rllama

Commit Graph

Author	SHA1	Message	Date
Mikko Juola	8c64313fec	Rewrite the matrix multiplication. This is something like ~10 times faster than the old one. But surprisingly this didn't have much impact on text generation time. Maybe most of the remaining slowness is no more from matrix multiplication. Also this slowed down CPU implementation. I think I'll try adding another kernel later for CPU OpenCL.	3 years ago
Mikko Juola	862d4a15d6	Add repetition penalty, add colors to outputs based on probabilities, try to make softmax() more numerically stable.	3 years ago
Mikko Juola	f4629ca987	Respect the stop token from the model.	3 years ago
Mikko Juola	de477314ed	Fix newlines not recognized when feeding newlines in the prompt. Tokenizer would misinterpret the newlines. In general, the non-printable control characters don't seem to be tokenized correctly at the moment. I added band-aid for newlines but should maybe fix the others too.	3 years ago
Mikko Juola	687bbf1249	Add instructions on how to use OpenCL in the README.md	3 years ago
Mikko Juola	8de18bdc77	Add screenshot to README.md.	3 years ago
Mikko Juola	a2e88c1193	Update README.md	3 years ago
Mikko Juola	b4d5cf91a7	Mention in README.md that using OpenCL does not cast weights to 32-bit floats.	3 years ago
Mikko Juola	99da6ed71a	Update README.md benchmarks for new attention OpenCL thing.	3 years ago
Mikko Juola	35b0c372a8	Implement some attention operations for OpenCL.	3 years ago
Mikko Juola	6e456e64f3	Add new benchmarks now that this is partially OpenCLified.	3 years ago
Mikko Juola	63d27dba90	Add partial OpenCL support, it's used in feed forward network only.	3 years ago
Mikko Juola	df079bceb0	Add records of my benchmarks to README.md so I can compare it later.	3 years ago
Mikko Juola	c9c861d199	Add some measurements so we can get tokens per second.	3 years ago
Mikko Juola	22792b26cc	Add an idea about on-disk cache for initial prompt processing (not for weights).	3 years ago
Mikko Juola	9087c50efa	Add notes about improving sampler to README.md	3 years ago
Mikko Juola	1a88482988	Add some OpenCL bits. I wrote an OpenCL matrix_mul_inplace_transposed. It is much faster than my CPU implementation for GPU, and also quite a lot faster on CPU (OpenCL runs on CPU and GPU) than my own implementation. Basically it can destroy all of my crappy code. So I think I will be replacing some of my other operations with this stuff in near future.	3 years ago
Mikko Juola	a92017bf56	Add some initial OpenCL stuff. I can copy tensors to GPU and back but not much more. Maybe next time I'll try implementing matrix_mul_transposed or something on the GPU.	3 years ago
Mikko Juola	53d367e6fa	Add some beginnings of OpenCL implementation. I think I'll try to get the smaller modules run faster.	3 years ago
Mikko Juola	846759b277	Optimize conversions to and from f16<->32. x86 cannot do f16 operations natively, but it does have an instruction to convert them to f32. I optimized those to use SIMD instructions.	3 years ago
Mikko Juola	8acb9f32b8	Update README.md for new discoveries.	3 years ago
Mikko Juola	26d5309cf7	Add support for bigger models. I've tested with 13B LLaMA model and it seems to work. There was a bug in unpickler that skipped over tuples of size 1. I had written bunch of code assuming there is no bug which I fixed and removed some unpickling code. I added functions to tensor.rs to be able construct tensors out of multiple files.	3 years ago
Mikko Juola	8a427bcb21	The project is actually called rllama, put that in readme.md.	3 years ago
Mikko Juola	18ef805458	Read parameters from model's JSON file instead of hard-coding them, make max sequence length configurable.	3 years ago
Mikko Juola	f103871bc0	Make the output colored. This is essential to be taken seriously. Also did some clippy happiness changes.	3 years ago
Mikko Juola	cd28aba5e2	Make the output look nicer.	3 years ago
Mikko Juola	d7a3f57510	Update README.md, add multithreading and optimizations to some operations, allow loading prompt from a file.	3 years ago
Mikko Juola	8bb9404168	Update README to clarify this is a Rust project and to show how to change temperature, top_k, top_p stuff.	3 years ago
Mikko Juola	f6217e0036	Add readme, make clippy happy.	3 years ago
Mikko Juola	3b8f904f13	First commit. LLaMA works now. It is not pretty but it does generate text from prompts. Yay.	3 years ago

30 Commits (8c64313fecebb42d07613fcc64ebd5aeebb50df9) All Branches Search

30 Commits (8c64313fecebb42d07613fcc64ebd5aeebb50df9)

All Branches