Mikko Juola
f233f8ad8f
Forgot to mark last benchmark at March 17
3 years ago
Mikko Juola
db0f22ed26
Update README.md, add a nice animation.
3 years ago
Mikko Juola
016b609481
More install instructions.
3 years ago
Mikko Juola
2666571e2b
Update README.md to show `rllama` is on crates.io now.
3 years ago
Mikko Juola
109171b50e
Mention that this is AMD64 only because of AVX2.
3 years ago
Mikko Juola
44e0abf0f1
Clarify that the OpenCL implementations all use f16.
3 years ago
Mikko Juola
58463458ee
Put benchmarks on top of README.md.
3 years ago
Mikko Juola
882ff05254
Update README.md for new benchmarks.
3 years ago
Mikko Juola
8134c20d57
We can now run in (mostly) f16 mode without any OpenCL. It's not the fastest way but right now it looks like most memory friendly.
3 years ago
Mikko Juola
09f76dfcfa
Update README.md opening with new benchmark numbers.
3 years ago
Mikko Juola
4b8accee44
Update benchmarks.
3 years ago
Mikko Juola
862d4a15d6
Add repetition penalty, add colors to outputs based on probabilities, try to make softmax() more numerically stable.
3 years ago
Mikko Juola
f4629ca987
Respect the stop token from the model.
3 years ago
Mikko Juola
687bbf1249
Add instructions on how to use OpenCL in the README.md
3 years ago
Mikko Juola
8de18bdc77
Add screenshot to README.md.
3 years ago
Mikko Juola
a2e88c1193
Update README.md
3 years ago
Mikko Juola
b4d5cf91a7
Mention in README.md that using OpenCL does not cast weights to 32-bit floats.
3 years ago
Mikko Juola
99da6ed71a
Update README.md benchmarks for new attention OpenCL thing.
3 years ago
Mikko Juola
6e456e64f3
Add new benchmarks now that this is partially OpenCLified.
3 years ago
Mikko Juola
df079bceb0
Add records of my benchmarks to README.md so I can compare it later.
3 years ago
Mikko Juola
22792b26cc
Add an idea about on-disk cache for initial prompt processing (not for weights).
3 years ago
Mikko Juola
9087c50efa
Add notes about improving sampler to README.md
3 years ago
Mikko Juola
1a88482988
Add some OpenCL bits.
...
I wrote an OpenCL matrix_mul_inplace_transposed. It is much faster than
my CPU implementation for GPU, and also quite a lot faster on CPU
(OpenCL runs on CPU and GPU) than my own implementation.
Basically it can destroy all of my crappy code. So I think I will be
replacing some of my other operations with this stuff in near future.
3 years ago
Mikko Juola
8acb9f32b8
Update README.md for new discoveries.
3 years ago
Mikko Juola
26d5309cf7
Add support for bigger models.
...
I've tested with 13B LLaMA model and it seems to work.
There was a bug in unpickler that skipped over tuples of size 1. I had
written bunch of code assuming there is no bug which I fixed and removed
some unpickling code.
I added functions to tensor.rs to be able construct tensors out of
multiple files.
3 years ago
Mikko Juola
8a427bcb21
The project is actually called rllama, put that in readme.md.
3 years ago
Mikko Juola
d7a3f57510
Update README.md, add multithreading and optimizations to some operations, allow loading prompt from a file.
3 years ago
Mikko Juola
8bb9404168
Update README to clarify this is a Rust project and to show how to change temperature, top_k, top_p stuff.
3 years ago
Mikko Juola
f6217e0036
Add readme, make clippy happy.
3 years ago
Mikko Juola
3b8f904f13
First commit. LLaMA works now. It is not pretty but it does generate text from prompts. Yay.
3 years ago