3 Commits (53d367e6fa02c9cdae20c4f7889a05a3efd92882)

Author SHA1 Message Date
Mikko Juola 53d367e6fa Add some beginnings of OpenCL implementation.
I think I'll try to get the smaller modules run faster.
3 years ago
Mikko Juola 846759b277 Optimize conversions to and from f16<->32.
x86 cannot do f16 operations natively, but it does have an instruction
to convert them to f32. I optimized those to use SIMD instructions.
3 years ago
Mikko Juola 3b8f904f13 First commit. LLaMA works now. It is not pretty but it does generate text from prompts. Yay. 3 years ago