rllama

Commit Graph

Author	SHA1	Message	Date
Mikko Juola	8aef5d8831	Rename to_gpu and to_cpu to to_gpu_inplace and to_cpu_inplace to make _inplace use consistent.	3 years ago
Mikko Juola	1c5ec04217	Add a different kernel to be used when OpenCL device is a CPU. This is almost the same code I had before. It runs better on CPUs rather than GPUs.	3 years ago
Mikko Juola	8c64313fec	Rewrite the matrix multiplication. This is something like ~10 times faster than the old one. But surprisingly this didn't have much impact on text generation time. Maybe most of the remaining slowness is no more from matrix multiplication. Also this slowed down CPU implementation. I think I'll try adding another kernel later for CPU OpenCL.	3 years ago
Mikko Juola	63d27dba90	Add partial OpenCL support, it's used in feed forward network only.	3 years ago
Mikko Juola	1a88482988	Add some OpenCL bits. I wrote an OpenCL matrix_mul_inplace_transposed. It is much faster than my CPU implementation for GPU, and also quite a lot faster on CPU (OpenCL runs on CPU and GPU) than my own implementation. Basically it can destroy all of my crappy code. So I think I will be replacing some of my other operations with this stuff in near future.	3 years ago
Mikko Juola	a92017bf56	Add some initial OpenCL stuff. I can copy tensors to GPU and back but not much more. Maybe next time I'll try implementing matrix_mul_transposed or something on the GPU.	3 years ago
Mikko Juola	53d367e6fa	Add some beginnings of OpenCL implementation. I think I'll try to get the smaller modules run faster.	3 years ago

7 Commits (8aef5d8831bf57e3ef11b964a9be108a3573de7b)