3 Commits (c9c861d199bd2d87d7e883e3087661c1e287f6c4)

Author SHA1 Message Date
Mikko Juola 1a88482988 Add some OpenCL bits.
I wrote an OpenCL matrix_mul_inplace_transposed. It is much faster than
my CPU implementation for GPU, and also quite a lot faster on CPU
(OpenCL runs on CPU and GPU) than my own implementation.

Basically it can destroy all of my crappy code. So I think I will be
replacing some of my other operations with this stuff in near future.
3 years ago
Mikko Juola a92017bf56 Add some initial OpenCL stuff.
I can copy tensors to GPU and back but not much more. Maybe next time
I'll try implementing matrix_mul_transposed or something on the GPU.
3 years ago
Mikko Juola 53d367e6fa Add some beginnings of OpenCL implementation.
I think I'll try to get the smaller modules run faster.
3 years ago