rllama

Commit Graph

Author	SHA1	Message	Date
Mikko Juola	40121e1c82	Multithread the k4 * f32 matrix multiplication.	3 years ago
Mikko Juola	b8946da2d8	Implement matrix multiplication for 4-bit * 32-bit floats. As of this commit, test works. But I want to optimize this a bit, seeing if increasing load instruction : arithmetic instruction ratio will make single-threaded performance a bit speedier.	3 years ago
Mikko Juola	9c86c17318	Refactor all SIMD to one file, simd_support.rs This should make it a bit easier to port to other SIMD instruction sets when the SIMD instructions are not littered randomly around the tensor.rs file.	3 years ago

Author

SHA1

Message

Date

Mikko Juola

40121e1c82

Multithread the k4 * f32 matrix multiplication.

Mikko Juola

b8946da2d8

Implement matrix multiplication for 4-bit * 32-bit floats.

As of this commit, test works. But I want to optimize this a bit, seeing
if increasing load instruction : arithmetic instruction ratio will make
single-threaded performance a bit speedier.

Mikko Juola

9c86c17318

Refactor all SIMD to one file, simd_support.rs

This should make it a bit easier to port to other SIMD instruction sets
when the SIMD instructions are not littered randomly around the
tensor.rs file.

3 Commits (40121e1c82e00a3f9567c7af66d632b3d41fca22)