3 Commits (2f3e9bc0f5e6e482efdaea40c53cfbab02ef9687)

Author SHA1 Message Date
Mikko Juola 2f3e9bc0f5 K4 bit inference works now. Performance isn't as good as I'd like it to be though. 3 years ago
Mikko Juola b8946da2d8 Implement matrix multiplication for 4-bit * 32-bit floats.
As of this commit, test works. But I want to optimize this a bit, seeing
if increasing load instruction : arithmetic instruction ratio will make
single-threaded performance a bit speedier.
3 years ago
Mikko Juola f6249e8d9f Add skeleton code for 4-bit quantization.
The type is now recognized and I have a very simple quantizer too but no
operations are done yet.
3 years ago