I wonder what’s the intended use case is. One would think that if one wants to work with reduced precision data, one would already store the weights this way. It is possible that this is for accelerating intermediate processing where you might mix different data types. For example, one operation could accumulate into FP32 and then you could use this feature to truncate the result for free. I don’t know, really. Apple invested a lot of effort to support heterogeneous inputs, where your two matrices can be of different types and are converted to a common denominator format without any additional runtime cost. So I assume there is a reason for that, even if I don’t see it immediately.