Changing AI math could reduce the hardware burden, researchers show

https://image.theregister.com/5264648.jpg?imageId=5264648&x=0&y=0&cropw=100&croph=100&panox=0&panoy=0&panow=100&panoh=100&width=1200&height=683

Sophisticated AI models tend to require a lot of memory and take up a lot of storage space. One of the ways to reduce that footprint involves a process called quantization, which changes how model weights are represented and stored. But quantization has its drawbacks.

Andrés Mac Allister, CEO and founder of The SEMQ Group, believes there's another way to make machine learning more efficient and less resource intensive. Instead of compressing model weights (specifically embeddings), he contends you can separate the semantics (the meaning) from how that meaning is represented.

Model weights, including embeddings (which map tokens to vectors), are the numbers in a machine learning model that determine how strongly one piece of information relates to another. Taken all together, they reflect learned behavior.

These parameters are commonly represented in Full-Precision (FP32), which requires 4 bytes per parameter. A 7B parameter model at FP32 would need about...

Copyright of this story solely belongs to theregister.com. To see the full text click HERE