Chinese researchers unveil LLaVA-o1 to challenge OpenAI’s o1 model
venturebeatOpenAI‘s o1 model has shown that inference-time scaling—using more compute during inference—can significantly boost a language model’s reasoning abilities. LLaVA-o1, a new model developed by researchers from multiple universities in China, brings this paradigm to open-source vision language models (VLMs).
Early open-source VLMs typically use a direct prediction approach, generating answers without reasoning about the prompt and the steps required to solve the prompt. Without a structured reasoning process, they are less effective at tasks that require logical reasoning. Advanced prompting techniques such as chain-of-thought (CoT) prompting, where the model is encouraged to generate intermediate reasoning steps, produce some marginal improvements. But VLMs often produce errors or hallucinate.
The researchers observed that a key issue is that the reasoning process in existing VLMs is not sufficiently systematic and structured. The models do not generate reasoning chains and often get stuck in ...
Copyright of this story solely belongs to venturebeat . To see the full text click HERE