Tech »  Topic »  Meta Introduces Spirit LM open source model that combines text and speech inputs/outputs

Meta Introduces Spirit LM open source model that combines text and speech inputs/outputs


Credit: VentureBeat made with ChatGPT

Just in time for Halloween 2024, Meta has unveiled Meta Spirit LM, the company’s first open-source multimodal language model capable of seamlessly integrating text and speech inputs and outputs.

As such, it competes directly with OpenAI’s GPT-4o (also natively multimodal) and other multimodal models such as Hume’s EVI 2, as well as dedicated text-to-speech and speech-to-text offerings such as ElevenLabs.

Designed by Meta’s Fundamental AI Research (FAIR) team, Spirit LM aims to address the limitations of existing AI voice experiences by offering a more expressive and natural-sounding speech generation, while learning tasks across modalities like automatic speech recognition (ASR), text-to-speech (TTS), and speech classification.

Unfortunately for entrepreneurs and business leaders, the model is only currently available for non-commercial usage under Meta’s FAIR Noncommercial Research License, which e grants users the right to use, reproduce, modify, and create derivative works of ...


Copyright of this story solely belongs to venturebeat . To see the full text click HERE