OpenAI has announced its new flagship generative AI model, GPT-4o, where the "o" stands for "omni," indicating the model's ability to work with text, speech, and video. GPT-4o will be gradually introduced into the company's products for both developers and consumers over the next few weeks.
Breakthrough in multimodal AI
According to OpenAI CTO Mira Muratti, GPT-4o has GPT-4-level intelligence but surpasses its capabilities in multiple modalities and media. At a presentation at OpenAI's offices in San Francisco, Muratti said:
"GPT-4o thinks through voice, text, and visuals. This is extremely important as we look to the future of human-machine interaction."
New ChatGPT features
GPT-4o significantly improves the functionality of OpenAI's chatbot, ChatGPT. Previously, the platform already offered a voice input mode, using a text-to-speech model to voice chatbot responses. However, with GPT-4o, users can interact with ChatGPT like an assistant. Users can now ask questions and interrupt ChatGPT's responses, the model provides "real-time" feedback, and can perceive nuances in the user's voice to generate responses with different emotional tones, including singing.
Enhanced visual capabilities
GPT-4o also improves ChatGPT's image capabilities. For example, ChatGPT can now quickly answer questions related to photos or screenshots of the desktop, from "What's going on in this program code?" to "What brand of shirt is this person wearing?"
The future of interaction
Mira Muratti added that these features will continue to evolve. Today, GPT-4o can translate menus into another language, and in the future, the model will be able to "watch" live sports games and explain their rules, for example. "We know that these models are getting more and more complex, but we want to make interacting with them more natural and easier so that users can focus on the ChatGPT experience rather than the interface," Muratti said. "Over the past few years, we've focused on improving the intelligence of these models... But now we're taking a huge step forward in terms of usability."
Multilingual capabilities and accessibility
GPT-4o also has improved capabilities in approximately 50 languages. In the OpenAI API and Microsoft Azure OpenAI service, the GPT-4o is twice as fast, half the cost, and has higher limits than the GPT-4 Turbo.
GPT-4o voice capabilities are currently not available to all API customers. Citing the risk of abuse, OpenAI plans to initially roll out support for the new audio capabilities to a "small group of trusted partners" in the coming weeks.
GPT-4o is available in ChatGPT's free plan starting today, and subscribers to ChatGPT Plus and Team premium plans will receive "5x higher" message limits. ChatGPT's enhanced voice experience, based on GPT-4o, will be available in alpha for Plus users next month, along with enterprise-focused options.
Revamped ChatGPT UI and New Features
OpenAI also announced an update to the ChatGPT UI on the web version with a new, more conversational home screen and message layout, as well as a desktop version of ChatGPT for macOS that allows users to ask questions with keyboard shortcuts or discuss screenshots. ChatGPT Plus users will be the first to have access to the app starting today, with the Windows version coming later this year.
In addition, the GPT Store library, which contains tools for building and using third-party chatbots based on OpenAI AI models, is now available to ChatGPT free users. Free users can also take advantage of ChatGPT features that were previously blocked due to paid access, such as the ability to save preferences for future interactions, upload files and photos, and search for answers to frequently asked questions on the Internet.