- The AI Digest
- Posts
- China's new AI hits GPT-4o of it's throne
China's new AI hits GPT-4o of it's throne
Welcome, AI Enthousiasts!
In today’s AI newsletter:
→ AI Newsflash
→ Google finally fixed image problems
→ China’s new Qwen 2 beats GPT-4o
→ OpenAI and Anthropic partner with US gov
→ AI Start-up reaches 100M token context
→ The revolutionary AI phone agent
→ 5 Best video generation and editing tools
Reading time: 11 minutes
AI Newsflash
Meta reported significant growth for its Llama AI models, with downloads nearing 350 million and usage surging 10x since January.
Nous Research released the Hermes Function Calling V1 dataset aimed at training AI models in function calling and structured output capabilities.
Nvidia and Apple reportedly discussed joining OpenAI’s funding round with Microsoft, which could potentially value the AI startup at over $100 billion.
California lawmakers approved a bill proposing sweeping AI regulations, including mandatory safety testing and possible legal consequences for harmful AI systems.
Yale University announced a $150 million investment over 5 years to bolster AI research, development, and education throughout the institution.
Codeium raised $150 million in Series C funding, reaching a $1.25 billion valuation and achieving unicorn status in less than two years since its launch.
Playground launched a new AI-powered graphic design tool that allows users to create logos, social media designs, t-shirts, and more—all for free.
In February, Google halted Gemini's ability to generate images of people following complaints about inaccuracies — you might recall this. For example, when asked to depict “a Roman legion,” the AI produced a diverse array of soldiers, and when prompted with “Zulu warriors,” it generated stereotypical representations of Black figures.
Google CEO Sundar Pichai apologised, and DeepMind’s co-founder, Demis Hassabis, promised a quick fix.
Clearly, it took longer than expected.
Now, users on paid Gemini plans —Advanced, Business, or Enterprise— can once again generate images of people, but this feature is initially available in an early access test and only in English. Google has not announced when this feature will roll out to free users or other languages.
Here’s what you should know:
→ Being able to generate people returns but only for paid users in early access.
→ Imagen 3, the latest model, aims to create fairer, more diverse images and will be available to all users.
→ Google introduces "Gems," custom AI experts for premium users, though these aren’t shareable for now.
Better late than never?
Google's latest model, Imagen 3, is designed to produce more equitable images by improving the diversity within its training data. While Google has been vague about the specifics, they claim that extensive testing has reduced the likelihood of undesirable results.
Soon, all Gemini users will gain access to Imagen 4, although the feature for generating people remains exclusive to premium tiers. To address concerns about deepfakes, Imagen 3 incorporates SynthID, an invisible watermarking technology developed by DeepMind.
Additionally, Google is rolling out "Gems" for premium users—custom versions of Gemini designed to serve as topic experts, similar to OpenAI's GPTs. These Gems can assist with tasks like brainstorming, project planning, and writing captions and will be available on both desktop and mobile in 150 countries. However, unlike GPTs, these Gems cannot be shared, as Google is currently focusing on how users apply them for creativity and productivity.
Alibaba just introduced Qwen2-VL, a new vision-language AI model that surpasses GPT-4 in several benchmarks, particularly in document comprehension and multilingual text-image understanding.
The details:
→ Qwen2-VL can process images of various resolutions and ratios, as well as videos over 20 minutes long.
→ The model shines in complex tasks like college-level problem-solving, mathematical reasoning, and document analysis.
→ It also supports multilingual text understanding in images, covering most European languages, as well as Japanese, Korean, Arabic, and Vietnamese.
→ You can try Qwen2-VL on Hugging Face, with more information on the official announcement blog.
P.S. Remember to avoid sharing personal or traceable data when interacting with the AI assistant. It’s a product from China after all, so stay cautious!
Why it matters: Another strong contender has entered the cutting-edge AI model scene, this time from China's Alibaba. Qwen2-VL’s ability to handle diverse visual inputs and multilingual queries could pave the way for more advanced, globally accessible AI applications.
OPENAI & ANTHROPIC
OpenAI and Anthropic partner with the US gov
OpenAI and Anthropic just signed a groundbreaking agreement with the U.S. Artificial Intelligence Safety Institute, allowing the government to access and test their AI models before they’re publicly released.
The details:
→ The U.S. AI Safety Institute will have early access to major new models from both companies, both before and after their public release.
→ This collaboration marks a significant step toward AI regulation and safety, enabling the U.S. government to evaluate AI models’ capabilities and associated risks.
→ The institute will provide feedback to OpenAI and Anthropic on potential safety improvements that need to be made.
→ These agreements come at a time when AI companies are under increasing regulatory scrutiny, highlighted by California’s recent passage of a broad AI regulation bill.
Why it matters: The world’s two most prominent AI companies are now giving the U.S. government early access to their unreleased models. This move could fundamentally change how AI is developed, tested, and deployed globally, with significant implications for innovation, safety, and international competition in the AI space - whether for better or worse.
Magic just developed LTM-2-mini, a model capable of processing 100 million tokens of context — equivalent to about 10 million lines of code or 750 novels — and partnered with Google Cloud to build advanced AI supercomputers.
The details:
→ LTM-2-mini can process and understand 100 million tokens of context during inference, surpassing current models by 50x.
→ The model’s innovative algorithm processes long sequences of data 1000x more efficiently than the current top-performing AI models.
→ Magic is partnering with Google Cloud to build supercomputers powered by Nvidia’s latest and most advanced GPUs.
→ The company has raised over $450 million in total funding, including a recent $320 million investment round.
Why it matters: This breakthrough in context length enables AI agents to process and reason over dense, complex codebases, vast databases, and years of conversation history in a single inference. It’s a significant step toward creating AI assistants with near-perfect recall and memory.
NLPearl, an AI-driven phone agent, is redefining customer communication with its ability to handle complex conversations across multiple industries. Offering an adaptive, lifelike experience that surpasses traditional automation, it stands out in the market. Here’s what this AI brings to the table:
→ Natural Voice Interactions: Using advanced technology, NLPearl delivers conversations that feel real, capturing human tones, pauses, emotions, and the nuances of genuine dialogue for a seamless experience.
→ Continuous Learning: Powered by deep learning, NLPearl refines its responses with every interaction, adapting to both industry specifics and individual business needs.
→ Multilingual Capabilities: Communicates effortlessly in multiple languages and accents, making it perfect for global businesses.
→ Automated Functions: Efficiently manages tasks like scheduling, payments, and CRM workflows, all in real time and fully scalable.
→ Data-Driven Insights: Analyzes call data to provide actionable insights, helping businesses boost engagement and improve customer experiences.
Ready to experience the future of communication?
5 Best AI video generation and editing tools
InVideo - Offers user-friendly tools for creating videos with AI-powered templates, stock footage, and advanced editing features.
DeepBrain - Utilizes AI to simplify video creation with realistic avatars and text-to-speech capabilities, perfect for dynamic video presentations.
Vidnoz - Provides a comprehensive suite of video editing tools enhanced with AI for animations, transitions, and voiceovers.
CapCut - A popular video editing app featuring AI tools for effortless video creation and editing, including effects, transitions, and filters.
HeyGen - Delivers AI-driven video creation tools, including facial animation and automatic video editing, to streamline the video production process.
That’s a wrap!
We had a lot to talk about, so let’s wrap it up for now. If you have any questions, feel free to shoot over an e-mail and we wil get back to you within 24 hours.
If you have specific feedback or anything interesting you’d like to share, please let us know by replying to this e-mail. Cya!