Search

Executive Brief: May 18

AI assistant with audio and visual

Highlights

1. Research shows that many leaders lack a full understanding of their employees’ AI usage and readiness.
2. GPT-4o from OpenAI features lower costs, faster responses, conversational and multimodal capabilities.
3. Google’s Project Astra is a new AI assistant that can understand and respond to visual and spoken inputs, making sense of what a smartphone camera sees and engaging in natural language conversations.
4. Google and OpenAI’s developments highlight a shift towards more sophisticated AI assistants that combine audio, visual, and text data.
5. Google’s Veo, an AI video making model, may be a competitor of Sora.
6. Stanford researchers have developed a prototype of lightweight AR glasses without the bulkiness of traditional AR headsets.

Innovation Insights 

 

1. What companies don’t know about how workers use AI (Harvard Business Review)
Research shows that many leaders lack a full understanding of their employees’ AI usage and readiness, and there is widespread distrust among Americans regarding businesses’ use of AI. To bridge these gaps, leaders should measure current AI usage, empower managers to build trust, and adopt a purpose-led AI strategy aligned with the company’s mission rather than relying on restrictive, fear-based approaches.

2. Today’s AI models are impressive. Teams of them will be formidable (The Economist)
Large language models (LLMs) working as multi-agent systems (MAS) demonstrate enhanced capabilities and intelligence, tackling complex tasks like trip planning and strategic games through collaborative and independent decision-making. These AI teams can refine solutions and minimize misinformation, yet they also present new risks, such as potential for illogical solutions and misuse in scenarios like phishing or identity theft. Microsoft and other entities are developing frameworks like AutoGen and Camel to facilitate MAS creation, emphasizing the need for robust security measures as these systems evolve and potentially access sensitive data.

3. Want to make the most of generative AI? Use your imagination (MIT Sloan)
Microsoft’s chief scientist Jaime Teevan highlights that leaders should consider the broader potential of generative AI beyond just productivity gains. During her keynote at the MIT Sloan AI & ML Conference, Teevan discussed the transformative impact of GPT-4 on work, emphasizing its ability to enhance efficiency, assist inexperienced workers, and facilitate new ways of collaboration and ideation. She also projected that generative AI would revolutionize online search by offering personalized feedback and interactive engagement, urging for more imaginative applications of the technology.

AI Innovations  

1. Google vs. OpenAI
OpenAI has introduced GPT-4o, which features lower costs, faster responses, conversational abilities, and a range of new capabilities including 3D generation, reading images, translating languages, and identifying emotions (OpenAI) (Open AI Demo) (Khan Demo). OpenAI also introduced Google Drive and OneDrive integration, as well as interactive tables and charts (OpenAI).

Google’s Project Astra is a new AI assistant that can understand and respond to visual and spoken inputs, making sense of what a smartphone camera sees and engaging in natural language conversations. Announced shortly after OpenAI revealed similar advancements for ChatGPT, Astra uses Google’s advanced Gemini Ultra model to deliver multimodal capabilities, including recognizing objects, analyzing code, and recalling information. Both Google and OpenAI’s developments highlight a shift towards more sophisticated AI assistants that combine audio, visual, and text data (Wired). Google also introduced Veo, an AI video-synthesis model capable of creating high-definition videos from text, image, or video prompts, akin to OpenAI’s Sora (Ars Technica).

Androids are getting more integration with Gemini, including exclusive features like an upgraded Circle to Search tool that now identifies products and solves math equations. Gemini 1.5 Pro and Flash, now enhanced for tasks such as translation, coding, and reasoning, are available for global preview and will officially launch in June. Google’s lightweight language model, Nano, will be incorporated into the Chrome browser, offering on-device AI capabilities like text generation (ZDNet).

2. Prompt generator
Anthropic has introduced a Prompt Generator tool for its business and API users. The tool helps automatically create optimal prompts using natural language when performing tasks with its Claude models. It uses advanced prompt techniques like chain-of-thought reasoning (Anthropic).

3. Tutoring
Duolingo is expanding a generative AI tool that lets users interact with a human-like AI tutor (Wall Street Journal).

4. New model
The Falcon 2 series by the Technology Innovation Institute (TII) include models with 180 billion parameters. These models, such as Falcon 180B, are designed for diverse applications, offering open access to researchers and commercial users while outperforming competitors like Meta’s LLaMA 2 in various benchmarks​ (Falcon).

5. AI narration
ElevenLabs introduced its inaugural consumer app, Reader: AI Audio, which can read web pages, PDFs, and other documents using 11 distinct voices (VentureBeat).

Other Innovations  

1. AR glasses
Researchers led by Stanford Professor Gordon Wetzstein have developed a prototype of lightweight AR glasses using metasurface waveguides and AI-driven holography, offering a seamless integration of digital and real-world visuals without the bulkiness of traditional AR/VR headsets. This breakthrough allows the glasses to control light paths and generate complex, real-time visual experiences with the help of AI. The technology is still in the prototype stage (ZDNet).

2. Robots
The Unitree G1 is a $16,000 humanoid robot designed for various tasks, equipped with 360-degree joints, 3D LiDAR sensors, a depth camera, and human-like hands. It is trained through reinforcement learning and can perform activities such as smashing walnuts and other household tasks, making it a versatile tool for both home and industrial environments  (New Atlas)​.

3. iPhone and iPad
Apple announced new assistive features for iPhones and iPads, including built-in eye-tracking, customizable vocal shortcuts, music haptics, and motion cues for vehicle use. These updates enhance navigation using gaze, improve voice control, provide haptic feedback for music, and alleviate motion sickness (Engadget).

4. 3D display
Japanese researchers have developed “Fairy Lights,” a 3-D midair plasma display created with femtosecond lasers that is safe to touch. This display uses ultra-fast laser pulses to ionize air molecules, producing glowing voxels that can form moving images and provide haptic feedback, allowing users to feel the plasma when they interact with it (IEEE Spectrum).
 
5. Obesity drugs

Researchers have developed a new obesity drug candidate that combines a gut-hormone mimic with a small molecule to target both GLP-1 and NMDA receptors, showing enhanced weight-loss effects in mice and rats. This innovative approach could potentially outperform existing treatments and is expected to enter clinical trials by 2025 (Nature).

A long-term trial of the weight-loss drug Wegovy (semaglutide) showed that initial weight loss, averaging 10.2% of body weight, was maintained for up to four years (Ars Technica).

6. DNA technology for identifying victims
Rapid DNA analysis has revolutionized the identification process in mass disasters, offering quick and reliable results, as demonstrated in the aftermath of the deadly Maui fires where victims were identified within hours. This technology significantly reduces the time families wait for closure, contrasting with traditional DNA methods that often take months or years (MIT Technology Review).

Share this post