Innovation Insights
1. Four common pitfalls of Gen AI strategy and how to avoid them (Board of Innovation)
The author identifies common pitfalls in generative AI strategy and offers guidance on avoiding them. First, he advises against rushing to build custom AI tools without first exploring foundation models and understanding their capabilities in relation to specific business contexts. He highlights the importance of connecting high-level AI strategies with practical, task-level applications to ensure alignment between strategic goals and ground-level execution. Furthermore, the author warns against the narrow vision of limiting AI strategy to chatbot applications and underestimating the organizational transformation required for effective AI integration, emphasizing the need for a holistic approach that considers the entire operating model.
2. Generative AI has a clean-energy problem (The Economist)
The rapid growth of generative AI is presenting a significant challenge to the global energy system, already strained by increasing demands from other sectors. Data centers, which have traditionally consumed a modest share of global energy, are facing a surge in electricity demand due to the intensive power requirements of AI-focused hardware like GPUs. This burgeoning energy demand is occurring alongside the push for electrification and zero-carbon energy solutions, complicating efforts to expand renewable energy capacity quickly due to logistical and financial constraints, thus posing a risk of increased reliance on fossil fuels.
3. Small language models and Edge AI
Small language models (SLMs) are gaining traction in the AI field as they offer a more efficient and adaptable alternative to large language models (LLMs), which are becoming resource-intensive and are showing signs of performance plateau. SLMs, with their fewer parameters and simpler designs, can be tailored for specific tasks and applications, making them highly effective for targeted uses like sentiment analysis and domain-specific question answering. These models not only reduce the computational demand and cost but also enhance privacy and security, making them suitable for sensitive applications and potentially democratizing AI access across various industries (VentureBeat).
The evolution of AI technology is mirroring the historical trajectory of computing hardware, transitioning from large, centralized systems to more specialized and localized applications known as “edge AI.” Just as computing power has dramatically increased and become more efficient—from the ENIAC to modern microchips—AI is also becoming more accessible and sustainable, moving processing closer to where data is generated, thereby enhancing efficiency and privacy. Edge AI optimizes processing by using smaller, specialized models that handle tasks directly on local devices, reducing latency and resource use, while posing new challenges in quality control and security governance (VentureBeat).
4. How People Are Really Using GenAI (Harvard Business Review)
Researchers have analyzed thousands of online comments to categorize the various uses of generative AI into six main themes: Technical Assistance & Troubleshooting (23%), Content Creation & Editing (22%), Personal & Professional Support (17%), Learning & Education (15%), Creativity & Recreation (13%), and Research, Analysis & Decision Making (10%). This broad utilization reflects the technology’s integration into numerous aspects of both domestic and work life, highlighting its versatility in addressing diverse challenges. The data reveals that while generative AI is already enhancing productivity and creative endeavors across many sectors, there is still significant room for growth and acceptance among the global population. This article might help us in identifying business use cases of AI.
AI Innovations
1. AI for fashion suggestions
eBay is integrating an AI-powered “shop the look” feature into its iOS app, offering personalized fashion recommendations based on users’ shopping histories and preferences. This feature, which displays as an interactive carousel, will soon be available to users in the U.S. and UK who have viewed at least 10 fashion items on eBay over the past 180 days, with plans to extend support to Android devices later this year. The initiative is part of eBay’s broader use of its eBay.ai platform, which also supports AI-generated product listings and is part of a wider trend of e-commerce platforms like Meta and Amazon integrating AI tools to enhance shopping experiences (Mashable).
2. AI model releases
Google has launched Gemini 1.5 Pro (access here), now available in public preview on Vertex AI. This model significantly expands the context it can process, handling between 128,000 to 1 million tokens, supporting sophisticated tasks like analyzing extensive code libraries or maintaining detailed, long-term dialogues. Gemini 1.5 Pro is multilingual and multimodal, capable of processing text, images, videos, and now audio streams, making it versatile for various applications such as media analysis and long audio transcription (TechCrunch).
French AI startup Mistral has launched Mixtral 8x22B, an open-source large language model (LLM) designed to surpass its predecessor, Mixtral 8x7B, and compete with prominent AI models like OpenAI’s GPT-3.5 and Meta’s Llama 2. Mixtral 8x22B features a 65,000-token context window and a vast parameter count of 176 billion, highlighting its advanced capabilities in processing and decision-making (ZDNet).
OpenAI and Meta are poised to launch new AI models that incorporate reasoning and planning. These enhancements allow applications like chatbots to perform complex sequences of tasks and understand the consequences of their actions. Meta plans to integrate its upcoming Llama 3 model into consumer products such as WhatsApp and Ray-Ban smart glasses, while OpenAI’s forthcoming model, potentially GPT-5, is expected to enhance the models’ ability to tackle more complex and extended tasks (Financial Times).
Google’s RecurrentGemma leverages linear recurrences from traditional recurrent neural networks (RNNs) to process sequential data efficiently by focusing on smaller segments at a time, thus reducing the computational load and memory usage compared to Transformer-based models. This model updates its hidden state as new data is processed, maintaining a continuous flow of information without the need for extensive intermediate data storage, making it particularly effective for long text sequences and suitable for deployment on resource-limited devices. By minimizing the reliance on high-powered GPUs and enabling more local processing, RecurrentGemma is well-positioned for edge computing applications, potentially transforming AI deployment in mobile and embedded systems while maintaining high processing efficiency (VentureBeat).
3. Chips for AI
Apple is preparing to launch its next-generation M4 processors in at least three variants to upgrade all Mac models, with plans for public availability starting late this year and continuing into early next. Despite a recent downturn in Mac sales, which fell 27% last fiscal year, the company is looking to reinvigorate its lineup, enhancing AI capabilities and integrating these features with its upcoming macOS version. The M4 chips, including entry-level Donan, mid-range Brava, and high-end Hidra models, aim to bolster Apple’s competitive edge in AI against tech giants like Microsoft and Google, while also addressing memory capacity issues in its high-end Mac desktops (Yahoo!).
Meta has introduced its next-generation custom chips designed to enhance AI workloads, which are set to improve the performance of ranking and recommendation models on platforms like Facebook and Instagram. These new chips are a crucial part of Meta’s expanded investment in AI infrastructure, aimed at supporting more sophisticated AI models and applications across its suite of technologies. The updated Meta Training and Inference Accelerator (MTIA) chips promise doubled compute and memory bandwidth, optimizing the efficiency for Meta’s specific AI needs and allowing better integration with future technological advancements (Meta).
Intel’s new Gaudi 3 AI accelerator chip aims to surpass Nvidia’s H100 in training large language models (LLMs) like GPT-3, boasting a 40% faster training time. The chip features a dual-die architecture with advanced memory and compute capabilities, including 48 megabytes of cache, matrix multiplication engines, and tensor processor cores, which collectively offer significant improvements over its predecessor, Gaudi 2. Although Intel’s Gaudi 3 shows promising performance and energy efficiency improvements, it faces competition from Nvidia’s forthcoming Blackwell B200 GPU, necessitating ongoing enhancements in memory technology and process nodes to stay competitive (IEEE Spectrum).
Arm has introduced its Ethos-U85 Neural Processing Unit (NPU) and the Corstone-320 IoT Reference Design Platform, targeting enhanced performance and integration for edge AI applications. These new technologies are designed to significantly boost the efficiency and capabilities of edge AI systems, such as smart home devices and factory automation, aiming to transform the Internet of Things with more advanced, energy-efficient computing power (VentureBeat).
AMD has expanded its Versal adaptive SoC lineup with the introduction of the Versal AI Edge Series Gen 2 and Versal Prime Series Gen 2 devices. These new programmable SoCs are designed to significantly improve the performance, power efficiency, and scalability for AI-driven embedded systems, boasting up to three times higher TOPs-per-watt and up to ten times more scalar compute than their predecessors, thereby setting new industry performance standards (VentureBeat).
4. Photo editing
Google has announced that its AI-powered editing tools, previously exclusive to Pixel devices and Google One subscribers, will now be available for free to all Google Photos users. These tools include the Magic Eraser for removing unwanted items, Photo Unblur to sharpen blurry images, and Portrait Light to adjust lighting in photos, among others. However, the rollout will begin on May 15 and will gradually reach all users, with certain hardware requirements necessary for access, such as Android 8.0 or higher, iOS 15 or higher, or specific ChromeOS devices (TechCrunch).
5. AI music generator
Udio, a new AI music generator, has recently transitioned from closed beta to a public launch, allowing users to freely access its services on their website. The platform differentiates itself by enabling users to create customized songs from text prompts, boasting features such as adjustable song length, vocals, and lyrics, claiming superior ease of use, voice quality, and musicality over competitors like Google’s MusicFX and Stability AI’s Stable Audio. Despite some launch-day server delays that extended song generation times, the final musical outputs from Udio were professionally sounding and richly detailed, promising a competitive edge in the AI music generation market (ZDNet).
Suno AI is a new tool that can generate songs in any musical style by taking lyrical prompts from users, but its ability to mimic existing artists’ sounds is raising concerns about potential copyright violations and undercutting human musicians. As AI music generation becomes more advanced, there are fears it could flood streaming platforms with machine-made songs that dilute royalties for human artists (The Guardian).
6. Virtual influencers
TikTok is developing a feature that allows advertisers and sellers on TikTok Shop to use AI avatars as virtual influencers, who can read scripts to promote products. This initiative is still in the testing phase and has shown less effectiveness in generating e-commerce sales compared to human influencers. Concerns exist regarding how this technology might impact the financial opportunities for human creators on the platform, especially following the discontinuation of TikTok’s $1 billion creator fund (The Verge).
7. OpenAI API
OpenAI has updated its API to allow for vision recognition and analysis capabilities through JSON format, simplifying the integration process for developers who previously had to use separate models for text and image processing. This change facilitates more efficient application development, enabling automated actions like sending emails or making purchases, with a strong recommendation for user confirmation steps. Additionally, OpenAI’s GPT-4 Turbo with Vision is being used by companies such as Cognition for autonomous coding, Healthify for nutritional analysis of meal photos, and TLDraw for converting drawings into websites, despite facing competition from newer AI models (VentureBeat).
8. Google Cloud Next 2024
At Cloud Next 2024, Google highlighted the expansion of its AI initiatives, including broader access to its advanced AI models like Gemini 1.5 Pro. Google Workspace is integrating new generative AI features to enhance its suite of productivity tools, including the introduction of Google Vids, an AI-powered application that assists in video creation by helping users write, produce, and edit videos collaboratively. Google Vids, set to join Workspace Labs in June 2024, is designed to facilitate storytelling in the workplace by generating video storyboards, compiling scenes from stock videos, and selecting voiceovers. Alongside Google Vids, Google Workspace will also launch AI-driven add-ons for meetings, messaging, and security, including automatic note-taking in Google Meet, translation features, and sensitive file classification in Google Drive, enhancing the overall functionality and security of the platform. Imagen 2’s text-to-live feature produces animations similar to GIFs, initially presenting them at a frame rate of 24 frames per second, with a resolution of 360×640 pixels, and lasting four seconds (Google).
9. AI for the physical world
Archetype AI is a startup developing an AI system called Newton that can interpret data from various sensors and describe in plain language what is happening in the physical world, such as the status of a package shipment or a factory. By acting as a translation layer between humans and complex sensor data, Newton aims to help people easily understand and gain insights from the wealth of information captured about the physical environment. While holding great potential for optimizing operations, the creators acknowledge concerns around privacy and say they are focused on detecting behaviors rather than identifying individuals through the sensor inputs (Wired).
Meta has introduced OpenEQA, an open-source framework designed to enhance how AI agents perceive and understand the world, particularly for applications in home robots and smart glasses. The framework enables AI to utilize sensory inputs to interpret its environment and interact with humans in clear language, providing practical assistance based on environmental cues and episodic memory. Despite its potential, Meta acknowledges significant challenges in the development of these vision+language models (VLMs), as they currently lack the spatial understanding necessary to function effectively without substantial improvement in perception and reasoning capabilities (ZDNet).
10. Search and retrieval
Snowflake recently introduced Copilot, an intelligent SQL query assistant now in public preview, following its recent collaboration with Coda. Copilot leverages Snowflake’s proprietary text-to-SQL model and the Mistral Large language model, assisting users in generating SQL queries to simplify data exploration and understanding. The tool, which integrates within SQL worksheets, offers a conversational interface for users to input queries in natural language and receive optimized SQL code, enhancing data-driven decision-making across enterprises (VentureBeat).
The newly launched Rerank 3 model by Cohere is specifically designed to optimize enterprise search and Retrieval Augmented Generation (RAG) systems, boasting compatibility with any database or search index and integration into legacy applications with minimal latency impact. This model enhances search capabilities across diverse data formats—including multi-aspect, semi-structured data like emails and tables—and supports multilingual retrieval in over 100 languages, significantly improving search quality and operational efficiency. Rerank 3 also offers advanced features such as a 4k context length for better handling of longer documents and high precision semantic reranking to ensure only the most relevant data is used, reducing costs and improving the accuracy of generated responses in RAG applications (Cohere).
11. Robotics
The field of robotics is experiencing a significant transformation, largely driven by the integration of AI, which could potentially replicate the disruptive effect seen with technologies like ChatGPT. Researchers are now using AI to enhance robots’ ability to learn and adapt to diverse environments such as homes, where previous robots struggled due to the unpredictability and variety of tasks. Innovations such as Stretch, a more affordable and adaptable robot, demonstrate this shift towards practical, everyday applications in domestic settings, highlighting a promising future where robots could become a common part of household environments (MIT Technology Review).
Other Innovations
1. Cancer fighting
Researchers have enhanced CAR T cells, a type of bioengineered immune cell used in cancer treatment, by boosting their levels of a protein that gives them stem-cell-like properties, leading to improved longevity and efficacy in fighting cancer. Two independent studies published in Nature found that these modified CAR T cells exhibit gene activity akin to stem cells, enabling them to better combat both solid tumors and blood cancers without becoming exhausted. This breakthrough opens new avenues for treating cancer more effectively with CAR T cells and is moving toward clinical trials that could see these rejuvenated cells tested in patients within two years (Nature).
2. Cloud-based quantum computing
Scientists at Oxford University Physics have developed a method called “blind quantum computing,” which enhances security and privacy for users engaging in cloud-based quantum computing, as detailed in their study published in Physical Review Letters. This breakthrough allows individuals and companies to perform large-scale quantum computations remotely without compromising the confidentiality of their data or algorithms, and even verify the accuracy of results. The approach, which uses a combination of quantum memory and photon detection, could lead to the development of devices that integrate securely with laptops, promoting wider access and use of quantum computing technologies (Oxford).
3. Spatial computing for business
Apple Vision Pro introduces an era in spatial computing for businesses, enabling the creation of customized workspaces, enhanced collaboration on 3D designs, and specialized training experiences that were previously unattainable. Utilizing visionOS, Apple’s new platform leverages high-resolution displays and powerful sensors to blend digital content with the physical world, providing a seamless interface for applications like Microsoft 365 and SAP Analytics Cloud, and allowing for intricate tasks such as detailed 3D modeling and real-time data visualization. This technological advancement is set to revolutionize enterprise operations by improving productivity, facilitating immersive training and guided work, and enhancing the ability to manage and secure devices at scale, promising significant impacts across various industries (Apple).
4. AR for surgery
Dr. Alberto Rodriguez, a surgeon and CEO of Levita Magnetics, performed the first-ever augmented reality (AR) abdominal surgery using Meta’s Quest 3 XR headset and Levita’s MARS system, enabling a less invasive gallbladder removal with enhanced visibility and control for the surgical team. The FDA-approved MARS system uses a combination of magnets and machines to minimize incisions, reducing patient pain and improving recovery, and is complemented by AR to increase immersion and precision in surgery. Rodriguez plans to further evaluate the benefits of AR in surgery through clinical trials, focusing on ergonomics and precision, while also highlighting the potential for global collaboration and real-time data access during surgeries, revolutionizing surgical practices and outcomes (FoxNews).