The Shifting Landscape of AI Safety: As Capabilities Advance, Do Our Frameworks Keep Pace?
In the rapidly evolving world of artificial intelligence, the relationship between innovation and safety has become increasingly complex. As AI systems grow more capable, the frameworks designed to ensure their safe development must evolve in parallel. Yet a critical question emerges: Are our safety approaches keeping pace with technological advancement, or are we witnessing a gradual erosion of safeguards in the race to build increasingly powerful systems?
Recent developments across the AI landscape—from OpenAI’s revised safety framework to DeepMind’s scientific vision for AGI, from novel applications in consumer devices to international governance efforts—paint a picture of an industry in flux, struggling to balance competitive pressures with responsible development. This tension lies at the heart of today’s AI ecosystem and will shape how these technologies impact our future.

The Evolution of OpenAI's Safety Approach
OpenAI recently updated its “Preparedness Framework,” the document that details how the company monitors its AI models for potentially catastrophic risks. While the update includes several improvements, one significant change has raised eyebrows among safety experts: the company will no longer assess its AI models before release for risks related to persuasion and manipulation—capabilities that could potentially influence elections or enable effective propaganda campaigns.
Instead of treating these risks as core technical concerns requiring pre-deployment testing, OpenAI will now address them through terms of service restrictions and post-deployment monitoring. The company has also indicated a willingness to release models even if they present “high risk,” provided appropriate mitigation steps are taken. Perhaps most notably, OpenAI stated it might consider releasing a “critical risk” model if a competitor has already released something similar.
This shift in approach has divided the AI safety community. Some experts commend OpenAI for voluntarily publishing its framework and note improvements like clearer risk categories. Others, including former OpenAI safety researcher Steven Adler, have criticized the reduction in safety commitments, particularly the removal of persuasion risks as a core concern.
“OpenAI appears to be shifting its approach,” observed Shyam Krishna, a research leader in AI policy and governance at RAND Europe. “Instead of treating persuasion as a core risk category, it may now be addressed either as a higher-level societal and regulatory issue or integrated into OpenAI’s existing guidelines on model development and usage restrictions.”
More critical voices, like Courtney Radsch, a senior fellow at Brookings, called the framework “another example of the technology sector’s hubris,” noting that downgrading persuasion risks “ignores context – for example, persuasion may be existentially dangerous to individuals such as children or those with low AI literacy or in authoritarian states and societies.”
But others defend OpenAI’s approach, suggesting that addressing persuasion risks through terms of service is reasonable, given the challenges of evaluating these risks in pre-deployment testing. Unlike more concrete dangers—such as biological weapons risks or cybersecurity vulnerabilities—persuasion capabilities exist on a spectrum and are contextually dependent.
Perhaps most concerning is the statement that if “another frontier AI developer releases a high-risk system without comparable safeguards, we may adjust our requirements.” This has prompted fears of a “race to the bottom” in safety standards, where competitive pressures drive companies to prioritize capabilities over safeguards.
“They’re basically signaling that none of what they say about AI safety is carved in stone,” said AI critic Gary Marcus. “What really governs their decisions is competitive pressure—not safety.”
DeepMind's Scientific Vision for AGI
While OpenAI’s approach appears increasingly influenced by market forces, DeepMind CEO Demis Hassabis continues to frame his company’s pursuit of artificial general intelligence (AGI) primarily in scientific terms. Fresh off winning the 2024 Nobel Prize in Chemistry for AlphaFold—an AI system that can predict protein structures with unprecedented accuracy—Hassabis envisions AGI as a scientific tool that could help solve humanity’s greatest challenges.
“I think some of the biggest problems that face us today as a society, whether that’s climate or disease, will be helped by AI solutions,” Hassabis says. “I’d be very worried about society today if I didn’t know that something as transformative as AI was coming down the line.”
For Hassabis, AGI represents the ultimate scientific instrument—a technology that could not only solve existing problems but generate entirely new explanations for the universe. His definition of AGI centers on scientific discovery: a system that could develop theories as groundbreaking as Einstein’s general relativity using only the information available in Einstein’s time.
This vision sets Hassabis apart from other industry leaders. While OpenAI defines AGI in economic terms—as a technology that can perform most economically valuable tasks better than humans—Hassabis frames it as a path to scientific enlightenment. “I identify myself as a scientist first and foremost,” he says. “The whole reason I’m doing everything I’ve done in my life is in the pursuit of knowledge and trying to understand the world around us.”
Yet even as Hassabis maintains this scientific idealism, he must navigate the realities of corporate ownership and geopolitical tension. When Google acquired DeepMind in 2014, Hassabis insisted on a contractual firewall explicitly prohibiting his technology from being used for military applications. That protection has since disappeared, and DeepMind’s AI systems are now being sold to militaries, including Israel’s.
Hassabis frames this change not as a compromise to access Google’s resources, but as a pragmatic response to geopolitical reality: “I think we can’t take for granted anymore that democratic values are going to win out.” This justification, whether sincere or expedient, raises questions about what other principles might be sacrificed as AGI approaches reality.
Hassabis believes AGI could arrive within five to ten years—a relatively conservative timeline by industry standards—and acknowledges two primary categories of risk. First are the capabilities risks: the possibility that AI systems could enhance bad actors’ abilities to cause harm, such as by synthesizing deadly pathogens. Second are what might be called alignment risks: the possibility that increasingly autonomous AI systems might deceive their creators or act against human interests.
What truly concerns Hassabis, however, are the coordination challenges. Even if well-intentioned companies build safe AGI systems, that doesn’t prevent the creation and proliferation of unsafe alternatives. International cooperation becomes essential, yet increasingly difficult in an era of rising geopolitical tensions, particularly between the U.S. and China.
Despite these challenges, Hassabis remains optimistic about AGI’s potential to create abundance rather than scarcity. “In the limited-resource world which we’re in, things ultimately become zero-sum,” he says. “What I’m thinking about is a world where it’s not a zero-sum game anymore, at least from a resource perspective.”
Novel Applications Expanding AI's Reach
As leading AI labs debate safety frameworks and AGI timelines, novel applications continue to expand AI’s footprint across domains previously considered beyond reach. One striking example is Google’s DolphinGemma, an AI model designed to analyze dolphin communication patterns.
Developed in collaboration with the Georgia Institute of Technology and the Wild Dolphin Project, DolphinGemma represents a significant step in applying language model technology to interspecies communication. The system doesn’t aim to directly translate dolphin speech but rather to recognize patterns in vocalizations and predict likely responses—similar to how human-focused language models anticipate the next word in a sentence.
The foundation for this research is a dataset collected by the Wild Dolphin Project, which has conducted underwater studies of Atlantic spotted dolphins in the Bahamas since 1985. By correlating specific vocalizations—signature whistles, burst-pulse squawks, and buzzing clicks—with social contexts, researchers have created a framework for DolphinGemma’s pattern recognition capabilities.
What makes this application particularly notable is its deployment method: the model is small enough to run on Pixel smartphones, allowing researchers to conduct real-time analysis in the field without requiring large-scale computing infrastructure. Google plans to release DolphinGemma as an open model this summer, enabling researchers to adapt it for other cetacean species.
Meanwhile, AI is also being integrated more deeply into consumer devices. Perplexity, an AI search and assistant company, is reportedly partnering with Motorola to feature its AI assistant in the upcoming Razr foldable phone. According to Bloomberg, Perplexity Assistant will be offered alongside Google’s Gemini as an option, with a special user interface designed to encourage customer engagement.
This integration represents another frontier in AI deployment: the embedding of sophisticated AI assistants directly into consumer devices, with capabilities that extend beyond traditional virtual assistants. Perplexity is also reportedly in early talks with Samsung about similar integrations and working with T-Mobile’s parent company on an “AI Phone” with agents that could handle complex tasks without requiring users to interact with individual apps.
These applications—from interspecies communication to advanced consumer assistants—illustrate how AI capabilities continue to expand into new domains. Each new application creates novel governance and safety challenges, as these systems interact with users and environments in increasingly complex ways.
The Global Governance Response
As AI capabilities advance and applications diversify, governance frameworks are evolving to address the associated risks. In May, the Council of Europe adopted what it claims is the first binding international treaty on artificial intelligence, establishing a legal framework covering the entire lifecycle of AI systems.
Unlike the European Union’s AI Act, which applies only to EU member states, this treaty can be signed by countries outside Europe. Indeed, eleven non-member states—including Argentina, Israel, Japan, the United States, and Uruguay—participated in drafting the convention.
The treaty was opened for signature on 5 September in Vilnius (Lithuania). Andorra, Georgia, Iceland, Montenegro, Norway, the Republic of Moldova, San Marino, the United Kingdom as well as Israel, the United States of America and the European Union signed it. In April 2025, japan and Canada signed the treaty
The treaty aims to ensure “transparency and oversight requirements tailored to specific contexts and risks,” including the identification of AI-generated content. Signatories must ensure accountability for adverse impacts and respect for equality, privacy rights, and non-discrimination principles.
“It is a response to the need for an international legal standard supported by states in different continents which share the same values to harness the benefits of Artificial intelligence, while mitigating the risks,” said Council of Europe Secretary-General Marija Pejčinović Burić.
This development represents a significant step toward international cooperation on AI governance—a necessary complement to the corporate frameworks and national regulations being developed across the globe. Yet significant challenges remain, particularly in harmonizing governance approaches across different regions with varying cultural, economic, and political contexts.
The effectiveness of such treaties ultimately depends on widespread adoption and enforcement, particularly by the nations and companies at the forefront of AI development. As competitive pressures intensify, maintaining robust governance standards may become increasingly difficult without coordinated global action.
The Next Generation of AI Models
While governance frameworks continue to evolve, AI capabilities are advancing at a remarkable pace. OpenAI recently released its latest models—o3 and o4-mini—which the company describes as capable of “thinking with images.” These models can understand and analyze user sketches and diagrams, even those of low quality, integrating visual information directly into their reasoning processes.
OpenAI claims these are the first AI models that don’t just “see” images but can “integrate visual information directly into the reasoning chain.” The o3 model is specifically optimized for mathematics, coding, science, and image understanding, while o4-mini operates faster and at lower cost. Both models can independently use all ChatGPT tools—web browsing, Python coding, image understanding, and image generation—to solve complex, multi-step problems.
This represents another significant advancement in AI capabilities, moving systems closer to the kind of multi-modal reasoning characteristic of human cognition. As models become more sophisticated in their ability to process and reason with different types of information, they potentially become more capable of the kinds of persuasion and manipulation that OpenAI no longer evaluates before deployment.
OpenAI emphasized that both models underwent its “most rigorous safety program to date” and referenced its updated Preparedness Framework—the same framework that no longer considers persuasion risks during pre-deployment evaluation. This juxtaposition of advancing capabilities and evolving safety standards exemplifies the tension at the heart of today’s AI development landscape.
Finding Balance in an Accelerating Field
The developments across the AI landscape—from OpenAI’s revised safety framework to DeepMind’s scientific vision, from novel applications to governance efforts—reveal an industry at a critical inflection point. As capabilities advance, the frameworks designed to ensure safety and responsible development are evolving, sometimes in concerning directions.
OpenAI’s decision to downgrade persuasion risks in its safety evaluation process comes just as its models gain enhanced reasoning capabilities that could potentially make them more persuasive. DeepMind’s Hassabis maintains his scientific idealism while making pragmatic concessions to geopolitical realities. Novel applications expand AI’s reach into new domains, creating governance challenges that existing frameworks may be ill-equipped to address.
The central question remains: Are our safety approaches keeping pace with technological advancement? The evidence suggests a complex answer. In some areas, such as international governance, we see promising developments like the Council of Europe treaty. In others, like corporate safety frameworks, competitive pressures appear to be driving a reduction in pre-deployment safeguards.
As we navigate this critical period in AI development, maintaining robust safety standards while enabling beneficial innovation will require unprecedented coordination among companies, governments, and civil society. The decisions made today—about safety frameworks, governance structures, and deployment strategies—will shape not just the trajectory of AI development but potentially the future of human society.
The responsibility falls not just on AI developers but on all stakeholders to ensure that as AI capabilities advance, our commitment to safe and beneficial development advances in parallel. In this rapidly evolving landscape, vigilance and engagement from diverse perspectives will be essential to realizing AI’s potential while mitigating its risks.
References
- Fortune. (2025, April 16). OpenAI updated its safety framework—but no longer sees mass manipulation and disinformation as a critical risk. https://fortune.com/2025/04/16/openai-safety-framework-manipulation-deception-critical-risk/
- Time. (2025). Demis Hassabis Is Preparing for AI’s Endgame. https://time.com/7277608/demis-hassabis-interview-time100-2025/
- Newsweek. (2025, April 14). Google Launches AI That Talks to Dolphins. https://www.newsweek.com/dolphingemma-google-ai-dolphins-2059955
- The Verge. (2025). Perplexity is reportedly key to Motorola’s next Razr. https://www.theverge.com/news/650585/perplexity-ai-samsung-motorola-razr-assistant
- Euronews. (2024, May 17). Council of Europe adopts first binding international AI treaty. https://www.euronews.com/next/2024/05/17/council-of-europe-adopts-first-binding-international-ai-treaty
- CNBC. (2025, April 16). OpenAI says newest AI model can ‘think with images,’ understanding diagrams and sketches. https://www.cnbc.com/2025/04/16/openai-releases-most-advanced-ai-model-yet-o3-o4-mini-reasoning-images.html