Automation

AI’s risk: Big tech’s bold moves, strange missteps & the search for safety

As AI becomes central to search, decision-making, and even creative work, the question isn’t just whether these models can perform, but how much risk they carry when they fail.

Big tech often makes moves that confuse. Last year, OpenAI moved away from a non-profit structure, pleasing investors by becoming more like a normal startup. However, at the same time, the AI giant closed their superalignment team, which focused on the long-term risks of AI. As a consumer of their product, I’m still wondering where Open AI stands when it comes to AI risk.

And AI risk is significant.

AI has been hallucinating a lot. Recently Google’s Gemini refused to generate images of white people, especially white men. Instead, users were able to generate images of Black popes and female Nazi soldiers. Google’s efforts to make LLM less biased has backfired. Google apologized and paused the feature

As AI becomes central to search, decision-making, and even creative work, the question isn’t just whether these models can perform, but how much risk they carry when they fail.

Google’s “AI Overview” feature told users they could use glue to stick cheese to pizza and eat one rock per day. It even asked users to ‘drink a couple of liters of light-colored urine to pass kidney stones and said that ‘geologists recommend humans eat one rock per day’. A few months ago, Google also upset Indian IT minister Rajeev Chandrasekhar when Gemini gave a biased opinion about Prime Minister Narendra Modi. Gemini also inaccurately depicted people of color in Nazi-era uniforms, showcasing historically inaccurate and insensitive images. 

In another incident, Microsoft’s Bing chat told a New York Times reporter to leave his wife.

Experts like Joscha Bach say Gemini’s behaviour reflects the social processes and prompts fed into it rather than being solely algorithmic. As per MIT Technology Review, the models that power AI-powered search engines simply predict the next word (or token) in a sequence, which makes them appear fluent but also leaves them prone to making things up. They have no ground truth to rely on, but instead choose each word purely on the basis of a statistical calculation. Worst of all? There’s probably no way to fix things. A good reason we shouldn’t blindly trust AI search engines. 

According to research, people trust the advice of AI ethics advisors just as much as human ethics advisors, and they assign the same responsibility to both. Is that well found though?

Earlier this month, a group of researchers from multiple universities argued that LLM agents should be evaluated primarily on the basis of their riskiness, not just how well they perform. In real-world, application-driven environments, especially with AI agents, unreliability, hallucinations, and brittleness are ruinous. One wrong move could spell disaster when money or safety are on the line.

What if we could measure this risk?

A UK project called Safeguarded AI aims to build AI systems that can provide quantitative guarantees about the effect of AI on the real world, like risk scores. The project wanted to form AI safety mechanisms by bringing together scientific world models and mathematical proofs. These proofs were to explain the AI’s work, while humans verified whether the AI model’s safety checks are correct. In August, Yoshua Bengio, known as the ‘godfather’ of AI, joined this project, a sign of how crucial such work is.

A paper from OpenAI shows that a little bit of bad training can make AI models go rogue. A group of researchers discovered that fine-tuning a model (in their case, OpenAI’s GPT-4o) by training it on code that contains certain security vulnerabilities could cause the model to respond with hateful, or obscene, or otherwise harmful content, even when the user inputs completely benign prompts.

The researchers found that this problem is generally pretty easy to fix. They could detect evidence of this so-called misalignment and even shift the model back to its regular state through additional fine-tuning with true information. 

AI’s growing influence demands more than clever features and flashy launches, it calls for accountability. As AI reshapes industries and everyday life, the path forward must balance ambition with caution. After all, progress without safeguards doesn’t just confuse, it can endanger the very trust these technologies depend on.

Navanwita Bora Sachdev

Navanwita is the editor of The Tech Panda who also frequently publishes stories in news outlets such as The Indian Express, Entrepreneur India, and The Business Standard

Recent Posts

Britive joins AWS Security Hub Extended Plan to eliminate standing privileges across 

Cloud security firm Britive announced that its unified privileged access management (PAM) platform is now…

10 hours ago

AI Launches: Conversational AI, Cybersecurity, Wellness & Communication

The Tech Panda takes a look at recent launches in the superfast field of Artificial…

15 hours ago

Funding alert: Tech startups that raked in moolah this month

The Tech Panda takes a look at recent funding events in the tech ecosystem, seeking…

16 hours ago

Why Edge AI is crucial for real-time traffic surveillance on Indian roads & highways

India has one of the most extensive road networks in the world, growing at its…

1 day ago

Women in finance increasingly eye entrepreneurship, but barriers to access persist

International Women’s Day provides an important moment to reflect on how professions can support equitable…

3 days ago