Major Chatbots Fail Accuracy Tests on Elections and News, Triggering Global Warnings

Image: Pbs
Main Takeaway
A study by 22 public broadcasters found AI chatbots misrepresent news 45% of the time, prompting election officials worldwide to warn voters against.
Jump to Key PointsSummary
How the Tests Were Conducted
Researchers from 22 international public service media organizations, including the BBC and DW, tested four major AI assistants: OpenAI's ChatGPT, Microsoft's Copilot, Google's Gemini, and Perplexity AI. The BBC gave these chatbots content from its website and asked them questions about the news. In parallel, AlgorithmWatch and CASM Technology evaluated chatbot responses to questions about German state elections in Thuringia, Saxony, and Brandenburg. Stanford researchers evaluated 15 large language models from OpenAI, Google, Meta, and DeepSeek on over 6,000 claims fact-checked by PolitiFact across 18 years.
The breadth of testing approaches reveals a coordinated international effort to measure chatbot reliability. No single methodology dominated. Some tests used controlled content from partner news organizations, while others examined live responses to current political questions. The Dutch Data Protection Authority conducted its own comparison of ChatGPT, Gemini, Grok, and Le Chat ahead of October 2025 parliamentary elections.
What the Data Actually Shows
The headline finding from the 22-broadcaster study: AI assistants misrepresent news content 45% of the time, regardless of language or territory. The BBC found that four major chatbots inaccurately summarized news stories when given direct access to source material. Stanford's preprint showed that even when tasked with fact-checking against a curated database, models struggled to consistently match PolitiFact's professional ratings.
Accuracy problems aren't uniform. AlgorithmWatch noted that some chatbots improved in certain aspects since 2024, but still fell short of provider promises to combat election misinformation. MIT researchers found an additional layer of concern: chatbots provide less accurate and more dismissive answers to vulnerable users, compounding risks for marginalized communities. The Dutch study specifically identified bias as a core problem across all tested systems.
Specific Failures in Election Contexts
Election information proved particularly treacherous territory. The BBC found chatbots giving misleading advice ahead of the Senedd (Welsh Parliament) election, with errors in candidate lists, constituency names, and policy details. AlgorithmWatch's German research showed continued unreliability for state election queries. A 2024 PBS report, based on findings from AI experts and bipartisan election officials, documented false and misleading information from popular chatbots that threatened to disenfranchise voters during U.S. presidential primaries.
U.S. government officials have responded with direct warnings. New York Attorney General Letitia James was among officials cautioning voters against relying on AI chatbots for election-related questions, as CNBC reported in November 2024. The Dutch Data Protection Authority issued an explicit advisory against using chatbots for voting advice, citing both unreliability and bias.
Why Technical Safeguards Keep Falling Short
Providers claim their systems have better safeguards. AlgorithmWatch found that Microsoft improved Copilot's election misinformation protections in German after researcher feedback. Yet barriers to data access restricted investigations into other chatbots, suggesting transparency gaps that hinder accountability.
Source selection presents another structural problem. Research from Ruhr University Bochum and the Max Planck Institute found that AI chatbots use different sources than Google search, often citing less-known websites. This divergence means chatbot answers rest on a different, sometimes less authoritative, information foundation than traditional search. One journalist's month-long experiment with using chatbots as a news source, reported by MENAFN, resulted in Gemini inventing an entirely fictional news outlet to support a false claim about a Quebec school bus strike.
What Regulators and Platforms Are Doing Now
Responses have been fragmented. The Dutch Data Protection Authority conducted its own testing and issued public warnings. U.S. officials including state attorneys general have added chatbot warnings to broader election integrity communications. The BBC's head of news and current affairs stated that developers of these tools are "playing with fire," reflecting frustration among content creators whose work is being mangled by summarization systems.
Platform responses remain largely reactive. Microsoft adjusted Copilot after German researcher intervention. OpenAI, Google, Anthropic, and xAI have not announced systematic overhauls of election information handling in response to these specific findings. The recurring pattern: studies document failures, limited fixes follow for high-profile cases, and fundamental reliability issues persist across new election cycles.
What This Means for the Next Election Cycle
The timing amplifies stakes. With U.S. midterms approaching and multiple European elections scheduled, chatbot inaccuracy shifts from technical concern to active democratic risk. The 45% misrepresentation rate from the broadcaster study suggests users have worse than coin-flip odds of receiving accurate news summaries from major AI tools.
For voters, the practical implication is stark: official election websites, verified candidate materials, and established news organizations remain more reliable than chatbot queries. For AI providers, the accumulated evidence from multiple independent studies suggests incremental safety improvements aren't keeping pace with deployment scale. The gap between marketing claims about responsible AI and measured performance on high-stakes topics continues to widen.
The Deeper Problem of Information Access
Beyond accuracy metrics lies a structural challenge. AlgorithmWatch noted that barriers to data access greatly restricted investigations, meaning much chatbot behavior remains unexamined. Without systematic transparency, independent verification of improvement claims becomes impossible.
The research also reveals a growing divergence in how information reaches users. Traditional search engines, for all their flaws, operate through mechanisms that researchers broadly understand and can audit. AI chatbots introduce opacity at the source-selection stage, citation stage, and synthesis stage. Stanford's finding that curated evidence can help, but not eliminate, fact-checking failures points toward partial solutions, not fixes. The trajectory suggests chatbots will remain prominent information intermediaries without being reliable ones, leaving users to navigate an increasingly confusing landscape without adequate guardrails.
Key Points
22-broadcaster study finds 45% news misrepresentation rate across major AI chatbots
Election information triggers specific failures: wrong candidates, fake policies, invented sources
Government officials in US, Netherlands, Germany warn voters against chatbot election advice
MIT research shows chatbots give less accurate answers to vulnerable and marginalized users
Provider claims of improved safeguards outpace independently verified performance gains
Questions Answered
Researchers tested ChatGPT, Google Gemini, Microsoft Copilot, Anthropic Claude, xAI Grok, Perplexity AI, and Le Chat across multiple studies.
A study by 22 public broadcasters found AI assistants misrepresent news content 45% of the time, regardless of language or territory.
Documented errors include incorrect candidate lists, wrong constituency names, fabricated policy details, and invented news sources to support false claims.
Yes. U.S. state attorneys general, the Dutch Data Protection Authority, and German election researchers have all issued public warnings against using chatbots for voting information.
MIT research found chatbots provide less accurate and more dismissive answers to vulnerable and marginalized users, compounding existing information inequities.
Some limited improvements exist, such as Microsoft adjusting Copilot after German researcher feedback, but systematic fixes lag behind deployment scale and marketing claims.
Source Reliability
54% of sources are highly trusted · Avg reliability: 76
Go deeper with Organic Intel
Simple AI systems for your life, work, and business. Each one includes copyable prompts, guides, and downloadable resources.
Explore Systems