With artificial intelligence now embedded in everyday marketing workflows, new data from NP Digital’s AI Hallucinations and Accuracy Report reveals that AI inaccuracies are widespread. The report finds that 47.1% of marketers encounter AI errors several times per week, with 36.5% saying hallucinated or inaccurate AI-generated content has gone live.
The report examines which models are most prone to errors, the specific mistakes that occur most frequently and how those issues affect marketers’ day-to-day work. The data is drawn from an analysis of 600 prompts tested for accuracy across six major large language models (LLMs), including ChatGPT, Claude and Gemini, alongside a survey of 565 U.S.-based digital marketers.
“As AI becomes more deeply embedded in marketing workflows, accuracy can’t be treated as an afterthought,” said Ronnie Malewski, Managing Director at NP Digital Canada. “Our findings reinforce just how critical human review remains in an AI-driven environment. When errors and hallucinations go unchecked, they can quickly reach clients or the public, eroding trust and damaging brand credibility and performance. Marketers need stronger processes, better tools, and most importantly greater rigour and accountability to ensure AI supports growth without compromising trust.”
Key findings from the report include:
- Time Intensive Fact-Checking: Nearly half of marketers (47.1%) say they encounter AI inaccuracies several times per week, and more than 70% spend one to five hours each week fact-checking AI-generated output.
- AI Errors Made Public: More than one-third of marketers (36.5%) report that hallucinated or incorrect AI content has been published publicly, most often due to false facts, broken citations or brand-unsafe language.
- Marketers Skip Human Review: Despite widespread awareness of hallucinations and the risk they pose, 23% of marketers say they feel comfortable using AI output without human review.
- Most Accurate Model: In NP Digital’s prompt accuracy analysis, ChatGPT delivered the highest rate of fully correct responses at 59.7%. However, no model consistently avoided errors, particularly on multi-part, niche or real-time questions.
- Most Common Hallucinations: The most common error types across models included omissions, outdated information, fabrication and misclassification, often delivered with high confidence.
- Tasks With Most Errors: AI errors are most common in tasks requiring structure or precision, such as HTML or schema creation, full content development and reporting.
The research highlights a clear takeaway for marketers: AI performs best when supported by strong prompts, stringent review processes and clear guidelines that keep humans in control. And with no single LLM emerging as error-free, the data proves consistent oversight is key to producing reliable outcomes.
See complete findings here: NP Digital’s AI Hallucinations and Accuracy Report