Researchers Surprised That With AI, Toxicity is Harder To Fake Than Intelligence

Researchers from four universities have released a study revealing that AI models remain easily detectable in social media conversations despite optimization attempts. The team tested nine language models across Twitter/X, Bluesky and Reddit, developing classifiers that identified AI-generated replies at 70 to 80% accuracy rates. Overly polite emotional tone served as the most persistent indicator. The models consistently produced lower toxicity scores than authentic human posts across all three platforms.

Instruction-tuned models performed worse than their base counterparts at mimicking humans, and the 70-billion-parameter Llama 3.1 showed no advantage over smaller 8-billion-parameter versions. The researchers found a fundamental tension: models optimized to avoid detection strayed further from actual human responses semantically.

Researchers Surprised That With AI, Toxicity is Harder To Fake Than Intelligence

Published by admin on November 12, 2025

GrapheneOS Defends Data-Wiping Function That Blocked US Border Search

Amazon Trying to Launch a Global Satellite Cellphone Network In 2028

Nvidia, Tech Giants Launch AI Safety Initiative

Researchers Surprised That With AI, Toxicity is Harder To Fake Than Intelligence

Published by admin on November 12, 2025

Related Posts

GrapheneOS Defends Data-Wiping Function That Blocked US Border Search

Amazon Trying to Launch a Global Satellite Cellphone Network In 2028

Nvidia, Tech Giants Launch AI Safety Initiative