AI models are prone to deteriorating performance
Recent research spearheaded by prominent institutions, including Stanford University, UC Berkeley, Princeton University, and teams at Google, has shed light on a troubling trend: after a period of notable advancements, the performance of large language models (LLMs)—the powerhouse behind generative AI—might actually start to deteriorate.
This phenomenon has been particularly noticeable with ChatGPT, likely because its widespread popularity invites increased scrutiny. It remains uncertain whether its competitors, such as Google's Gemini and Anthropic's Claude, will undergo similar performance declines. I won't go into the technical reasons for declining LLM performance here. For a deeper dive into the scientific insights on this issue, I recommend checking out the latest publications from these research institutions and the resource list below.
For people keen to adopt or continue using Gen AI the potential decline in performance raises crucial questions about the sustainability and future improvements of these AI systems. What can you do?
Domain expertise to the rescue, again
This issue underscores a fundamental truth about using AI or any data-driven technology: the importance of domain and sector experience. To use AI tools "appropriately"—that is, in ways that optimise your tasks while minimising potential harms—a thorough understanding of both the technology and its application context is essential.
Historically, the effective use of data techniques—be it in data science, analytics, or artificial intelligence—has always required a nuanced understanding of the relevant domain. Without this expertise, even the most advanced tools can lead to suboptimal outcomes or unintended consequences.
The bottom line? If you're not a domain expert, always engage one when using an AI tool. Or, tread very carefully...
For example, we encourage businesses that use Gen AI to help with content and copy research to ensure that domain experts are part of the editorial process so that toxic or false information doesn't accidentally seep into copy and cause reputational harm.
For those looking to use AI effectively, consider exploring resources that offer best practices in AI implementation across various industries. MIT’s Technology Review provides excellent insights into how businesses are integrating AI into their operations safely and effectively.
Finally, here's a list of research and articles on this topic:
- Chen and Zou (Stanford) and Zaharia's (Berkeley) research paper: https://arxiv.org/pdf/2307.09009.pdf
- Changmao Li, Jeffrey Flanigan (UC) on LLMs struggle with new data - https://arxiv.org/abs/2312.16337.
- A general discussion on the topic from TechTarget- https://www.techtarget.com/whatis/feature/Model-collapse-explained-How-synthetic-training-data-breaks-AI.
- A discussion of the Stanford and Berkely research in Search Engine Journal- https://www.searchenginejournal.com/chatgpt-quality-worsened/492145/
Love you. Bye!
Found this Little Missions interesting?
Subscribe to get Little Missions delivered straight to your inbox.