The 5 most common quality pitfalls of AI systems (2025 update)

Last updated on 02-09-2025

Artificial Intelligence (AI) is transforming the world at an unprecedented pace.

Yet, beneath the surface, many AI systems face severe quality challenges that threaten their long-term success.

To truly embrace the opportunities, it’s important to understand the foundation of AI systems.

After all, while impressive, they are not magic.

So, what are they?

According to ISO/IEC 5338, co-developed by Software Improvement Group (SIG), AI is classified as a software system with unique characteristics.

AI is software, but what sets AI apart from traditional software is that it doesn’t just follow fixed rules. Instead, it learns by analyzing large datasets, identifying patterns, and applying that knowledge to make informed decisions, solve problems, or answer questions. AI’s unique features often lead to misunderstandings and significant risks, including security threats and even potential harm. To manage these risks, AI needs strong engineering practices and proper regulation.

In this article you’ll learn:

the most common quality pitfalls in AI and big data systems.
Why these issues occur, from education gaps to siloed teams.
Actionable recommendations to avoid a crisis and ensure your AI remains maintainable, secure, and adaptable.

On average, AI/big data systems score lower when it comes to build quality compared to other software. In fact, in our latest 2025 State of software report, 73% scored below average.

The build quality problem

Our dataset of Al/big data systems was compiled by selecting systems that revolve around statistical analysis or machine learning, based on the technologies used (e.g. R and Tensorflow) and documentation.

The visual shows that most AI/big data systems have a low build quality score. Fortunately, the research also highlights that building high-quality, maintainable AI/big data systems is possible, with some systems bucking the trend and achieving high build quality. This prompts the question: where do things typically go wrong?

Why do AI systems falter?

Engineering robust, future-proof AI systems is still a relatively new field.
We see many organizations struggling to transition AI from experimental projects in the lab to scalable, secure, compliant, and maintainable real-world applications. The engineering challenges stem from how AI engineers—such as data scientists—are traditionally managed and trained.

Their focus is often on quickly creating insights and models, not on building systems that are secure, reliable, maintainable, reusable, easy to transfer, and testable.
When examining low-scoring AI and big data systems, a consistent pattern emerges that confirms this assessment: Minimal testing, sparse documentation, and frequent security, privacy, and scalability gaps.

These weaknesses often stem from a deeper structural issue—long, complex code segments that are hard to analyze, modify, reuse, and test.

In the sections that follow, we’ll break down the most common root causes of these challenges—from “lab programming” habits to team silos—and show how to address them before they trigger a full blown AI quality crisis.

Long and complex code segments are hard to analyze, modify, reuse, and test

The more responsibilities a single block of code takes on, the more decision paths it contains — making comprehensive test coverage unrealistic.

This is demonstrated by the dramatically small amount of test code found in AI systems. In a typical AI/big data system, only 1.5% of the code is test code, compared to 43% in traditional systems.

Low build quality drives up change costs and raises the risk of undetected errors. Over time, as data and requirements change, adjustments are typically ‘patched on’ rather than properly integrated, making things even more complicated. Furthermore, transferring to another team becomes less feasible. In other words, typical AI/big data code tends to become a burden

Why do long and complex pieces of code occur?

These issues usually stem from unfocused code—code that serves multiple purposes without a clear separation of responsibilities—and a lack of abstraction.

Useful pieces of code are often duplicated rather than isolated into separate functions, leading to problems with adaptability and readability.

In addition, there’s the challenge of the lack of functional test code mentioned above (‘unit tests’).

One reason for this is that AI engineers tend to rely on integration tests only, which measure the accuracy of the AI model. If the model performs poorly, it may indicate a fault.

This approach has two issues:

The lack of function test code makes it unclear where the problem lies.
The model might perform well, but a hidden fault could prevent it from achieving even better results. For example, a model predicting drink sales using weather reports might score 80% accuracy.

If an error causes the temperature to always read zero, the model can’t reach its full potential of a 95% score. Without proper test code, such errors remain undetected.

Five key causes of AI quality issues

We see the following as underlying causes of AI/big data quality issues:

1. Lab programming

Generally speaking, data scientists are efficient in ad hoc experiments to develop working AI models, not intending to deliver long-term production solutions. Once the model works, there’s little incentive to improve the code, which lacks sufficient testing, risking unnoticed malfunctions upon changes. A working model should be maintainable, transferable, scalable, secure, and robust, just like the code leading to it.

2. Data science education

Data science education often focuses more on data science than on software engineering best practices. Data scientists, AI engineers, are focused on creating working models. But they haven’t been trained sufficiently to create models that need to work outside of the lab.

3. Traditional data science development

Traditional data science development tools offer little support for software engineering best practices. Tools like R and Jupyter are designed for experiments, not creating maintainable software, and some data science languages lack powerful abstraction and testing mechanisms.

4. The SQL pattern

SQL is a standard language for managing data in databases, but its extensive use in AI/big data systems (75-90% of programming work) presents maintainability challenges. Data scientists often find this the least enjoyable and most difficult part of their work, leading to further complications. [1].

5. Siloed teams

In AI/big data systems, teams are often composed mainly of data scientists, whose focus on creating functional models can lead to a lack of software engineering best practices, ultimately causing maintainability issues.

Preventing an AI crisis in your organization

By integrating continuous quality measurement, building cross-functional teams, and applying established software engineering practices to AI, organizations can better ensure that AI systems are more robust, secure, and adaptable.

Continuously measure and improve

Maintainability and test coverage — including functional and unit testing to complement model-level integration tests — must be monitored in real time, with feedback loops for teams.

Have data scientists and software engineers collaborate

Cross-functional teams where data scientists work alongside software engineers. So that Data scientists learn to write more future-proof and robust code, embracing these practices as they see daily work benefits.

Remember that AI is software

It’s crucial to view AI as software with unique characteristics, as outlined in the new ISO/IEC standard 5338 for AI engineering. Instead of creating a new process, this standard builds on the existing software lifecycle framework (standard 12207).

Organizations typically have proven practices like version control, testing, DevOps, knowledge management, documentation, and architecture, which only need minor adaptations for AI. AI should also be included in security and privacy activities, like penetration testing, considering its unique challenges [2]. This inclusive approach in software engineering allows AI to responsibly grow beyond the lab and prevent a crisis.

During our IT leadership event SCOPE 2024, Rob van der Veer, Chief AI Officer, gave a keynote on how leadership can navigate AI and addressed these exact elements.

The risks outlined in this article are not theoretical. They’re visible in the majority of AI and big data systems in production today.

But they are also preventable.

AI is no longer just a lab experiment; we can help you put robust AI tools and systems in production. Our approach leverages our expertise in code quality management for AI and includes benchmark-based assessments, technology readiness evaluations, objective setting, and the identification of code issues specific to AI—ensuring that the systems you develop are reliable, secure, and scalable.

References:

[1]  Software Engineering for Machine Learning: A Case Study”, presented at the 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP) by Amershi et al. from Microsoft.

[2] For more information on AI security and privacy, refer to the OWASP AI Security & Privacy Guide: OWASP AI Security and Privacy Guide

Original article (in Dutch)

To prevent AI crises, our AI readiness guide, authored by Rob van der Veer, outlines 19 steps for secure and responsible AI development. It emphasizes practical strategies for governance, risk management, and IT development. With a focus on testing, DevOps, and collaboration, the guide ensures robust AI integration and minimizes risk while maximizing potential (last updated in August of 2025).

Prevent AI-related crises before they happen. Download our AI readiness guide for actionable steps to ensure robust and secure AI implementation.

How to avoid an AI crisis and develop systems that won't become liabilities [2025 update]

In this article

The build quality problem

Why do AI systems falter?

Long and complex code segments are hard to analyze, modify, reuse, and test

Why do long and complex pieces of code occur?

Five key causes of AI quality issues

1. Lab programming

2. Data science education

3. Traditional data science development

4. The SQL pattern

5. Siloed teams

Preventing an AI crisis in your organization

Continuously measure and improve

Have data scientists and software engineers collaborate

Remember that AI is software

By USE CASE

RESOURCES

LOGIN

Our Solutions

Platform

Consultancy

Partners

Certifications

Legal Pages

Social Media

Developed by

Experience Sigrid live

Register for access to Summer Sessions

How to avoid an AI crisis and develop systems that won't become liabilities [2025 update]

In this article​

The build quality problem

Why do AI systems falter?

Long and complex code segments are hard to analyze, modify, reuse, and test

Why do long and complex pieces of code occur?

Five key causes of AI quality issues

1. Lab programming

2. Data science education

3. Traditional data science development

4. The SQL pattern

5. Siloed teams

Preventing an AI crisis in your organization

Continuously measure and improve

Have data scientists and software engineers collaborate

Remember that AI is software

By USE CASE

RESOURCES

LOGIN

Platform

Consultancy

Partners

Social Media

Experience Sigrid live

Register for access to Summer Sessions

In this article