Metrics in machine learning

This study examined the use of metrics for quality assurance of AI systems across the entire ML lifecycle, from data preparation to feature engineering, training and inference. The study provides a structured overview of common evaluation metrics, tests and approaches and is based on a survey of ML experts. The results show an iterative way of working in practice and a preference for a few, simple and easily interpretable error measures, while more complex metrics are used less frequently due to implementation differences and higher interpretation effort. By critically comparing the advantages and disadvantages of different metrics, supplemented by practical recommendations, the study supports the selection of suitable methods for specific applications. It shows that the combination of careful data hygiene, robust evaluation metrics, targeted tests, focused tuning and a resilient inference setup leads to trustworthy and powerful AI systems in productive use.