Objectives

After learning about fundamental data analysis techniques from the prerequisite course Data analysis and knowledge discovery, this course goes deeper into techniques for building trustworthy artificial intelligence (AI) by rigorous performance estimation with modern resampling methods. The core aim is to adopt a scientific way of thinking on the evaluation design for machine learning based AI systems, namely how to answer to more specific statistical questions regarding prediction performance evaluation that go beyond the usual generalization to unseen data, such as how well the AI system works with new data given that the new data is known to differ from the already observed training data in a specific way. For example, if AI intends to carry out prediction to a geolocation known a priori to be at certain distance away from the already known measurements, one can design the resampling based performance evaluation method accordingly. In addition to the theoretical concepts of performance estimation, students learn how to implement them in practical real-world problems, including chemistry, geoinformatics and medical informatics related case studies.