Knowledge of probability calculus, statistics, programming, and machine learning basics.

# Machine Learning

Jana Faganeli Pucer

Lectures:

- What is machine learning, what are its basic principles, what are we trying to achieve with it.
- A review of linear regression and an in-depth overview of regularised linear regression methods.
- Classification using logistic regression.
- What is a cost function and which are the most commonly used cost functions.
- Gradient descent and stochastic gradient descent and why they are useful methods in machine learning.
- Generalised linear models.
- Evaluation of machine learning models (cross-validation, the bootstrap method)
- Ensemble methods (especially bagging and boosting and random forests)
- Kernel methods (Gaussian processes, support vector machines)
- Artificial neural networks (activation function, backpropagation, neural network training, regularisation)
- Methods for dimensionality reduction (principal component analysis, matrix factorisation, clustering)
- Explainable machine learning models
- Reinforcement learning

Lab work:

At the lab work, students consolidate the material covered in lectures by applying it to practical problems. The emphasis is on students working independently with the help of lab assistants. The aim of the lab work is to explore how different methods work in practice through programming approaches.

James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning (Vol. 112, p. 18). New York: Springer.

Trevor, H., Robert, T., & Jerome, F. (2016). The elements of statistical learning: data mining, inference, and prediction (second edition).

Murphy, K. P. (2022). Probabilistic machine learning: An introduction. MIT press.

The course aims to deepen the knowledge of machine learning that students have acquired in their undergraduate studies. Students learn about the most successful approaches and delve deeper into them to understand how they work and what their limitations are. The course prepares the student for further, more in-depth study of machine learning approaches. It also prepares students to apply machine learning methods in practice, as at the end of the course they will be able to judge which of the presented techniques to use for a given problem and build a prototype solution.

On successful completion of the course, the students will:

- be able to apply various machine learning techniques and methods used in data modelling in practice.
- be able to choose the most appropriate technique to solve a problem.
- be able to evaluate different solutions and their limitations.
- be able to explain a machine learning model.

Lectures, lab work and homework. Special emphasis will be placed on the implementation of different methods to give students an understanding of how they work.

Continuing (homework, lab work)

Final (written and/or oral exam)

grading: 5 (fail), 6-10 (pass) (according to the Statute of UL)

Demaeyer, J., Bhend, J., Lerch, S., Primo, C., Van Schaeybroeck, B., Atencia, A., Ben Bouallègue, Z., Chen, J., Dabernig, M., Evans, G. & Faganeli Pucer, J., (2023). The EUPPBench postprocessing benchmark dataset v1. 0. Earth System Science Data, 15(6), 2635-2653.

Mlakar, P., & Faganeli Pucer, J. (2023). Mixture Regression for Clustering Atmospheric-Sounding Data: A Study of the Relationship between Temperature Inversions and PM10 Concentrations. Atmosphere, 14(3), 481.

Faganeli Pucer, J., & Štrumbelj, E. (2018). Impact of changes in climate on air pollution in Slovenia between 2002 and 2017. Environmental pollution, 242, 398-406.

Faganeli Pucer, J., Pirš, G., & Štrumbelj, E. (2018). A Bayesian approach to forecasting daily air-pollutant levels. Knowledge and Information Systems, 57(3), 635-654.

Pucer, J. F., & Kukar, M. (2018). A topological approach to delineation and arrhythmic beats detection in unprocessed long-term ECG signals. Computer methods and programs in biomedicine, 164, 159-168.