There are no prerequisites.
Machine learning for data science 1
Tomaž Hočevar, Blaž Zupan
Linear models. Linear regression.
Linear discriminant analysis. Logistic
regression. Gradient descent.
Stochastic gradient descent.
The machine learning approach.
Cost functions. Empirical risk
minimization. Maximum likelihood
estimation. Model evaluation. Crossvalidation.
Feature selection. Search-based
feature selection. Regularization.
Tree-based models. Decision trees.
Random forest. Bagging. Gradient
tree boosting.
Clustering. k-means. Expectation
Maximization.
Non-linear regression. Basis
functions. Splines. Support vector
machines. Kernel trick.
Neural networks. Perceptron.
Activation functions.
Backpropagation.
- T. Hastie, R. Tibshirani, J. Friedman: The elements of statistical learning : data mining, inference, and prediction, 2nd ed., New York : Springer, 2017.
The course aims at familiarizing the
student with the fundamentals of machine
learning, classical machine learning models,
and the practicalities of applying machine
learning to real-world problems. The
course prepares students for the study of
advanced machine learning methods.
After successfully completing the course,
students should be able to:
-
Apply the machine learning
approach to data analysis. -
Evaluate different types of models.
-
Choose the correct model for the
problem at hand. -
Interpret machine learning results.
- Identify potential issues.
Lectures, , homework, and a set of smaller
projects.
Continuing (homework, projects)
Final (written exam)
Grading: 6-10 pass, 5 fail
Hočevar T, Zupan B, Stålring J (2021) Conformal Prediction with Orange. Journal of Statistical Software 98:1-22.
Hočevar T, Demšar J (2014) A combinatorial approach to graphlet counting. Bioinformatics 30(4):559-65.
Čopar A, Žitnik M, Zupan B (2017) Scalable non-negative matrix tri-factorization, BioData Mining 10:41.
Žitnik M, Zupan B (2016) Jumping across biomedical contexts using compressive data fusion, Bioinformatics 32(12):i90-i100.
Stražar M, Žitnik M, Zupan B, Ule J, Curk T (2016) Orthogonal matrix factorization enables integrative analysis of multiple RNA binding proteins, Bioinformatics 32(10): 1527-35.
Žitnik M, Nam EA, Dinh C, Kuspa A, Shaulsky G, Zupan B (2015) Gene prioritization by compressive data fusion and chaining, PLoS Computational Biology 11(10):e1004552.
Starič A, Demšar J, Zupan B (2015) Concurrent software architectures for exploratory data analysis. WIREs Data Mining and Knowledge Discovery 5(4):165-180.