Topics in data analysis

2022/2023
Programme:
Mathematics, First Cycle
Year:
3 year
Semester:
second
Kind:
optional
Group:
B
ECTS:
5
Language:
slovenian
Hours per week – 2. semester:
Lectures
2
Seminar
0
Tutorial
2
Lab
0
Prerequisites

Completed course Introduction to programming.

Content (Syllabus outline)

Multivariate methods for analysis of multdimensional data: principal components, multidimensional scaling, clustering.
Modern approaches to data visualization: e.g. ggplot2, Many Eyes, D3.js
Basics of analysis of temporal and spatial data. Time series analysis: basic properties, smoothing, ARIMA.
Basic network analysis: basic properties, network visualization, distribution of vertex degrees, scale-free networks, imporant parts of networks.
Big data: symbolic data analysis, parallel and distributed computation (Hadoop).
The topics will be mostly presented in a 'black box' manner - it will be described WHAT can be achieved using particular methods subject to WHICH conditions, but it will not be explained in detail HOW the methods work. For students interested in details and background of methods, references will be given.

Readings

B. Everitt, T. Hothorn: An Introduction to Applied Multivariate Analysis with R. Springer, 2011.
W.N. Venables, B.D. Ripley: Modern Applied Statistics with S (fourth edition). Springer, 2002.
L. Wilkinson. The Grammar of Graphics (second edition). Springer, 2005.
H. Wickham: ggplot2 - Elegant Graphics for Data Analysis. Springer, 2009.
R.H. Shumway, D.S. Stoffer: Time Series Analysis and Its Applications With R Examples (third edition). Springer, 2011.
R.S. Bivand, E.J. Pebesma, V. Gómez-Rubio: Applied Spatial Data Analysis with R. Springer, 2008.
W. de Nooy, A. Mrvar, V. Batagelj: Exploratory Social Network Analysis with Pajek, (revised and expanded second edition). Cambridge University Press, 2012.
E. Diday, M. Noirhomme-Fraiture (eds.): Symbolic Data Analysis and the SODAS Software. Wiley, 2008.
Q.E. McCallum, S. Weston: Parallel R. O’Reilly, 2012.
spletna stran: http://d3js.org/
spletna stran: http://www-958.ibm.com/software/analytics/manyeyes/

Objectives and competences

Through a practical work on various selected data sets students learn basics on data analysis and data visualization. They experience basic use of a relatively wide range of different analytic and visualization methods, but without in-depth understanding of the methods.

Intended learning outcomes

Knowledge and understanding: Student familiarises him/herself with various types of data sets and wide range of basic approaches and methods for their data analysis and visualization. He also upgrades data-analytic and in part also programming skills.
Application: Carrying out basic data analysis on various types of data sets. Building of analytic methods. Preparation of charts and graphical presentations of data.
Reflection: The importance of modern information technology to analyze large amounts of data, the importance of data visualization.
Transferable skills: Working with a computer, data-analytic and algorithmic way of thinking.

Learning and teaching methods

Lectures, exercises, homework, consultations

Assessment

Homeworks and a project
Oral exam
grading: 5 (fail), 6-10 (pass) (according to the Statute of UL)

Lecturer's references

Alen Orbanić:
ORBANIĆ, Alen. Tools for networks. V: ALHAJJ, Reda (ur.), ROKNE, Jon (ur.). Encyclopedia of social network analysis and mining. New York: Springer, cop. 2014, str. 2166-2175, ilustr. [COBISS-SI-ID 17145433]
ŠIROK, Brane, BIZJAN, Benjamin, ORBANIĆ, Alen, BAJCAR, Tom. Mineral wool melt fiberization on a spinner wheel. Transactions of the Institution of Chemical Engineers. Part A, Chemical engineering research and design, ISSN 0263-8762, 2014, vol. 92, issue 1, str. 80-90, ilustr. [COBISS-SI-ID 13057819]
BIZJAN, Benjamin, ORBANIĆ, Alen, ŠIROK, Brane, KOVAČ, Boštjan, BAJCAR, Tom, KAVKLER, Iztok. A computer-aided visualization method for flow analysis. Flow measurement and instrumentation, ISSN 0955-5986. [Print ed.], Aug. 2014, vol. 38, str. 1-8, ilustr. [COBISS-SI-ID 13484571]
Alex Simpson:
EGGER, Jeff, MØGELBERG, Rasmus Ejlers, SIMPSON, Alex. The enriched effect calculus: syntax and semantics. Journal of logic and computation, ISSN 0955-792X, 2014, vol. 24, iss. 3, str. 615-654. [COBISS-SI-ID 17090137]
EGGER, Jeff, MØGELBERG, Rasmus Ejlers, SIMPSON, Alex. Linear-use CPS translations in the enriched effect calculus. Logical methods in computer science, ISSN 1860-5974, 2012, vol. 8, iss. 4, paper 2 (str. 1-27). [COBISS-SI-ID 17090905]
Ljupčo Todorovski:
BRENCE, Jure, TODOROVSKI, Ljupčo, DŽEROSKI, Sašo. Probabilistic grammars for equation discovery. Knowledge-based systems. [Print ed.]. 2021, vol. 224, str. 107077-1-107077-12. [COBISS-SI-ID 61709059]
LUKŠIČ, Žiga, TANEVSKI, Jovan, DŽEROSKI, Sašo, TODOROVSKI, Ljupčo. Meta-model framework for surrogate-basedparameter estimation in dynamical systems. IEEE access. 2019, vol. 7, str. 181829 -181841. [COBISS-SI-ID 33102631]
BOGATINOVSKI, Jasmin, TODOROVSKI, Ljupčo, DŽEROSKI, Sašo, KOCEV, Dragi. Explaining the performance of multilabel classification methods with data set properties. International journal of intelligent systems. [Print ed.]. [in press] 2022, 43 str. [COBISS-SI-ID 96459267]