There are no prerequisites.
High performance computing
Uroš Lotrič
Parallel and distributed computing. Quantifying parallelisation architectures. Memory access. Granularity. Topologies.
Modern parallel architectures. Shared-memory systems. Distributed-memory systems. Graphics processing units. Modern coprocessors. FPGA circuits. Heterogeneous systems.
Parallel languages and programming environments. OpenMP. MPI. OpenCL. MapReduce.
Parallel algorithms. Analysis and programming. Data and functional parallelism. Pipeline. Scalability. Programming strategies. Performance analyis. Implementation of standard scientific algorithms. Choosing the appropriate architecture.
Parallel performance. Load balancing. Scheduling. Communication overhead. Cache effects. Spatial and temporal locality. Energy efficiency.
Using the national high performance computing infrastructure.
Selected advanced and current topics in high performance computing.
• Introduction to High Performance Scientific Computing, by V. Eijkhout et al. (Creative Commons, 2015)
• P.S. Pacheco. An Introduction to Parallel Programming, 2nd Edition, Morgan Kaufman, 2011.
• M. J. Quinn. Parallel Programing in C with MPI and OpenMP. Mc Graw Hill, 2003.
• B.R. Gaster et. al. Heterogeneous computing with OpenCL. Morgan Kaufmann, 2013.
• G. Couloris et al. Distributed Systems: Concepts and Design. Pearson, 2012.
To get the theoretical and practical knowledge from the areas of parallel and distributed systems, parallel programming and processing, needed to excel the computation of the problem at hand using modern computing platforms and tools.
Parallelize problems from science and engineering by structuring the problem, choosing the appropriate hardware and programming concept to generate an effcient solution.
Gain knowledge to work with national high performance infrastructure.
After successfully completing the course, students should be able to:
• Design programs for modern parallel architectures.
• Choose the appropriate hardware to speed up a particular algorithm.
• Perform performance analysis of computer code.
• Identify parts of the code that can be sped up.
• Use the national high performance computing architecture.
• Connect the theory and practice of parallel and distributed systems.
Lectures, tutorials, homework, project.
Continuing (homework, project work)
Final (oral exam)
Grading: 6-10 pass, 5 fail (according to the rules of University of Ljubljana).
• SILVA, Catarina, LOTRIČ, Uroš, RIBEIRO, Bernardete, DOBNIKAR, Andrej. Distributed text classification with an ensemble kernel-based learning approach. IEEE trans. syst. man cybern., Part C Appl. rev., May 2010, vol. 40, 287-297
• LOTRIČ, Uroš, BULIĆ, Patricio. Applicability of approximate multipliers in hardware neural networks. Neurocomputing, 2012, vol. 96, 57-65
• CANKAR, Matija, ARTAČ, Matej, ŠTERK, Marjan, LOTRIČ, Uroš, SLIVNIK, Boštjan. Co-allocation with collective requests in grid systems. Journal for universal computer science, 2013, vol. 96, 282-300
• SLUGA, Davor, CURK, Tomaž, ZUPAN, Blaž, LOTRIČ, Uroš. Heterogeneous computing architecture for fast detection of SNP-SNP interactions. BMC bioinformatics, 2014, vol. 15, 1-16
• LOTRIČ, Uroš, BULIĆ, Patricio. Logarithmic arithmetic for low-power adaptive control systems. Circuits Systems and Signal Processing, 2017, vol. 36, 3564-3584
Celotna bibliografija: / Full bibliograpy:
http://sicris.izum.si/search/rsr.aspx?lang=slv&id=9241.