Before we Start


  • Use RStudio to write and run R programs.
  • Use install.packages() to install packages (libraries).

Using RMarkdown


  • Use .md files for episodes when you want static content
  • Use .Rmd files for episodes when you need to generate output
  • Run sandpaper::check_lesson() to identify any issues with your lesson
  • Run sandpaper::build_lesson() to preview your lesson locally

Submit a parallel job


  • Parallel R code distributes work
  • There is shared memory and distributed memory parallelism
  • You can test parallel code on your own local machine
  • There are several different job schedulers, but they share many similarities so you can learn a new one when needed

Multicore


  • To evaluate the fitted model, the availabe data is split into training and testing sets
  • Parallelisation decreases the training time

Blas


  • Many statistical calculations require matrix and vector operations
  • When libraries are used, setting their parameters appropriately can improve your time to solution

MPI - Distributed Memory Parallelism


  • One can run a distributed memory program on a shared memory node

pbdMPI - Parallel and Big Data interface to MPI


  • The message passing interface offers many operations that can be used to efficiently and portably add parallelism to your program
  • It is possible to use parallel libraries to minimize the amount of parallel programming you need to do for your data exploration and data analysis

MPI - Distributed Memory Parallelism


  • Classification can be used for data other than digits, such as diamonds
  • Distributed memory parallelism can speed up training

Parallel Randomized Singular Value Decomposition for Classification


  • There are a variety of machine learning algorithms which can be used for classification
  • Some will work better than others for your data
  • The memory and compute requirements will differ, choose your algorithms and their implementations wisely!

Using RMarkdown


  • Use .md files for episodes when you want static content
  • Use .Rmd files for episodes when you need to generate output
  • Run sandpaper::check_lesson() to identify any issues with your lesson
  • Run sandpaper::build_lesson() to preview your lesson locally