HPC pbdR: Key Points

HPC pbdR

Before we Start

Use RStudio to write and run R programs.
Use install.packages() to install packages (libraries).

Using RMarkdown

Use .md files for episodes when you want static content
Use .Rmd files for episodes when you need to generate output
Run sandpaper::check_lesson() to identify any issues with your lesson
Run sandpaper::build_lesson() to preview your lesson locally

Submit a parallel job

Parallel R code distributes work
There is shared memory and distributed memory parallelism
You can test parallel code on your own local machine
There are several different job schedulers, but they share many similarities so you can learn a new one when needed

Multicore

To evaluate the fitted model, the availabe data is split into training and testing sets
Parallelisation decreases the training time

Blas

Many statistical calculations require matrix and vector operations
When libraries are used, setting their parameters appropriately can improve your time to solution

MPI - Distributed Memory Parallelism

One can run a distributed memory program on a shared memory node

pbdMPI - Parallel and Big Data interface to MPI

The message passing interface offers many operations that can be used to efficiently and portably add parallelism to your program
It is possible to use parallel libraries to minimize the amount of parallel programming you need to do for your data exploration and data analysis

MPI - Distributed Memory Parallelism

Classification can be used for data other than digits, such as diamonds
Distributed memory parallelism can speed up training

Parallel Randomized Singular Value Decomposition for Classification

There are a variety of machine learning algorithms which can be used for classification
Some will work better than others for your data
The memory and compute requirements will differ, choose your algorithms and their implementations wisely!

Using RMarkdown

Use .md files for episodes when you want static content
Use .Rmd files for episodes when you need to generate output
Run sandpaper::check_lesson() to identify any issues with your lesson
Run sandpaper::build_lesson() to preview your lesson locally