HPC pbdR
- Use RStudio to write and run R programs.
- Use
install.packages()
to install packages
(libraries).
- Use
.md
files for episodes when you want static
content
- Use
.Rmd
files for episodes when you need to generate
output
- Run
sandpaper::check_lesson()
to identify any issues
with your lesson
- Run
sandpaper::build_lesson()
to preview your lesson
locally
- Parallel R code distributes work
- There is shared memory and distributed memory parallelism
- You can test parallel code on your own local machine
- There are several different job schedulers, but they share many
similarities so you can learn a new one when needed
- To evaluate the fitted model, the availabe data is split into
training and testing sets
- Parallelisation decreases the training time
- Many statistical calculations require matrix and vector
operations
- When libraries are used, setting their parameters appropriately can
improve your time to solution
- One can run a distributed memory program on a shared memory
node
- The message passing interface offers many operations that can be
used to efficiently and portably add parallelism to your program
- It is possible to use parallel libraries to minimize the amount of
parallel programming you need to do for your data exploration and data
analysis
- Classification can be used for data other than digits, such as
diamonds
- Distributed memory parallelism can speed up training
- There are a variety of machine learning algorithms which can be used
for classification
- Some will work better than others for your data
- The memory and compute requirements will differ, choose your
algorithms and their implementations wisely!
- Use
.md
files for episodes when you want static
content
- Use
.Rmd
files for episodes when you need to generate
output
- Run
sandpaper::check_lesson()
to identify any issues
with your lesson
- Run
sandpaper::build_lesson()
to preview your lesson
locally