HPC Carpentry at LLNL

We ran the full user workshop at LLNL!

HPC Carpentry at LLNL

In the first week of June, 2024, instructors from HPC Carpentry taught our full workflow workshop for the first time. Over a four-day stint at Lawrence Livermore National Laboratory, we delivered this content not once, but twice!

It was immensely rewarding to see all this material come together in one place. Traveling to teach in person, while not without hiccups, was extremely worthwhile. We believe we served our learners pretty well, and we learned a few lessons relevant to future workshops.

Workshop Structure

Each workshop ran over two days. On the first day, we did the Unix Shell intro lesson from Software Carpentry in the morning, and our own HPC Intro lesson in the afternoon. On the second day, we did a variant of the workflow lesson, adapted for the Maestro workflow tool (rather than Snakemake), because it is developed and used at LLNL.

The instructor team consisted of Andrew Reid and Trevor Keller from the HPC Carpentry steering committee, and Jane Herriman from LLNL, along with helpers from the LLNL community.

While split-terminal tools exist, we used vanilla tmux with two terminals attached to the same session. This allowed the instructors to type on their own laptop while referencing the lesson webpage and selectively sharing the terminal. Learners followed along on the enhanced terminal displayed at the front of the room. Note: to “scroll up” in tmux, press Ctrl+b, [, then arrow-key around.

Maestro

Maestro is a capable workflow engine, and one we would not have explored had Jane not ported the Snakemake lesson so expertly. Maestro favors reproducibility, running every step of the task from scratch at every invocation. This is a significant difference from Snakemake which, like Make, does not re-execute completed “targets.” A significant benefit of Maestro is that the tool does not persist while jobs execute: it generates and submits native Slurm jobs, with tooling in place to check the status of running workflows. This is much more HPC-compatible, for large-scale or time-consuming jobs.

Learners

Learners had a range of backgrounds, from undergraduate bio-informatics students to experienced Linux HPC users. The lessons generally went at a slightly faster pace than expected, without leaving anyone behind. This was in part because access to LLNL’s system Ruby was by means of pre-authorized RSA tokens, removing a lot of the friction from the initial connection process that has been time-consuming in other versions of the workshop. The instructors live-coded plenty of mistakes, opening discussions on some interesting tangential topics. LLNL runs a pool of “login nodes” per HPC system, rather than a single machine, which made for interesting, early discussion of networked filesystems. The sheer number of nodes also made the output of sinfo tricky to comprehend at-a-glance, which is awesome.

Lesson Feedback

One major take-away is that the workflow lesson in particular is vulnerable to learners losing the thread if they miss a step. This lesson, in either its Maestro or Snakemake version, builds up an increasingly sophisticated workflow specification file, incrementally demonstrating workflow concepts in the context of the tool. Consequently, a learner who misses a step and falls behind can find themselves unable to recover, since the remainder of the lesson builds on precisely the content that was missed. The Workflow lesson differs in this respect from the Shell and HPC intro lessons, where later steps can better stand on their own.

The solution to this, which we already started to implement for the second workshop, was to have a shared online notepad with “checkpoint” versions of the file, to which learners can refer if they fall behind, with helpers bridging the content gap for them. Also, LLNL supports and uses the give tool, allowing users to easily pass files around: it’s nifty!

The hands-on Carpentries approach proved itself once again, building muscle memory and vocabulary in learners, who could then move on to their LLNL summer research projects with greater confidence in their ability to productively use the shared high-performance computing resources.

For the project, it was confirmation that the HPC User workshop can work, including the valuable feedback about checkpoint files and a shared notepad. We look forward to teaching this workshop more, and getting it out of beta status and into our main curriculum.

Dialogue & Discussion