Getting started¶

Initial setup¶

The ml-toolkit was primarily built for use on the Grace-Hopper nodes of N8 HPC cluster Bede. You will first need to register for an account. Instructions for this can be found in the Bede Documentation.

You will also need to ensure you are logged into the Grace-Hopper partition using the ghlogin command.

Alternatively if you wish to install the ML toolkit on you local machine you can just be aware some of the instructions will only apply to Bede (although I will try to make it clear when this is the case).

System Requirements:¶

ml-toolkit relies on Apptainer which is only officially supported on Linux systems. As such we only support and provide pip builds for Linux (Arm64 and X86).

Users on Windows 10/11 can install Apptainer through Windows Subsystem for Linux (WSL) whilst MacOs users. can use a tool called lima. Instructions for which can be found here:

Note these have not been thoroughly tested but appear to work if you wish to use them. However your mileage may vary.

Installation instructions¶

To install ml-toolkit you have a few options you can:

install ml-toolkit for the current user using pip
download the python source files from github and build it from there.

Using pip (recommended for most users):¶

To install ml-toolkit run the following commands:

#create python virtual environment (recommended)
python -m venv ~/.venv/ml-toolkit
source ~/.venv/ml-toolkit/bin/activate
# install ml-toolkit
pip install bede-ml-toolkit
# if needed
install_apptainer
install_ml-toolkit

This will download and install all the necessary files to your system. it will also install Apptainer if needed and create a directory called ML_Toolkit in your home directory.

This is used to house all the various config, definition files and container images used by ml-tookit. If you want to put this directory somewhere else on your system you can use the -p option along with where you wish to install it as:

install_ml-toolkit -p /path/to/install/to

Installing from source (recommended for developers/advanced users):¶

You will first need to download the git repository code using:

git clone https://github.com/bjthorpe/Bede_containers
git checkout dev
cd Bede_containers

Next you will need to install some python packages using pip. We recommend using a virtual environment for this:

python3 -m venv ~/.venv/ml-toolkit
source ~/.venv/ml-toolkit/bin/activate
pip install .

Finally you will need to run the two install scripts:

install_apptainer
install_ml-toolkit

The install_ml-toolkit script also optionally has an option --dev which allows you to use the Data directory of the git repo as ML_TOOLKIT_HOME, with git ignore setup to not track Images logs etc. This is useful if you are developing stuff your changes all tracked within git and you don’t have to remember to copy code over to the ~/ML_Toolkit directory for testing.

It also pairs well with pip install -e . so you can change code in the repo and not have to re-run pip install.

Optionally you can now run a more through set of unit tests using:

pytest

Note: there are two tests that are disabled by default. These require a separate working installation of CASTEP which is used by some of the included AI models. If desired these can be enabled with:

pytest -m CASTEP

The ML_Toolkit directory¶

The software we downloaded contains a number of different files and folders so I think it is worthwhile briefly covering the important files/folders and what each of them is used for.

All files used by ml-toolkit are located in the ML_Toolkit directory which was created during installation and is located in the users home directory by default.

Note: if you ever need to know where this directory is located you can run:

echo $ML_TOOLKIT_HOME

Getting our bearings¶

The ML_toolkit directory contains the following sub-directories:

Container_Configs: Contains all the .yaml config files used to configure the containers.
Definitions: Contains a number of .def files used to build containers.
Scripts: Contains various python scripts to demonstrate how to use pre-trained models from MatBench Discovery
Images: Contains the container images (.sif) files used by Apptainer. These store all the data and files used by the container.
Models: Checkpoint Files used for the various pre-trained models.
logs: Log files containing more detailed output from the program, useful for debugging.

In order to use containers with the ml-toolkit we need two things:

A container config (.yaml) file
A container definition

Both of these combined will tell ml-toolkit and Apptainer what software, files and steps are needed in order to build our container. We will go into more detail about how these files are constructed in later sections.

The Container_Configs directory holds a bunch of example config files most of which are for accessing for the various pre-trained models used for Machine learned atomic potentials. See the section on ML Atomic Potentials, for more details.

For this first example, we will be using Test_containers.yaml. This defines two containers that consist of a tiny Linux installations that can be used for testing the installation of Apptainer. This will be used in the next section to test everything is installed correctly.

Related to this is the Definitions directory this contains various Apptainer definition (.def) files. These can be thought of as a blueprints that tells Apptainer how to setup the system and what software to install and are used by Apptainer to create the various containers.

With this out of the way we shall now run a small test container to check everything is working correctly.

Your first container¶

To test everything is installed correctly we will now build and run a simple test container. This will consist of a tiny Linux installation which only has a program called cowsay installed. Cowsay is a small program that generates ASCII art pictures of a cow with a message.

An Apptainer definition file to setup and run this container can be found under Definitions/cowsay.def.

These can be quite involved and a full tutorial on Apptainer is beyond the scope of this document. However those interested can consult this Tutorial or the Apptainer Documentation for a full breakdown of how these files work.

For our purposes however, we can build the container with the following command (note this is case sensitive):

ml-toolkit build TestContainer

This will download and build the cowsay container from the definition file and create a container image TestContainer.sif in the Images directory. A container image contains all the files the container needs to run in a format Apptainer can understand.

Once this is complete we can run a command inside this container using:

ml-toolkit run TestContainer COMMAND

Where COMMAND is the Linux command we wish the container to run. In our case we will run a script (found in ML_Toolkit/Scripts directory) that runs the fortune command, to generate a random (possibly silly) quote then passes it into the cowsay command which will then display the message.

ml-toolkit run TestContainer $ML_TOOLKIT_HOME/Scripts/speak_wisdom.sh

If all has gone to plan you should see similar output to the following:

*********************************************************************
***************** Loading Model Config Files ************************
*********************************************************************
                       All config files look good
*********************************************************************
***************** Running: TestContainer *********************
*********************************************************************
_________________________________________
/ You will be reincarnated as a toad; and \
\ you will be much happier.               /
 -----------------------------------------
            \   ^__^
            \   (oo)\_______
                (__)\       )\/\
                    ||----w |
                    ||     ||

You will also find a more detailed output in the logfile logs/log.log. This is overwritten each time you use the build or run command and notably does not contain the output of the container itself. In this case the Ascii art cow. However it does contain useful information including:

what files/folders the container is accessing
the variables used for each config ml-toolkit has found
a summary of the underlying Apptainer commands the script is running
Warnings about config issues that won’t necessarily crash the container but may cause issues.

All of which is useful for debugging.

Other useful commands:¶

In addition to build and run commands there are several other useful commands which are worth knowing:

list
start
stop

List is a useful command for checking what containers are available.

ml-toolkit list

The full list however may mot be that useful as it’s quite long. As such the –group option may be used to filter down based on the group tag. For example

ml-toolkit list --group Test

Will list only list containers with the tag Test.

you can also use the -m <model_name> option to get details of a specific model and the -l option to expand the model description beyond the default 80 characters.

for example: .. code-block:: bash

ml-toolkit list -l -m PET-MAD-S

Start and Stop are also useful commands they allow you to run containers as background processes which is useful for certain types of software.

The Start command runs the command defined in the definition (.def) file under the section %startscript. You can also list all currently running containers with:

apptainer instance list.

Getting Help:¶

If you need a quick refresher on which options exist and what they do this can be displayed using the -h or (–help) option as follows:

ml-toolkit -h

usage: run_container.py [-h] [--config_file CONFIG_FILE] [--debug] {run,build,load,list,start,stop} ...

A CLI tool for easily running AI/ML containers on Bede.

positional arguments:
{run,convert,build,load,list,start,stop}
                        Operation to perform.
    run                 Run command(s), with the Container
    convert             Convert existing Model Container to/from editable/static, useful for development as it saves having to re-build
                        containers when making small changes.
    build               Build the Container, exactly equivalent to load
    load                Build the Container, exactly equivalent to build
    list                List available containers
    start               Start Container as background process
    stop                Stop container that is running in the background

options:
-h, --help            show this help message and exit
--debug               Print generated Apptainer command instead of running container, useful for sanity checking

Note this is the best place to check first as if new options are added it will likely be more up to date than these docs.

Many of these options also have there own sub options, for example run, these can be displayed with:

ml-toolkit run -h

usage: run_container.py run [-h] model_name cmd [cmd ...]

positional arguments:
model_name  Name of Model to use
cmd         Command(s) to run

options:
-h, --help  show this help message and exit