Understanding Boutiques Descriptors
Boutiques descriptors are flexible JSON-based instructions that describe how a containerized application can be interacted with. For example, this includes descriptions of the expected inputs and outputs for a pipeline, and the arguments that can be used to configure processing.
Boutiques descriptors are how CBRAIN creates a consistent interface for interacting with a wide variety of pipelines. On the CBRAIN web portal, the options for processing are automatically configured in a convenient way for users via utilities like drop-down menus, text fields, and check-boxes. Because of the complexity and scale of HBCD processing, we instead interact with CBRAIN (and the Boutiques descriptors within CBRAIN) via CBRAIN’s API. In either case, the “descriptor” for a given tool will determine how arguments from a user are conveyed to a container for processing purposes.
In this section of the documentation, we describe some of the relevant details of Boutiques descriptors that are useful for understanding HBCD processing. Other details can also be found in the Boutiques publication and webpage.
First, at the top-level of a Boutiques descriptor, there are a few key fields that describe the tool itself. These include:
name: The name of the tool.
description: A description of the tool.
tool-version: The version of the tool for display in CBRAIN
container-image: A field with sub-fields that describe where the container lives
Next are details that are more closely related to how the tool operates. All arguments (including both groups of input and output files) will be specified in the “inputs” section. Arguments can have default values, be either required or optional, and can represent a number of data types including strings, numbers, and files. An example of two input fields may be as follows:
{
"id": "ParticipantLabel",
"name": "Participant Label",
"description": "select a specific subject to be processed (with or without sub- prefix)",
"type": "String",
"optional": false,
"value-key": "[Arg1]"
},
{
"id": "SessionLabel",
"name": "Session Label",
"description": "select a specific session to be processed (with or without ses- prefix)",
"type": "String",
"optional": true,
"command-line-flag": "--session-id",
"value-key": "[Arg2]"
}
The “inputs” to a pipeline take there form via the “command-line” section. When CBRAIN is setting up a processing job for a given subject and pipeline, CBRAIN first sets up a single directory that contain all the input/output folders needed during processing (more on that later), and then CBRAIN will shell into the container. The “command-line” section then specifies the command that will be run inside of the container.
Often times there are certain manipulations that will be made to the input/output files to facilitate processing. If this is the case, the “command-line” text may either start or end with commands that facilitate these manipulations. For example, if CBRAIN needs to create working or output directories on behalf of the pipeline, the “command-line” text may look as follows:
"command-line": "mkdir -p work out; pipeline_name [Arg1] [Arg2]"
In the above example, the directories “work” and “out” would be created first. Following this, the hypothetical pipeline would be run by calling “pipeline_name” which refers to a command that exists in the container. Most containers have commands on the PATH that can be directly called in this way. Then following “pipeline_name” are [Arg1] and [Arg2] which correspond to arguments that have been defined in the “inputs” section of the descriptor. During processing, if we had changed the arguments corresponding with the ids “ParticipantLabel” and “SessionLabel” to be sub-1 and ses-1, respectively, the following command would be provided to the container:
#ses-1 has a flag because the input has a "command-line-flag"
#section specified in the descriptor. sub-1 does not have a flag
#and in this case would be passed as a positional argument.
mkdir -p work out; pipeline_name sub-1 --session-id ses-1
The “output-files” section of the descriptor will describe the files or folders that are expected to be generated by the pipeline. Included for each output file is a “path-template” field that describes the expected path of the file in the processing directory. After processing is complete, CBRAIN will save any files mentioned in the “output-files” section to the appropriate Data Provider (which in HBCD’s case is a path on a S3 bucket). The “BoutiquesForcedOutputBrowsePath” will determine where the individual outputs are placed relative to the root path of the Data Provider. In general, outputs will be routed to output folders that represent the name of the pipeline that was used to generate the data. The outputs being saved are generally folders or HTML reports with the name of the subject being processed.
BoutiquesBidsSingleSubjectMaker and other “Custom” modules
All Boutiques descriptors used in HBCD processing have a “custom” section of the descriptor that is used to specifiy a number of additional modules that CBRAIN uses to adjust data in some way. Mostly this includes modifications so that the input/output behaves as desired.
The most important of these modules is the “BoutiquesBidsSingleSubjectMaker”. This module is used to transform a BIDS “subject” folder into a new BIDS “Dataset”. Because of this, you may see an argument such as [SubjectName] listed in the command-line section, but if that argument is also passed through “BoutiquesBidsSingleSubjectMaker”, then the argument will be transformed into a BIDS Dataset with an empty participants.tsv file and dataset_description.json. Therefore the following:
"command-line" : "pipeline_name [SubjectName]"
Will be expanded into something like:
pipeline_name BidsDataset
Where BidsDataset has the following structure:
BidsDataset
├── participants.tsv
├── dataset_description.json
├── sub-<label>
The second most important module is likely the “BoutiquesBidsSubjectFileSelector” module. Using this tool, we explicitly choose which files the pipeline should be exposed to during processing. This allows us to remove redundant or low quality images that would have a negative impact on processing.
Other modules of note include:
cbrain:no-run-id-for-outputs: This module prevents CBRAIN from appending a task id to the output files.
BoutiquesFileTypeVerifier: This module ensures that the input file provided to a task has a specific file type. In CBRAIN file types often include groups of files such as “BidsSubject”, or “NibabiesOutput”
BoutiquesOutputFileTypeSetter: The file type to assign to a specific pipeline output. This output file type(s) normally specifies which pipeline was ran to produce the outputs.
BoutiquesForcedOutputBrowsePath: Used to redirect outputs to a specific location in the output Data Provider.
BoutiquesTaskLogsCopier: Used to save logs from the task and also a current boutiques descriptor for the tool. See here for more details.
BoutiquesInputSubdirMaker: Used to transform an input file/folder into a sub-directory. This is useful for ensuring that input files are where we want them to be during processing.