.. HBCD_CBRAIN_PROCESSING documentation master file, created by
   sphinx-quickstart on Wed Jun  5 10:48:12 2024.
   You can adapt this file completely to your liking, but it should at least
   contain the root `toctree` directive.

Understanding Boutiques Descriptors
===================================

`Boutiques descriptors <https://arxiv.org/abs/1711.09713>`_ are
flexible JSON-based instructions that describe how a containerized
application can be interacted with. For example, this includes descriptions
of the expected inputs and outputs for a pipeline, and the arguments
that can be used to configure processing. 

Boutiques descriptors are how CBRAIN creates a consistent interface
for interacting with a wide variety of pipelines. On the CBRAIN web portal,
the options for processing are automatically configured in a convenient way
for users via utilities like drop-down menus, text fields, and check-boxes.
Because of the complexity and scale of HBCD processing, we instead interact
with CBRAIN (and the Boutiques descriptors within CBRAIN) via CBRAIN's API.
In either case, the "descriptor" for a given tool will determine how
arguments from a user are conveyed to a container for processing purposes.

In this section of the documentation, we describe some of the relevant details
of Boutiques descriptors that are useful for understanding HBCD processing.
Other details can also be found in the Boutiques `publication <https://arxiv.org/abs/1711.09713>`_ 
and `webpage <https://boutiques.github.io/>`_.

First, at the top-level of a Boutiques descriptor, there are a few key fields
that describe the tool itself. These include:
    
    * **name**: The name of the tool.
    * **description**: A description of the tool.
    * **tool-version**: The version of the tool for display in CBRAIN
    * **container-image**: A field with sub-fields that describe where the container lives

Next are details that are more closely related to how the tool operates. All arguments (including
both groups of input and output files) will be specified in the "inputs" section. Arguments can
have default values, be either required or optional, and can represent a number of data types including
strings, numbers, and files. An example of two input fields may be as follows: ::

      {
        "id":                "ParticipantLabel",
        "name":              "Participant Label",
        "description":       "select a specific subject to be processed (with or without sub- prefix)",
        "type":              "String",
        "optional":          false,
        "value-key":         "[Arg1]"
      },
      {
        "id":                "SessionLabel",
        "name":              "Session Label",
        "description":       "select a specific session to be processed (with or without ses- prefix)",
        "type":              "String",
        "optional":          true,
        "command-line-flag": "--session-id",
        "value-key":         "[Arg2]"
      }

The "inputs" to a pipeline take there form via the "command-line" section. When
CBRAIN is setting up a processing job for a given subject and pipeline, CBRAIN
first sets up a single directory that contain all the input/output folders
needed during processing (more on that later), and then CBRAIN will shell
into the container. The "command-line" section then specifies the command that
will be run inside of the container.

Often times there are certain manipulations that will be made to the input/output files
to facilitate processing. If this is the case, the "command-line" text may either start
or end with commands that facilitate these manipulations. For example, if CBRAIN needs
to create working or output directories on behalf of the pipeline, the "command-line"
text may look as follows: ::

    "command-line": "mkdir -p work out; pipeline_name [Arg1] [Arg2]"

In the above example, the directories "work" and "out" would be created first. Following
this, the hypothetical pipeline would be run by calling "pipeline_name" which refers to
a command that exists in the container. Most containers have commands on the PATH that
can be directly called in this way. Then following "pipeline_name" are [Arg1] and [Arg2]
which correspond to arguments that have been defined in the "inputs" section of the descriptor.
During processing, if we had changed the arguments corresponding with the ids "ParticipantLabel"
and "SessionLabel" to be sub-1 and ses-1, respectively, the following command would be provided
to the container: ::
    
    #ses-1 has a flag because the input has a "command-line-flag"
    #section specified in the descriptor. sub-1 does not have a flag
    #and in this case would be passed as a positional argument.
    mkdir -p work out; pipeline_name sub-1 --session-id ses-1


The "output-files" section of the descriptor will describe the files or folders that are expected to
be generated by the pipeline. Included for each output file is a "path-template" field that describes
the expected path of the file in the processing directory. After processing is complete, CBRAIN will
save any files mentioned in the "output-files" section to the appropriate Data Provider (which in HBCD's
case is a path on a S3 bucket). The "BoutiquesForcedOutputBrowsePath" will determine where the individual
outputs are placed relative to the root path of the Data Provider. In general, outputs will be routed
to output folders that represent the name of the pipeline that was used to generate the data. The outputs
being saved are generally folders or HTML reports with the name of the subject being processed.

BoutiquesBidsSingleSubjectMaker and other "Custom" modules
----------------------------------------------------------

All Boutiques descriptors used in HBCD processing have a "custom"
section of the descriptor that is used to specifiy a number of
additional modules that CBRAIN uses to adjust data in some way.
Mostly this includes modifications so that the input/output behaves
as desired.

The most important of these modules is the "BoutiquesBidsSingleSubjectMaker".
This module is used to transform a BIDS "subject" folder into a new BIDS "Dataset".
Because of this, you may see an argument such as [SubjectName] listed in the
command-line section, but if that argument is also passed through "BoutiquesBidsSingleSubjectMaker",
then the argument will be transformed into a BIDS Dataset with an empty participants.tsv file
and dataset_description.json. Therefore the following: ::
    
    "command-line" : "pipeline_name [SubjectName]"
    
Will be expanded into something like: ::
    
    pipeline_name BidsDataset

Where BidsDataset has the following structure: ::

    BidsDataset
    ├── participants.tsv
    ├── dataset_description.json
    ├── sub-<label>

The second most important module is likely the "BoutiquesBidsSubjectFileSelector" module.
Using this tool, we explicitly choose which files the pipeline should be exposed to
during processing. This allows us to remove redundant or low quality images that would
have a negative impact on processing.

**Other modules of note include:**

 - **cbrain:no-run-id-for-outputs**: This module prevents CBRAIN from appending a task id to the output files.
 - **BoutiquesFileTypeVerifier**: This module ensures that the input file provided to a task has a specific file type.
   In CBRAIN file types often include groups of files such as "BidsSubject", or "NibabiesOutput"
 - **BoutiquesOutputFileTypeSetter**: The file type to assign to a specific pipeline output. This output file type(s)
   normally specifies which pipeline was ran to produce the outputs.
 - **BoutiquesForcedOutputBrowsePath**: Used to redirect outputs to a specific location in the output Data Provider.
 - **BoutiquesTaskLogsCopier**: Used to save logs from the task and also a current boutiques descriptor for the tool. See
   :doc:`here <hidden_proc_details_folder>` for more details.
 - **BoutiquesInputSubdirMaker**: Used to transform an input file/folder into a sub-directory. This is useful for
   ensuring that input files are where we want them to be during processing.