Provenance Database Schema¶
Main database¶
Below we describe the JSON schema for the anomalies and normalexecs collections of the main database component of the provenance database.
Function event schema¶
This section describes the JSON schema for the anomalies and normalexecs collections. The fields of the JSON object are bolded, and a brief description follows the colon (:).
Function execution “events” in Chimbuko are labeled by a unique (for each process) string of following form “$RANK:$IO_STEP:$IDX” (eg “0:12:225”), where RANK, IO_STEP and IDX are the MPI rank, the io step and an integer index, respectively, and $VAL indicates the numerical value of the variable VAL. We will refer to such a string as an “event label” below.
For the SSTD (original) algorithm, the algo_params field has the following format:
The schema for the gpu_location field is as follows:
and for the gpu_parent field:
Note that Tau considers a GPU device/context/stream much in the same way as a CPU thread, and assigns it a unique index. This index is the “thread index” for GPU events.
Metadata schema¶
Metadata are stored in the metadata collection in the following JSON schema:
Note that the tid (thread index) for metadata is usually 0, apart from for metadata associated with a GPU context/device/stream, for which the index is the virtual thread index assigned by Tau to the context/device/stream.
Global database¶
Below we describe the JSON schema for the func_stats and counter_stats collections of the global database component of the provenance database.
Function profile statistics schema¶
func_stats contains aggregated profile information for all functions. The JSON schema is as follows:
Counter statistics schema¶
The counter_stats collection has the following schema: