Provenance Database Schema
Main database
Below we describe the JSON schema for the anomalies, normalexecs and metadata collections of the main database component of the provenance database.
Function event schema
This section describes the JSON schema for the anomalies and normalexecs collections. The fields of the JSON object are bolded, and a brief description follows the colon (:).
Function execution “events” in Chimbuko are labeled by a unique (for each process) string of following form “$RANK:$IO_STEP:$IDX” (eg “0:12:225”), where RANK, IO_STEP and IDX are the MPI rank, the io step and an integer index, respectively, and $VAL indicates the numerical value of the variable VAL. We will refer to such a string as an “event label” below.
For the SSTD (original) algorithm, the algo_params field has the following format:
For the HBOS and COPOD algorithms, the algo_params field has the following format:
The schema for the gpu_location field is as follows:
and for the gpu_parent field:
Note that Tau considers a GPU device/context/stream much in the same way as a CPU thread, and assigns it a unique index. This index is the “thread index” for GPU events.
Metadata schema
Metadata are stored in the metadata collection in the following JSON schema:
Note that the tid (thread index) for metadata is usually 0, apart from for metadata associated with a GPU context/device/stream, for which the index is the virtual thread index assigned by Tau to the context/device/stream.
Global database
Below we describe the JSON schema for the func_stats, counter_stats and ad_model collections of the global database component of the provenance database.
A common data structure RunStats is used extensively to represent statistics (mean, min/max, std. dev., etc) of some quantity. It has the following schema:
Function profile statistics schema
func_stats contains aggregated profile information and anomaly information for all functions. The JSON schema is as follows:
Counter statistics schema
The counter_stats collection has the following schema:
AD model schema
The ad_model collection contains the final AD model for each function. It has the following schema:
The “model” entry has the same form as the “algo_params” entry of the main database, and is documented above.