This is dedicated to Installation and Usage section. Description of various parameters used during Chimbuko’s Install and Usage are described here.

Installation

Gtest Installation

apt-get install libgtest-dev
cd /usr/src/gtest
cmake CMakelists.txt
make
cp *.a /usr/lib

ADIOS2 Build Location

ADIOS2_INSTALL_DIR=/gpfs/alpine/world-shared/csc143/jyc/summit/sw/adios2/devel/gcc

GoogleTest on Summit

cd ${GTEST_SOURCE_DIR}
wget https://github.com/google/googletest/archive/release-1.10.0.tar.gz
tar -xvzf release-1.10.0.tar.gz
cd ${GTEST_BUILD_DIR}
CC=gcc CXX=g++ cmake -DCMAKE_INSTALL_PREFIX=${GTEST_INSTALL_DIR} ${GTEST_SOURCE_DIR}/googletest-release-1.10.0
make install

Manual installation of Chimbuko

Instructions on how to install the Chimbuko visualization can be found on GitHub here <https://github.com/CODARcode/ChimbukoVisualizationII/>. To build the AD module:

cd ${AD_SOURCE_DIR}
./autogen.sh
cd ${AD_BUILD_DIR}
${AD_SOURCE_DIR}/configure --with-adios2=${ADIOS2_INSTALL_DIR} --with-network=ZMQ --with-perf-metric --prefix=${AD_INSTALL_DIR}
make install

Here –with-perf-metric is an optional flag that enables generation of performance information of Chimbuko. The ZMQ network is the only one presently supported; an MPI network is maintained for legacy purposes and is not recommended for use. Assuming Sonata is installed and the mochi-sonata spack module loaded, the configure script will automatically detect the installation and will build the provenance database.

Instrumentation with TAU

Environment Variables TAU

  • TAU_MAKEFILE=${PATH_TO_TAU_MAKEFILE} : Ensure the Makefile is one that supports ADIOS2; these contain “adios2” in the filename

  • TAU_ADIOS2_PERIODIC=1 : This tells Tau to use stepped IO

  • TAU_ADIOS2_PERIOD=${PERIOD} : This is the period in microseconds(?) between IO steps. A value of PERIOD=100000 is a common choice, but a larger value may be necessary if Chimbuko is not able to keep pace with the data flow.

  • TAU_ADIOS2_ENGINE=SST : The engine should be SST for an online analysis in which Chimbuko and the application are running simultaneously. Chimbuko also supports offline analysis whereby an application is run without Chimbuko and the engine is set to BPfile.

  • TAU_ADIOS2_FILENAME=${STUB} : Here ${STUB} becomes the first component of the ADIOS2 filename, eg for “${STUB}=tau-metrics” and an application “main”, the filename will be “tau-metrics-main”. This filename is needed to start Chimbuko; in online mode the file exists only temporarily and is used to instantiate the communication, whereas for offline mode the filename becomes the location where the trace dump is written.

Usage

Chimbuko Config

Options for visualization module:

  • viz_root : Path to the visualization module.

  • viz_worker_port : The port on which to run the redis server for the visualization backend.

  • viz_port : the port on which to run the webserver

General options for Chimbuko backend:

  • backend_root : The root install directory of the PerformanceAnalysis libraries. If set to “infer” it will be inferred from the path of the executables

Options for the provenance database:

  • provdb_nshards : Number of database shards

  • provdb_engine : The OFI libfabric provider used for the Mochi stack.

  • provdb_port : The port of the provenance database

  • provdb_ninstances : Number of server instances (default 1)

Options for the parameter server:

  • ad_win_size : Number of events around an anomaly to store; provDB entry size is proportional to this

  • ad_alg : AD algorithm to use. “sstd” or “hbos” or “copod”

  • ad_outlier_sstd_sigma : number of standard deviations that defines an outlier.

  • ad_outlier_hbos_threshold : The percentile of events outside of which are considered anomalies by the HBOS algorithm.

Options for TAU:

  • export TAU_ADIOS2_ENGINE=${value} : Online communication engine (recommended SST, but alternative BP4 although this goes through the disk system and may be slower unless the BPfiles are stored on a burst disk)

  • export TAU_ADIOS2_ONE_FILE=FALSE : a different connection file for each rank

  • export TAU_ADIOS2_PERIODIC=1 : enable/disable ADIOS2 periodic output

  • export TAU_ADIOS2_PERIOD=1000000 : period in us between ADIOS2 io steps

  • export TAU_THREAD_PER_GPU_STREAM=1 : force GPU streams to appear as different TAU virtual threads

  • export TAU_THROTTLE=1 : enable/disable throttling of short-running functions

  • TAU_ADIOS2_PATH : path where the adios2 files are to be stored. Chimbuko services creates the directory chimbuko/adios2 in the working directory and this should be used by default

  • TAU_ADIOS2_FILE_PREFIX : the prefix of tau adios2 files; full filename is ${TAU_ADIOS2_PREFIX}-${EXE_NAME}-${RANK}.bp

Launch Services

Description of running the Chimbuko head node Services: First, This script sources the chimbuko config script with variables defined in previous section.

Next, it instantiates provenance database using the following command:

provdb_admin "${provdb_addr}" -engine ${provdb_engine} -nshards ${provdb_nshards} -nthreads ${provdb_nthreads} -db_write_dir ${provdb_writedir}

where ${provdb_addr} is address of provenance database and other variables are defined here.

Next, the following commands instantiates visualization module:

export SHARDED_NUM=${provdb_nshards}
export PROVDB_ADDR=${prov_add}

export SERVER_CONFIG="production"
export DATABASE_URL="sqlite:///${viz_dir}/main.sqlite"
export ANOMALY_STATS_URL="sqlite:///${viz_dir}/anomaly_stats.sqlite"
export ANOMALY_DATA_URL="sqlite:///${viz_dir}/anomaly_data.sqlite"
export FUNC_STATS_URL="sqlite:///${viz_dir}/func_stats.sqlite"
export PROVENANCE_DB=${provdb_writedir}
export CELERY_BROKER_URL="redis://${HOST}:${viz_worker_port}"

#Setup redis
cp -r $viz_root/redis-stable/redis.conf .
sed -i "s|^dir ./|dir ${viz_dir}/|" redis.conf
sed -i "s|^bind 127.0.0.1|bind 0.0.0.0|" redis.conf
sed -i "s|^daemonize no|daemonize yes|" redis.conf
sed -i "s|^pidfile /var/run/redis_6379.pid|pidfile ${viz_dir}/redis.pid|" redis.conf
sed -i "s|^logfile "\"\""|logfile ${log_dir}/redis.log|" redis.conf
sed -i "s|.*syslog-enabled no|syslog-enabled yes|" redis.conf

echo "==========================================="
echo "Chimbuko Services: Launch Chimbuko visualization server"
echo "==========================================="
cd ${viz_root}

echo "Chimbuko Services: create db ..."
python3 manager.py createdb

echo "Chimbuko Services: run redis ..."
redis-server ${viz_dir}/redis.conf
sleep 5

echo "Chimbuko Services: run celery ..."
CELERY_ARGS="--loglevel=info --concurrency=1"
python3 manager.py celery ${CELERY_ARGS} 2>&1 | tee "${log_dir}/celery.log" &
sleep 10

echo "Chimbuko Services: run webserver ..."
python3 run_server.py $HOST $viz_port 2>&1 | tee "${log_dir}/webserver.log" &
sleep 2

echo "Chimbuko Services: redis ping-pong ..."
redis-cli -h $HOST -p ${viz_worker_port} ping

cd ${base}

ws_addr="http://${HOST}:${viz_port}/api/anomalydata"
ps_extra_args+=" -ws_addr ${ws_addr}"

echo $HOST > ${var_dir}/chimbuko_webserver.host
echo $viz_port > ${var_dir}/chimbuko_webserver.port

After visualization module (its variables are described here) is successfully instantiated, the parameter server is launched as part of Chimbuko services

pserver -ad ${pserver_alg} -nt ${pserver_nt} -logdir ${log_dir} -port ${pserver_port} ${ps_extra_args}

The parameter server command line variables used as input for pserver command are described here.

Additional ProvDB Variables

  • -nthreads : Number of threads used by provenance database

  • -nshards : Number of shards used by provenance database

  • -db_write_dir : This is used to specify a path to provenance database to write on disk.

  • -engine : This is the OFI libfabric provider used for the Mochi stack. Its value can be set to “ofi+tcp;ofi_rxm”.

Visualization Variables

  • ${provdb_writedir} : A directory which stores provenance database

  • ${provdb_nshards} : Number of shards used between provenance database and visualization module.

  • ${VIZ_PORT} : The port to assign to the visualization module

  • ${VIZ_DATA_DIR}: A directory for storing logs and temporary data (assumed to exist)

  • ${VIZ_INSTALL_DIR}: The directory where the visualization module is installed

Parameter Server Variables

  • -port ${pserver_port} : the port used by parameter server

  • -nt ${pserver_nt} : The number of threads used to handle incoming communications from the AD modules

  • -logdir ${log_dir} : A directory for logging output

  • -ad ${pserver_alg} : Set AD algorithm to use for online analysis: “sstd” or “hbos”. Default value is “hbos”.

  • ${ps_extra_args} : Extra arguments used by parameter server.

Note that all the above are optional arguments, although if the VIZ_ADDRESS is not provided, no information will be sent to the webserver.

Additional pserver Variables

  • -ws_addr : Address of the visualization module.

  • -provdb_addr : The address of the provenance database (see above). This option enables the storing of the final globally-aggregated function profile information into the provenance database.

  • -prov_outputpath : This is the path to the provenance database on disk.

AD Variables

  • ${ADIOS2_ENGINE} : The ADIOS2 communications engine. For online analysis this should be SST by default (an alternative, BP4 is discussed below)

  • ${ADIOS2_PATH} : The directory in which the ADIOS2 file is written (see below)

  • ${ADIOS2_FILE_PREFIX} : The ADIOS2 file prefix.

  • ${EXE_NAME} : Name of the executable of application (see examples).

  • ${ad_opts} : This is a collection of all other arguments required by AD module for its instantiation.

Additional AD Variables

  • -prov_outputpath : The directory in which the provenance data will be output. This can be used in place of or in conjunction with the provenance database. An empty string (default) disables this output.

  • -outlier_sigma : The number of standard deviations from the mean function execution time outside which the execution is considered anomalous (default 6)

  • -anom_win_size : The number of events around an anomalous function execution that are captured as contextual information and placed in the provenance database and displayed in the visualization (default 10)

  • -program_idx : For workflows with multiple component programs, a “program index” must be supplied to the AD instances attached to those processes.

  • -rank : By default the data rank assigned to an AD instance is taken from its MPI rank in MPI_COMM_WORLD. This rank is used to verify the incoming trace data. This option allows the user to manually set the rank index.

  • -override_rank : This option disables the data rank verification and instead overwrites the data rank of the incoming trace data with the data rank stored in the AD instance. The value supplied must be the original data rank (this is used to generate the correct trace filename).

  • -ad_algorithm : This sets the AD algorithm to use for online analysis: “sstd” or “hbos” or “copod”. Default value is “hbos”.

  • -hbos_threshold : This sets the threshold to control density of detected anomalies used by HBOS algorithm. Its value ranges between 0 and 1. Default value is 0.99

Offline Analysis

For an offline analysis the user runs the application on its own, with Tau’s ADIOS2 plugin configured to use the BPFile engine (TAU_ADIOS2_ENGINE=BPFile environment option; see previous section). Once complete, Tau will generate a file with a .bp extension and a filename chosen according to the user-specified TAU_ADIOS2_FILENAME environment option. The user can then copy this file to a location accessible to the Chimbuko application, for example on a local machine.

The first step is to run the application:

mpirun -n ${RANKS} ${APPLICATION} ${APPLICATION_ARGS}

Once complete, the user should locate the .bp file and copy to a location accessible to Chimbuko.

  • ${RANKS} : Number MPI ranks.

  • ${APPLICATION} : Path to the application executable.

  • ${APPLICATION_ARGS} : Input arguments required by the application.

On the analysis machine, the provenance database and parameter server should be instantiated as in the previous section. The AD modules must still be spawned under MPI with one AD instance per rank of the original job:

mpirun -n ${RANKS} driver BPFile ${ADIOS2_FILE_DIR} ${ADIOS2_FILE_PREFIX} ${OUTPUT_LOC} -pserver_addr ${PSERVER_ADDR} -provdb_addr ${PROVDB_ADDR} ...

Note that the first argument of driver, which specifies the ADIOS2 engine, has been set to BPFile, and the process is not run in the background.