API¶
AD¶
The “Anomaly Detection” (AD) component of Chimbuko is deployed alongside an instance of the target application (e.g. for each MPI task) and analyzes the raw trace output provided by Tau. Using globally-aggregated statistics a local decision is made as to whether a particular function execution is anomalous and the anomaly information is forwarded to the higher level components of the tool.
chimbuko¶
The main interface for the AD module.
-
namespace chimbuko¶
-
class Chimbuko¶
- #include <chimbuko.hpp>
The main interface for the AD module.
Public Functions
-
Chimbuko()¶
-
~Chimbuko()¶
-
inline Chimbuko(const ChimbukoParams ¶ms)¶
Construct and initialize the AD with the parameters provided.
-
void initialize(const ChimbukoParams ¶ms)¶
Initialize the AD with the parameters provided (must be performed prior to running)
-
void finalize()¶
Free memory associated with AD components (called automatically by destructor)
-
inline bool use_ps() const¶
Whether the parameter server is in use.
-
inline void show_status(bool verbose = false) const¶
Request that the event manager print its status.
-
inline bool get_status() const¶
Whether the AD is connected through ADIOS2 to the trace input.
-
inline int get_step() const¶
Get the current IO step.
-
void run(unsigned long long &n_func_events, unsigned long long &n_comm_events, unsigned long long &n_counter_events, unsigned long &n_outliers, unsigned long &frames)¶
Run the main Chimbuko analysis loop.
- Parameters
n_func_events – [out] number of function events recorded
n_comm_events – [out] number of comm events recorded
n_counter_events – [out] number of counter events recorded
n_outlier – [out] number of anomalous events recorded
frames – [out] number of adios2 input steps
Private Functions
-
void init_io()¶
-
void init_parser()¶
-
void init_event()¶
-
void init_net_client()¶
-
void init_outlier()¶
-
void init_counter()¶
-
void init_normalevent_prov()¶
-
void init_metadata_parser()¶
-
bool parseInputStep(int &step, unsigned long long &n_func_events, unsigned long long &n_comm_events, unsigned long long &n_counter_event)¶
Signal the parser to parse the adios2 timestep.
- Parameters
step – [out] index
number – [out] of func events parsed
number – [out] of comm events parsed
number – [out] of counter events parsed
- Returns
false if unsuccessful, true otherwise
-
void extractEvents(unsigned long &first_event_ts, unsigned long &last_event_ts, int step)¶
Extract parsed events and insert into the event manager.
- Parameters
first_event_ts – [out] Earliest timestamp in io frame
last_event_ts – [out] Latest timestamp in io frame
step – The adios2 stream step index
-
void extractCounters(int rank, int step)¶
Extract parsed counters and insert into counter manager.
- Parameters
rank – The MPI rank of the process
step – The adios2 stream step index
-
void extractAndSendProvenance(const Anomalies &anomalies, const int step, const unsigned long first_event_ts, const unsigned long last_event_ts) const¶
Extract provenance information about anomalies and communicate to provenance DB.
-
void gatherAndSendPSdata(const Anomalies &anomalies, const int step, const unsigned long first_event_ts, const unsigned long last_event_ts) const¶
Gather and send the required data to the pserver.
-
void sendNewMetadataToProvDB(int step) const¶
Send new metadata entries collected during current fram to provenance DB.
Private Members
-
ADThreadNetClient *m_net_client¶
client for comms with parameter server
-
ADMetadataParser *m_metadata_parser¶
parser for metadata
-
ADNormalEventProvenance *m_normalevent_prov¶
maintain provenance info of normal events
-
mutable PerfPeriodic m_perf_prd¶
Performance temporal logging
-
ChimbukoParams m_params¶
Parameters to setup the AD
-
bool m_is_initialized¶
Whether the AD has been initialized
-
Chimbuko()¶
-
struct ChimbukoParams¶
- #include <chimbuko.hpp>
Parameters for setting up the AD.
Public Members
-
std::string trace_engineType¶
The ADIOS2 communications mode. If “SST” it will receive trace data in real-time, if “BPfile” it will parse an existing trace dump
-
std::string trace_inputFile¶
The input file. Assuming the environment variable TAU_FILENAME is set, the binary name is BINARY_NAME and the MPI rank is WORLD_RANK, the file format is < inputFile = “${TAU_FILENAME}-${BINARY_NAME}-${WORLD_RANK}.bp” < Do not include the .sst file extensions for SST mode
-
int trace_connect_timeout¶
Timeout (in seconds) of ADIOS2 SST connection to trace data
-
double outlier_sigma¶
The number of sigma (standard deviations) away from the mean runtime for an event to be considered anomalous
-
double hbos_threshold¶
Threshold used by HBOS algorithm to filter outliers. Set in config file
-
bool hbos_use_global_threshold¶
Global threshold flag in HBOS
-
std::string pserver_addr¶
The address of the parameter server. < If no parameter server is in use, this string should be empty (length zero) < If using ZmqNet (default) this is a tcp address of the form “tcp://${ADDRESS}:${PORT}”
-
int hpserver_nthr¶
If using the hierarchical pserver, this parameter is used to compute a port offset for the particular endpoint that this AD rank connects to
-
std::string prov_outputpath¶
Directory where provenance data is written (in conjunction with provDB if active). Blank string indicates no output
-
unsigned int anom_win_size¶
When anomaly data are recorded, a window of this size (in units of events) around the anomalous event are also recorded (used both for viz and provDB)
-
std::string perf_outputpath¶
Output path for AD performance monitoring data. If an empty string no output is written.
-
int perf_step¶
How frequently (in IO steps) the performance data is dumped
-
int program_idx¶
Program index (for workflows with >1 component)
-
int rank¶
The rank index of the trace data
-
bool verbose¶
Enable verbose output. Typically one enables this only on a single node (eg verbose = (rank==0); )
-
bool only_one_frame¶
Force the AD to stop after a single IO frame
-
int interval_msec¶
Force the AD to pause for this number of ms at the end of each IO step
-
int parser_beginstep_timeout¶
Set the timeout in seconds on waiting for the next ADIOS2 timestep (default 30s)
-
bool override_rank¶
Set Chimbuko to overwrite the rank index in the parsed data with its own rank parameter. This disables verification of the data rank.
-
int step_report_freq¶
Steps between Chimbuko reporting IO step progress. Use 0 to deactivate this logging entirely (default 1)
-
int net_recv_timeout¶
Timeout (in ms) used for blocking receives functionality on client (driver) of parameter server
-
std::string trace_engineType¶
-
class Chimbuko¶
ADAnomalyProvenance¶
-
namespace chimbuko
-
class ADAnomalyProvenance¶
- #include <ADAnomalyProvenance.hpp>
A class that gathers provenance data associated with a detected anomaly.
Public Functions
-
ADAnomalyProvenance(const ExecData_t &call, const ADEventIDmap &event_man, const ParamInterface &algo_params, const ADCounter &counters, const ADMetadataParser &metadata, const int window_size, const int io_step, const unsigned long io_step_tstart, const unsigned long io_step_tend)¶
Extract the provenance information and store in internal JSON fields.
- Parameters
call – The anomalous execution
event_man – The event manager
algo_params – The algorithm parameters that resulted in the event classification
counters – The counter manager
metadata – The metadata manager
window_size – The number of events (on this process/rank/thread) either side of the anomalous event that are captured in the window field
io_step – Index of io step
io_step_tstart – Timestamp of beginning of io frame
io_step_tend – Timestamp of end of io frame
-
nlohmann::json get_json() const¶
Serialize anomaly data into JSON construct.
Public Static Functions
-
static void getProvenanceEntries(std::vector<nlohmann::json> &anom_event_entries, std::vector<nlohmann::json> &normal_event_entries, ADNormalEventProvenance &normal_event_manager, PerfStats &perf, const Anomalies &anomalies, const int step, const unsigned long first_event_ts, const unsigned long last_event_ts, const unsigned int anom_win_size, const ParamInterface &algo_params, const ADEventIDmap &event_man, const ADCounter &counters, const ADMetadataParser &metadata)¶
Extract the json provDB entries for the anomalies and normal events from an Anomalies collection.
- Parameters
anom_event_entries – The provDB entries for anomalous events
normal_event_entries – The provDB entries for normal events
normal_event_manager – An instance of ADNormalEventProvenance that maintains normal events between calls
perf – Performance timing
anomalies – The Anomalies object containing anomalies and select normal events for this io step
step – The io step
first_event_ts – The timestamp of the first event in the io step
last_event_ts – The timestamp of the last event in the io step
anom_win_size – The window size of events to capture around each anomaly
algo_params – The outlier algorithm parameters
event_man – The event manager object
counters – The counter manager object
metadata – The metadata manager object
-
static inline void getProvenanceEntries(std::vector<nlohmann::json> &anom_event_entries, std::vector<nlohmann::json> &normal_event_entries, ADNormalEventProvenance &normal_event_manager, const Anomalies &anomalies, const int step, const unsigned long first_event_ts, const unsigned long last_event_ts, const unsigned int anom_win_size, const ParamInterface &algo_params, const ADEventIDmap &event_man, const ADCounter &counters, const ADMetadataParser &metadata)¶
Extract the json provDB entries for the anomalies and normal events from an Anomalies collection.
- Parameters
anom_event_entries – The provDB entries for anomalous events
normal_event_entries – The provDB entries for normal events
normal_event_manager – An instance of ADNormalEventProvenance that maintains normal events between calls
anomalies – The Anomalies object containing anomalies and select normal events for this io step
step – The io step
first_event_ts – The timestamp of the first event in the io step
last_event_ts – The timestamp of the last event in the io step
anom_win_size – The window size of events to capture around each anomaly
algo_params – The outlier algorithm parameters
event_man – The event manager object
counters – The counter manager object
metadata – The metadata manager object
Private Functions
-
void getStackInformation(const ExecData_t &call, const ADEventIDmap &event_man)¶
Get the call stack.
-
void getWindowCounters(const ExecData_t &call)¶
Get counters in execution window.
-
void getGPUeventInfo(const ExecData_t &call, const ADEventIDmap &event_man, const ADMetadataParser &metadata)¶
Determine if it is a GPU event, and if so get the context.
-
void getExecutionWindow(const ExecData_t &call, const ADEventIDmap &event_man, const int window_size)¶
Get the execution window.
- Parameters
window_size – The number of events either side of the call that are captured
Private Members
-
ExecData_t m_call¶
The anomalous event
-
std::vector<nlohmann::json> m_callstack¶
Call stack from function back to root. Each entry is the function index and name
-
nlohmann::json m_algo_params¶
JSON object containing the algorithm parameters used to classify the anomaly
-
std::vector<nlohmann::json> m_counters¶
A list of counter events that occurred during the execution of the anomalous function
-
bool m_is_gpu_event¶
Is this an anomaly that occurred on a GPU?
-
nlohmann::json m_gpu_location¶
If it was a GPU event, which device/context/stream did it occur on
-
nlohmann::json m_gpu_event_parent_info¶
If a GPU event, info related to CPU event spawned it (name, thread, callstack)
-
nlohmann::json m_exec_window¶
Window of function executions and MPI commuinications around anomalous execution
-
int m_io_step¶
IO step
-
unsigned long m_io_step_tstart¶
Timestamp of start of io step
-
unsigned long m_io_step_tend¶
Timestamp of end of io step
-
ADAnomalyProvenance(const ExecData_t &call, const ADEventIDmap &event_man, const ParamInterface &algo_params, const ADCounter &counters, const ADMetadataParser &metadata, const int window_size, const int io_step, const unsigned long io_step_tstart, const unsigned long io_step_tend)¶
-
class ADAnomalyProvenance¶
ADCounter¶
-
namespace chimbuko
Typedefs
-
typedef std::list<CounterData_t> CounterDataList_t¶
-
typedef CounterDataList_t::iterator CounterDataListIterator_t¶
-
typedef std::map<unsigned long, std::list<CounterDataListIterator_t>> CounterTimeStamps_t¶
-
typedef std::map<unsigned long, std::list<CounterDataListIterator_t>> CountersByIndex_t¶
-
typedef mapPRT<CounterDataList_t> CounterDataListMap_p_t¶
map of process, rank, thread -> CounterDataList_t
-
typedef mapPRT<CounterTimeStamps_t> CounterTimeStampMap_p_t¶
map of process, rank, thread -> CounterTimeStamps_t
-
class ADCounter¶
- #include <ADCounter.hpp>
A class that stores counter events.
Public Functions
-
inline ADCounter()¶
-
inline ~ADCounter()¶
-
inline void linkCounterMap(const std::unordered_map<int, std::string> *m)¶
pass in the pointer to the mapping of counter index to counter description
- Parameters
m – hash map to counter descriptions
-
void addCounter(const Event_t &event)¶
Insert a new counter.
- Parameters
event – Event_t wrapper around the counter data
-
void addCounter(const CounterData_t &cdata)¶
Insert a new counter in CounterData_t form.
This function does not require the counter index->name map to be linked, but if it is a consistency check will be performed
- Parameters
cdata – CounterData_t instance
-
inline CounterDataListMap_p_t const *getCounters() const¶
Return all counters collected in the timestep.
-
CounterDataListMap_p_t *flushCounters()¶
Return all counters and clear internal state.
- Returns
A pointer to a list of counters (should be deleted externally)
-
std::list<CounterDataListIterator_t> getCountersInWindow(const unsigned long pid, const unsigned long rid, const unsigned long tid, const unsigned long t_start, const unsigned long t_end) const¶
Get counters for a particular process/rank/thread that were recorded in the window (t_start, t_end) [inclusive].
-
inline const CountersByIndex_t &getCountersByIndex() const¶
Get the map of counters by index.
Private Members
-
CounterDataListMap_p_t *m_counters¶
process/rank/thread -> List of counters
-
CounterTimeStampMap_p_t m_timestampCounterMap¶
process/rank/thread -> Ordered map of timestamp to counter list iterator (flushed with flushCounters)
-
CountersByIndex_t m_countersByIdx¶
Counter index -> all instances of this counter in the timestep (flushed with flushCounters)
-
inline ADCounter()¶
-
typedef std::list<CounterData_t> CounterDataList_t¶
ADDefine¶
Details.
Defines
-
IDX_P¶
index of program id
-
IDX_R¶
index of rank id
-
IDX_T¶
index of thread id
-
IDX_E¶
index of event (entry/exit/send/recv) id
-
FUNC_EVENT_DIM¶
dimension of a function (timer) event vector
-
FUNC_IDX_F¶
index of function (timer) id
-
FUNC_IDX_TS¶
index of timestamp in function (timer) event
-
COMM_EVENT_DIM¶
dimension of a communication event vector
-
COMM_IDX_TAG¶
index of communication tag
-
COMM_IDX_PARTNER¶
index of communication partner
-
COMM_IDX_BYTES¶
index of communication size (in bytes)
-
COMM_IDX_TS¶
index of communication timestamp
-
COUNTER_EVENT_DIM¶
dimension of a counter event vector
-
COUNTER_IDX_ID¶
index of counter idx
-
COUNTER_IDX_VALUE¶
index of counter value
-
COUNTER_IDX_TS¶
index of counter timestamp
-
MAX_RUNTIME¶
maximum execution time of a function (or a timer)
-
IO_VERSION¶
IO version number (deprecated)
-
namespace chimbuko
Enums
-
enum ParserError¶
Error kinds of the ADParser class
Values:
-
enumerator OK¶
OK (no error)
-
enumerator NoFuncData¶
Failed to fetch function data
-
enumerator NoCommData¶
Failed to fetch communication data
-
enumerator NoCountData¶
Failed to fetch counter data
-
enumerator OK¶
-
enum EventError¶
Error kinds of the ADEvent class.
Values:
-
enumerator OK¶
OK (no error)
-
enumerator UnknownEvent¶
unknown event error
-
enumerator UnknownFunc¶
unknown function (timer) error
-
enumerator CallStackViolation¶
call stack violoation error
-
enumerator EmptyCallStack¶
empty call stack error (i.e. exit before entry )
-
enumerator OK¶
-
enum IOError¶
Error kinds of the ADio class.
Values:
-
enumerator OK¶
OK (no error)
-
enumerator OutIndexRange¶
Out of index range error
-
enumerator OK¶
-
enum IOMode¶
I/O mode of the ADio class.
Values:
-
enumerator Off¶
no I/O
-
enumerator Offline¶
offline mode, dump to files
-
enumerator Online¶
online mode, stream data
-
enumerator Both¶
both, dump to files and stream it
-
enumerator Off¶
-
enum IOOpenMode¶
I/O open mode of the ADio class.
Values:
-
enumerator Read¶
Read
-
enumerator Write¶
Write
-
enumerator Read¶
-
enum ParserError¶
ADEvent¶
-
namespace chimbuko
Typedefs
-
typedef std::stack<CommData_t> CommStack_t¶
a stack of CommData_t
-
typedef mapPRT<CommStack_t> CommStackMap_p_t¶
map of process, rank, thread -> Commstack_t
-
typedef std::stack<CounterData_t> CounterStack_t¶
a stack of CounterData_t
-
typedef mapPRT<CounterStack_t> CounterStackMap_p_t¶
map of process, rank, thread -> Counterstack_t
-
typedef std::list<ExecData_t> CallList_t¶
list of function calls (ExecData_t) in entry time order
-
typedef CallList_t::iterator CallListIterator_t¶
iterator of CallList_t
-
typedef mapPRT<CallList_t> CallListMap_p_t¶
map of process, rank, thread -> CallList_t
-
typedef std::stack<CallListIterator_t> CallStack_t¶
function call stack
-
typedef mapPRT<CallStack_t> CallStackMap_p_t¶
map of process, rank, thread -> CallStack_t
-
typedef std::unordered_map<unsigned long, std::vector<CallListIterator_t>> ExecDataMap_t¶
hash map of a collection of ExecData_t per function
key is function id and value is a vector of CallListIterator_t (i.e. ExecData_t)
-
class ADEvent : public chimbuko::ADEventIDmap¶
- #include <ADEvent.hpp>
Event manager whose role is to correlate function entry and exit events and associate other counters with the function call.
When a function call with ENTRY signature is inserted, the event is placed on the call stack for that thread. Events associated with MPI comms and counters are also placed on their respective stacks. When a function call with EXIT signature on the same thread is inserted, a complete call is generated and placed in the call list, and all comm and counter events on their stacks are associated with that call.
Public Functions
-
ADEvent(bool verbose = false)¶
Construct a new ADEvent object.
- Parameters
verbose – true to print out detailed information (useful for debug)
-
inline void linkEventType(const std::unordered_map<int, std::string> *m)¶
copy a pointer that is externally defined even type object
- Parameters
m – event type object (hash map)
-
inline void linkFuncMap(const std::unordered_map<int, std::string> *m)¶
copy a pointer that is externally defined function map object
- Parameters
m – function map object
-
inline void linkCounterMap(const std::unordered_map<int, std::string> *m)¶
copy a pointer that is externally defined function map object
- Parameters
m – counter map object
-
inline void linkGPUthreadMap(const std::unordered_map<unsigned long, GPUvirtualThreadInfo> *m)¶
Optional: give the event manager knowledge of which threads are GPU threads, improves error checking.
-
inline const std::unordered_map<int, std::string> *getFuncMap() const¶
Get the Func Map object.
- Returns
const std::unordered_map<int, std::string>* pointer to function map object
-
inline const std::unordered_map<int, std::string> *getEventType() const¶
Get the Event Type object.
- Returns
const std::unordered_map<int, std::string>* pointer to event type object
-
inline const std::unordered_map<int, std::string> *getCounterMap() const¶
Get the Counter name object.
- Returns
const std::unordered_map<int, std::string>* pointer to counter name object
-
inline const ExecDataMap_t *getExecDataMap() const¶
Get the Exec Data Map object ( map of function id -> vector of iterators to ExecData objects )
- Returns
const ExecDataMap_t* pointer to ExecDataMap_t object
-
inline const CallListMap_p_t *getCallListMap() const¶
Get the Call List Map object ( map of pid/rid/tid -> list of ExecData objects )
- Returns
const CallListMap_p_t* pointer to CallListMap_p_t object
-
inline CallListMap_p_t &getCallListMap()¶
Get the Call List Map object.
- Returns
CallListMap_p_t& pointer to CallListMap_p_t object
-
virtual CallListIterator_t getCallData(const eventID &event_id) const override¶
Get an iterator to an ExecData_t instance with given event index string.
throws a runtime error if the call is not present in the call-list
-
virtual std::pair<CallListIterator_t, CallListIterator_t> getCallWindowStartEnd(const eventID &event_id, const int win_size) const override¶
Get a pair of iterators marking the start and one-past-the-end of a window of size (up to) win_size events on either size around the given event occurring on the same thread.
-
void clear()¶
Clear all data members.
-
EventError addEvent(const Event_t &event)¶
add an event
- Parameters
event – function or communication event
- Returns
EventError event error code
-
EventError addFunc(const Event_t &event)¶
add a function event
- Parameters
event – function event
- Returns
EventError event error code
-
EventError addComm(const Event_t &event)¶
add a communication event
- Parameters
event – communication event
- Returns
EventError event error code
-
EventError addCounter(const Event_t &event)¶
add a counter event
- Parameters
event – counter event
- Returns
EventError event error code
-
CallListIterator_t addCall(const ExecData_t &exec)¶
Add a complete function call, primarily for testing.
- Parameters
exec – Instance of ExecData_t
- Returns
Iterator to inserted call
-
CallListMap_p_t *trimCallList(int n_keep_thread = 0)¶
trim out all function calls that are completed (i.e. a pair of ENTRY and EXIT events are observed)
- Parameters
n_keep_thread – The amount of events per thread to maintain [if they exist] (allows window view to extend into previous io step)
- Returns
CallListMap_p_t* trimed function calls
-
size_t getCallListSize() const¶
Get the total number of function events in the call list over all pid/rid/tid.
-
void purgeCallList(int n_keep_thread = 0)¶
purge all function calls that are completed (i.e. a pair of ENTRY and EXIT events are observed)
Functionality is the same as trimCallList only it doesn’t return the trimmed function calls
- Parameters
n_keep_thread – The amount of events per thread to maintain [if they exist] (allows window view to extend into previous io step)
-
void show_status(bool verbose = false) const¶
show current call stack tree status
- Parameters
verbose – true to see all details
-
inline const std::unordered_map<unsigned long, CallListIterator_t> &getUnmatchCorrelationIDevents() const¶
Get the map of correlation ID to event for those events that have yet to be partnered.
Private Functions
-
void checkAndMatchCorrelationID(CallListIterator_t it)¶
Check if the event has a correlation ID counter, if so try to match it to an outstanding unmatched event with a correlation ID.
-
void stackProtectGC(CallListIterator_t it)¶
Flag the call and all it’s parental line such that they are protected from deletion by the garbage collection.
-
void stackUnProtectGC(CallListIterator_t it)¶
Flag the call and all it’s parental line such that they are not protected from deletion by the garbage collection, stopping if a call with an unmatched correlation ID is encountered.
Private Members
-
const std::unordered_map<int, std::string> *m_funcMap¶
pointer to map of function index to function name
-
const std::unordered_map<int, std::string> *m_eventType¶
pointer to map of event index to event type string
-
const std::unordered_map<int, std::string> *m_counterMap¶
pointer to map of counter index to counter name string
-
int m_eidx_func_entry¶
If previously seen, the eid corresponding to the function entry event (-1 otherwise)
-
int m_eidx_func_exit¶
If previously seen, the eid corresponding to the function exit event (-1 otherwise)
-
int m_eidx_comm_send¶
If previously seen, the eid corresponding to the comm send event (-1 otherwise)
-
int m_eidx_comm_recv¶
If previously seen, the eid corresponding to the comm recv event (-1 otherwise)
-
const std::unordered_map<unsigned long, GPUvirtualThreadInfo> *m_gpu_thread_Map¶
Optional: give the event manager knowledge of which threads are GPU threads, improves error checking.
-
CommStackMap_p_t m_commStack¶
communication event stack. Once a function call has exited, all comms events are associated with that call and the stack is cleared
-
CounterStackMap_p_t m_counterStack¶
map of process,rank,thread to counter events. Once a function call has exited, all counter events are associated with that call and the stack is cleaned.
-
CallStackMap_p_t m_callStack¶
map of process,rank,thread to the current function call stack. As functions exit they are popped from the stack
-
CallListMap_p_t m_callList¶
map of process,rank,thread to a list of ExecData_t objects which contain entry/exit timestamps for function calls
In practise the call list is purged of completed events each IO step through calls to trimCallList unless those elements are marked as non-deletable
-
ExecDataMap_t m_execDataMap¶
map of function index to an array of complete calls to this function during this IO step
In practise this map is cleared every IO step by calls to trimCallList
-
std::unordered_map<eventID, CallListIterator_t> m_callIDMap¶
map of call event index string to the event
Completed calls are removed from this list every IO step by calls to trimCallList
-
std::unordered_map<unsigned long, CallListIterator_t> m_unmatchedCorrelationID¶
Events with unmatched correlation IDs.
Events that correspond to GPU kernel launches and executions are given correlation IDs as counters that allow us to match the CPU thread that launched them to the GPU kernel event
-
bool m_verbose¶
verbose
-
ADEvent(bool verbose = false)¶
-
class ADEventIDmap¶
- #include <ADEvent.hpp>
An abstract interface for obtaining events given an event index.
Subclassed by chimbuko::ADEvent
Public Functions
-
virtual CallListIterator_t getCallData(const eventID &event_id) const = 0¶
Get an iterator to an ExecData_t instance with given event index string.
throws a runtime error if the call is not present in the call-list
-
virtual std::pair<CallListIterator_t, CallListIterator_t> getCallWindowStartEnd(const eventID &event_id, const int win_size) const = 0¶
Get a pair of iterators marking the start and one-past-the-end of a window of size (up to) win_size events on either size around the given event occurring on the same thread.
-
inline virtual ~ADEventIDmap()¶
-
virtual CallListIterator_t getCallData(const eventID &event_id) const = 0¶
-
struct EventInfo¶
- #include <ADEvent.hpp>
A type that stores some information about an event whose data may have been deleted.
Public Functions
-
inline EventInfo()¶
-
inline EventInfo(const ExecData_t &e, int entry_or_exit)¶
Create from an Event_t.
- Parameters
entry_or_exit – 0:entry 1:exit
-
inline EventInfo()¶
-
typedef std::stack<CommData_t> CommStack_t¶
ADglobalFunctionIndexMap¶
-
namespace chimbuko
-
class ADglobalFunctionIndexMap¶
- #include <ADglobalFunctionIndexMap.hpp>
A class that maintains a mapping of a local function index to a global function index that is specified by the parameter server.
If the parameter server is not connected it will simply return the local index
Public Functions
-
inline ADglobalFunctionIndexMap(unsigned long pid, ADThreadNetClient *net_client = nullptr)¶
Class constructor.
If a pointer to the net client is not provided the local index will not be synchronized betwee nodes
- Parameters
pid – The program index
A – pointer to the ADNetClient
-
inline bool connectedToPS() const¶
Check if the pserver is connected.
-
inline void linkNetClient(ADThreadNetClient *net_client)¶
Link the net client.
-
unsigned long lookup(const unsigned long local_idx, const std::string &func_name)¶
Lookup the global index corresponding to the input local index.
Function names must be unique
-
std::vector<unsigned long> lookup(const std::vector<unsigned long> &local_idx, const std::vector<std::string> &func_name)¶
Lookup the global indices corresponding to the input local indices as a batch.
Function names must be unique
-
unsigned long lookup(const unsigned long local_idx) const¶
Lookup the global index corresponding to the input local index (const version; throws if not already present)
-
inline ADThreadNetClient *getNetClient()¶
Return a pointer to the net client.
Private Members
-
ADThreadNetClient *m_net_client¶
-
std::unordered_map<unsigned long, unsigned long> m_idxmap¶
Map of local function index to global function index
-
unsigned long m_pid¶
Program index
-
inline ADglobalFunctionIndexMap(unsigned long pid, ADThreadNetClient *net_client = nullptr)¶
-
class ADglobalFunctionIndexMap¶
ADio¶
-
namespace chimbuko
-
class ADio¶
- #include <ADio.hpp>
A class that manages communication of JSON-formatted data to disk.
Public Functions
-
ADio(unsigned long program_idx, int rank)¶
Constructor.
- Parameters
program_idx – The program index
rank – MPI rank
-
~ADio()¶
-
inline void setProgramIdx(unsigned long pid)¶
Set the MPI rank of the current process.
-
inline unsigned long getProgramIdx() const¶
@ brief Get the program idx
-
inline void setRank(int rank)¶
Set the MPI rank of the current process.
-
inline int getRank() const¶
Get the MPI rank of the current process.
-
void setOutputPath(std::string path)¶
For disk output, provide the write path.
A zero length string will disable disk IO
-
void setDispatcher(std::string name = "ioDispatcher", size_t thread_cnt = 1)¶
If a DispatchQueue instance has not previously been created, create an instance with the parameters provided.
-
inline size_t getNumIOJobs() const¶
Get the number of threads performing the IO.
-
IOError writeJSON(const std::vector<nlohmann::json> &data, long long step, const std::string &file_stub)¶
Write an array of JSON objects.
- Parameters
file_stub – File will be ${file_stub}.${step}.json
-
inline void setDestructorThreadWaitTime(const int secs)¶
Set the amount of time between completion of thread dispatcher tasks and destruction of the dispatcher in the class destructor.
- Parameters
secs – The time in seconds
Private Members
-
DispatchQueue *m_dispatcher¶
Instance of multi-threaded writer
-
unsigned long m_program_idx¶
Program index
-
int m_rank¶
The MPI rank of the current process
-
int destructor_thread_waittime¶
Choose thread wait time in seconds after threadhandler has completed (default 10s)
-
ADio(unsigned long program_idx, int rank)¶
-
class ADio¶
ADLocalCounterStatistics¶
-
namespace chimbuko
-
class ADLocalCounterStatistics¶
- #include <ADLocalCounterStatistics.hpp>
A class that gathers local counter statistics and communicates them to the parameter server.
Public Functions
-
inline ADLocalCounterStatistics(const unsigned long program_idx, const int step, const std::unordered_set<std::string> *which_counters, PerfStats *perf = nullptr)¶
Constructor.
- Parameters
program_idx – The program index
step – The io step
which_counters – Pointer to a set of counters that will be collected (not all might appear in any given run). Use nullptr to collect all.
perf – Attach a PerfStats object into which performance metrics are accumulated
-
void gatherStatistics(const CountersByIndex_t &cntrs_by_idx)¶
Add counters to internal statistics.
-
std::pair<size_t, size_t> updateGlobalStatistics(ADThreadNetClient &net_client) const¶
update (send) counter statistics gathered during this io step to the connected parameter server
The message string is the output of net_serialize()
- Parameters
net_client – The network client object
- Returns
std::pair<size_t, size_t> [sent, recv] message size
-
inline void linkPerf(PerfStats *perf)¶
Attach a PerfStats object into which performance metrics are accumulated.
-
inline const std::unordered_map<std::string, RunStats> &getStats() const¶
Get the map of counter name to statistics.
-
nlohmann::json get_json_state() const¶
Get the JSON object that is sent to the parameter server.
-
State get_state() const¶
Get the State object that is sent to the parameter server.
The string form of this object is sent to the pserver using updateGlobalStatistics
-
void set_state(const State &s)¶
Set the internal variables to the given state object.
Note: it does not set the list of counters that are collected by gatherStatistics (m_which_counter)
-
void net_deserialize(const std::string &s)¶
Unserialize this class after communication over the network.
-
void setStats(const std::string &counter, const RunStats &to)¶
Set the statistics for a particular counter (must be in the list of counters being collected). Primarily used for testing.
-
inline unsigned long getProgramIdex() const¶
Get the program index.
-
inline int getIOstep() const¶
Get the IO step.
-
inline bool operator==(const ADLocalCounterStatistics &r) const¶
Comparison operator.
-
inline bool operator!=(const ADLocalCounterStatistics &r) const¶
Inequality operator.
Protected Attributes
-
unsigned long m_program_idx¶
Program idx
-
int m_step¶
io step
Protected Static Functions
-
static std::pair<size_t, size_t> updateGlobalStatistics(ADThreadNetClient &net_client, const std::string &l_stats, int step)¶
update (send) counter statistics gathered during this io step to the connected parameter server
- Parameters
net_client – The network client object
l_stats – local statistics
step – step (or frame) number
- Returns
std::pair<size_t, size_t> [sent, recv] message size
-
struct State¶
- #include <ADLocalCounterStatistics.hpp>
Data structure containing the data that is sent (in serialized form) to the parameter server.
-
inline ADLocalCounterStatistics(const unsigned long program_idx, const int step, const std::unordered_set<std::string> *which_counters, PerfStats *perf = nullptr)¶
-
class ADLocalCounterStatistics¶
ADLocalFuncStatistics¶
-
namespace chimbuko
-
class ADLocalFuncStatistics¶
- #include <ADLocalFuncStatistics.hpp>
A class that gathers local function statistics and communicates them to the parameter server.
Public Functions
-
inline ADLocalFuncStatistics()¶
-
ADLocalFuncStatistics(const unsigned long program_idx, const unsigned long rank, const int step, PerfStats *perf = nullptr)¶
-
void gatherStatistics(const ExecDataMap_t *exec_data)¶
Add function executions to internal statistics.
-
std::pair<size_t, size_t> updateGlobalStatistics(ADThreadNetClient &net_client) const¶
update (send) function statistics (#anomalies, incl/excl run times) gathered during this io step to the connected parameter server
The message communicated is the string dump of the output of get_json_state()
- Parameters
net_client – The network client object
- Returns
std::pair<size_t, size_t> [sent, recv] message size
-
nlohmann::json get_json_state() const¶
Get the current state as a JSON object.
The string dump of this object is the serialized form sent to the parameter server
-
State get_state() const¶
Get the current state as a state object.
The string dump of this object is the serialized form sent to the parameter server
-
inline void linkPerf(PerfStats *perf)¶
Attach a RunMetric object into which performance metrics are accumulated.
-
inline const AnomalyData &getAnomalyData() const¶
Access the AnomalyData instance.
-
inline const std::unordered_map<unsigned long, FuncStats> &getFuncStats() const¶
Access the function profile statistics.
-
void net_deserialize(const std::string &s)¶
Unserialize this class after communication over the network.
-
inline bool operator==(const ADLocalFuncStatistics &r) const¶
Comparison operator.
-
inline bool operator!=(const ADLocalFuncStatistics &r) const¶
Inequality operator.
Protected Attributes
-
AnomalyData m_anom_data¶
AnomalyData instance holding information about the anomalies
Protected Static Functions
-
static std::pair<size_t, size_t> updateGlobalStatistics(ADThreadNetClient &net_client, const std::string &l_stats, int step)¶
update (send) function statistics (#anomalies, incl/excl run times) gathered during this io step to the connected parameter server
- Parameters
net_client – The network client object
l_stats – local statistics
step – step (or frame) number
- Returns
std::pair<size_t, size_t> [sent, recv] message size
-
struct State¶
- #include <ADLocalFuncStatistics.hpp>
Data structure containing the data that is sent (in serialized form) to the parameter server.
Public Functions
-
template<class Archive>
inline void serialize(Archive &archive)¶ Serialize using cereal.
Statistics on overall anomalies
-
void deserialize_cerealpb(const std::string &strstate)¶
Serialize from Cereal portable binary format
-
nlohmann::json get_json() const¶
Create a JSON object from this instance.
Public Members
-
AnomalyData::State anomaly¶
Function stats for each function
-
template<class Archive>
-
inline ADLocalFuncStatistics()¶
-
class ADLocalFuncStatistics¶
ADMetadataParser¶
-
namespace chimbuko
-
class ADMetadataParser¶
- #include <ADMetadataParser.hpp>
A class that parses and maintains useful metadata.
Public Functions
-
void addData(const std::vector<MetaData_t> &new_metadata)¶
Add new metadata collected during this timeframe.
-
inline const std::unordered_map<unsigned long, GPUvirtualThreadInfo> &getGPUthreadMap() const¶
-
inline bool isGPUthread(const unsigned long thr) const¶
-
const GPUvirtualThreadInfo &getGPUthreadInfo(const unsigned long thread) const¶
Return the thread info struct for this thread. Throws an error if an invalid thread.
Private Functions
-
void parseMetadata(const MetaData_t &m)¶
Parse an individual metadata entry.
-
void addData(const std::vector<MetaData_t> &new_metadata)¶
-
struct GPUvirtualThreadInfo¶
- #include <ADMetadataParser.hpp>
Structure containing the CUDA device/context/stream associated with a given virtual thread index.
-
class ADMetadataParser¶
ADNetClient¶
-
namespace chimbuko
-
class ADLocalNetClient : public chimbuko::ADNetClient¶
- #include <ADNetClient.hpp>
Implementation of ADNetClient for intraprocess communications.
Public Functions
-
virtual void connect_ps(int rank, int srank = 0, std::string sname = "MPINET") override¶
connect to the parameter server
- Parameters
rank – Ignored
srank – Ignored
sname – Ignored
-
virtual void disconnect_ps() override¶
disconnect from the connected parameter server
Called automatically by destructor if not previously called
-
virtual void connect_ps(int rank, int srank = 0, std::string sname = "MPINET") override¶
-
class ADNetClient¶
- #include <ADNetClient.hpp>
A wrapper class to facilitate communications between the AD and the parameter server.
Subclassed by chimbuko::ADLocalNetClient, chimbuko::ADThreadNetClient, chimbuko::ADZMQNetClient
Public Functions
-
ADNetClient()¶
-
virtual ~ADNetClient()¶
-
inline bool use_ps() const¶
check if the parameter server is in use
- Returns
true if the parameter server is in use
- Returns
false if the parameter server is not in use
-
virtual void connect_ps(int rank, int srank = 0, std::string sname = "MPINET") = 0¶
connect to the parameter server
- Parameters
rank – this process rank
srank – server process rank. If using ZMQnet this is not applicable
sname – server name. If using ZMQNet this is the server ip address, for MPINet it is not applicable
-
virtual void disconnect_ps() = 0¶
disconnect from the connected parameter server
Called automatically by destructor if not previously called
-
inline int get_server_rank() const¶
Return the MPI rank of the parameter server.
-
inline int get_client_rank() const¶
Return the MPI rank of this client.
-
virtual std::string send_and_receive(const Message &msg) = 0¶
Send a message to the parameter server and receive the response in a serialized format.
- Parameters
msg – The message
- Returns
The response message in serialized format. Use Message::set_msg( <serialized_msg>, true ) to unpack
-
void send_and_receive(Message &recv, const Message &send)¶
Send a message to the parameter server and receive the response both as Message objects.
Note recv and send can be the same object
- Parameters
send – The sent message
recv – The received message
-
virtual void async_send(const Message &send)¶
Perform a non-blocking send operation.
Not all net clients support non-blocking sends, in which case it will default to a blocking send
-
inline virtual bool async_send_supported() const¶
Check if a net client supports non-blocking sends.
-
inline void linkPerf(PerfStats *perf)¶
If linked timing and packet size information will be gathered.
-
inline virtual void setRecvTimeout(const int timeout_ms)¶
Set the timeout for receiving messages (implementation dependent)
-
ADNetClient()¶
-
class ADThreadNetClient : public chimbuko::ADNetClient¶
- #include <ADNetClient.hpp>
ADNetClient inside a worker thread with blocking send/receive and non-blocking send.
Public Functions
-
ADThreadNetClient(bool local = false)¶
Constructor.
- Parameters
local – Use a local (in process) communicator if true, otherwise use the default network communicator
-
void enqueue_action(ClientAction *action)¶
Add an action to the queue.
Use only if you know what you are doing!
-
virtual void connect_ps(int rank, int srank = 0, std::string sname = "MPINET") override¶
Connect to the parameter server.
-
virtual void disconnect_ps() override¶
Disconnect from the parameter server.
-
virtual std::string send_and_receive(const Message &send) override¶
Perform a blocking send and receive operation.
- Parameters
msg – The message
- Returns
The response message in serialized format. Use Message::set_msg( <serialized_msg>, true ) to unpack
-
inline virtual bool async_send_supported() const override¶
Check if a net client supports non-blocking sends.
-
virtual void setRecvTimeout(const int timeout_ms) override¶
Set a timeout (in ms) on receiving a response message.
-
void stopWorkerThread()¶
Stop the worker thread. Called automatically by destructor.
-
~ADThreadNetClient()¶
Private Functions
-
size_t getNwork() const¶
Get the number of outstanding net operations.
-
ClientAction *getWorkItem()¶
Get the next net operation.
-
void run(bool local = false)¶
Create the worker thread.
- Parameters
local – Use a local (in process) communicator if true, otherwise use the default network communicator
Private Members
-
std::queue<ClientAction*> queue¶
The queue of net operations
-
bool m_is_running¶
Is the worker thread running?
-
struct ClientAction¶
- #include <ADNetClient.hpp>
Virtual class representing actions performed by the worker thread.
Public Functions
-
virtual void perform(ADNetClient &client) = 0¶
Perform the action utilizing the underlying net implementation.
-
virtual bool do_delete() const = 0¶
Whether to delete the work object (instance of ClientAction) after completion.
-
inline virtual bool shutdown_worker() const¶
Whether to shutdown the worker thread after completing the action.
-
inline virtual ~ClientAction()¶
-
virtual void perform(ADNetClient &client) = 0¶
-
ADThreadNetClient(bool local = false)¶
-
class ADZMQNetClient : public chimbuko::ADNetClient¶
- #include <ADNetClient.hpp>
Implementation of the ADNetClient interface for the ZMQNet network.
Public Functions
-
ADZMQNetClient()¶
-
~ADZMQNetClient()¶
-
virtual void connect_ps(int rank, int srank = 0, std::string sname = "MPINET") override¶
connect to the parameter server
- Parameters
rank – this process rank
srank – Ignored for this class
sname – The server ip address
-
virtual void disconnect_ps() override¶
disconnect from the connected parameter server
Called automatically by destructor if not previously called
-
virtual std::string send_and_receive(const Message &msg) override¶
Send a message to the parameter server and receive the response in a serialized format.
- Parameters
msg – The message
- Returns
The response message in serialized format. Use Message::set_msg( <serialized_msg>, true ) to unpack
-
inline virtual void setRecvTimeout(const int timeout_ms) override¶
Set the timeout on blocking receives. Must be called prior to connecting.
-
inline void *getZMQsocket()¶
Get the zeroMQ socket.
-
inline void *getZMQcontext()¶
Get the zeroMQ context.
-
void stopServer()¶
Issue a stop command to the server. The server will then stop once all clients have disconnected and all messages processed.
-
ADZMQNetClient()¶
-
class ADLocalNetClient : public chimbuko::ADNetClient¶
ADNormalEventProvenance¶
-
namespace chimbuko
-
class ADNormalEventProvenance¶
- #include <ADNormalEventProvenance.hpp>
A class that maintains the provenance information for the most recent normal event for each encountered function A mechanism is provided for dealing with cases where a normal execution is not yet available Once returned the internal copy is deleted ensuring a given normal execution is only ever output once.
Public Functions
-
void addNormalEvent(const unsigned long pid, const unsigned long rid, const unsigned long tid, const unsigned long fid, const nlohmann::json &event)¶
Add a normal event. If a normal event already exists with this pid,rid,tid,fid it will be overwritten.
-
std::pair<nlohmann::json, bool> getNormalEvent(const unsigned long pid, const unsigned long rid, const unsigned long tid, const unsigned long fid, bool add_outstanding, bool do_delete)¶
Get a normal event if available.
- Parameters
add_outstanding – If true and the event is not available the pid/rid/tid/fid will be placed in a list of outstanding requests that will be furnished later
do_delete – If true and the event is available, the stored copy will be deleted
- Returns
The JSON data if available, and a bool indicating if the data is populated
-
std::vector<nlohmann::json> getOutstandingRequests(bool do_delete)¶
For normal event requests that were not previously available, calls to this function will see if a normal event now exists.
- Parameters
do_delete – If true and the event is available, the stored copy will be deleted
- Returns
A vector of outstanding requests that have now been furnished
Private Functions
-
void addOutstanding(const unsigned long pid, const unsigned long rid, const unsigned long tid, const unsigned long fid)¶
Add an entry to the list of outstanding requests.
-
void addNormalEvent(const unsigned long pid, const unsigned long rid, const unsigned long tid, const unsigned long fid, const nlohmann::json &event)¶
-
class ADNormalEventProvenance¶
ADOutlier¶
-
namespace chimbuko
-
class ADOutlier¶
- #include <ADOutlier.hpp>
abstract class for anomaly detection algorithms
Subclassed by chimbuko::ADOutlierCOPOD, chimbuko::ADOutlierHBOS, chimbuko::ADOutlierSSTD
Public Types
Public Functions
-
ADOutlier(OutlierStatistic stat = ExclusiveRuntime)¶
Construct a new ADOutlier object.
-
inline bool use_ps() const¶
check if the parameter server is in use
- Returns
true if the parameter server is in use
- Returns
false if the parameter server is not in use
-
inline void linkExecDataMap(const ExecDataMap_t *m)¶
copy a pointer to execution data map
- See
- Parameters
m –
-
void linkNetworkClient(ADThreadNetClient *client)¶
Link the interface for communicating with the parameter server.
-
virtual Anomalies run(int step = 0) = 0¶
abstract method to run the implemented anomaly detection algorithm
- Parameters
step – step (or frame) number
- Returns
data structure containing information on captured anomalies
-
inline void linkPerf(PerfStats *perf)¶
If linked, performance information on the sync_param routine will be gathered.
-
inline ParamInterface const *get_global_parameters() const¶
Get the local copy of the global parameters.
- Returns
Pointer to a ParamInterface object
Public Static Functions
-
static ADOutlier *set_algorithm(OutlierStatistic stat, const std::string &algorithm, const double &hbos_thres, const bool &glob_thres, const double &sstd_sigma)¶
Fatory method to select AD algorithm at runtime.
Protected Functions
-
virtual unsigned long compute_outliers(Anomalies &outliers, const unsigned long func_id, std::vector<CallListIterator_t> &data) = 0¶
abstract method to compute outliers (or anomalies)
- Parameters
outliers – [out] data structure containing captured anomalies
func_id – function id
data – [inout] a list of function calls to inspect. Entries will be tagged as outliers
- Returns
unsigned long the number of outliers (or anomalies)
-
virtual std::pair<size_t, size_t> sync_param(ParamInterface const *param)¶
synchronize local parameters with global parameters
- Parameters
param – [in] local parameters
- Returns
std::pair<size_t, size_t> [sent, recv] message size
-
inline void setStatistic(OutlierStatistic to)¶
Set the statistic used for the anomaly detection.
-
double getStatisticValue(const ExecData_t &e) const¶
Extract the appropriate statistic from an ExecData_t object.
Protected Attributes
-
int m_rank¶
this process rank
-
bool m_use_ps¶
true if the parameter server is in use
-
ADThreadNetClient *m_net_client¶
interface for communicating to parameter server
-
std::unordered_map<std::array<unsigned long, 4>, size_t, ArrayHasher<unsigned long, 4>> m_local_func_exec_count¶
Map(program id, rank id, thread id, func id) -> number of times encountered on this node
-
const ExecDataMap_t *m_execDataMap¶
execution data map
-
ParamInterface *m_param¶
global parameters (kept in sync with parameter server)
Private Members
-
OutlierStatistic m_statistic¶
-
ADOutlier(OutlierStatistic stat = ExclusiveRuntime)¶
-
class ADOutlierCOPOD : public chimbuko::ADOutlier¶
- #include <ADOutlier.hpp>
COPOD anomaly detection algorithm.
Public Functions
-
ADOutlierCOPOD(OutlierStatistic stat = ExclusiveRuntime, double threshold = 0.99, bool use_global_threshold = true)¶
Construct a new ADOutlierCOPOD object.
-
~ADOutlierCOPOD()¶
Destroy the ADOutlierCOPOD object.
-
inline void set_alpha(double alpha)¶
Set the alpha value.
- Parameters
regularizer – alpha value
Protected Functions
-
virtual unsigned long compute_outliers(Anomalies &outliers, const unsigned long func_id, std::vector<CallListIterator_t> &data) override¶
compute outliers (or anomalies) of the list of function calls
- Parameters
outliers – [out] Array of function calls that were tagged as outliers
func_id – function id
data[in, out] – a list of function calls to inspect
- Returns
unsigned long the number of outliers (or anomalies)
-
double _scott_binWidth(std::vector<double> &vals)¶
Scott’s rule for bin_width estimation during histogram formation.
-
ADOutlierCOPOD(OutlierStatistic stat = ExclusiveRuntime, double threshold = 0.99, bool use_global_threshold = true)¶
-
class ADOutlierHBOS : public chimbuko::ADOutlier¶
- #include <ADOutlier.hpp>
HBOS anomaly detection algorithm.
Public Functions
-
ADOutlierHBOS(OutlierStatistic stat = ExclusiveRuntime, double threshold = 0.99, bool use_global_threshold = true)¶
Construct a new ADOutlierHBOS object.
-
~ADOutlierHBOS()¶
Destroy the ADOutlierHBOS object.
-
inline void set_alpha(double alpha)¶
Set the alpha value.
- Parameters
regularizer – alpha value
Protected Functions
-
virtual unsigned long compute_outliers(Anomalies &outliers, const unsigned long func_id, std::vector<CallListIterator_t> &data) override¶
compute outliers (or anomalies) of the list of function calls
- Parameters
outliers – [out] Array of function calls that were tagged as outliers
func_id – function id
data[in, out] – a list of function calls to inspect
- Returns
unsigned long the number of outliers (or anomalies)
-
ADOutlierHBOS(OutlierStatistic stat = ExclusiveRuntime, double threshold = 0.99, bool use_global_threshold = true)¶
-
class ADOutlierSSTD : public chimbuko::ADOutlier¶
- #include <ADOutlier.hpp>
statistic analysis based anomaly detection algorithm
Public Functions
-
ADOutlierSSTD(OutlierStatistic stat = ExclusiveRuntime, double sigma = 6.0)¶
Construct a new ADOutlierSSTD object.
-
~ADOutlierSSTD()¶
Destroy the ADOutlierSSTD object.
-
inline void set_sigma(double sigma)¶
Set the sigma value.
- Parameters
sigma – sigma value
Protected Functions
-
virtual unsigned long compute_outliers(Anomalies &outliers, const unsigned long func_id, std::vector<CallListIterator_t> &data) override¶
compute outliers (or anomalies) of the list of function calls
- Parameters
outliers – [out] Array of function calls that were tagged as outliers
func_id – function id
data[in, out] – a list of function calls to inspect
- Returns
unsigned long the number of outliers (or anomalies)
-
double computeScore(CallListIterator_t ev, const SstdParam &stats) const¶
Compute the anomaly score (probability) for an event assuming a Gaussian distribution.
Private Members
-
double m_sigma¶
sigma
-
ADOutlierSSTD(OutlierStatistic stat = ExclusiveRuntime, double sigma = 6.0)¶
-
class ADOutlier¶
ADParser¶
-
namespace chimbuko
-
class ADParser¶
- #include <ADParser.hpp>
parsing performance trace data streamed via ADIOS2
Note: The “function index” assigned to each function by Tau is not necessarily the same for every node as it depends on the order in which the function is encountered. To deal with this, if the parameter server is running it maintains a global mapping of function name to an index, which is synchronized to the parser (providing the net client is linked) and the local index is replaced by the global index in the incoming data stream.
Note2: The “program index” assigned by Tau is defunct (always 0). We must therefore replace it manually with a correct index to support workflows
Public Functions
-
ADParser(std::string inputFile, unsigned long program_idx, int rank, std::string engineType = "BPFile", int openTimeoutSeconds = 60)¶
Construct a new ADParser object.
- Parameters
inputFile – ADIOS2 BP filename
program_index – The index to assign to the program whose trace data is being parsed
rank – Rank of current process
engineType – BPFile or SST
openTimeoutSeconds – Timeout for opening ADIOS2 stream
-
inline void linkNetClient(ADThreadNetClient *net_client)¶
Link the net client to the object that maintains a mapping of local function index to global index.
If this is performed, the parser will replace the local with global index in the incoming data stream
-
inline const std::unordered_map<int, std::string> *getFuncMap() const¶
Get the function hash map (function id > function name)
- Returns
const std::unordered_map<int, std::string>* function hash map
-
inline const std::unordered_map<int, std::string> *getEventType() const¶
Get the event type hash map (event type id > event name)
- Returns
const std::unordered_map<int, std::string>* event type hash map
-
inline const std::unordered_map<int, std::string> *getCounterMap() const¶
Get the counter hash map (counter id > counter description)
- Returns
const std::unordered_map<int, std::string>* event type hash map
-
inline bool getStatus() const¶
Get the status of this parser.
- Returns
true if it is connected with a writer
- Returns
false if it is disconnected or there are no available data anymore
-
inline int getCurrentStep() const¶
Get the current step (or frame) number.
Returns a value of -1 if beginStep has yet to be called.
- Returns
int step number
-
int beginStep(bool verbose = false)¶
start fetching next available data
- Parameters
verbose – true to output additional information
- Returns
int current step number
-
void endStep()¶
end current step (or frame), only effect on ADIOS2 SST engine
Set the timeout in seconds on waiting for the next ADIOS2 timestep (default 30s)
-
inline void setBeginStepTimeout(int timeout)¶
-
void update_attributes()¶
update attributes (or meta data), with ADIOS2 BPFile engine it only fetches the available attributes one time.
-
ParserError fetchFuncData()¶
fetching function (timer) data. Results stored internally and extracted using ADParser::getFuncData
- Returns
ParserError error code
-
ParserError fetchCommData()¶
fetching communication data. Results stored internally and extracted using ADParser::getCommData
- Returns
ParserError error code
-
ParserError fetchCounterData()¶
fetching counter data. Results stored internally and extracted using ADParser::getCounterData
- Returns
ParserError error code
-
inline const unsigned long *getFuncData(size_t idx) const¶
get pointer to an array of a function event specified by
idx- Parameters
idx – index of a function event
- Returns
pointer to a function event array
-
inline size_t getNumFuncData() const¶
Get the number of function events in the current step.
- Returns
size_t the number of function events
-
inline const unsigned long *getCommData(size_t idx) const¶
get pointer to a communication event array specified by
idx- Parameters
idx – index of a communication event
- Returns
pointer to a communication event array
-
inline size_t getNumCommData() const¶
Get the number of communication events in the current step.
- Returns
size_t the number of communication events
-
inline const unsigned long *getCounterData(size_t idx) const¶
get pointer to a counter event array specified by
idx- Parameters
idx – index of a counter event
- Returns
pointer to a counter event array
-
inline size_t getNumCounterData() const¶
Get the number of counter events in the current step.
- Returns
size_t the number of counter events
-
inline const std::vector<MetaData_t> &getNewMetaData() const¶
Get metadata parsed for the first time during the current step.
-
std::vector<Event_t> getEvents() const¶
Get all the events (func, comm and counter) occuring in the IO step.
Events are guaranteed ordered by their timestamp on a per-thread basis. Global ordering of events is not guaranteed
-
void addFuncData(unsigned long const *d)¶
For testing purposes, add the data in the array d to the internal m_event_timestamps array.
Will throw an error if the new array size exceeds the vector capacity as this would invalidate previous Event_t objects
- Parameters
d – An array of length FUNC_EVENT_DIM
-
void addCounterData(unsigned long const *d)¶
For testing purposes, add the data in the array d to the internal m_counter_timestamps array.
Will throw an error if the new array size exceeds the vector capacity as this would invalidate previous Event_t objects
- Parameters
d – An array of length COUNTER_EVENT_DIM
-
void addCommData(unsigned long const *d)¶
For testing purposes, add the data in the array d to the internal m_comm_timestamps array.
Will throw an error if the new array size exceeds the vector capacity as this would invalidate previous Event_t objects
- Parameters
d – An array of length COMM_EVENT_DIM
-
inline void setFuncDataCapacity(size_t cap)¶
Set the m_event_timestamps vector capacity in units of FUNC_EVENT_DIM. This will invalidate previous Event_t objects if it requires a realloc!
-
inline void setCommDataCapacity(size_t cap)¶
Set the m_comm_timestamps vector capacity in units of COMM_EVENT_DIM. This will invalidate previous Event_t objects if it requires a realloc!
-
inline void setCounterDataCapacity(size_t cap)¶
Set the m_counter_timestamp vector capacity in units of COUNTER_EVENT_DIM. This will invalidate previous Event_t objects if it requires a realloc!
-
inline void setFuncMap(const std::unordered_map<int, std::string> &m)¶
Set the function index->name map for testing.
-
inline void setEventTypeMap(const std::unordered_map<int, std::string> &m)¶
Set the function event index -> event type map for testing.
-
inline void setCounterMap(const std::unordered_map<int, std::string> &m)¶
Set the counter index->name map for testing.
-
inline unsigned long getGlobalFunctionIndex(const unsigned long local_idx) const¶
Get the global index corresponding to a given local function index. 1<->1 mapping if pserver not connected.
-
inline void setDataRankOverride(bool to)¶
When true the parser will override the rank index of the parsed data with the rank member parameter.
This is useful for example when multiple instances of a non-MPI program are being run and the user wishes to distinguish them by the rank index
Private Functions
-
bool checkEventOrder(const EventDataType type, bool exit_on_fail) const¶
Scan the data and check the events are in order.
- Parameters
exit_on_fail – Throw an error if the check fails
- Returns
true if the events are in order, false otherwise
-
bool validateEvent(const unsigned long *e) const¶
Validate an event to bypass corrupted input data (any event type)
- Parameters
e – Pointer to event data
-
std::pair<Event_t, bool> createAndValidateEvent(const unsigned long *data, EventDataType t, size_t idx, const eventID &id, bool log_error = true) const¶
Create an Event_t instance from the data at the provided pointer and run simple validation.
- Parameters
log_error – If true a recoverable error will be logged for invalid events
Private Members
-
adios2::ADIOS m_ad¶
adios2 handler
-
adios2::IO m_io¶
adios2 I/O handler
-
adios2::Engine m_reader¶
adios2 engine handler
-
int m_beginstep_timeout¶
the timeout in seconds on waiting for the next ADIOS2 timestep
-
bool m_status¶
parser status
-
bool m_opened¶
true if connected to a writer or a BP file
-
bool m_attr_once¶
true for BP engine
-
int m_current_step¶
current step
-
int m_rank¶
Rank of current process
-
unsigned long m_program_idx¶
Program index
-
std::vector<MetaData_t> m_new_metadata¶
New metadata that appeared on this step
-
std::unordered_map<int, std::string> m_funcMap¶
function hash map (global function id > function name)
-
size_t m_timer_event_count¶
the number of function events in current step
-
size_t m_comm_count¶
the number of communication events in current step
-
size_t m_counter_count¶
the number of counter events in the current step
-
ADglobalFunctionIndexMap m_global_func_idx_map¶
Maintains mapping of local function index to global function index (if pserver connected)
-
bool m_data_rank_override¶
Overwrite the rank index in the parsed data with member m_rank (default false)
-
ADParser(std::string inputFile, unsigned long program_idx, int rank, std::string engineType = "BPFile", int openTimeoutSeconds = 60)¶
-
class ADParser¶
ADProvenanceDBclient¶
ADProvenanceDBengine¶
AnomalyData¶
-
namespace chimbuko
Functions
-
bool operator==(const AnomalyData &a, const AnomalyData &b)¶
-
bool operator!=(const AnomalyData &a, const AnomalyData &b)¶
-
class AnomalyData¶
- #include <AnomalyData.hpp>
A class that contains data on the number of anomalies collected during the present timestep. It contains the number of anomalies and the timestamp window in which the anomalies occurred.
These data are aggregated over rank to form the anomaly_stats.anomaly field of the pserver streaming output
Public Functions
-
AnomalyData()¶
Default constructor.
-
AnomalyData(unsigned long app, unsigned long rank, unsigned step, unsigned long min_ts, unsigned long max_ts, unsigned long n_anomalies)¶
Constructor by value.
- Parameters
app – The application index
rank – The MPI rank of the process
step – The io step
min_ts – The first timestamp at which an anomaly was observed in this io step
max_ts – The last timestamp at which an anomaly was observed in this io step
n_anomalies – The number of anomalies observed in this io step
-
void set(unsigned long app, unsigned long rank, unsigned step, unsigned long min_ts, unsigned long max_ts, unsigned long n_anomalies)¶
Set the parameters.
- Parameters
app – The application index
rank – The MPI rank of the process
step – The io step
min_ts – The first timestamp at which an anomaly was observed in this io step
max_ts – The last timestamp at which an anomaly was observed in this io step
n_anomalies – The number of anomalies observed in this io step
-
inline unsigned long get_app() const¶
Get the application index.
-
inline unsigned long get_rank() const¶
Get the MPI rank of the process.
-
inline unsigned long get_step() const¶
Get the io step.
-
inline unsigned long get_min_ts() const¶
Get the first timestamp at which an anomaly was observed in this io step.
-
inline unsigned long get_max_ts() const¶
Get the last timestamp at which an anomaly was observed in this io step.
-
inline unsigned long get_n_anomalies() const¶
Get the number of anomalies observed in this io step.
-
inline void set_min_ts(unsigned long to)¶
Set the earliest timestamp.
-
inline void set_max_ts(unsigned long to)¶
Set the last timestamp.
-
inline void incr_n_anomalies(unsigned long by)¶
Increment the number of anomalies.
-
inline void add_outlier_score(double val)¶
Add an outlier score to the internal statistics.
-
inline nlohmann::json get_json_state() const¶
Serialize the state (internal variables) of this instance in JSON format.
-
nlohmann::json get_json() const¶
Get the object in JSON format.
-
inline void set_json_state(const nlohmann::json &j)¶
Set the member variables according to the state in JSON format.
Private Members
-
unsigned long m_app¶
The application index
-
unsigned long m_rank¶
The MPI rank of the process
-
unsigned long m_step¶
The io step
-
unsigned long m_min_timestamp¶
The first timestamp at which an anomaly was observed in this io step
-
unsigned long m_max_timestamp¶
The last timestamp at which an anomaly was observed in this io step
-
unsigned long m_n_anomalies¶
The number of anomalies observed in this io step
Friends
- friend friend bool operator== (const AnomalyData &a, const AnomalyData &b)
Comparison operator.
- friend friend bool operator!= (const AnomalyData &a, const AnomalyData &b)
Negative comparison operator.
-
struct State¶
- #include <AnomalyData.hpp>
State struct for serialization.
Public Functions
-
State(const AnomalyData &p)¶
-
inline State()¶
-
State(const nlohmann::json &j)¶
-
void deserialize_cerealpb(const std::string &strstate)¶
Serialize from Cereal portable binary format
-
nlohmann::json get_json() const¶
Serialize this instance in JSON format.
-
void set_json(const nlohmann::json &j)¶
Set the state from a JSON object.
-
State(const AnomalyData &p)¶
-
AnomalyData()¶
-
bool operator==(const AnomalyData &a, const AnomalyData &b)¶
ExecData¶
-
namespace chimbuko
Functions
-
class CommData_t¶
- #include <ExecData.hpp>
wrapper for communication event
Public Functions
-
CommData_t()¶
Construct a new CommData_t object.
-
CommData_t(const Event_t &ev, const std::string &commType)¶
Construct a new CommData_t object.
- Parameters
ev – constant reference to a Event_t object
commType – communication type (e.g. SEND/RECV)
-
CommData_t(unsigned long pid, unsigned long rid, unsigned long tid, unsigned long partner, unsigned long bytes, unsigned long tag, unsigned long timestamp, const std::string &commType)¶
Construct a new CommData_t object by values.
- Parameters
pid – Program index
rid – Rank index
tid – Thread index
partner – The other rank involved in the communication
bytes – The number of bytes sent/received
tag – The tag of the event
timestamp – The time of the counter
commType – Either “SEND” or “RECV” depending on the comm type
-
inline ~CommData_t()¶
Destroy the CommData_t object.
-
inline unsigned long ts() const¶
return timestamp
-
inline unsigned long src() const¶
return source process id of this communication event
-
inline unsigned long tar() const¶
return target (or destination) process id of this communication event
-
inline unsigned long tag() const¶
Get the integer tag associated with this comm event.
-
inline void set_exec_key(const eventID &key)¶
Set the execution key id (i.e. where this communication event occurs). This is equal to the “id” object associated with a parent ExecData_t object.
- Parameters
key – execution id
-
inline const eventID &get_exec_key() const¶
Get the execution key id. This is equal to the “id” object associated with a parent ExecData_t object.
-
bool is_same(const CommData_t &other) const¶
compare two communication data
- Parameters
other –
- Returns
true if other is same
- Returns
false if other is different
-
nlohmann::json get_json() const¶
Get the json object of this communication data.
Private Members
-
unsigned long m_pid¶
program id
-
unsigned long m_rid¶
rank id
-
unsigned long m_tid¶
thread id
-
unsigned long m_src¶
source process id
-
unsigned long m_tar¶
target process id
-
unsigned long m_bytes¶
communication data size in bytes
-
unsigned long m_tag¶
communication tag
-
unsigned long m_ts¶
communication timestamp
-
CommData_t()¶
-
class CounterData_t¶
- #include <ExecData.hpp>
wrapper for counter event
Public Functions
-
CounterData_t()¶
Construct a new CounterData_t object.
-
CounterData_t(const Event_t &ev, const std::string &counter_name)¶
Construct a new CounterData_t object.
- Parameters
ev – constant reference to a Event_t object
commType – communication type (e.g. SEND/RECV)
-
CounterData_t(unsigned long pid, unsigned long rid, unsigned long tid, unsigned long counter_id, const std::string &counter_name, unsigned long counter_value, unsigned long timestamp)¶
Construct a new CounterData_t object by values.
- Parameters
pid – Program index
rid – Rank index
tid – Thread index
counter_id – Counter index
counter_name – The name of the counter (should match the counter_id through the name map)
counter_value – The value of the counter
timestamp – The time of the counter
-
nlohmann::json get_json() const¶
Get the json object of this communication data.
-
inline unsigned long get_pid() const¶
return program id
-
inline unsigned long get_rid() const¶
return rank id
-
inline unsigned long get_tid() const¶
return thread id
-
inline unsigned long get_value() const¶
Return the value of the counter.
-
inline unsigned long get_ts() const¶
Return the counter timestamp.
-
inline unsigned long get_counterid() const¶
Return the index of the counter.
-
inline void set_exec_key(const eventID &key)¶
Set the execution key id (i.e. where this counter event occurs). This is equal to the “id” string associated with a parent ExecData_t object.
- Parameters
key – execution id
-
inline const eventID &get_exec_key() const¶
Get the execution key id. This is equal to the “id” string associated with a parent ExecData_t object.
-
CounterData_t()¶
-
class Event_t¶
- #include <ExecData.hpp>
class to provide easy access to raw performance event vector
The data are passed in via ADIOS2 and stored internally in a compressed format in the form of an integer array, blocks of which are associated with particular events. Each block has a certain number of entries associated with it that relate to information such as program, comm and thread index, timestamp as well as detailed event information. The mappings are set out in ADDefine.hpp.
This class wraps the event data blocks allowing for retrieval of event information through explicit function calls. It works for all event types: function, comm and counter
Public Functions
-
inline Event_t(const unsigned long *data, EventDataType t, size_t idx, const eventID &id = eventID())¶
Construct a new Event_t object.
- Parameters
data – pointer to raw performance event vector
t – event type
idx – event index
id – event id
-
inline bool valid() const¶
check if the raw data pointer is valid
-
inline size_t idx() const¶
return event index, typically the index of the event in the input array for the timestep on which it was spawned
-
inline unsigned long pid() const¶
return program id
-
inline unsigned long rid() const¶
return rank id
-
inline unsigned long tid() const¶
return thread id
-
unsigned long eid() const¶
return event type id (FUNC/COMM only). Eg for FUNC events is is ENTRY/EXIT
-
unsigned long ts() const¶
return timestamp of this event
-
inline EventDataType type() const¶
return event type
-
unsigned long fid() const¶
return function (timer) id (FUNC event only)
-
unsigned long tag() const¶
return communication tag id (COMM event only)
-
unsigned long partner() const¶
return communication partner id (COMM event only)
-
unsigned long bytes() const¶
return communication data size (in bytes) (COMM event only)
-
unsigned long counter_id() const¶
return counter id (COUNT event only)
-
unsigned long counter_value() const¶
return the value of the counter (COUNT event only)
-
bool operator==(const Event_t &r) const¶
Equivalence operation.
Note the underlying array pointers can be different providing the values are identical
-
nlohmann::json get_json() const¶
Get the json object of this event object.
-
inline const unsigned long *get_ptr() const¶
Return the pointer to the underlying data.
-
int get_data_len() const¶
Get the length of the underlying array.
Private Members
-
const unsigned long *m_data¶
pointer to raw performance trace data vector
-
EventDataType m_t¶
event type
-
size_t m_idx¶
event index
Friends
- friend friend bool operator< (const Event_t &lhs, const Event_t &rhs)
compare two events
- friend friend bool operator> (const Event_t &lhs, const Event_t &rhs)
compare two events
-
inline Event_t(const unsigned long *data, EventDataType t, size_t idx, const eventID &id = eventID())¶
-
class ExecData_t¶
- #include <ExecData.hpp>
A pair of function (timer) events, ENTRY and EXIT.
Public Functions
-
ExecData_t()¶
Construct a new ExecData_t object.
-
ExecData_t(const Event_t &ev)¶
Construct a new ExecData_t object.
- Parameters
ev – constant reference to a Event_t object
-
ExecData_t(const eventID &id, unsigned long pid, unsigned long rid, unsigned long tid, unsigned long fid, const std::string &func_name, long entry, long exit = -1)¶
Construct a new ExecData_t object by values.
- Parameters
id – The id associated with the instance
pid – Program index
rid – Rank index
tid – Thread index
fid – Function index
func_name – The name of the function (should match the function index through the name map)
entry – Timestamp of function start (entry)
exit – Timestamp of function exit. A value of -1 (default) indicates that the exit is not yet known. It should be set later using update_exit
-
~ExecData_t()¶
Destroy the ExecData_t object.
-
inline unsigned long get_pid() const¶
Get the program id of this execution data.
-
inline unsigned long get_tid() const¶
Get the thread id of this execution data.
-
inline unsigned long get_rid() const¶
Get the rank id of this execution data.
-
inline unsigned long get_fid() const¶
Get the function id of this execution data.
-
inline long get_entry() const¶
Get the entry time of this execution data.
-
inline long get_exit() const¶
Get the exit time of this execution data.
-
inline long get_runtime() const¶
Get the (inclusive) running time of this execution data.
-
inline long get_inclusive() const¶
Get the (inclusive) running time of this execution data.
-
inline long get_exclusive() const¶
Get the exclusive running ime of this execution data.
-
inline int get_label() const¶
Get the label of this execution data.
- Returns
int 1 of normal and -1 if anomaly. Returns 0 if no label has been assigned.
-
inline const std::deque<CommData_t> &get_messages() const¶
Get a list of communication data occured in this execution data.
-
inline const std::deque<CounterData_t> &get_counters() const¶
Get a list of counter events that occured in this execution data.
-
inline unsigned long get_n_message() const¶
Get the number of communication events.
-
inline unsigned long get_n_children() const¶
Get the number of childrent functions.
-
inline unsigned long get_n_counter() const¶
Get the number of counter.
-
inline double get_outlier_score() const¶
Return the outlier score assigned to the data point, representing how unlikely an event is.
-
inline double get_outlier_severity() const¶
Return the outlier severity, representing how important the outlier is.
-
inline void set_label(int label)¶
Set the label.
- Parameters
label – 1 for normal, -1 for anomaly.
-
inline void set_outlier_score(double score)¶
Set the outlier score.
-
inline void set_parent(const eventID &parent)¶
Set the parent function of this execution.
- Parameters
parent – the parent execution id
-
inline void set_funcname(const std::string &funcname)¶
Set the function name of this execution.
- Parameters
funcname – function name
-
bool update_exit(const Event_t &ev)¶
update exit event of this execution
- Parameters
ev – exit event
- Returns
true no errors
- Returns
false incorrect exit event
-
void update_exit(unsigned long exit)¶
update exit event of this execution
- Parameters
exit – timestamp
-
inline void update_exclusive(long t)¶
update exclusive running time
- Parameters
t – running time of a child function
-
inline void inc_n_children()¶
increase the number of child function by 1
-
bool add_message(const CommData_t &comm, ListEnd end = ListEnd::Back)¶
add communication data to one end of the message queue
- Parameters
comm – communication event occured in this execution
end – add to which end of the deque
- Returns
true no errors
- Returns
false invalid communication event
-
bool add_counter(const CounterData_t &count, ListEnd end = ListEnd::Back)¶
add counter data
- Parameters
counter – counter event occurred in this execution
end – add to which end of the deque
- Returns
true no errors
- Returns
false invalid communication event
-
bool is_same(const ExecData_t &other) const¶
compare with other execution
- Parameters
other – other execution data
- Returns
true if they are same
- Returns
false if they are different
-
nlohmann::json get_json(bool with_message = false, bool with_counter = false) const¶
Get the json object of this execution data.
- Parameters
with_message – if true, include all message (communication) information
with_counter – if true, include all counter information
- Returns
nlohmann::json json object
-
inline bool can_delete() const¶
Determine whether the event can be deleted by the garbage collection at the end of the io step.
-
inline void register_reference()¶
Increment the external reference counter, preventing object deletion.
-
void deregister_reference()¶
Decrement the external reference counter, allowing object deletion if 0.
-
inline unsigned long reference_count() const¶
Get the number of external references registered.
-
inline void set_GPU_correlationID_partner(const eventID &event_id)¶
Set the partner event linked by a GPU correlation ID.
-
inline bool has_GPU_correlationID_partner() const¶
Return true if this event has been matched to a partner event by a GPU correlation ID.
-
inline size_t n_GPU_correlationID_partner() const¶
Get the number of events linked by GPU correlation ID.
Private Members
-
unsigned long m_pid¶
program id
-
unsigned long m_tid¶
thread id
-
unsigned long m_rid¶
rank id
-
unsigned long m_fid¶
function id
-
long m_entry¶
entry time
-
long m_exit¶
exit time
-
long m_runtime¶
inclusive running time (i.e. including time of child calls)
-
long m_exclusive¶
exclusive running time (i.e. excluding time of child calls)
-
int m_label¶
1 for normal, -1 for abnormal execution
-
double m_score¶
Outlier score (implementation dependent)
-
unsigned long m_n_children¶
the number of childrent executions
-
unsigned long m_n_messages¶
the number of messages
-
std::deque<CommData_t> m_messages¶
a vector of all messages
-
std::deque<CounterData_t> m_counters¶
a vector of all counters
-
unsigned long m_references¶
track number of external references to object. When 0 the object can be deleted
-
ExecData_t()¶
-
class MetaData_t¶
- #include <ExecData.hpp>
wrapper for metadata entries
Public Functions
-
MetaData_t(unsigned long pid, unsigned long rid, unsigned long tid, const std::string &descr, const std::string &value)¶
Construct an instance will full set of parameters.
- Parameters
pid – Program index
rid – Rank
tid – Thread
descr – Key
descr – Value
-
inline unsigned long get_pid() const¶
Get the origin program index.
-
inline unsigned long get_rid() const¶
Get the origin global comm rank.
-
inline unsigned long get_tid() const¶
Get the origin thread index.
-
nlohmann::json get_json() const¶
Get the json object of this metadata.
- Returns
nlohmann::json json object
-
MetaData_t(unsigned long pid, unsigned long rid, unsigned long tid, const std::string &descr, const std::string &value)¶
-
class CommData_t¶
utils¶
-
template<>
struct std::hash<chimbuko::eventID>¶ - #include <utils.hpp>
Specialize std::hash for eventID type.
-
namespace chimbuko
Functions
-
unsigned char random_char()¶
Return a random character.
-
unsigned char random_char()¶
-
namespace std¶
- template<> eventID >
- #include <utils.hpp>
Specialize std::hash for eventID type.
Anomaly Detection Algorithm Parameters¶
Parameters of the anomaly detection algorithm.
ParamInterface¶
-
namespace chimbuko
-
class NetPayloadGetParams : public chimbuko::NetPayloadBase¶
- #include <param.hpp>
Net payload for AD updating params from pserver.
Public Functions
-
inline NetPayloadGetParams(ParamInterface const *param)¶
-
inline virtual MessageKind kind() const override¶
The message kind to which the payload is to be bound.
-
inline virtual MessageType type() const override¶
The message type to which the payload is to be bound.
Private Members
-
ParamInterface const *m_param¶
-
inline NetPayloadGetParams(ParamInterface const *param)¶
-
class NetPayloadUpdateParams : public chimbuko::NetPayloadBase¶
- #include <param.hpp>
Net payload for pserver updating params from AD.
Public Functions
-
inline NetPayloadUpdateParams(ParamInterface *param, bool freeze = false)¶
Construct the NetPayloadUpdateParams object.
- Parameters
param – A pointer to an instance of ParamInterface
freeze – If true the state will not be modified by the update command
-
inline virtual MessageKind kind() const override¶
The message kind to which the payload is to be bound.
-
inline virtual MessageType type() const override¶
The message type to which the payload is to be bound.
Private Members
-
ParamInterface *m_param¶
-
bool m_freeze¶
If set to true, the additional data from the AD will be ignored and the parameter state will not change
-
inline NetPayloadUpdateParams(ParamInterface *param, bool freeze = false)¶
-
class ParamInterface¶
- #include <param.hpp>
The general interface for storing function statistics for anomaly detection.
Subclassed by chimbuko::CopodParam, chimbuko::HbosParam, chimbuko::SstdParam
Public Functions
-
ParamInterface()¶
-
inline virtual ~ParamInterface()¶
-
virtual void clear() = 0¶
Clear all statistics.
-
virtual size_t size() const = 0¶
Get the number of functions for which statistics are being collected.
-
virtual std::string serialize() const = 0¶
Convert internal run statistics to string format for IO.
- Returns
Run statistics in string format
-
virtual std::string update(const std::string ¶meters, bool return_update = false) = 0¶
Update the internal run statistics with those included in the serialized input map.
Note: we combine update and serialize here in order to avoid having to acquire 2 successive mutex locks on the pserver
- Parameters
parameters – The parameters in serialized format
return_update – Indicates that the function should return a serialized copy of the updated parameters
- Returns
An empty string or a serialized copy of the updated parameters depending on return_update
-
virtual void assign(const std::string ¶meters) = 0¶
Set the internal run statistics to match those included in the serialized input map. Overwrite performed only for those keys in input.
- Parameters
runstats – The serialized input map
-
virtual nlohmann::json get_algorithm_params(const unsigned long func_id) const = 0¶
Get the algorithm parameters associated with a given function. Format is algorithm dependent.
Public Static Functions
-
static ParamInterface *set_AdParam(const std::string &ad_algorithm)¶
-
ParamInterface()¶
-
class NetPayloadGetParams : public chimbuko::NetPayloadBase¶
CopodParam¶
-
namespace chimbuko
-
class CopodParam : public chimbuko::ParamInterface¶
- #include <copod_param.hpp>
@brief Implementation of ParamInterface for COPOD based anomaly detection
Public Functions
-
CopodParam()¶
-
~CopodParam()¶
-
virtual void clear() override¶
Clear all statistics.
-
const int find(const unsigned long &func_id)¶
-
inline const std::unordered_map<unsigned long, Histogram> &get_hbosstats() const¶
Get the internal map between global function index and statistics.
-
inline virtual size_t size() const override¶
Get the number of functions for which statistics are being collected.
-
virtual std::string serialize() const override¶
Convert internal Histogram to string format for IO.
- Returns
Histogram in string format
-
virtual std::string update(const std::string ¶meters, bool return_update = false) override¶
Update the internal Histogram with those included in the serialized input map.
- Parameters
parameters – The parameters in serialized format
return_update – Controls return format
- Returns
An empty string if return_update==False, otherwise the serialized updated parameters
-
virtual void assign(const std::string ¶meters) override¶
Set the internal Histogram to match those included in the serialized input map. Overwrite performed only for those keys in input.
- Parameters
parameters – The serialized input map
-
void assign(const std::unordered_map<unsigned long, Histogram> &copodstats)¶
Set the internal Histogram to match those included in the input map. Overwrite performed only for those keys in input.
- Parameters
copodstats – The input map between global function index and Histogram
-
inline Histogram &operator[](unsigned long id)¶
Get an element of the internal map.
- Parameters
id – The global function index
-
void update(const std::unordered_map<unsigned long, Histogram> &copodstats)¶
Update the internal histogram with those included in the input map.
- Parameters
copodstats – [in] The map between global function index and histogram
-
inline void update(const CopodParam &other)¶
Update the internal Histogram with those included in another CopodParam instance.
- Parameters
other – [in] The other CopodParam instance
-
void update_and_return(std::unordered_map<unsigned long, Histogram> &copodstats)¶
Update the internal histogram with those included in the input map. Input map is then updated to reflect new state.
- Parameters
copodstats – [inout] The map between global function index and statistics
-
inline void update_and_return(CopodParam &other)¶
Update the internal histogram with those included in another CopodParam instance. Other CopodParam is then updated to reflect new state.
- Parameters
other – [inout] The other CopodParam instance
-
virtual nlohmann::json get_algorithm_params(const unsigned long func_id) const override¶
Get the algorithm parameters associated with a given function. Format is algorithm dependent.
Public Static Functions
-
static std::string serialize_cerealpb(const std::unordered_map<unsigned long, Histogram> &copodstats)¶
Convert a Histogram mapping into a Cereal portable binary representration.
-
static void deserialize_cerealpb(const std::string ¶meters, std::unordered_map<unsigned long, Histogram> &copodstats)¶
Convert a Histogram Cereal portable binary representation string into a map.
- Parameters
parameters – [in] The parameter string
copodstats – [out] The map between global function index and histogram statistics
-
CopodParam()¶
-
class CopodParam : public chimbuko::ParamInterface¶
HbosParam¶
-
namespace chimbuko
Functions
-
class HbosParam : public chimbuko::ParamInterface¶
- #include <hbos_param.hpp>
@brief Implementation of ParamInterface for HBOS based anomaly detection
Public Functions
-
HbosParam()¶
-
~HbosParam()¶
-
virtual void clear() override¶
Clear all statistics.
-
const int find(const unsigned long &func_id)¶
-
inline const std::unordered_map<unsigned long, Histogram> &get_hbosstats() const¶
Get the internal map between global function index and statistics.
-
inline virtual size_t size() const override¶
Get the number of functions for which statistics are being collected.
-
virtual std::string serialize() const override¶
Convert internal Histogram to string format for IO.
- Returns
Histogram in string format
-
virtual std::string update(const std::string ¶meters, bool return_update = false) override¶
Update the internal Histogram with those included in the serialized input map.
- Parameters
parameters – The parameters in serialized format
return_update – Controls return format
- Returns
An empty string if return_update==False, otherwise the serialized updated parameters
-
virtual void assign(const std::string ¶meters) override¶
Set the internal Histogram to match those included in the serialized input map. Overwrite performed only for those keys in input.
- Parameters
parameters – The serialized input map
-
void assign(const std::unordered_map<unsigned long, Histogram> &hbosstats)¶
Set the internal Histogram to match those included in the input map. Overwrite performed only for those keys in input.
- Parameters
hbosstats – The input map between global function index and Histogram
-
inline Histogram &operator[](unsigned long id)¶
Get an element of the internal map.
- Parameters
id – The global function index
-
void update(const std::unordered_map<unsigned long, Histogram> &hbosstats)¶
Update the internal histogram with those included in the input map.
- Parameters
hbosstats – [in] The map between global function index and histogram
-
inline void update(const HbosParam &other)¶
Update the internal Histogram with those included in another HbosParam instance.
- Parameters
other – [in] The other HbosParam instance
-
void update_and_return(std::unordered_map<unsigned long, Histogram> &hbosstats)¶
Update the internal histogram with those included in the input map. Input map is then updated to reflect new state.
- Parameters
hbosstats – [inout] The map between global function index and statistics
-
inline void update_and_return(HbosParam &other)¶
Update the internal histogram with those included in another HbosParam instance. Other HbosParam is then updated to reflect new state.
- Parameters
other – [inout] The other HbosParam instance
-
virtual nlohmann::json get_algorithm_params(const unsigned long func_id) const override¶
Get the algorithm parameters associated with a given function. Format is algorithm dependent.
Public Static Functions
-
static std::string serialize_cerealpb(const std::unordered_map<unsigned long, Histogram> &hbosstats)¶
Convert a Histogram mapping into a Cereal portable binary representration.
-
static void deserialize_cerealpb(const std::string ¶meters, std::unordered_map<unsigned long, Histogram> &hbosstats)¶
Convert a Histogram Cereal portable binary representation string into a map.
- Parameters
parameters – [in] The parameter string
hbosstats – [out] The map between global function index and histogram statistics
-
HbosParam()¶
-
class Histogram¶
- #include <hbos_param.hpp>
Histogram Implementation.
Public Functions
-
inline void clear()¶
-
void push(double x)¶
-
int create_histogram(const std::vector<double> &r_times)¶
Create new histogram locally for AD module’s batch data instances.
- Parameters
r_times – a vector<double> of function run times
- Returns
returns 0 if success, else -1
-
int merge_histograms(const Histogram &g, const std::vector<double> &runtimes)¶
merges a Histogram with function runtimes
- Parameters
g – Histogram to merge
runtimes – Function runtimes
- Returns
0 if successful, -1 if failed
-
Histogram &operator+=(const Histogram &h)¶
Combine two Histogram instances such that the resulting statistics are the union of the two Histograms.
-
inline void set_glob_threshold(const double &l)¶
set global threshold for anomaly filtering
-
inline void add2counts(const int &count)¶
-
inline void add2counts(const int &id, const int &count)¶
-
inline void add2binedges(const double &bin_edge)¶
-
inline const double &get_threshold() const¶
-
nlohmann::json get_json() const¶
Get the current statistics as a JSON object.
Public Static Functions
Private Static Functions
Friends
-
struct Data¶
- #include <hbos_param.hpp>
Data structure that stores Histogram data ( bin counts, bin edges)
Public Functions
-
inline Data()¶
Initialize histogram data.
-
inline Data(const double &g_threshold, const std::vector<int> &h_counts, const std::vector<double> &h_bin_edges)¶
Initialize histogram data with existing histogram data.
- Parameters
g_threshold – Global Threshold
h_counts – a vector<int> of histogram bin counts
h_bin_edges – a vector<double> of histogram bin edges
-
inline void clear()¶
-
inline Data()¶
-
inline void clear()¶
-
class HbosParam : public chimbuko::ParamInterface¶
SstdParam¶
-
namespace chimbuko
-
class SstdParam : public chimbuko::ParamInterface¶
- #include <sstd_param.hpp>
@brief Implementation of ParamInterface for anomaly detection based on function time distribution (mean, std. dev., etc)
Public Functions
-
SstdParam()¶
-
~SstdParam()¶
-
virtual void clear() override¶
Clear all statistics.
-
inline virtual size_t size() const override¶
Get the number of functions for which statistics are being collected.
-
virtual std::string serialize() const override¶
Convert internal run statistics to string format for IO.
- Returns
Run statistics in string format
-
virtual std::string update(const std::string ¶meters, bool return_update = false) override¶
Update the internal run statistics with those included in the serialized input map.
- Parameters
parameters – The parameters in serialized format
return_update – Indicates that the function should return a serialized copy of the updated parameters
- Returns
An empty string or a serialized copy of the updated parameters depending on return_update
-
virtual void assign(const std::string ¶meters) override¶
Set the internal run statistics to match those included in the serialized input map. Overwrite performed only for those keys in input.
- Parameters
runstats – The serialized input map
-
void update(const std::unordered_map<unsigned long, RunStats> &runstats)¶
Update the internal run statistics with those included in the input map.
- Parameters
runstats – [in] The map between global function index and statistics
-
inline void update(const SstdParam &other)¶
Update the internal statistics with those included in another SstdParam instance.
- Parameters
other – [in] The other SstdParam instance
-
void update_and_return(std::unordered_map<unsigned long, RunStats> &runstats)¶
Update the internal run statistics with those included in the input map. Input map is then updated to reflect new state.
- Parameters
runstats – [inout] The map between global function index and statistics
-
inline void update_and_return(SstdParam &other)¶
Update the internal statistics with those included in another SstdParam instance. Other SstdParam is then updated to reflect new state.
- Parameters
other – [inout] The other SstdParam instance
-
void assign(const std::unordered_map<unsigned long, RunStats> &runstats)¶
Set the internal run statistics to match those included in the input map. Overwrite performed only for those keys in input.
- Parameters
runstats – The input map between global function index and statistics
-
inline RunStats &operator[](unsigned long id)¶
Get an element of the internal map.
- Parameters
id – The global function index
-
inline const std::unordered_map<unsigned long, RunStats> &get_runstats() const¶
Get the internal map between global function index and statistics.
-
virtual nlohmann::json get_algorithm_params(const unsigned long func_id) const override¶
Get the algorithm parameters associated with a given function.
Public Static Functions
-
static std::string serialize_json(const std::unordered_map<unsigned long, RunStats> &runstats)¶
Convert a run statistics mapping into a JSON string.
- Parameters
The – map between global function index and statistics
- Returns
Run statistics in string format
-
static void deserialize_json(const std::string ¶meters, std::unordered_map<unsigned long, RunStats> &runstats)¶
Convert a run statistics JSON string into a map.
- Parameters
parameters – [in] The parameter string
runstats – [out] The map between global function index and statistics
-
static std::string serialize_cerealpb(const std::unordered_map<unsigned long, RunStats> &runstats)¶
Convert a run statistics mapping into a Cereal portable binary representration.
- Parameters
The – run stats mapping
- Returns
Run statistics in string format
-
static void deserialize_cerealpb(const std::string ¶meters, std::unordered_map<unsigned long, RunStats> &runstats)¶
Convert a run statistics Cereal portable binary representation string into a map.
- Parameters
parameters – [in] The parameter string
runstats – [out] The map between global function index and statistics
Protected Functions
-
SstdParam()¶
-
class SstdParam : public chimbuko::ParamInterface¶
Parameter Server¶
The parameter server runs on the head node and aggregates function anomaly and counter statistics for visualization. Aggregated statistics for function executions are also maintained and synchronized back to the AD instances such that the anomaly detection algorithm uses the most complete statistics to identify anomalies.
AnomalyStat¶
Warning
doxygenfile: Cannot find file “AnomalyStat.hpp
global_anomaly_stats¶
Warning
doxygenfile: Cannot find file “global_anomaly_stats.hpp
global_counter_stats¶
Warning
doxygenfile: Cannot find file “global_counter_stats.hpp
PSglobalFunctionIndexMap¶
-
namespace chimbuko
-
class NetPayloadGlobalFunctionIndexMap : public chimbuko::NetPayloadBase¶
- #include <PSglobalFunctionIndexMap.hpp>
Net payload for communicating function index pserver->AD.
Public Functions
-
inline NetPayloadGlobalFunctionIndexMap(PSglobalFunctionIndexMap *idxmap)¶
-
inline virtual MessageKind kind() const override¶
The message kind to which the payload is to be bound.
-
inline virtual MessageType type() const override¶
The message type to which the payload is to be bound.
Private Members
-
PSglobalFunctionIndexMap *m_idxmap¶
-
inline NetPayloadGlobalFunctionIndexMap(PSglobalFunctionIndexMap *idxmap)¶
-
class NetPayloadGlobalFunctionIndexMapBatched : public chimbuko::NetPayloadBase¶
- #include <PSglobalFunctionIndexMap.hpp>
Net payload for communicating function index pserver->AD in batches.
Public Functions
-
inline NetPayloadGlobalFunctionIndexMapBatched(PSglobalFunctionIndexMap *idxmap)¶
-
inline virtual MessageKind kind() const override¶
The message kind to which the payload is to be bound.
-
inline virtual MessageType type() const override¶
The message type to which the payload is to be bound.
Private Members
-
PSglobalFunctionIndexMap *m_idxmap¶
-
inline NetPayloadGlobalFunctionIndexMapBatched(PSglobalFunctionIndexMap *idxmap)¶
-
class PSglobalFunctionIndexMap¶
- #include <PSglobalFunctionIndexMap.hpp>
A class that maintains a global mapping between function name and an index, which is to be synchronized over the nodes.
Public Functions
-
inline PSglobalFunctionIndexMap()¶
< Next unassigned index
-
unsigned long lookup(unsigned long pid, const std::string &func_name)¶
Lookup a function by name and return the index. A new index will be assigned if the function has not been encountered before.
- Parameters
pid – The program index
func_name – The function name
-
bool contains(unsigned long pid, const std::string &func_name) const¶
Check if the map contains the specified function.
- Parameters
pid – The program index
func_name – The function name
-
nlohmann::json serialize() const¶
Serialize the map to a JSON object.
-
void deserialize(const nlohmann::json &fmap)¶
Set the map to the contents of the JSON object.
-
inline PSglobalFunctionIndexMap()¶
-
class NetPayloadGlobalFunctionIndexMap : public chimbuko::NetPayloadBase¶
PSProvenanceDBclient¶
PSstatSender¶
-
namespace chimbuko
-
class PSstatSender¶
- #include <PSstatSender.hpp>
A class that periodically sends aggregate statistics to the visualization module via curl using a background thread.
Public Functions
-
PSstatSender(size_t send_freq = 1000)¶
Constructpr.
- Parameters
send_freq – The frequency (in milliseconds) at which sends are performed to the viz module
-
~PSstatSender()¶
-
inline void set_send_freq(const size_t freq)¶
Change the frequency (in milliseconds) at which sends are performed to the viz module. Must be set prior to calling run_stat_sender.
-
void run_stat_sender(const std::string &url, const std::string &stat_save_dir = "")¶
Start sending global anomaly stats to the visualization module (curl)
- Parameters
url – The URL of the visualization module
stat_save_dir – Optionally output the stats to disk in this directory alongside/instead of to the viz module
-
void stop_stat_sender(int wait_msec = 0)¶
Stop sending global anomaly stats to the visualization module (curl)
-
inline void add_payload(PSstatSenderPayloadBase *payload)¶
Add a payload. Takes ownership of pointer, which is freed.
-
inline bool bad() const¶
If an exception is caught in the thread loop, the thread will stop issuing sends and set this bool to true.
Private Members
-
size_t m_send_freq¶
Number of seconds between sends to viz
-
std::atomic_bool m_bad¶
If an exception is caught in the thread loop, the thread will stop issuing sends and set this bool to true
-
std::vector<PSstatSenderPayloadBase*> m_payloads¶
Vector of payload wrappers defining the sets of data sent to the parameter server
-
PSstatSender(size_t send_freq = 1000)¶
-
struct PSstatSenderPayloadBase¶
- #include <PSstatSender.hpp>
Base class for wrappers around objects/object pointers that return JSON objects that are sent to the parameter server.
The JSON objects are collected into a single object whose members are tagged according to the “tag” provided by the wrapper Nothing will be sent if the resulting JSON object is empty
Subclassed by chimbuko::PSstatSenderGlobalAnomalyMetricsPayload, chimbuko::PSstatSenderGlobalAnomalyStatsPayload, chimbuko::PSstatSenderGlobalCounterStatsPayload
Public Functions
-
virtual void add_json(nlohmann::json &into) const = 0¶
Add the JSON object payload to ‘into’ as a new member with an appropriate tag (user should ensure no duplicate tags!)
-
inline virtual bool do_fetch() const¶
Whether to request a callback to process the response (optional)
- Parameters
packet – The string packet returned by the previous call to get_json()
returned – The string returned in response
-
inline virtual void process_callback(const std::string &packet, const std::string &returned) const¶
If a callback is requested, this function is called after it is returned.
-
inline virtual ~PSstatSenderPayloadBase()¶
-
virtual void add_json(nlohmann::json &into) const = 0¶
-
class PSstatSender¶
Network¶
The network is the communication pathway between the AD instances and the parameter server. The default implementation, ZMQnet uses zeroMQ, and a deprecated interface via MPI is also provided and can be selected at compile time.
NetInterface¶
-
namespace chimbuko
-
-
class NetInterface¶
- #include <net.hpp>
Network interface class.
Subclassed by chimbuko::LocalNet, chimbuko::ZMQMENet, chimbuko::ZMQNet
Public Types
-
typedef std::unordered_map<MessageKind, std::unordered_map<MessageType, std::unique_ptr<NetPayloadBase>>> PayloadMapType¶
Map of message kind/type to payloads
-
typedef std::unordered_map<int, PayloadMapType> WorkerPayloadMapType¶
Map of worker index and message type to payloads
Public Functions
-
NetInterface()¶
Construct a new Net Interface object.
-
virtual ~NetInterface()¶
Destroy the Net Interface object.
-
virtual void init(int *argc = nullptr, char ***argv = nullptr, int nt = 1) = 0¶
(virtual) initialize network interface
- Parameters
argc – command line argc
argv – command line argv
nt – the number of threads for a thread pool
-
virtual void finalize() = 0¶
(virtual) finalize network
-
virtual void run() = 0¶
(virtual) Run network server
-
virtual void stop() = 0¶
(virtual) stop network server
-
virtual std::string name() const = 0¶
(virtual) name of network server
- Returns
std::string name of network server
-
void add_payload(NetPayloadBase *payload, int worker_idx = 0)¶
Add a payload to the receiver bound to particular message kind/type specified internally.
Assumes ownership of the NetPayloadBase object and deletes in constructor worker_idx: ZMQNet - use 0 always MPINet - use 0 always ZMQMENet - worker_idx corresponds to the endpoint thread
- Parameters
payload – The payload
worker_idx – The worker index to which the payload is bound (implementation defined, see below)
Public Static Functions
-
static void find_and_perform_action(int worker_id, Message &msg_reply, const Message &msg, const WorkerPayloadMapType &payloads)¶
Find the action associated with the given worker_id and message type and perform the action.
- Parameters
worker_id – The worker index
msg_reply – The reply message
msg – The input message
payloads – The map of worker/message type to payload
-
static void find_and_perform_action(Message &msg_reply, const Message &msg, const PayloadMapType &payloads)¶
Find the action associated with the given message type and perform the action.
- Parameters
msg_reply – The reply message
msg – The input message
payloads – The map of worker/message type to payload
Protected Functions
-
virtual void init_thread_pool(int nt) = 0¶
initialize thread pool
- Parameters
nt – the number threads in the pool
Protected Attributes
-
int m_nt¶
The number of threads in the pool
-
WorkerPayloadMapType m_payloads¶
Map of worker index (implementation defined), message kind and message type to a payload
-
typedef std::unordered_map<MessageKind, std::unordered_map<MessageType, std::unique_ptr<NetPayloadBase>>> PayloadMapType¶
-
class NetPayloadBase¶
Subclassed by chimbuko::NetPayloadGetParams, chimbuko::NetPayloadGlobalFunctionIndexMap, chimbuko::NetPayloadGlobalFunctionIndexMapBatched, chimbuko::NetPayloadHandShake, chimbuko::NetPayloadRecvCombinedADdata, chimbuko::NetPayloadUpdateAnomalyMetrics, chimbuko::NetPayloadUpdateAnomalyStats, chimbuko::NetPayloadUpdateCounterStats, chimbuko::NetPayloadUpdateParams
Public Functions
-
virtual MessageKind kind() const = 0¶
The message kind to which the payload is to be bound.
-
virtual MessageType type() const = 0¶
The message type to which the payload is to be bound.
-
virtual void action(Message &response, const Message &message) = 0¶
Act on the message and formulate a response.
-
inline void check(const Message &msg) const¶
Helper function to ensure the message is of the correct kind/type.
-
inline virtual ~NetPayloadBase()¶
-
virtual MessageKind kind() const = 0¶
-
class NetPayloadHandShake : public chimbuko::NetPayloadBase¶
- #include <net.hpp>
Default handshake response; this is bound automatically to the network.
Public Functions
-
inline virtual MessageKind kind() const override¶
The message kind to which the payload is to be bound.
-
inline virtual MessageType type() const override¶
The message type to which the payload is to be bound.
-
inline virtual MessageKind kind() const override¶
-
namespace DefaultNetInterface¶
Functions
-
NetInterface &get()¶
get default network interface for easy usages
- Returns
NetInterface& default network
-
NetInterface &get()¶
-
class NetInterface¶
MPINet¶
ZMQNet¶
-
namespace chimbuko
-
class ZMQNet : public chimbuko::NetInterface¶
- #include <zmq_net.hpp>
A network interface using ZeroMQ.
Public Types
Public Functions
-
ZMQNet()¶
-
~ZMQNet()¶
-
virtual void init(int *argc, char ***argv, int nt) override¶
(virtual) initialize network interface
- Parameters
argc – command line argc
argv – command line argv
nt – the number of threads for a thread pool
-
virtual void finalize() override¶
Finalize network.
-
virtual void run() override¶
(virtual) Run network server
-
virtual void stop() override¶
Stop network server.
-
inline virtual std::string name() const override¶
Name of network server.
- Returns
std::string name of network server
-
inline void setMaxMsgPerPollCycle(const int max_msg)¶
Set the maximum number of messages that the router thread will route front->back and back->front per poll cycle.
-
inline void setIOthreads(const int nt)¶
Set the number of IO threads used by ZeroMQ (default 1). Must be called prior to init(…)
-
inline void setPort(const int port)¶
Set the port upon which the connection is made. Must be called prior to run(..). Default 5559.
-
inline void setAutoShutdown(const bool to)¶
Set the rule for automatic shutdown once all clients have disconnected (default true)
-
inline void setTimeOut(const long time_ms)¶
Set the timeout on polling for client requests or responses from worker threads (-1 = no timeout [default])
Public Static Functions
Protected Functions
-
virtual void init_thread_pool(int nt) override¶
initialize thread pool
- Parameters
nt – the number threads in the pool
Private Functions
-
int recvAndSend(void *skFrom, void *skTo, int max_msg)¶
Route a message to/from worker thread pool.
- Parameters
skFrom – ZMQ origin socket
skTo – ZMQ destination socket
max_msg – The maximum number of messages this function will attempt to drain from the queue (including disconnect message)
- Returns
the number of messages routed
Private Members
-
void *m_context¶
ZeroMQ context pointer
-
long long m_n_requests¶
Accumulated number of RPC requests
-
std::vector<PerfStats> m_perf_thr¶
Performance monitoring for worker threads; will be combined with m_perf before write
-
int m_max_pollcyc_msg¶
Maximum number of front->back and back->front messages that will be routed per poll cycle. Too many and we risk starving a socket, too few and might hit perf issues
-
int m_io_threads¶
Set the number of IO threads used by ZeroMQ (default 1)
-
int m_clients¶
Number of connected clients
-
bool m_client_has_connected¶
At least one client has connected previously
-
int m_port¶
The port upon which the net connects
-
bool m_autoshutdown¶
The network will shutdown once all clients have disconnected
-
long m_poll_timeout¶
The timeout (in ms) after which on no activity the network with shutdown (default -1: infinite)
-
bool m_remote_stop_cmd¶
Registration of requests for server to stop issued by clients
-
ZMQNet()¶
-
class ZMQNet : public chimbuko::NetInterface¶
ZMQMENet¶
-
namespace chimbuko
-
class ZMQMENet : public chimbuko::NetInterface¶
- #include <zmqme_net.hpp>
A multi-endpoint, multi-threaded interface using ZeroMQ.
Public Functions
-
ZMQMENet()¶
-
~ZMQMENet()¶
-
virtual void init(int *argc, char ***argv, int nt) override¶
(virtual) initialize network interface
- Parameters
argc – command line argc
argv – command line argv
nt – the number of endpoint threads
-
virtual void finalize() override¶
Finalize network; blocking wait for worker threads to finish.
-
virtual void run() override¶
(virtual) Run network server
-
virtual void stop() override¶
Stop network server.
Public Static Functions
Protected Functions
-
virtual void init_thread_pool(int nt) override¶
initialize thread pool
- Parameters
nt – the number threads in the pool
Private Members
-
int m_base_port¶
Port of first endpoint
-
int m_nthread¶
Number of endpoint threads
-
std::vector<PerfStats> m_perf_thr¶
Performance monitoring for worker threads; will be combined with m_perf before write
-
std::vector<int> m_clients_thr¶
Tracker of number of connected clients, used to determine when a thread exits
-
bool m_finalized¶
Has previously been finalized
-
ZMQMENet()¶
-
class ZMQMENet : public chimbuko::NetInterface¶
Message¶
-
namespace chimbuko
Enums
-
enum MessageType¶
Enum of the message “type” or action.
Values:
-
enumerator REQ_ADD¶
-
enumerator REQ_GET¶
-
enumerator REQ_CMD¶
-
enumerator REQ_QUIT¶
-
enumerator REQ_ECHO¶
-
enumerator REP_ADD¶
-
enumerator REP_GET¶
-
enumerator REP_CMD¶
-
enumerator REP_QUIT¶
-
enumerator REP_ECHO¶
-
enumerator REQ_ADD¶
-
class Message¶
- #include <message.hpp>
A class containing a message and header that can be serialized in JSON form for communication.
Public Functions
-
void set_info(int src, int dst, int type, int kind, int frame = 0, int size = 0)¶
Set the message information (header)
- Parameters
src – source rank
dst – destination rank
type – message type
kind – message kind
frame – frame index
size – message size
-
void set_msg(const std::string &msg, bool include_head = false)¶
Set the message contents.
If ‘include_head’ is true, the string ‘msg’ will be interpreted as a serialized message and it will be unpacked into this object (use to parse received messages) If ‘include_head’ is false, the message contents will be set to ‘msg’ and the header will be set to contain the length of the string as its size entry (use to generate new messages to send)
-
void set_msg(int cmd)¶
Set the message contents to an integer; equivalent to set_msg(int_as_string, false)
-
inline int src() const¶
Get the origin rank.
-
inline int dst() const¶
Get the destination rank.
-
inline int type() const¶
Get the message type.
-
inline int kind() const¶
Get the message kind.
-
inline int size() const¶
Get the message size in bytes.
-
inline int frame() const¶
Get the message io frame (step)
-
inline void clear()¶
clear data buffer
-
class Header¶
Public Functions
-
inline Header()¶
header size in bytes
-
inline int &src()¶
source rank
- Returns
int& reference to the source rank
-
inline int src() const¶
-
inline int &dst()¶
desination rank
- Returns
int& reference to the destination rank
-
inline int dst() const¶
-
inline int &type()¶
message type
- Returns
int& reference to the message type
-
inline int type() const¶
-
inline int &kind()¶
message kind
- Returns
int& reference to the message kind
-
inline int kind() const¶
-
inline int &size()¶
message size
- Returns
int& reference to the message size
-
inline int size() const¶
-
inline int &frame()¶
message frame index
- Returns
int& reference to the message frame index
-
inline int frame() const¶
-
nlohmann::json get_json() const¶
-
void set_header(const nlohmann::json &j)¶
Private Members
-
int m_h[8]¶
header information
0: src rank 1: dst rank 2: message type 3: message kind 4: message size (except header) in bytes 5: frame index (or step index) 6: reserved 7: reserved
-
inline Header()¶
-
void set_info(int src, int dst, int type, int kind, int frame = 0, int size = 0)¶
-
enum MessageType¶
Utils¶
Utility functions and classes.
ADIOS2parseUtils¶
-
namespace chimbuko
Functions
-
std::ostream &operator<<(std::ostream &os, const mapPrint &mp)¶
ostream output of a map using mapPrint wrapper
-
template<typename T>
std::ostream &operator<<(std::ostream &os, const vecPrint<T> &mp)¶ ostream output of a vector using vecPrint wrapper
-
varBase *parseVariable(const std::string &name, const std::map<std::string, std::string> &varinfo, adios2::IO &io, adios2::Engine &eng)¶
A factory for generating varBase derived class instances that contain the data read from the input stream.
Returns a NULL ptr if the type is not supported The name/varinfo data can be obtained using the adios2::IO::AvailableVariables method
-
struct mapPrint¶
- #include <ADIOS2parseUtils.hpp>
Wrapper allowing ostream output of a string map object.
-
struct varBase¶
- #include <ADIOS2parseUtils.hpp>
Abstract interface for an object that reads, stores and outputs data or arrays of data from ADIOS2 streams.
Subclassed by chimbuko::varPOD< T >, chimbuko::varTensor< T >
-
template<typename T>
class varPOD : public chimbuko::varBase¶ - #include <ADIOS2parseUtils.hpp>
Capture POD (single-value) data.
-
template<typename T>
class varTensor : public chimbuko::varBase¶ - #include <ADIOS2parseUtils.hpp>
Capture multi-dimensional tensor data.
Public Functions
-
inline varTensor(const std::string &name, const std::vector<unsigned long> &shape, adios2::IO &io, adios2::Engine &eng)¶
-
inline virtual void get(adios2::IO &io, adios2::Engine &eng)¶
Read the variable from the ADIOS2 stream.
-
inline virtual void put(adios2::IO &io, adios2::Engine &eng)¶
Write the variable to the ADIOS2 stream.
-
inline T &operator()(const std::vector<unsigned long> &coord)¶
Get the value at given coordinate (non-const)
-
inline varTensor(const std::string &name, const std::vector<unsigned long> &shape, adios2::IO &io, adios2::Engine &eng)¶
-
template<typename T>
struct vecPrint¶ - #include <ADIOS2parseUtils.hpp>
Wrapper allowing ostream output of a vector object.
-
std::ostream &operator<<(std::ostream &os, const mapPrint &mp)¶
Anomalies¶
-
namespace chimbuko
-
class Anomalies¶
- #include <Anomalies.hpp>
A class that contains information about the anomalies captured by the AD. Also stored are a few examples of normal executions, allowing for comparison with outliers.
Public Functions
-
void insert(CallListIterator_t event, EventType type)¶
Insert a detected outlier/normal execution.
-
void insert(CallListIterator_t event, EventType type, double runtime, double hbos_score, double threshold)¶
Insert used in HBOS for test purposes.
-
void insert(CallListIterator_t event, EventType type, std::vector<double> thres_hilo_mean_std)¶
Insert used in SSTD for test purposes.
-
inline const std::unordered_map<unsigned long, std::vector<std::vector<double>>> &allHbosScores() const¶
-
const std::vector<CallListIterator_t> &funcEvents(const unsigned long func_id, EventType type) const¶
Get the outlier/normal events associated with a given function.
-
inline const std::vector<CallListIterator_t> &allEvents(EventType type) const¶
Get all outliers/normal events.
Private Members
-
std::vector<CallListIterator_t> m_all_outliers¶
Array of outliers
-
std::unordered_map<unsigned long, std::vector<CallListIterator_t>> m_func_outliers¶
Map of function index to associated outliers
-
std::vector<CallListIterator_t> m_all_normal_execs¶
Array of normal executions (the algorithm will capture a limited number of these for comparison with outliers)
-
std::unordered_map<unsigned long, std::vector<CallListIterator_t>> m_func_normal_execs¶
Map of function index to associated normal executions
-
void insert(CallListIterator_t event, EventType type)¶
-
class Anomalies¶
barrier¶
commandLineParser¶
Defines
-
_GMP_GET_TYPE(T)¶
Macros for generating the structure list needed for addOptionalArgMultiArg.
-
_GMP_GET_MEMBER_TYPE(T, NAME)¶
-
_GMP_GET_MEMBER_PTR(T, NAME)¶
-
GET_MEMBER_PTR_CON(T, NAME)¶
-
_GMP_GE_0(T, NAME)¶
-
_GMP_GE_1(T, NAME)¶
-
_GMP_GE_2(T, NAME, ...)¶
-
_GMP_GE_3(T, NAME, ...)¶
-
_GMP_GE_4(T, NAME, ...)¶
-
_GMP_GE_5(T, NAME, ...)¶
-
_GMP_GET_MACRO(_0, _1, _2, _3, _4, _5, NAME, ...)¶
-
GET_MEMBER_PTR_CONS(T, ...)¶
-
addOptionalCommandLineArg(PARSER, NAME, HELP_STR)¶
Helper macro to add an optional command line arg to the parser PARSER with given name NAME and help string HELP_STR. Option enabled by “-NAME” on command line.
-
addOptionalCommandLineArgDefaultHelpString(PARSER, NAME)¶
Helper macro to add an optional command line arg to the parser PARSER with given name NAME and default help string “Provide the value for NAME”.
-
addOptionalCommandLineArgWithFlag(PARSER, NAME, FLAGNAME, HELP_STR)¶
Helper macro to add an optional command line arg to the parser PARSER with given name NAME and help string HELP_STR. Option enabled by “-NAME” on command line. A bool field FLAGNAME will be set to true if parsed.
-
addOptionalCommandLineArgMultiValue(PARSER, ARG_NAME, HELP_STR, ...)¶
Helper macro to add an optional command line arg to the parser PARSER with argument name -${ARG_NAME} which sets multiple variables.
Supports up to 5 variables
Example usage: addOptionalCommandLineArgMultiValue(parser_instance, set_2vals, a, b) called with -set_2vals 1 2 will set the structure members a and b to 1 and 2, respectively
-
addMandatoryCommandLineArg(PARSER, NAME, HELP_STR)¶
Helper macro to add a mandatory command line arg to the parser PARSER with given name NAME and help string HELP_STR.
-
addMandatoryCommandLineArgDefaultHelpString(PARSER, NAME)¶
Helper macro to add an optional command line arg to the parser PARSER with given name NAME and default help string “Provide the value for NAME”.
-
namespace chimbuko
-
template<typename ArgsStruct>
class commandLineParser¶ - #include <commandLineParser.hpp>
The main parser class for a generic struct ArgsStruct.
Public Types
-
typedef ArgsStruct StructType¶
Public Functions
-
inline void addOptionalArg(optionalCommandLineArgBase<ArgsStruct> *arg_parser)¶
Add an optional argument parser object. Assumes ownership of pointer.
-
template<typename T, T ArgsStruct::* P>
inline void addOptionalArg(const std::string &arg, const std::string &help_str)¶ Add an optional argument with the given type, member pointer (eg &ArgsStruct::a) with provided argument (eg “-a”) and help string.
-
template<typename T, T ArgsStruct::* P, bool ArgsStruct::* Flag>
inline void addOptionalArgWithFlag(const std::string &arg, const std::string &help_str)¶ Add an optional argument with the given type, member pointer (eg &ArgsStruct::a), a bool flag member pointer (eg &ArgsStruct::got_value), with provided argument (eg “-a”) and help string.
-
template<class ...MemberPtrContainers>
inline void addOptionalArgMultiValue(const std::string &arg, const std::string &help_str)¶ Add an optional argument that has multiple associated values. Template parameters should be a list of specializations of MemberPtrContainer, e.g MemberPtrContainer<ArgsStruct, A, &ArgsStruct::a>, MemberPtrContainer<ArgsStruct, B, &ArgsStruct::b>
-
template<typename T, T ArgsStruct::* P>
inline void addMandatoryArg(const std::string &help_str)¶ Add an mandatory argument with the given type, member pointer (eg &ArgsStruct::a) and help string.
-
inline size_t nMandatoryArgs() const¶
Get the number of mandatory arguments.
-
inline void parse(ArgsStruct &into, const int narg, const char **args)¶
Parse an array of strings of length ‘narg’ into the structure.
Parsing will commence with first entry of args
-
inline void parseCmdLineArgs(ArgsStruct &into, int argc, char **argv)¶
Parse the command line arguments into the structure.
Parsing will commence with second entry of argv
Private Members
-
std::vector<std::unique_ptr<mandatoryCommandLineArgBase<ArgsStruct>>> m_man_args¶
Container for the individual mandatory arg parsers
-
std::vector<std::unique_ptr<optionalCommandLineArgBase<ArgsStruct>>> m_opt_args¶
Container for the individual optional arg parsers
-
typedef ArgsStruct StructType¶
-
template<typename ArgsStruct, typename T, T ArgsStruct::* P>
class mandatoryCommandLineArg : public chimbuko::mandatoryCommandLineArgBase<ArgsStruct>¶ - #include <commandLineParser.hpp>
A class that parses an argument of a given type into the struct.
Public Functions
-
inline mandatoryCommandLineArg(const std::string &help_str)¶
Create an instance with the provided argument and help string.
-
inline virtual bool parse(ArgsStruct &into, const std::string &val) override¶
Parse the value into the struct. Return false val is unable to be parsed.
-
inline mandatoryCommandLineArg(const std::string &help_str)¶
-
template<typename ArgsStruct>
class mandatoryCommandLineArgBase¶ - #include <commandLineParser.hpp>
Base class for mandatory arg parsing structs.
Subclassed by chimbuko::mandatoryCommandLineArg< ArgsStruct, T, P >
Public Functions
-
virtual bool parse(ArgsStruct &into, const std::string &val) = 0¶
Parse the value into the struct. Return false val is unable to be parsed.
-
virtual void help(std::ostream &os) const = 0¶
Print the help string for this argument to the ostream.
-
inline virtual ~mandatoryCommandLineArgBase()¶
-
virtual bool parse(ArgsStruct &into, const std::string &val) = 0¶
-
template<typename S, typename T, T S::* P>
struct MemberPtrContainer¶ - #include <commandLineParser.hpp>
A class containing a member function pointer.
-
template<typename ArgsStruct, typename T, T ArgsStruct::* P>
class optionalCommandLineArg : public chimbuko::optionalCommandLineArgBase<ArgsStruct>¶ - #include <commandLineParser.hpp>
A class that parses an argument of a given type into the struct.
Public Functions
-
inline optionalCommandLineArg(const std::string &arg, const std::string &help_str)¶
Create an instance with the provided argument and help string.
-
inline virtual int parse(ArgsStruct &into, const std::string &arg, const char **vals, const int vals_size) override¶
If the first string matches the internal arg string (eg “-help”), a number of strings are consumed from the array ‘vals’ and that number returned. A value of -1 indicates the argument did not match.
- Parameters
into – The output structure
vals – An array of strings
vals_size – The length of the string array
-
inline optionalCommandLineArg(const std::string &arg, const std::string &help_str)¶
-
template<typename ArgsStruct>
class optionalCommandLineArgBase¶ - #include <commandLineParser.hpp>
Base class for optional arg parsing structs.
Subclassed by chimbuko::optionalCommandLineArg< ArgsStruct, T, P >, chimbuko::optionalCommandLineArgMultiValue< ArgsStruct, MemberPtrContainers >, chimbuko::optionalCommandLineArgWithFlag< ArgsStruct, T, P, Flag >
Public Functions
-
virtual int parse(ArgsStruct &into, const std::string &arg, const char **vals, const int vals_size) = 0¶
If the first string matches the internal arg string (eg “-help”), a number of strings are consumed from the array ‘vals’ and that number returned. A value of -1 indicates the argument did not match.
- Parameters
into – The output structure
vals – An array of strings
vals_size – The length of the string array
-
virtual void help(std::ostream &os) const = 0¶
Print the help string for this argument to the ostream.
-
inline virtual ~optionalCommandLineArgBase()¶
-
virtual int parse(ArgsStruct &into, const std::string &arg, const char **vals, const int vals_size) = 0¶
-
template<typename ArgsStruct, class ...MemberPtrContainers>
class optionalCommandLineArgMultiValue : public chimbuko::optionalCommandLineArgBase<ArgsStruct>¶ - #include <commandLineParser.hpp>
A class that parses an argument of a given type into the struct with multiple values.
Public Functions
-
inline optionalCommandLineArgMultiValue(const std::string &arg, const std::string &help_str)¶
Create an instance with the provided argument and help string.
-
inline virtual int parse(ArgsStruct &into, const std::string &arg, const char **vals, const int vals_size) override¶
If the first string matches the internal arg string (eg “-help”), a number of strings are consumed from the array ‘vals’ and that number returned. A value of -1 indicates the argument did not match.
- Parameters
into – The output structure
vals – An array of strings
vals_size – The length of the string array
Public Static Attributes
-
static constexpr int NValues = std::tuple_size<std::tuple<MemberPtrContainers...>>::value¶
-
inline optionalCommandLineArgMultiValue(const std::string &arg, const std::string &help_str)¶
-
template<typename ArgsStruct, typename TheMemberPtrContainer, class ...RemainingMemberPtrContainers>
struct optionalCommandLineArgMultiValue_parse¶ - #include <commandLineParser.hpp>
Recursive template class for parsing multiple values.
Public Static Functions
-
static inline void parse(ArgsStruct &into, const char **vals)¶
-
static inline void parse(ArgsStruct &into, const char **vals)¶
-
template<typename ArgsStruct, typename TheMemberPtrContainer>
struct optionalCommandLineArgMultiValue_parse<ArgsStruct, TheMemberPtrContainer>¶ Public Static Functions
-
static inline void parse(ArgsStruct &into, const char **vals)¶
-
static inline void parse(ArgsStruct &into, const char **vals)¶
-
template<typename ArgsStruct, typename T, T ArgsStruct::* P, bool ArgsStruct::* Flag>
class optionalCommandLineArgWithFlag : public chimbuko::optionalCommandLineArgBase<ArgsStruct>¶ - #include <commandLineParser.hpp>
A class that parses an argument of a given type into the struct and sets a bool flag argument to true.
Public Functions
-
inline optionalCommandLineArgWithFlag(const std::string &arg, const std::string &help_str)¶
Create an instance with the provided argument and help string.
-
inline virtual int parse(ArgsStruct &into, const std::string &arg, const char **vals, const int vals_size) override¶
If the first string matches the internal arg string (eg “-help”), a number of strings are consumed from the array ‘vals’ and that number returned. A value of -1 indicates the argument did not match.
- Parameters
into – The output structure
vals – An array of strings
vals_size – The length of the string array
-
inline optionalCommandLineArgWithFlag(const std::string &arg, const std::string &help_str)¶
-
template<typename ArgsStruct>
DispatchQueue¶
-
namespace chimbuko
-
class DispatchQueue¶
- #include <DispatchQueue.hpp>
A class for dispatching work items over a thread pool.
Public Functions
-
DispatchQueue(std::string name, size_t thread_cnt = 1)¶
Construct an instance of class, providing a name for the instance and the number of threads.
- Parameters
name – The name of the instance
thread_cnt – The number of threads (default 1)
-
~DispatchQueue()¶
-
void dispatch(const fp_t &op)¶
Enqueue a work item (lvalue reference)
- Parameters
op – An instance of std::function<void(void)>
-
void dispatch(fp_t &&op)¶
Enqueue a work item (rvalue reference)
- Parameters
op – An instance of std::function<void(void)>
-
size_t size()¶
Return the number of outstanding work items in the queue.
Private Functions
-
void thread_handler(void)¶
-
DispatchQueue(std::string name, size_t thread_cnt = 1)¶
-
class DispatchQueue¶
error¶
-
namespace chimbuko
Functions
-
ErrorWriter &Error()¶
The global error writer instance.
-
void writeErrorTerminateHandler()¶
For fatal errors we delay writing the error to the output stream in case it is caught. This terminate handler ensures it is written.
After flushing the error the handler calls the terminateHandlerAbortAction above
-
struct ErrorWriter¶
- #include <error.hpp>
A class for writing out errors to an output stream.
Public Functions
-
ErrorWriter()¶
-
inline void setRank(const int rank)¶
Set the MPI rank. This will add the rank to the error output.
-
void recoverable(const std::string &msg, const std::string &func, const std::string &file, const unsigned long line)¶
Signal a recoverable error.
Private Functions
-
ErrorWriter()¶
-
ErrorWriter &Error()¶
hash¶
-
namespace chimbuko
-
template<typename T, size_t N>
struct ArrayHasher¶ - #include <hash.hpp>
Hash function for std::array.
-
template<typename T, size_t N>
map¶
-
namespace chimbuko
Typedefs
Functions
-
template<typename T>
T *getElemPRT(const unsigned long pid, const unsigned long rid, const unsigned long tid, std::unordered_map<unsigned long, std::unordered_map<unsigned long, std::unordered_map<unsigned long, T>>> &map)¶ Get an element from the commonly-occuring triple-depth map of process/rank/thread to element (non-const)
- Parameters
pid – The process index
rid – The rank index
tid – The thread index
map – The map
- Returns
A pointer to the element if it exists, nullptr otherwise
-
template<typename T>
T const *getElemPRT(const unsigned long pid, const unsigned long rid, const unsigned long tid, const std::unordered_map<unsigned long, std::unordered_map<unsigned long, std::unordered_map<unsigned long, T>>> &map)¶ Get an element from the commonly-occuring triple-depth map of process/rank/thread to element (const)
- Parameters
pid – The process index
rid – The rank index
tid – The thread index
map – The map
- Returns
A pointer to the element if it exists, nullptr otherwise
-
template<typename T>
std::unordered_map<unsigned long, T> *getMapPR(const unsigned long pid, const unsigned long rid, std::unordered_map<unsigned long, std::unordered_map<unsigned long, std::unordered_map<unsigned long, T>>> &map)¶ Get the map between thread and element from the commonly-occuring triple-depth map of process/rank/thread to element (non-const)
- Parameters
pid – The process index
rid – The rank index
map – The map
- Returns
A pointer to the map element if it exists, nullptr otherwise
-
template<typename T>
std::unordered_map<unsigned long, T> const *getMapPR(const unsigned long pid, const unsigned long rid, const std::unordered_map<unsigned long, std::unordered_map<unsigned long, std::unordered_map<unsigned long, T>>> &map)¶ Get the map between thread and element from the commonly-occuring triple-depth map of process/rank/thread to element (const)
- Parameters
pid – The process index
rid – The rank index
map – The map
- Returns
A pointer to the map element if it exists, nullptr otherwise
-
template<typename T>
memutils¶
-
namespace chimbuko
mtQueue¶
-
template<typename T>
class mtQueue¶ - #include <mtQueue.hpp>
A multi-threaded wrapper around FIFO queue (std::queue)
Public Functions
-
inline mtQueue()¶
-
inline ~mtQueue()¶
-
inline bool tryPop(T &out)¶
Try to obtain a value from the front of the queue.
- Parameters
out – [out] The value
- Returns
True if the value is populated, false if the queue is invalid or the queue is empty
-
inline bool waitPop(T &out)¶
Wait until the queue either has an entry or is invalidated. Value taken from front of queue.
- Parameters
out – [out] The value
- Returns
True if queue is valid, false otherwise
-
inline bool empty() const¶
Return true if the queue is empty.
-
inline void clear()¶
Remove all entries from the queue.
-
inline void invalidate()¶
Mark the queue as invalid.
-
inline bool is_valid() const¶
Check if the queue has been invalidated
-
inline size_t size() const¶
The number of entries in the queue.
-
inline mtQueue()¶
PerfStats¶
-
namespace chimbuko
-
class PerfPeriodic¶
- #include <PerfStats.hpp>
A class for storing and writing periodic data, eg memory usage, outstanding provDB requests. It stores and writes only if _PERF_METRIC is active, otherwise it does nothing.
Public Functions
-
PerfPeriodic()¶
Construct with empty path and filename (no output will be written unless these are set)
-
void setWriteLocation(const std::string &output_path, const std::string &filename)¶
Set the output path and file name.
-
void write()¶
Write the running statistics to the file. Only writes out if a path and filename have been provided. After writing, stored values are purged.
-
PerfPeriodic()¶
-
class PerfStats¶
- #include <PerfStats.hpp>
A class that maintains performance statistics of various aspects of the AD module It’s constituent functions only do anything if _PERF_METRIC flag enabled.
Public Functions
-
PerfStats()¶
Construct with empty path and filename (no output will be written unless these are set)
-
void setWriteLocation(const std::string &output_path, const std::string &filename)¶
Set the output path and file name.
-
void write() const¶
Write the running statistics to the file. Only writes out if a path and filename have been provided.
-
PerfStats()¶
-
class PerfTimer¶
- #include <PerfStats.hpp>
A timer class that only measures time if _PERF_METRIC compile flag is set.
Public Functions
-
PerfTimer(bool start_now = true)¶
-
void start()¶
(Re)start the timer
-
void pause()¶
Pause the timer.
-
void unpause()¶
Unpause the timer.
This is the same as start but it does not zero the accumulated time from previous active periods
-
double elapsed_us() const¶
Compute the elapsed time in microseconds since start/unpause plus accumulated time from previoud active periods.
-
double elapsed_ms() const¶
Compute the elapsed time in milliconds since start/unpause plus accumulated time from previoud active periods.
-
PerfTimer(bool start_now = true)¶
-
class PerfPeriodic¶
RunMetric¶
-
namespace chimbuko
-
class RunMetric¶
- #include <RunMetric.hpp>
A class containing a map of a string to its aggregated statistics, used for performance logging.
Public Functions
-
inline RunMetric()¶
-
inline ~RunMetric()¶
-
inline void add(std::string name, double val)¶
Add a value to the statistics tagged by the provided name.
A new entry in the map is created if the name has not been provided previously
-
inline RunMetric()¶
-
class RunMetric¶
RunStats¶
-
namespace chimbuko
Functions
-
class RunStats¶
- #include <RunStats.hpp>
Compute statistics in a single pass.
Computes the minimum, maximum, mean, variance, standard deviation, skewness, and kurtosis. Optionally, also computes accumulated values.
RunStats objects may also be added together and copied.
Based entirely on the C++ code by John D Cook at http://www.johndcook.com/skewness_kurtosis.html
Public Functions
-
RunStats(bool do_accumulate = false)¶
Constructor.
- Parameters
do_accumulate – If true the sum of the provided values will also be collected
-
~RunStats()¶
-
void clear()¶
Reset the statistics.
-
inline const State &get_state() const¶
Return the current set of internal variables (state) as an instance of State.
-
nlohmann::json get_json_state() const¶
Return the current set of internal variables (state) as a JSON object.
-
void set_json_state(const nlohmann::json &s)¶
Set the internal variables from a JSON object.
-
std::string get_strstate()¶
Get the current set of internal variables (state) as a JSON-formatted string.
-
void net_deserialize(const std::string &s)¶
Unserialize this class after communication over the network.
-
void push(double x)¶
Add a new value to be included in internal statistics.
-
double count() const¶
Get the number of values added to the statistics.
-
double minimum() const¶
-
double maximum() const¶
-
double accumulate() const¶
If m_do_accumulate, the accumulated sum of all values added, otherwise 0.
-
double mean() const¶
-
double variance(double ddof = 1.0) const¶
Return the variance of the data.
If ddof=1 (default) the variance will include Bessel’s correction, and represents an estimate of the population variance. If ddof=0 the variance will be the variance of the sample
-
double stddev(double ddof = 1.0) const¶
-
double skewness() const¶
-
double kurtosis() const¶
-
inline void set_do_accumulate(bool do_accumulate)¶
Set whether the sum of all values is to be maintained.
-
inline bool get_do_accumulate() const¶
Determine whether the sum of all values is to be maintained.
-
nlohmann::json get_json() const¶
Get the current statistics as a JSON object.
-
RunStatsValues get_stat_values() const¶
Get the current statistics as a RunStatsValues object.
Public Static Functions
-
static inline RunStats from_state(const State &s)¶
Create an instance of this class from a State instance.
Private Members
-
bool m_do_accumulate¶
True if the sum of the input values are maintained
Friends
- friend friend RunStats operator+ (const RunStats &a, const RunStats &b)
Combine two RunStats instances such that the resulting statistics are the union of the two.
- friend friend bool operator== (const RunStats &a, const RunStats &b)
Comparison operator.
- friend friend bool operator!= (const RunStats &a, const RunStats &b)
Negative comparison operator.
-
struct RunStatsValues¶
- #include <RunStats.hpp>
A serializable object containing the stats values.
Public Functions
-
inline bool operator==(const RunStatsValues &r) const¶
Comparison operator.
-
inline bool operator==(const RunStatsValues &r) const¶
-
struct State¶
- #include <RunStats.hpp>
Internal state of RunStats object.
Note the variables in https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance are M2,M3,M4. The mappings are provided in the comments below.
Public Functions
-
inline State()¶
-
inline State(double _count, double _eta, double _rho, double _tau, double _phi, double _min, double _max, double _acc)¶
-
inline void clear()¶
Reset state.
-
nlohmann::json get_json() const¶
Get this object as a JSON instance.
-
void set_json(const nlohmann::json &to)¶
Set this object to the values stored in the JSON instance.
-
inline State()¶
-
RunStats(bool do_accumulate = false)¶
-
class RunStats¶
string¶
-
namespace chimbuko
threadPool¶
-
class threadPool¶
- #include <threadPool.hpp>
A class maintaining a queue of tasks that are performed by a pool of threads.
Public Functions
-
inline threadPool()¶
-
inline explicit threadPool(const std::uint32_t nt)¶
Instantiate a pool of nt threads.
- Parameters
nt – The number of threads to instantiate
-
threadPool(const threadPool &rhs) = delete¶
The class is not copyable but can be moved.
-
threadPool &operator=(const threadPool &rhs) = delete¶
-
inline ~threadPool()¶
-
template<typename Func, typename ...Args>
inline auto sumit(Func &&func, Args&&... args)¶ Submit a function object and its arguments to the queue.
-
inline size_t pool_size() const¶
Return the number of threads in the pool.
-
inline size_t queue_size() const¶
Return the number of tasks in the queue.
Private Members
-
mtQueue<std::unique_ptr<IThreadTask>> m_workQueue¶
-
class IThreadTask¶
Base class of thread tasks.
Public Functions
-
IThreadTask() = default¶
-
virtual ~IThreadTask() = default¶
-
IThreadTask(const IThreadTask &rhs) = delete¶
-
IThreadTask &operator=(const IThreadTask &rhs) = delete¶
-
IThreadTask(IThreadTask &&other) = default¶
-
IThreadTask &operator=(IThreadTask &&other) = default¶
-
virtual void execute() = 0¶
Perform the task (executed by thread)
-
IThreadTask() = default¶
-
template<typename T>
class TaskFuture¶ - #include <threadPool.hpp>
A wrapper class for an std::future instance representing the result of an asynchronous operation.
Public Functions
-
inline ~TaskFuture()¶
The destructor waits for the asynchronous operation to complete before exiting.
-
TaskFuture(const TaskFuture &rhs) = delete¶
-
TaskFuture &operator=(const TaskFuture &rhs) = delete¶
-
TaskFuture(TaskFuture &&other) = default¶
-
TaskFuture &operator=(TaskFuture &&other) = default¶
-
inline auto get()¶
Wait until the asynchronous operation has completed and return the value.
-
inline ~TaskFuture()¶
-
template<typename Func>
class ThreadTask : public threadPool::IThreadTask¶ A thread task executing a functional object.
Public Functions
-
~ThreadTask() override = default¶
-
ThreadTask(const ThreadTask &rhs) = delete¶
-
ThreadTask &operator=(const ThreadTask &rhs) = delete¶
-
ThreadTask(ThreadTask &&other) = default¶
-
ThreadTask &operator=(ThreadTask &&other) = default¶
-
inline void execute() override¶
-
~ThreadTask() override = default¶
-
inline threadPool()¶
time¶
-
namespace chimbuko
Functions
-
class Timer¶
- #include <time.hpp>
A timer / stopwatch class.
Public Functions
-
Timer(bool start_now = true)¶
-
void start()¶
(Re)start the timer
-
void pause()¶
Pause the timer.
-
void unpause()¶
Unpause the timer.
This is the same as start but it does not zero the accumulated time from previous active periods
-
double elapsed_us() const¶
Compute the elapsed time in microseconds since start/unpause plus accumulated time from previoud active periods.
-
double elapsed_ms() const¶
Compute the elapsed time in milliconds since start/unpause plus accumulated time from previoud active periods.
Private Types
-
Timer(bool start_now = true)¶
-
class Timer¶
verbose¶
Defines
-
verboseStream¶
Macro for log output that appears when verbose logging is enabled.
Example usage: verboseStream << “Hello world!” << std::endl;
-
progressStream¶
Macro for log output that includes the date and time, intended for reporting progress on service components for which there is only one rank.
Example usage: progressStream << “Hello world!” << std::endl;
-
headProgressStream(rank)¶
Macro for log output that appears when either the rank is equal to the head rank or verbose logging is enabled.
- Parameters
rank – The rank of the current process Example usage: progressStream(rank) << “Hello world!” << std::endl;
-
namespace chimbuko