API
AD
The “Anomaly Detection” (AD) component of Chimbuko is deployed alongside an instance of the target application (e.g. for each MPI task) and analyzes the raw trace output provided by Tau. Using globally-aggregated statistics a local decision is made as to whether a particular function execution is anomalous and the anomaly information is forwarded to the higher level components of the tool.
chimbuko
The main interface for the AD module.
Warning
doxygenfile: Found multiple matches for file “chimbuko.hpp
ADAnomalyProvenance
-
namespace chimbuko
-
namespace modules
-
namespace performance_analysis
-
class ADAnomalyProvenance
- #include <ADAnomalyProvenance.hpp>
A class that gathers provenance data associated with a set of detected anomalies.
Public Functions
-
ADAnomalyProvenance(const ADEventIDmap &event_man)
-
void getProvenanceEntries(std::vector<nlohmann::json> &anom_event_entries, std::vector<nlohmann::json> &normal_event_entries, const ADExecDataInterface &iface, const int step, const unsigned long first_event_ts, const unsigned long last_event_ts)
Extract the json provDB entries for the anomalies and normal events from an data interface after running the AD.
- Parameters:
anom_event_entries – The provDB entries for anomalous events
normal_event_entries – The provDB entries for normal events
iface – The interface object containing references to labeled events on this io step
step – The io step
first_event_ts – The timestamp of the first event in the io step
last_event_ts – The timestamp of the last event in the io step
-
nlohmann::json getEventProvenance(const ExecData_t &call, const int step, const unsigned long first_event_ts, const unsigned long last_event_ts) const
Extract the provenance information for a specific call.
Note the minimum anomaly time for recorded data does not apply to this call
- Parameters:
call – The call
step – The io step
first_event_ts – The timestamp of the first event in the io step
last_event_ts – The timestamp of the last event in the io step
-
inline void setWindowSize(const int sz)
Set the number of events on either side of the anomaly to record in the window view (default 5)
-
inline void setMinimumAnomalyTime(const unsigned long to)
Set the minimum exclusive runtime (in microseconds) for recorded anomalies (default 0)
Anomalies with exclusive runtime less than this will not have their data recorded
-
inline void linkAlgorithmParams(ParamInterface const *algo_params)
Link the algorithm parameters to have algorithm information included in the output.
-
inline void linkMonitoring(ADMonitoring const *monitoring)
Link the monitoring data to have node state information included in the output.
-
inline void linkMetadata(ADMetadataParser const *metadata)
Link the metadata owner to have metadata (GPU information, hostname, etc) included in the output.
This is also necessary to have Chimbuko track back the host-side parent of a GPU kernel
-
inline void linkEventManager(ADEventIDmap const *event_man)
Link the event manager. The event manager pointer is set in the constructor but this function allows it to be changed.
Private Functions
-
void getStackInformation(nlohmann::json &into, const ExecData_t &call) const
Get the call stack.
-
void getWindowCounters(nlohmann::json &into, const ExecData_t &call) const
Get counters in execution window.
-
void getGPUeventInfo(nlohmann::json &into, const ExecData_t &call) const
Determine if it is a GPU event, and if so get the context.
Requires metadata manager to be linked
-
void getExecutionWindow(nlohmann::json &into, const ExecData_t &call) const
Get the execution window.
-
void getNodeState(nlohmann::json &into) const
Get the node state data.
Requires the monitoring class to be linked
-
void getHostname(nlohmann::json &into) const
Get the hostname metadata.
Requires metadata manager to be linked
-
void getAlgorithmParams(nlohmann::json &into, const ExecData_t &call) const
Get the algorithm parameters.
Requires the algorithm parameters manager to be linked
Private Members
-
ADEventIDmap const *m_event_man
Contains the map between event index and event
-
ADMetadataParser const *m_metadata
Contains metadata information
-
int m_window_size
The number of events either side of the anomaly to capture in the window
-
ADMonitoring const *m_monitoring
Node state information from TAU’s monitoring plugin
-
ParamInterface const *m_algo_params
The algorithm parameters
-
unsigned long m_min_anom_time
Anomalies with exclusive runtime less than this will not have their data recorded
-
ADNormalEventProvenance m_normalevents
Maintain information on a selection of normal events
-
ADAnomalyProvenance(const ADEventIDmap &event_man)
-
class ADAnomalyProvenance
-
namespace performance_analysis
-
namespace modules
ADCounter
-
namespace chimbuko
-
namespace modules
-
namespace performance_analysis
Typedefs
-
typedef std::list<CounterData_t> CounterDataList_t
-
typedef CounterDataList_t::iterator CounterDataListIterator_t
-
typedef std::map<unsigned long, std::list<CounterDataListIterator_t>> CounterTimeStamps_t
-
typedef std::map<unsigned long, std::list<CounterDataListIterator_t>> CountersByIndex_t
-
typedef mapPRT<CounterDataList_t> CounterDataListMap_p_t
map of process, rank, thread -> CounterDataList_t
-
typedef mapPRT<CounterTimeStamps_t> CounterTimeStampMap_p_t
map of process, rank, thread -> CounterTimeStamps_t
-
class ADCounter
- #include <ADCounter.hpp>
A class that stores counter events.
Public Functions
-
inline ADCounter()
-
inline ~ADCounter()
-
inline void linkCounterMap(const std::unordered_map<int, std::string> *m)
pass in the pointer to the mapping of counter index to counter description
- Parameters:
m – hash map to counter descriptions
-
void addCounter(const Event_t &event)
Insert a new counter.
- Parameters:
event – Event_t wrapper around the counter data
-
void addCounter(const CounterData_t &cdata)
Insert a new counter in CounterData_t form.
This function does not require the counter index->name map to be linked, but if it is a consistency check will be performed
- Parameters:
cdata – CounterData_t instance
-
inline CounterDataListMap_p_t const *getCounters() const
Return all counters collected in the timestep.
-
CounterDataListMap_p_t *flushCounters()
Return all counters and clear internal state.
- Returns:
A pointer to a list of counters (should be deleted externally)
-
std::list<CounterDataListIterator_t> getCountersInWindow(const unsigned long pid, const unsigned long rid, const unsigned long tid, const unsigned long t_start, const unsigned long t_end) const
Get counters for a particular process/rank/thread that were recorded in the window (t_start, t_end) [inclusive].
-
inline const CountersByIndex_t &getCountersByIndex() const
Get the map of counters by index.
Private Members
-
CounterDataListMap_p_t *m_counters
process/rank/thread -> List of counters
-
CounterTimeStampMap_p_t m_timestampCounterMap
process/rank/thread -> Ordered map of timestamp to counter list iterator (flushed with flushCounters)
-
CountersByIndex_t m_countersByIdx
Counter index -> all instances of this counter in the timestep (flushed with flushCounters)
-
inline ADCounter()
-
typedef std::list<CounterData_t> CounterDataList_t
-
namespace performance_analysis
-
namespace modules
ADDefine
Details.
Defines
-
IDX_P
index of program id
-
IDX_R
index of rank id
-
IDX_T
index of thread id
-
IDX_E
index of event (entry/exit/send/recv) id
-
FUNC_EVENT_DIM
dimension of a function (timer) event vector
-
FUNC_IDX_F
index of function (timer) id
-
FUNC_IDX_TS
index of timestamp in function (timer) event
-
COMM_EVENT_DIM
dimension of a communication event vector
-
COMM_IDX_TAG
index of communication tag
-
COMM_IDX_PARTNER
index of communication partner
-
COMM_IDX_BYTES
index of communication size (in bytes)
-
COMM_IDX_TS
index of communication timestamp
-
COUNTER_EVENT_DIM
dimension of a counter event vector
-
COUNTER_IDX_ID
index of counter idx
-
COUNTER_IDX_VALUE
index of counter value
-
COUNTER_IDX_TS
index of counter timestamp
-
MAX_RUNTIME
maximum execution time of a function (or a timer)
-
IO_VERSION
IO version number (deprecated)
-
namespace chimbuko
-
namespace modules
-
namespace performance_analysis
Enums
-
enum class ParserError
Error kinds of the ADParser class
Values:
-
enumerator OK
OK (no error)
-
enumerator NoFuncData
Failed to fetch function data
-
enumerator NoCommData
Failed to fetch communication data
-
enumerator NoCountData
Failed to fetch counter data
-
enumerator OK
-
enum class EventError
Error kinds of the ADEvent class.
Values:
-
enumerator OK
OK (no error)
-
enumerator UnknownEvent
unknown event error
-
enumerator UnknownFunc
unknown function (timer) error
-
enumerator CallStackViolation
call stack violoation error
-
enumerator EmptyCallStack
empty call stack error (i.e. exit before entry )
-
enumerator OK
-
enum class IOMode
I/O mode of the ADio class.
Values:
-
enumerator Off
no I/O
-
enumerator Offline
offline mode, dump to files
-
enumerator Online
online mode, stream data
-
enumerator Both
both, dump to files and stream it
-
enumerator Off
-
enum class IOOpenMode
I/O open mode of the ADio class.
Values:
-
enumerator Read
Read
-
enumerator Write
Write
-
enumerator Read
-
enum class ParserError
-
namespace performance_analysis
-
namespace modules
ADEvent
-
namespace chimbuko
-
namespace modules
-
namespace performance_analysis
Typedefs
-
typedef std::stack<CommData_t> CommStack_t
a stack of CommData_t
-
typedef mapPRT<CommStack_t> CommStackMap_p_t
map of process, rank, thread -> Commstack_t
-
typedef std::stack<CounterData_t> CounterStack_t
a stack of CounterData_t
-
typedef mapPRT<CounterStack_t> CounterStackMap_p_t
map of process, rank, thread -> Counterstack_t
-
typedef std::list<ExecData_t> CallList_t
list of function calls (ExecData_t) in entry time order
-
typedef CallList_t::iterator CallListIterator_t
iterator of CallList_t
-
typedef mapPRT<CallList_t> CallListMap_p_t
map of process, rank, thread -> CallList_t
-
typedef std::stack<CallListIterator_t> CallStack_t
function call stack
-
typedef mapPRT<CallStack_t> CallStackMap_p_t
map of process, rank, thread -> CallStack_t
-
typedef std::unordered_map<unsigned long, std::vector<CallListIterator_t>> ExecDataMap_t
hash map of a collection of ExecData_t per function
key is function id and value is a vector of CallListIterator_t (i.e. ExecData_t)
-
struct EventInfo
- #include <ADEvent.hpp>
A type that stores some information about an event whose data may have been deleted.
Public Functions
-
inline EventInfo()
-
inline EventInfo(const ExecData_t &e, int entry_or_exit)
Create from an Event_t.
- Parameters:
entry_or_exit – 0:entry 1:exit
-
inline EventInfo()
-
class ADEventIDmap
- #include <ADEvent.hpp>
An abstract interface for obtaining events given an event index.
Subclassed by chimbuko::modules::performance_analysis::ADEvent
Public Functions
-
virtual CallListIterator_t getCallData(const eventID &event_id) const = 0
Get an iterator to an ExecData_t instance with given event index string.
throws a runtime error if the call is not present in the call-list
-
virtual std::pair<CallListIterator_t, CallListIterator_t> getCallWindowStartEnd(const eventID &event_id, const int win_size) const = 0
Get a pair of iterators marking the start and one-past-the-end of a window of size (up to) win_size events on either size around the given event occurring on the same thread.
-
inline virtual ~ADEventIDmap()
-
virtual CallListIterator_t getCallData(const eventID &event_id) const = 0
-
class ADEvent : public chimbuko::modules::performance_analysis::ADEventIDmap
- #include <ADEvent.hpp>
Event manager whose role is to correlate function entry and exit events and associate other counters with the function call.
When a function call with ENTRY signature is inserted, the event is placed on the call stack for that thread. Events associated with MPI comms and counters are also placed on their respective stacks. When a function call with EXIT signature on the same thread is inserted, a complete call is generated and placed in the call list, and all comm and counter events on their stacks are associated with that call.
Public Functions
-
ADEvent(bool verbose = false)
Construct a new ADEvent object.
- Parameters:
verbose – true to print out detailed information (useful for debug)
-
inline void linkEventType(const std::unordered_map<int, std::string> *m)
copy a pointer that is externally defined even type object
- Parameters:
m – event type object (hash map)
-
inline void linkFuncMap(const std::unordered_map<int, std::string> *m)
copy a pointer that is externally defined function map object
- Parameters:
m – function map object
-
inline void linkCounterMap(const std::unordered_map<int, std::string> *m)
copy a pointer that is externally defined function map object
- Parameters:
m – counter map object
-
inline void linkGPUthreadMap(const std::unordered_map<unsigned long, GPUvirtualThreadInfo> *m)
Optional: give the event manager knowledge of which threads are GPU threads, improves error checking.
-
inline const std::unordered_map<int, std::string> *getFuncMap() const
Get the Func Map object.
- Returns:
const std::unordered_map<int, std::string>* pointer to function map object
-
inline const std::unordered_map<int, std::string> *getEventType() const
Get the Event Type object.
- Returns:
const std::unordered_map<int, std::string>* pointer to event type object
-
inline const std::unordered_map<int, std::string> *getCounterMap() const
Get the Counter name object.
- Returns:
const std::unordered_map<int, std::string>* pointer to counter name object
-
inline const ExecDataMap_t *getExecDataMap() const
Get the Exec Data Map object ( map of function id -> vector of iterators to ExecData objects )
- Returns:
const ExecDataMap_t* pointer to ExecDataMap_t object
-
inline const CallListMap_p_t *getCallListMap() const
Get the Call List Map object ( map of pid/rid/tid -> list of ExecData objects )
- Returns:
const CallListMap_p_t* pointer to CallListMap_p_t object
-
inline CallListMap_p_t &getCallListMap()
Get the Call List Map object.
- Returns:
CallListMap_p_t& pointer to CallListMap_p_t object
-
virtual CallListIterator_t getCallData(const eventID &event_id) const override
Get an iterator to an ExecData_t instance with given event index string.
throws a runtime error if the call is not present in the call-list
-
virtual std::pair<CallListIterator_t, CallListIterator_t> getCallWindowStartEnd(const eventID &event_id, const int win_size) const override
Get a pair of iterators marking the start and one-past-the-end of a window of size (up to) win_size events on either size around the given event occurring on the same thread.
-
std::vector<CallListIterator_t> getCallStack(const eventID &event_id) const
Get the call stack for a specific function.
- Parameters:
event_id – The index of the event
- Returns:
the call stack starting from the provided event (first entry) back to the root event (last entry)
-
void clear()
Clear all data members.
-
EventError addEvent(const Event_t &event)
add an event
- Parameters:
event – function or communication event
- Returns:
EventError event error code
-
EventError addFunc(const Event_t &event)
add a function event
- Parameters:
event – function event
- Returns:
EventError event error code
-
EventError addComm(const Event_t &event)
add a communication event
- Parameters:
event – communication event
- Returns:
EventError event error code
-
EventError addCounter(const Event_t &event)
add a counter event
- Parameters:
event – counter event
- Returns:
EventError event error code
-
CallListIterator_t addCall(const ExecData_t &exec)
Add a complete function call, primarily for testing.
- Parameters:
exec – Instance of ExecData_t
- Returns:
Iterator to inserted call
-
CallListMap_p_t *trimCallList(int n_keep_thread = 0)
trim out all function calls that are completed (i.e. a pair of ENTRY and EXIT events are observed)
- Parameters:
n_keep_thread – The amount of events per thread to maintain [if they exist] (allows window view to extend into previous io step)
- Returns:
CallListMap_p_t* trimed function calls
-
size_t getCallListSize() const
Get the total number of function events in the call list over all pid/rid/tid.
-
void purgeCallList(int n_keep_thread = 0, purgeReport *report = nullptr)
purge all function calls that are completed (i.e. a pair of ENTRY and EXIT events are observed)
Functionality is the same as trimCallList only it doesn’t return the trimmed function calls
- Parameters:
n_keep_thread – The amount of events per thread to maintain [if they exist] (allows window view to extend into previous io step)
report – If non-null, information on the number of events purged/maintained will be recorded
-
void show_status(bool verbose = false) const
show current call stack tree status
- Parameters:
verbose – true to see all details
-
inline const std::unordered_map<unsigned long, CallListIterator_t> &getUnmatchCorrelationIDevents() const
Get the map of correlation ID to event for those events that have yet to be partnered.
Private Functions
-
void checkAndMatchCorrelationID(CallListIterator_t it)
Check if the event has a correlation ID counter, if so try to match it to an outstanding unmatched event with a correlation ID.
-
void stackProtectGC(CallListIterator_t it)
Flag the call and all it’s parental line such that they are protected from deletion by the garbage collection.
-
void stackUnProtectGC(CallListIterator_t it)
Flag the call and all it’s parental line such that they are not protected from deletion by the garbage collection, stopping if a call with an unmatched correlation ID is encountered.
Private Members
-
const std::unordered_map<int, std::string> *m_funcMap
pointer to map of function index to function name
-
const std::unordered_map<int, std::string> *m_eventType
pointer to map of event index to event type string
-
const std::unordered_map<int, std::string> *m_counterMap
pointer to map of counter index to counter name string
-
int m_eidx_func_entry
If previously seen, the eid corresponding to the function entry event (-1 otherwise)
-
int m_eidx_func_exit
If previously seen, the eid corresponding to the function exit event (-1 otherwise)
-
int m_eidx_comm_send
If previously seen, the eid corresponding to the comm send event (-1 otherwise)
-
int m_eidx_comm_recv
If previously seen, the eid corresponding to the comm recv event (-1 otherwise)
-
int m_cidx_corr_id
If previously seen, the counter index corresponding to the Correlation ID counter (-1 otherwise)
-
const std::unordered_map<unsigned long, GPUvirtualThreadInfo> *m_gpu_thread_Map
Optional: give the event manager knowledge of which threads are GPU threads, improves error checking.
-
CommStackMap_p_t m_commStack
communication event stack. Once a function call has exited, all comms events are associated with that call and the stack is cleared
-
CounterStackMap_p_t m_counterStack
map of process,rank,thread to counter events. Once a function call has exited, all counter events are associated with that call and the stack is cleaned.
-
CallStackMap_p_t m_callStack
map of process,rank,thread to the current function call stack. As functions exit they are popped from the stack
-
CallListMap_p_t m_callList
map of process,rank,thread to a list of ExecData_t objects which contain entry/exit timestamps for function calls
In practise the call list is purged of completed events each IO step through calls to trimCallList unless those elements are marked as non-deletable
-
ExecDataMap_t m_execDataMap
map of function index to an array of complete calls to this function during this IO step
In practise, labeled events are cleared every IO step by calls to purgeCallList
-
std::unordered_map<eventID, CallListIterator_t> m_callIDMap
map of call event index string to the event
Completed calls are removed from this list every IO step by calls to trimCallList
-
std::unordered_map<unsigned long, CallListIterator_t> m_unmatchedCorrelationID
Events with unmatched correlation IDs.
Events that correspond to GPU kernel launches and executions are given correlation IDs as counters that allow us to match the CPU thread that launched them to the GPU kernel event
-
std::unordered_set<std::string> m_ignoreCorrelationID
List of function names for which correlation IDs are ignored.
-
std::unordered_set<eventID> m_stackLockedUnlabeled
Set of events that have been stack locked because they weren’t able to be labeled.
-
bool m_verbose
verbose
-
struct purgeReport
- #include <ADEvent.hpp>
Structure for recording information about purged events.
Public Members
-
size_t n_purged
Number of function calls purged
-
size_t n_kept_protected
Number of calls maintained because they have been protected
-
size_t n_kept_incomplete
Number of calls maintained because they have not yet completed
-
size_t n_kept_window
Number of calls maintained because they may be needed for provenance window capture on next io step
-
size_t n_kept_unlabeled
Number of calls maintained because they have not yet been labeled
-
size_t n_purged
-
ADEvent(bool verbose = false)
-
typedef std::stack<CommData_t> CommStack_t
-
namespace performance_analysis
-
namespace modules
ADglobalFunctionIndexMap
-
namespace chimbuko
-
namespace modules
-
namespace performance_analysis
-
class ADglobalFunctionIndexMap
- #include <ADglobalFunctionIndexMap.hpp>
A class that maintains a mapping of a local function index to a global function index that is specified by the parameter server.
If the parameter server is not connected it will simply return the local index
Public Functions
-
inline ADglobalFunctionIndexMap(unsigned long pid, ADNetClient *net_client = nullptr)
Class constructor.
If a pointer to the net client is not provided the local index will not be synchronized betwee nodes
- Parameters:
pid – The program index
A – pointer to the ADNetClient
-
inline bool connectedToPS() const
Check if the pserver is connected.
-
inline void linkNetClient(ADNetClient *net_client)
Link the net client.
-
unsigned long lookup(const unsigned long local_idx, const std::string &func_name)
Lookup the global index corresponding to the input local index.
Function names must be unique
-
std::vector<unsigned long> lookup(const std::vector<unsigned long> &local_idx, const std::vector<std::string> &func_name)
Lookup the global indices corresponding to the input local indices as a batch.
Function names must be unique
-
unsigned long lookup(const unsigned long local_idx) const
Lookup the global index corresponding to the input local index (const version; throws if not already present)
-
inline ADNetClient *getNetClient()
Return a pointer to the net client.
Private Members
-
ADNetClient *m_net_client
-
std::unordered_map<unsigned long, unsigned long> m_idxmap
Map of local function index to global function index
-
unsigned long m_pid
Program index
-
inline ADglobalFunctionIndexMap(unsigned long pid, ADNetClient *net_client = nullptr)
-
class ADglobalFunctionIndexMap
-
namespace performance_analysis
-
namespace modules
ADio
-
namespace chimbuko
Enums
-
class ADio
- #include <ADio.hpp>
A class that manages communication of JSON-formatted data to disk.
Public Functions
-
ADio(unsigned long program_idx, int rank)
Constructor.
- Parameters:
program_idx – The program index
rank – MPI rank
-
~ADio()
-
inline void setProgramIdx(unsigned long pid)
Set the MPI rank of the current process.
-
inline unsigned long getProgramIdx() const
@ brief Get the program idx
-
inline void setRank(int rank)
Set the MPI rank of the current process.
-
inline int getRank() const
Get the MPI rank of the current process.
-
void setOutputPath(std::string path)
For disk output, provide the write path.
A zero length string will disable disk IO
-
void setDispatcher(std::string name = "ioDispatcher", size_t thread_cnt = 1)
If a DispatchQueue instance has not previously been created, create an instance with the parameters provided.
-
inline size_t getNumIOJobs() const
Get the number of threads performing the IO.
-
IOError writeJSON(const std::vector<nlohmann::json> &data, long long step, const std::string &file_stub)
Write an array of JSON objects.
- Parameters:
file_stub – File will be ${file_stub}.${step}.json
-
inline void setDestructorThreadWaitTime(const int secs)
Set the amount of time between completion of thread dispatcher tasks and destruction of the dispatcher in the class destructor.
- Parameters:
secs – The time in seconds
Private Members
-
DispatchQueue *m_dispatcher
Instance of multi-threaded writer
-
unsigned long m_program_idx
Program index
-
int m_rank
The MPI rank of the current process
-
int destructor_thread_waittime
Choose thread wait time in seconds after threadhandler has completed (default 10s)
-
ADio(unsigned long program_idx, int rank)
-
class ADio
ADLocalCounterStatistics
-
namespace chimbuko
-
namespace modules
-
namespace performance_analysis
-
class ADLocalCounterStatistics
- #include <ADLocalCounterStatistics.hpp>
A class that gathers local counter statistics and communicates them to the parameter server.
Public Functions
-
inline ADLocalCounterStatistics(const unsigned long program_idx, const int step, const std::unordered_set<std::string> *which_counters, PerfStats *perf = nullptr)
Constructor.
- Parameters:
program_idx – The program index
step – The io step
which_counters – Pointer to a set of counters that will be collected (not all might appear in any given run). Use nullptr to collect all.
perf – Attach a PerfStats object into which performance metrics are accumulated
-
inline ADLocalCounterStatistics()
-
void gatherStatistics(const CountersByIndex_t &cntrs_by_idx)
Add counters to internal statistics.
-
std::pair<size_t, size_t> updateGlobalStatistics(ADThreadNetClient &net_client) const
update (send) counter statistics gathered during this io step to the connected parameter server
The message string is the output of net_serialize()
- Parameters:
net_client – The network client object
- Returns:
std::pair<size_t, size_t> [sent, recv] message size
-
inline void linkPerf(PerfStats *perf)
Attach a PerfStats object into which performance metrics are accumulated.
-
inline const std::unordered_map<std::string, RunStats> &getStats() const
Get the map of counter name to statistics.
-
nlohmann::json get_json() const
Get object in the JSON format.
-
void deserialize_cerealpb(const std::string &strstate)
Serialize from Cereal portable binary format
-
void net_deserialize(const std::string &s)
Unserialize this class after communication over the network.
-
void setStats(const std::string &counter, const RunStats &to)
Set the statistics for a particular counter (must be in the list of counters being collected). Primarily used for testing.
-
inline unsigned long getProgramIndex() const
Get the program index.
-
inline void setProgramIndex(unsigned long to)
Get the program index.
-
inline int getIOstep() const
Get the IO step.
-
inline void setIOstep(int to)
Set the IO step.
-
inline bool operator==(const ADLocalCounterStatistics &r) const
Comparison operator.
-
inline bool operator!=(const ADLocalCounterStatistics &r) const
Inequality operator.
Protected Attributes
-
unsigned long m_program_idx
Program idx
-
int m_step
io step
Protected Static Functions
-
static std::pair<size_t, size_t> updateGlobalStatistics(ADThreadNetClient &net_client, const std::string &l_stats, int step)
update (send) counter statistics gathered during this io step to the connected parameter server
- Parameters:
net_client – The network client object
l_stats – local statistics
step – step (or frame) number
- Returns:
std::pair<size_t, size_t> [sent, recv] message size
-
inline ADLocalCounterStatistics(const unsigned long program_idx, const int step, const std::unordered_set<std::string> *which_counters, PerfStats *perf = nullptr)
-
class ADLocalCounterStatistics
-
namespace performance_analysis
-
namespace modules
ADLocalFuncStatistics
-
namespace chimbuko
-
namespace modules
-
namespace performance_analysis
-
class ADLocalFuncStatistics
- #include <ADLocalFuncStatistics.hpp>
A class that gathers local function statistics and communicates them to the parameter server.
Public Functions
-
inline ADLocalFuncStatistics()
-
ADLocalFuncStatistics(const unsigned long program_idx, const unsigned long rank, const int step, PerfStats *perf = nullptr)
-
void gatherStatistics(const ExecDataMap_t *exec_data)
Add function executions to internal statistics.
-
void gatherAnomalies(const ADExecDataInterface &iface)
Add anomalies to internal statistics.
-
std::pair<size_t, size_t> updateGlobalStatistics(ADThreadNetClient &net_client) const
update (send) function statistics (#anomalies, incl/excl run times) gathered during this io step to the connected parameter server
The message communicated is the string dump of the output of get_json_state()
- Parameters:
net_client – The network client object
- Returns:
std::pair<size_t, size_t> [sent, recv] message size
-
nlohmann::json get_json() const
Output the contents of this object in JSON format.
-
inline void linkPerf(PerfStats *perf)
Attach a RunMetric object into which performance metrics are accumulated.
-
inline const AnomalyData &getAnomalyData() const
Access the AnomalyData instance.
-
inline void setAnomalyData(const AnomalyData &to)
Set the AnomalyData member.
-
inline const std::unordered_map<unsigned long, FuncStats> &getFuncStats() const
Access the function profile statistics.
-
inline void setFuncStats(const std::unordered_map<unsigned long, FuncStats> &to)
Set the function profile statistics.
-
void deserialize_cerealpb(const std::string &strstate)
Serialize from Cereal portable binary format
-
void net_deserialize(const std::string &s)
Unserialize this class after communication over the network.
-
inline bool operator==(const ADLocalFuncStatistics &r) const
Comparison operator.
-
inline bool operator!=(const ADLocalFuncStatistics &r) const
Inequality operator.
Protected Attributes
-
AnomalyData m_anom_data
AnomalyData instance holding information about the anomalies
Protected Static Functions
-
static std::pair<size_t, size_t> updateGlobalStatistics(ADThreadNetClient &net_client, const std::string &l_stats, int step)
update (send) function statistics (#anomalies, incl/excl run times) gathered during this io step to the connected parameter server
- Parameters:
net_client – The network client object
l_stats – local statistics
step – step (or frame) number
- Returns:
std::pair<size_t, size_t> [sent, recv] message size
-
inline ADLocalFuncStatistics()
-
class ADLocalFuncStatistics
-
namespace performance_analysis
-
namespace modules
ADMetadataParser
-
namespace chimbuko
-
namespace modules
-
namespace performance_analysis
-
struct GPUvirtualThreadInfo
- #include <ADMetadataParser.hpp>
Structure containing the CUDA device/context/stream associated with a given virtual thread index.
-
class ADMetadataParser
- #include <ADMetadataParser.hpp>
A class that parses and maintains useful metadata.
Public Functions
-
inline ADMetadataParser()
-
void addData(const std::vector<MetaData_t> &new_metadata)
Add new metadata collected during this timeframe.
-
inline const std::unordered_map<unsigned long, GPUvirtualThreadInfo> &getGPUthreadMap() const
-
inline bool isGPUthread(const unsigned long thr) const
-
const GPUvirtualThreadInfo &getGPUthreadInfo(const unsigned long thread) const
Return the thread info struct for this thread. Throws an error if an invalid thread.
Private Functions
-
void parseMetadata(const MetaData_t &m)
Parse an individual metadata entry.
-
inline ADMetadataParser()
-
struct GPUvirtualThreadInfo
-
namespace performance_analysis
-
namespace modules
ADNetClient
-
namespace chimbuko
-
class ADNetClient
- #include <ADNetClient.hpp>
A wrapper class to facilitate communications between the AD and the parameter server.
Subclassed by chimbuko::ADLocalNetClient, chimbuko::ADThreadNetClient, chimbuko::ADZMQNetClient
Public Functions
-
ADNetClient()
-
virtual ~ADNetClient()
-
inline bool use_ps() const
check if the parameter server is in use
- Returns:
true if the parameter server is in use
- Returns:
false if the parameter server is not in use
-
virtual void connect_ps(int rank, int srank = 0, std::string sname = "MPINET") = 0
connect to the parameter server
- Parameters:
rank – this process rank
srank – server process rank. If using ZMQnet this is not applicable
sname – server name. If using ZMQNet this is the server ip address, for MPINet it is not applicable
-
virtual void disconnect_ps() = 0
disconnect from the connected parameter server
Called automatically by destructor if not previously called
-
inline int get_server_rank() const
Return the MPI rank of the parameter server.
-
inline int get_client_rank() const
Return the MPI rank of this client.
-
virtual std::string send_and_receive(const Message &msg) = 0
Send a message to the parameter server and receive the response in a serialized format.
- Parameters:
msg – The message
- Returns:
The response message in serialized format. Use Message::set_msg( <serialized_msg>, true ) to unpack
-
void send_and_receive(Message &recv, const Message &send)
Send a message to the parameter server and receive the response both as Message objects.
Note recv and send can be the same object
- Parameters:
send – The sent message
recv – The received message
-
virtual void async_send(const Message &send)
Perform a non-blocking send operation.
Not all net clients support non-blocking sends, in which case it will default to a blocking send
-
inline virtual bool async_send_supported() const
Check if a net client supports non-blocking sends.
-
inline void linkPerf(PerfStats *perf)
If linked timing and packet size information will be gathered.
-
inline virtual void setRecvTimeout(const int timeout_ms)
Set the timeout for receiving messages (implementation dependent)
-
ADNetClient()
-
class ADZMQNetClient : public chimbuko::ADNetClient
- #include <ADNetClient.hpp>
Implementation of the ADNetClient interface for the ZMQNet network.
Public Functions
-
ADZMQNetClient()
-
~ADZMQNetClient()
-
virtual void connect_ps(int rank, int srank = 0, std::string sname = "MPINET") override
connect to the parameter server
- Parameters:
rank – this process rank
srank – Ignored for this class
sname – The server ip address
-
virtual void disconnect_ps() override
disconnect from the connected parameter server
Called automatically by destructor if not previously called
-
virtual std::string send_and_receive(const Message &msg) override
Send a message to the parameter server and receive the response in a serialized format.
- Parameters:
msg – The message
- Returns:
The response message in serialized format. Use Message::set_msg( <serialized_msg>, true ) to unpack
-
inline virtual void setRecvTimeout(const int timeout_ms) override
Set the timeout on blocking receives. Must be called prior to connecting.
-
inline void *getZMQsocket()
Get the zeroMQ socket.
-
inline void *getZMQcontext()
Get the zeroMQ context.
-
void stopServer()
Issue a stop command to the server. The server will then stop once all clients have disconnected and all messages processed.
-
ADZMQNetClient()
-
class ADLocalNetClient : public chimbuko::ADNetClient
- #include <ADNetClient.hpp>
Implementation of ADNetClient for intraprocess communications.
Public Functions
-
virtual void connect_ps(int rank, int srank = 0, std::string sname = "MPINET") override
connect to the parameter server
- Parameters:
rank – Stored as internal rank index
srank – Ignored
sname – Ignored
-
virtual void disconnect_ps() override
disconnect from the connected parameter server
Called automatically by destructor if not previously called
-
virtual std::string send_and_receive(const Message &msg) override
Send a message to the parameter server and receive the response in a serialized format.
- Parameters:
msg – The message
- Returns:
The response message in serialized format. Use Message::set_msg( <serialized_msg>, true ) to unpack
-
virtual void connect_ps(int rank, int srank = 0, std::string sname = "MPINET") override
-
class ADThreadNetClient : public chimbuko::ADNetClient
- #include <ADNetClient.hpp>
ADNetClient inside a worker thread with blocking send/receive and non-blocking send.
Public Functions
-
ADThreadNetClient(bool local = false)
Constructor.
- Parameters:
local – Use a local (in process) communicator if true, otherwise use the default network communicator
-
void enqueue_action(ClientAction *action)
Add an action to the queue.
Use only if you know what you are doing!
-
virtual void connect_ps(int rank, int srank = 0, std::string sname = "MPINET") override
Connect to the parameter server.
-
virtual void disconnect_ps() override
Disconnect from the parameter server.
-
virtual std::string send_and_receive(const Message &send) override
Perform a blocking send and receive operation.
- Parameters:
msg – The message
- Returns:
The response message in serialized format. Use Message::set_msg( <serialized_msg>, true ) to unpack
-
inline virtual bool async_send_supported() const override
Check if a net client supports non-blocking sends.
-
virtual void setRecvTimeout(const int timeout_ms) override
Set a timeout (in ms) on receiving a response message.
-
void stopWorkerThread()
Stop the worker thread. Called automatically by destructor.
-
size_t getNwork() const
Get the number of outstanding net operations.
Mutex locks the queue so using this too frequently may cause performance issues
-
~ADThreadNetClient()
Private Functions
-
ClientAction *getWorkItem()
Get the next net operation.
-
void run(bool local = false)
Create the worker thread.
- Parameters:
local – Use a local (in process) communicator if true, otherwise use the default network communicator
Private Members
-
std::queue<ClientAction*> queue
The queue of net operations
-
bool m_is_running
Is the worker thread running?
-
struct ClientAction
- #include <ADNetClient.hpp>
Virtual class representing actions performed by the worker thread.
Public Functions
-
virtual void perform(ADNetClient &client) = 0
Perform the action utilizing the underlying net implementation.
-
virtual bool do_delete() const = 0
Whether to delete the work object (instance of ClientAction) after completion.
-
inline virtual bool shutdown_worker() const
Whether to shutdown the worker thread after completing the action.
-
inline virtual ~ClientAction()
-
virtual void perform(ADNetClient &client) = 0
-
ADThreadNetClient(bool local = false)
-
class ADNetClient
ADNormalEventProvenance
-
namespace chimbuko
-
namespace modules
-
namespace performance_analysis
-
class ADNormalEventProvenance
- #include <ADNormalEventProvenance.hpp>
A class that maintains the provenance information for the most recent normal event for each encountered function A mechanism is provided for dealing with cases where a normal execution is not yet available Once returned the internal copy is deleted ensuring a given normal execution is only ever output once.
Public Functions
-
void addNormalEvent(const unsigned long pid, const unsigned long rid, const unsigned long tid, const unsigned long fid, const nlohmann::json &event)
Add a normal event. If a normal event already exists with this pid,rid,tid,fid it will be overwritten.
-
std::pair<nlohmann::json, bool> getNormalEvent(const unsigned long pid, const unsigned long rid, const unsigned long tid, const unsigned long fid, bool add_outstanding, bool do_delete)
Get a normal event if available.
- Parameters:
add_outstanding – If true and the event is not available the pid/rid/tid/fid will be placed in a list of outstanding requests that will be furnished later
do_delete – If true and the event is available, the stored copy will be deleted
- Returns:
The JSON data if available, and a bool indicating if the data is populated
-
std::vector<nlohmann::json> getOutstandingRequests(bool do_delete)
For normal event requests that were not previously available, calls to this function will see if a normal event now exists.
- Parameters:
do_delete – If true and the event is available, the stored copy will be deleted
- Returns:
A vector of outstanding requests that have now been furnished
Private Functions
-
void addOutstanding(const unsigned long pid, const unsigned long rid, const unsigned long tid, const unsigned long fid)
Add an entry to the list of outstanding requests.
-
void addNormalEvent(const unsigned long pid, const unsigned long rid, const unsigned long tid, const unsigned long fid, const nlohmann::json &event)
-
class ADNormalEventProvenance
-
namespace performance_analysis
-
namespace modules
ADOutlier
-
namespace chimbuko
-
class ADOutlier
- #include <ADOutlier.hpp>
abstract class for anomaly detection algorithms
Subclassed by chimbuko::ADOutlierCOPOD, chimbuko::ADOutlierHBOS, chimbuko::ADOutlierSSTD
Public Functions
-
inline bool use_ps() const
check if the parameter server is in use
- Returns:
true if the parameter server is in use
- Returns:
false if the parameter server is not in use
-
void linkNetworkClient(ADNetClient *client)
Link the interface for communicating with the parameter server.
-
virtual void run(ADDataInterface &data, int step = 0) = 0
abstract method to run the implemented anomaly detection algorithm
- Parameters:
step – step (or frame) number
- Returns:
data structure containing information on captured anomalies
-
inline void linkPerf(PerfStats *perf)
If linked, performance information on the sync_param routine will be gathered.
-
inline ParamInterface const *get_global_parameters() const
Get the local copy of the global parameters.
- Returns:
Pointer to a ParamInterface object
-
void setGlobalParameters(const std::string &to)
Set the global parameters, overwriting the existing global model. Here the input is in serialized form.
Use in conjunction with setGlobalModelSyncFrequency(0) to set and freeze the model, not allowing it to be modified by the data
-
inline void setGlobalModelSyncFrequency(int to)
Set how often (in steps, or equivalently calls to “run”) the global model is updated. If to <= 0, the global model will never be updated.
Public Static Functions
-
static ADOutlier *set_algorithm(int rank, const AlgoParams ¶ms)
Factory method to select AD algorithm at runtime.
Protected Functions
-
virtual std::pair<size_t, size_t> sync_param(ParamInterface *param)
Synchronize the input model with the global model
If we are connected to the pserver, the local model will be merged remotely with the current global model on the server, and the new global model returned. The internal global model will be replaced by this. If we are not connected to the pserver, the local model will be merged into the internal global model
- Parameters:
param – the input model
- Returns:
std::pair<size_t, size_t> [sent, recv] message size
-
void updateGlobalModel()
Every m_global_model_sync_freq calls to this function, synchronize the local model with the global model then flush the local model.
Protected Attributes
-
bool m_use_ps
true if the parameter server is in use
-
ADNetClient *m_net_client
interface for communicating to parameter server
-
ParamInterface *m_param
global parameters (kept in sync with parameter server)
-
ParamInterface *m_local_param
local parameters that have not yet been sync’d with the global model
-
int m_sync_call_count
count of calls to sync_param
-
int m_global_model_sync_freq
how often the local model is pushed and synchronized with the globel model (default 1)
-
int m_rank
rank index, used only for staggering pserver sync events
-
struct AlgoParams
- #include <ADOutlier.hpp>
Unified structure for passing the parameters of the AD algorithms to the factory method.
Public Functions
-
AlgoParams()
-
void setJson(const nlohmann::json &in)
Read the parameters from a json object. Note, only “algorithm” and the entries associated with the specific algorithm need to be set.
-
void loadJsonFile(const std::string &filename)
Read the parameters from a json file. Note, only “algorithm” and the entries associated with the specific algorithm need to be set.
-
nlohmann::json getJson() const
Return the parameters as a json object.
-
bool operator==(const AlgoParams &r) const
Equivalence operator.
Public Members
-
double sstd_sigma
The number of sigma that defines an outlier
-
double hbos_thres
The outlier threshold
-
bool glob_thres
Should the global threshold be used?
-
int hbos_max_bins
The maximum number of bins in a histogram
-
class cmdlineParser : public chimbuko::optionalCommandLineArgBase
- #include <ADOutlier.hpp>
Parser object that reads the data from a json file with provided filename.
Public Functions
-
inline cmdlineParser(AlgoParams &member, const std::string &arg, const std::string &help_str)
-
virtual int parse(const std::string &arg, const char **vals, const int vals_size) override
If the first string matches the internal arg string (eg “-help”), a number of strings are consumed from the array ‘vals’ and that number returned. A value of -1 indicates the argument did not match.
- Parameters:
vals – An array of strings
vals_size – The length of the string array
-
inline cmdlineParser(AlgoParams &member, const std::string &arg, const std::string &help_str)
-
AlgoParams()
-
inline bool use_ps() const
-
class ADOutlierSSTD : public chimbuko::ADOutlier
- #include <ADOutlier.hpp>
statistic analysis based anomaly detection algorithm
Public Functions
-
ADOutlierSSTD(int rank, double sigma = 6.0)
Construct a new ADOutlierSSTD object.
-
~ADOutlierSSTD()
Destroy the ADOutlierSSTD object.
-
inline void set_sigma(double sigma)
Set the sigma value: the number of standard deviations from the mean that defines an anomaly.
- Parameters:
sigma – sigma value
-
virtual void run(ADDataInterface &data, int step = 0) override
abstract method to run the implemented anomaly detection algorithm
- Parameters:
step – step (or frame) number
- Returns:
data structure containing information on captured anomalies
Protected Functions
Private Members
-
double m_sigma
sigma
-
ADOutlierSSTD(int rank, double sigma = 6.0)
-
class ADOutlierHBOS : public chimbuko::ADOutlier
- #include <ADOutlier.hpp>
HBOS anomaly detection algorithm.
Public Functions
-
ADOutlierHBOS(int rank, double threshold = 0.99, bool use_global_threshold = true, int maxbins = 200)
Construct a new ADOutlierHBOS object.
- Parameters:
threshold – The threshold defining an outlier
use_global_threshold – The threshold is maintained as part of the global model
maxbins – The maximum number of bins in the histograms
-
~ADOutlierHBOS()
Destroy the ADOutlierHBOS object.
-
inline void set_threshold(double to)
Set the outlier detection threshold.
-
inline void set_alpha(double alpha)
Set the alpha value.
- Parameters:
regularizer – alpha value
-
inline void set_maxbins(int to)
Set the maximum number of histogram bins.
-
virtual void run(ADDataInterface &data, int step = 0) override
abstract method to run the implemented anomaly detection algorithm
- Parameters:
step – step (or frame) number
- Returns:
data structure containing information on captured anomalies
Protected Functions
-
void labelData(std::vector<ADDataInterface::Elem> &data_vals, size_t dset_idx, size_t model_idx)
Override the default threshold for a particular function.
Note: during the merge on the pserver, the merged threshold will be the larger of the values from the two inputs, hence the threshold should ideally be uniformly set for all ranks
Get the threshold used by the given data set function
compute outliers (or anomalies) of the list of data
- Parameters:
func – The function name
threshold – The new threshold
data_vals – [inout] Array of data
dset_idx – The data set index
model_idx – The index of the data type in the model
Private Members
-
double m_alpha
Used to prevent log2 overflow
-
double m_threshold
Threshold used to filter anomalies in HBOS
-
std::unordered_map<std::string, double> m_func_threshold_override
Optionally override the threshold for specific functions
-
bool m_use_global_threshold
Flag to use global threshold
-
int m_maxbins
Maximum number of bin in the histograms
-
ADOutlierHBOS(int rank, double threshold = 0.99, bool use_global_threshold = true, int maxbins = 200)
-
class ADOutlierCOPOD : public chimbuko::ADOutlier
- #include <ADOutlier.hpp>
COPOD anomaly detection algorithm.
A description of the algorithm can be found in Li, Z., Zhao, Y., Botta, N., Ionescu, C. and Hu, X. COPOD: Copula-Based Outlier Detection. IEEE International Conference on Data Mining (ICDM), 2020. https://www.andrew.cmu.edu/user/yuezhao2/papers/20-icdm-copod-preprint.pdf
The implementation is based on the pyOD implementation https://pyod.readthedocs.io/en/latest/_modules/pyod/models/copod.html for which the computation of the outlier score differs slightly from that in the paper
Public Functions
-
ADOutlierCOPOD(int rank, double threshold = 0.99, bool use_global_threshold = true)
Construct a new ADOutlierCOPOD object.
-
~ADOutlierCOPOD()
Destroy the ADOutlierCOPOD object.
-
inline void set_alpha(double alpha)
Set the alpha value.
- Parameters:
regularizer – alpha value
-
virtual void run(ADDataInterface &data, int step = 0) override
abstract method to run the implemented anomaly detection algorithm
- Parameters:
step – step (or frame) number
- Returns:
data structure containing information on captured anomalies
Protected Functions
-
void labelData(std::vector<ADDataInterface::Elem> &data_vals, size_t dset_idx, size_t model_idx)
Override the default threshold for a particular function.
Note: during the merge on the pserver, the merged threshold will be the larger of the values from the two inputs, hence the threshold should ideally be uniformly set for all ranks
Get the threshold used by the given function
- Parameters:
func – The function name
threshold – The new threshold
-
ADOutlierCOPOD(int rank, double threshold = 0.99, bool use_global_threshold = true)
-
class ADOutlier
ADParser
-
namespace chimbuko
-
namespace modules
-
namespace performance_analysis
-
class ADParser
- #include <ADParser.hpp>
parsing performance trace data streamed via ADIOS2
Note: The “function index” assigned to each function by Tau is not necessarily the same for every node as it depends on the order in which the function is encountered. To deal with this, if the parameter server is running it maintains a global mapping of function name to an index, which is synchronized to the parser (providing the net client is linked) and the local index is replaced by the global index in the incoming data stream.
Note2: The “program index” assigned by Tau is defunct (always 0). We must therefore replace it manually with a correct index to support workflows
Public Functions
-
ADParser(std::string inputFile, unsigned long program_idx, int rank, std::string engineType = "BPFile", int openTimeoutSeconds = 60)
Construct a new ADParser object.
- Parameters:
inputFile – ADIOS2 BP filename
program_index – The index to assign to the program whose trace data is being parsed
rank – Rank of current process
engineType – BPFile or SST
openTimeoutSeconds – Timeout for opening ADIOS2 stream
-
inline void linkNetClient(ADNetClient *net_client)
Link the net client to the object that maintains a mapping of local function index to global index.
If this is performed, the parser will replace the local with global index in the incoming data stream
-
inline const std::unordered_map<int, std::string> *getFuncMap() const
Get the function hash map (function id –> function name)
- Returns:
const std::unordered_map<int, std::string>* function hash map
-
inline const std::unordered_map<int, std::string> *getEventType() const
Get the event type hash map (event type id –> event name)
- Returns:
const std::unordered_map<int, std::string>* event type hash map
-
inline const std::unordered_map<int, std::string> *getCounterMap() const
Get the counter hash map (counter id –> counter description)
- Returns:
const std::unordered_map<int, std::string>* event type hash map
-
inline bool getStatus() const
Get the status of this parser.
- Returns:
true if it is connected with a writer
- Returns:
false if it is disconnected or there are no available data anymore
-
inline int getCurrentStep() const
Get the current step (or frame) number.
Returns a value of -1 if beginStep has yet to be called.
- Returns:
int step number
-
int beginStep(bool verbose = false)
start fetching next available data
- Parameters:
verbose – true to output additional information
- Returns:
int current step number
-
void endStep()
end current step (or frame), only effect on ADIOS2 SST engine
Set the timeout in seconds on waiting for the next ADIOS2 timestep (default 30s)
-
inline void setBeginStepTimeout(int timeout)
-
void update_attributes()
update attributes (or meta data), with ADIOS2 BPFile engine it only fetches the available attributes one time.
-
ParserError fetchFuncData()
fetching function (timer) data. Results stored internally and extracted using ADParser::getFuncData
- Returns:
ParserError error code
-
ParserError fetchCommData()
fetching communication data. Results stored internally and extracted using ADParser::getCommData
- Returns:
ParserError error code
-
ParserError fetchCounterData()
fetching counter data. Results stored internally and extracted using ADParser::getCounterData
- Returns:
ParserError error code
-
inline const unsigned long *getFuncData(size_t idx) const
get pointer to an array of a function event specified by
idx- Parameters:
idx – index of a function event
- Returns:
pointer to a function event array
-
inline size_t getNumFuncData() const
Get the number of function events in the current step.
- Returns:
size_t the number of function events
-
inline const unsigned long *getCommData(size_t idx) const
get pointer to a communication event array specified by
idx- Parameters:
idx – index of a communication event
- Returns:
pointer to a communication event array
-
inline size_t getNumCommData() const
Get the number of communication events in the current step.
- Returns:
size_t the number of communication events
-
inline const unsigned long *getCounterData(size_t idx) const
get pointer to a counter event array specified by
idx- Parameters:
idx – index of a counter event
- Returns:
pointer to a counter event array
-
inline size_t getNumCounterData() const
Get the number of counter events in the current step.
- Returns:
size_t the number of counter events
-
inline const std::vector<MetaData_t> &getNewMetaData() const
Get metadata parsed for the first time during the current step.
-
std::vector<Event_t> getEvents() const
Get all the events (func, comm and counter) occuring in the IO step.
Events are guaranteed ordered by their timestamp on a per-thread basis. Global ordering of events is not guaranteed
-
void addFuncData(unsigned long const *d)
For testing purposes, add the data in the array d to the internal m_event_timestamps array.
Will throw an error if the new array size exceeds the vector capacity as this would invalidate previous Event_t objects
- Parameters:
d – An array of length FUNC_EVENT_DIM
-
void addCounterData(unsigned long const *d)
For testing purposes, add the data in the array d to the internal m_counter_timestamps array.
Will throw an error if the new array size exceeds the vector capacity as this would invalidate previous Event_t objects
- Parameters:
d – An array of length COUNTER_EVENT_DIM
-
void addCommData(unsigned long const *d)
For testing purposes, add the data in the array d to the internal m_comm_timestamps array.
Will throw an error if the new array size exceeds the vector capacity as this would invalidate previous Event_t objects
- Parameters:
d – An array of length COMM_EVENT_DIM
-
inline void setFuncDataCapacity(size_t cap)
Set the m_event_timestamps vector capacity in units of FUNC_EVENT_DIM. This will invalidate previous Event_t objects if it requires a realloc!
-
inline void setCommDataCapacity(size_t cap)
Set the m_comm_timestamps vector capacity in units of COMM_EVENT_DIM. This will invalidate previous Event_t objects if it requires a realloc!
-
inline void setCounterDataCapacity(size_t cap)
Set the m_counter_timestamp vector capacity in units of COUNTER_EVENT_DIM. This will invalidate previous Event_t objects if it requires a realloc!
-
inline void setFuncMap(const std::unordered_map<int, std::string> &m)
Set the function index->name map for testing.
-
inline void setEventTypeMap(const std::unordered_map<int, std::string> &m)
Set the function event index -> event type map for testing.
-
void setCounterMap(const std::unordered_map<int, std::string> &m)
Set the counter index->name map for testing.
-
inline unsigned long getGlobalFunctionIndex(const unsigned long local_idx) const
Get the global index corresponding to a given local function index. 1<->1 mapping if pserver not connected.
-
inline int getCorrelationIDcounterIdx() const
Get the counter index of the special counter “Correlation ID” if known; else return -1.
-
inline void setDataRankOverride(bool to)
When true the parser will override the rank index of the parsed data with the rank member parameter.
This is useful for example when multiple instances of a non-MPI program are being run and the user wishes to distinguish them by the rank index
Private Functions
-
bool checkEventOrder(const EventDataType type, bool exit_on_fail) const
Scan the data and check the events are in order.
- Parameters:
exit_on_fail – Throw an error if the check fails
- Returns:
true if the events are in order, false otherwise
-
bool validateEvent(const unsigned long *e) const
Validate an event to bypass corrupted input data (any event type)
- Parameters:
e – Pointer to event data
-
std::pair<Event_t, bool> createAndValidateEvent(const unsigned long *data, EventDataType t, size_t idx, const eventID &id, bool log_error = true) const
Create an Event_t instance from the data at the provided pointer and run simple validation.
- Parameters:
log_error – If true a recoverable error will be logged for invalid events
Private Members
-
adios2::ADIOS m_ad
adios2 handler
-
adios2::IO m_io
adios2 I/O handler
-
adios2::Engine m_reader
adios2 engine handler
-
int m_beginstep_timeout
the timeout in seconds on waiting for the next ADIOS2 timestep
-
bool m_status
parser status
-
bool m_opened
true if connected to a writer or a BP file
-
bool m_attr_once
true for BP engine
-
int m_current_step
current step
-
int m_rank
Rank of current process
-
unsigned long m_program_idx
Program index
-
std::vector<MetaData_t> m_new_metadata
New metadata that appeared on this step
-
std::unordered_map<int, std::string> m_funcMap
function hash map (global function id –> function name)
-
int m_correlation_id_cid
counter id of “Correlation ID” counters if known (-1 otherwise)
-
size_t m_timer_event_count
the number of function events in current step
-
size_t m_comm_count
the number of communication events in current step
-
size_t m_counter_count
the number of counter events in the current step
-
ADglobalFunctionIndexMap m_global_func_idx_map
Maintains mapping of local function index to global function index (if pserver connected)
-
bool m_data_rank_override
Overwrite the rank index in the parsed data with member m_rank (default false)
-
ADParser(std::string inputFile, unsigned long program_idx, int rank, std::string engineType = "BPFile", int openTimeoutSeconds = 60)
-
class ADParser
-
namespace performance_analysis
-
namespace modules
ADProvenanceDBclient
ADProvenanceDBengine
Warning
doxygenfile: Cannot find file “ADProvenanceDBengine.hpp
AnomalyData
-
namespace chimbuko
-
namespace modules
-
namespace performance_analysis
Functions
-
bool operator==(const AnomalyData &a, const AnomalyData &b)
-
bool operator!=(const AnomalyData &a, const AnomalyData &b)
-
class AnomalyData
- #include <AnomalyData.hpp>
A class that contains data on the number of anomalies collected during the present timestep. It contains the number of anomalies and the timestamp window in which the anomalies occurred.
These data are aggregated over rank to form the anomaly_stats.anomaly field of the pserver streaming output
Public Functions
-
AnomalyData()
Default constructor.
-
AnomalyData(unsigned long app, unsigned long rank, unsigned step, unsigned long min_ts, unsigned long max_ts, unsigned long n_anomalies)
Constructor by value.
- Parameters:
app – The application index
rank – The MPI rank of the process
step – The io step
min_ts – The first timestamp at which an anomaly was observed in this io step
max_ts – The last timestamp at which an anomaly was observed in this io step
n_anomalies – The number of anomalies observed in this io step
-
void set(unsigned long app, unsigned long rank, unsigned step, unsigned long min_ts, unsigned long max_ts, unsigned long n_anomalies)
Set the parameters.
- Parameters:
app – The application index
rank – The MPI rank of the process
step – The io step
min_ts – The first timestamp at which an anomaly was observed in this io step
max_ts – The last timestamp at which an anomaly was observed in this io step
n_anomalies – The number of anomalies observed in this io step
-
inline unsigned long get_app() const
Get the application index.
-
inline unsigned long get_rank() const
Get the MPI rank of the process.
-
inline unsigned long get_step() const
Get the io step.
-
inline unsigned long get_min_ts() const
Get the first timestamp at which an anomaly was observed in this io step.
-
inline unsigned long get_max_ts() const
Get the last timestamp at which an anomaly was observed in this io step.
-
inline unsigned long get_n_anomalies() const
Get the number of anomalies observed in this io step.
-
inline void set_min_ts(unsigned long to)
Set the earliest timestamp.
-
inline void set_max_ts(unsigned long to)
Set the last timestamp.
-
inline void incr_n_anomalies(unsigned long by)
Increment the number of anomalies.
-
inline void add_outlier_score(double val)
Add an outlier score to the internal statistics.
-
nlohmann::json get_json() const
Get the object in JSON format.
Private Members
-
unsigned long m_app
The application index
-
unsigned long m_rank
The MPI rank of the process
-
unsigned long m_step
The io step
-
unsigned long m_min_timestamp
The first timestamp at which an anomaly was observed in this io step
-
unsigned long m_max_timestamp
The last timestamp at which an anomaly was observed in this io step
-
unsigned long m_n_anomalies
The number of anomalies observed in this io step
Friends
-
friend bool operator==(const AnomalyData &a, const AnomalyData &b)
Comparison operator.
-
friend bool operator!=(const AnomalyData &a, const AnomalyData &b)
Negative comparison operator.
-
AnomalyData()
-
bool operator==(const AnomalyData &a, const AnomalyData &b)
-
namespace performance_analysis
-
namespace modules
ExecData
-
namespace chimbuko
-
namespace modules
-
namespace performance_analysis
Functions
-
class Event_t
- #include <ExecData.hpp>
class to provide easy access to raw performance event vector
The data are passed in via ADIOS2 and stored internally in a compressed format in the form of an integer array, blocks of which are associated with particular events. Each block has a certain number of entries associated with it that relate to information such as program, comm and thread index, timestamp as well as detailed event information. The mappings are set out in ADDefine.hpp.
This class wraps the event data blocks allowing for retrieval of event information through explicit function calls. It works for all event types: function, comm and counter
Public Functions
-
inline Event_t(const unsigned long *data, EventDataType t, size_t idx, const eventID &id = eventID())
Construct a new Event_t object.
- Parameters:
data – pointer to raw performance event vector
t – event type
idx – event index
id – event id
-
inline bool valid() const
check if the raw data pointer is valid
-
inline size_t idx() const
return event index, typically the index of the event in the input array for the timestep on which it was spawned
-
inline unsigned long pid() const
return program id
-
inline unsigned long rid() const
return rank id
-
inline unsigned long tid() const
return thread id
-
unsigned long eid() const
return event type id (FUNC/COMM only). Eg for FUNC events is is ENTRY/EXIT
-
unsigned long ts() const
return timestamp of this event
-
inline EventDataType type() const
return event type
-
unsigned long fid() const
return function (timer) id (FUNC event only)
-
unsigned long tag() const
return communication tag id (COMM event only)
-
unsigned long partner() const
return communication partner id (COMM event only)
-
unsigned long bytes() const
return communication data size (in bytes) (COMM event only)
-
unsigned long counter_id() const
return counter id (COUNT event only)
-
unsigned long counter_value() const
return the value of the counter (COUNT event only)
-
bool operator==(const Event_t &r) const
Equivalence operation.
Note the underlying array pointers can be different providing the values are identical
-
nlohmann::json get_json() const
Get the json object of this event object.
-
inline const unsigned long *get_ptr() const
Return the pointer to the underlying data.
-
int get_data_len() const
Get the length of the underlying array.
Private Members
-
const unsigned long *m_data
pointer to raw performance trace data vector
-
EventDataType m_t
event type
-
size_t m_idx
event index
-
inline Event_t(const unsigned long *data, EventDataType t, size_t idx, const eventID &id = eventID())
-
class CommData_t
- #include <ExecData.hpp>
wrapper for communication event
Public Functions
-
CommData_t()
Construct a new CommData_t object.
-
CommData_t(const Event_t &ev, const std::string &commType)
Construct a new CommData_t object.
- Parameters:
ev – constant reference to a Event_t object
commType – communication type (e.g. SEND/RECV)
-
CommData_t(unsigned long pid, unsigned long rid, unsigned long tid, unsigned long partner, unsigned long bytes, unsigned long tag, unsigned long timestamp, const std::string &commType)
Construct a new CommData_t object by values.
- Parameters:
pid – Program index
rid – Rank index
tid – Thread index
partner – The other rank involved in the communication
bytes – The number of bytes sent/received
tag – The tag of the event
timestamp – The time of the counter
commType – Either “SEND” or “RECV” depending on the comm type
-
inline ~CommData_t()
Destroy the CommData_t object.
-
inline unsigned long ts() const
return timestamp
-
inline unsigned long src() const
return source process id of this communication event
-
inline unsigned long tar() const
return target (or destination) process id of this communication event
-
inline unsigned long tag() const
Get the integer tag associated with this comm event.
-
inline unsigned long bytes() const
Return the size of the communication in bytes.
-
inline void set_exec_key(const eventID &key)
Set the execution key id (i.e. where this communication event occurs). This is equal to the “id” object associated with a parent ExecData_t object.
- Parameters:
key – execution id
-
inline const eventID &get_exec_key() const
Get the execution key id. This is equal to the “id” object associated with a parent ExecData_t object.
-
bool is_same(const CommData_t &other) const
compare two communication data
- Parameters:
other –
- Returns:
true if other is same
- Returns:
false if other is different
-
nlohmann::json get_json() const
Get the json object of this communication data.
Private Members
-
unsigned long m_pid
program id
-
unsigned long m_rid
rank id
-
unsigned long m_tid
thread id
-
unsigned long m_src
source process id
-
unsigned long m_tar
target process id
-
unsigned long m_bytes
communication data size in bytes
-
unsigned long m_tag
communication tag
-
unsigned long m_ts
communication timestamp
-
CommData_t()
-
class CounterData_t
- #include <ExecData.hpp>
wrapper for counter event
Public Functions
-
CounterData_t()
Construct a new CounterData_t object.
-
CounterData_t(const Event_t &ev, const std::string &counter_name)
Construct a new CounterData_t object.
- Parameters:
ev – constant reference to a Event_t object
commType – communication type (e.g. SEND/RECV)
-
CounterData_t(unsigned long pid, unsigned long rid, unsigned long tid, unsigned long counter_id, const std::string &counter_name, unsigned long counter_value, unsigned long timestamp)
Construct a new CounterData_t object by values.
- Parameters:
pid – Program index
rid – Rank index
tid – Thread index
counter_id – Counter index
counter_name – The name of the counter (should match the counter_id through the name map)
counter_value – The value of the counter
timestamp – The time of the counter
-
nlohmann::json get_json() const
Get the json object of this communication data.
-
inline unsigned long get_pid() const
return program id
-
inline unsigned long get_rid() const
return rank id
-
inline unsigned long get_tid() const
return thread id
-
inline unsigned long get_value() const
Return the value of the counter.
-
inline unsigned long get_ts() const
Return the counter timestamp.
-
inline unsigned long get_counterid() const
Return the index of the counter.
-
inline void set_exec_key(const eventID &key)
Set the execution key id (i.e. where this counter event occurs). This is equal to the “id” string associated with a parent ExecData_t object.
- Parameters:
key – execution id
-
inline const eventID &get_exec_key() const
Get the execution key id. This is equal to the “id” string associated with a parent ExecData_t object.
-
CounterData_t()
-
class ExecData_t
- #include <ExecData.hpp>
A pair of function (timer) events, ENTRY and EXIT.
Public Functions
-
ExecData_t()
Construct a new ExecData_t object.
-
ExecData_t(const Event_t &ev)
Construct a new ExecData_t object.
- Parameters:
ev – constant reference to a Event_t object
-
ExecData_t(const eventID &id, unsigned long pid, unsigned long rid, unsigned long tid, unsigned long fid, const std::string &func_name, long entry, long exit = -1)
Construct a new ExecData_t object by values.
- Parameters:
id – The id associated with the instance
pid – Program index
rid – Rank index
tid – Thread index
fid – Function index
func_name – The name of the function (should match the function index through the name map)
entry – Timestamp of function start (entry)
exit – Timestamp of function exit. A value of -1 (default) indicates that the exit is not yet known. It should be set later using update_exit
-
~ExecData_t()
Destroy the ExecData_t object.
-
inline unsigned long get_pid() const
Get the program id of this execution data.
-
inline unsigned long get_tid() const
Get the thread id of this execution data.
-
inline unsigned long get_rid() const
Get the rank id of this execution data.
-
inline unsigned long get_fid() const
Get the function id of this execution data.
-
inline long get_entry() const
Get the entry time of this execution data.
-
inline long get_exit() const
Get the exit time of this execution data.
-
inline long get_runtime() const
Get the (inclusive) running time of this execution data.
-
inline long get_inclusive() const
Get the (inclusive) running time of this execution data.
-
inline long get_exclusive() const
Get the exclusive running ime of this execution data.
-
inline int get_label() const
Get the label of this execution data.
- Returns:
int 1 of normal and -1 if anomaly. Returns 0 if no label has been assigned.
-
inline const std::deque<CommData_t> &get_messages() const
Get a list of communication data occured in this execution data.
-
inline const std::deque<CounterData_t> &get_counters() const
Get a list of counter events that occured in this execution data.
-
inline unsigned long get_n_message() const
Get the number of communication events.
-
inline unsigned long get_n_children() const
Get the number of childrent functions.
-
inline unsigned long get_n_counter() const
Get the number of counter.
-
inline double get_outlier_score() const
Return the outlier score assigned to the data point, representing how unlikely an event is.
-
inline double get_outlier_severity() const
Return the outlier severity, representing how important the outlier is.
-
inline void set_label(int label)
Set the label.
- Parameters:
label – 1 for normal, -1 for anomaly.
-
inline void set_outlier_score(double score)
Set the outlier score.
-
inline void set_parent(const eventID &parent)
Set the parent function of this execution.
- Parameters:
parent – the parent execution id
-
inline void set_funcname(const std::string &funcname)
Set the function name of this execution.
- Parameters:
funcname – function name
-
bool update_exit(const Event_t &ev)
update exit event of this execution
- Parameters:
ev – exit event
- Returns:
true no errors
- Returns:
false incorrect exit event
-
void update_exit(unsigned long exit)
update exit event of this execution
- Parameters:
exit – timestamp
-
inline void update_exclusive(long t)
update exclusive running time
- Parameters:
t – running time of a child function
-
inline void inc_n_children()
increase the number of child function by 1
-
bool add_message(const CommData_t &comm, ListEnd end = ListEnd::Back)
add communication data to one end of the message queue
- Parameters:
comm – communication event occured in this execution
end – add to which end of the deque
- Returns:
true no errors
- Returns:
false invalid communication event
-
bool add_counter(const CounterData_t &count, ListEnd end = ListEnd::Back)
add counter data
- Parameters:
counter – counter event occurred in this execution
end – add to which end of the deque
- Returns:
true no errors
- Returns:
false invalid communication event
-
bool is_same(const ExecData_t &other) const
compare with other execution
- Parameters:
other – other execution data
- Returns:
true if they are same
- Returns:
false if they are different
-
nlohmann::json get_json(bool with_message = false, bool with_counter = false) const
Get the json object of this execution data.
- Parameters:
with_message – if true, include all message (communication) information
with_counter – if true, include all counter information
- Returns:
nlohmann::json json object
-
inline bool can_delete() const
Determine whether the event can be deleted by the garbage collection at the end of the io step.
-
inline void register_reference()
Increment the external reference counter, preventing object deletion.
-
void deregister_reference()
Decrement the external reference counter, allowing object deletion if 0.
-
inline unsigned long reference_count() const
Get the number of external references registered.
-
inline void set_GPU_correlationID_partner(const eventID &event_id)
Set the partner event linked by a GPU correlation ID.
-
inline bool has_GPU_correlationID_partner() const
Return true if this event has been matched to a partner event by a GPU correlation ID.
-
inline size_t n_GPU_correlationID_partner() const
Get the number of events linked by GPU correlation ID.
Private Members
-
unsigned long m_pid
program id
-
unsigned long m_tid
thread id
-
unsigned long m_rid
rank id
-
unsigned long m_fid
function id
-
long m_entry
entry time
-
long m_exit
exit time
-
long m_runtime
inclusive running time (i.e. including time of child calls)
-
long m_exclusive
exclusive running time (i.e. excluding time of child calls)
-
int m_label
1 for normal, -1 for abnormal execution
-
double m_score
Outlier score (implementation dependent)
-
unsigned long m_n_children
the number of childrent executions
-
unsigned long m_n_messages
the number of messages
-
std::deque<CommData_t> m_messages
a vector of all messages
-
std::deque<CounterData_t> m_counters
a vector of all counters
-
unsigned long m_references
track number of external references to object. When 0 the object can be deleted
-
ExecData_t()
-
class MetaData_t
- #include <ExecData.hpp>
wrapper for metadata entries
Public Functions
-
MetaData_t(unsigned long pid, unsigned long rid, unsigned long tid, const std::string &descr, const std::string &value)
Construct an instance will full set of parameters.
- Parameters:
pid – Program index
rid – Rank
tid – Thread
descr – Key
descr – Value
-
inline unsigned long get_pid() const
Get the origin program index.
-
inline unsigned long get_rid() const
Get the origin global comm rank.
-
inline unsigned long get_tid() const
Get the origin thread index.
-
nlohmann::json get_json() const
Get the json object of this metadata.
- Returns:
nlohmann::json json object
-
MetaData_t(unsigned long pid, unsigned long rid, unsigned long tid, const std::string &descr, const std::string &value)
-
class Event_t
-
namespace performance_analysis
-
namespace modules
utils
-
namespace chimbuko
Functions
-
unsigned char random_char()
Return a random character.
-
unsigned char random_char()
Anomaly Detection Algorithm Parameters
Parameters of the anomaly detection algorithm.
ParamInterface
-
namespace chimbuko
-
class ParamInterface
- #include <param.hpp>
The general interface for storing function statistics for anomaly detection.
Subclassed by chimbuko::CopodParam, chimbuko::HbosParam, chimbuko::SstdParam
Public Functions
-
ParamInterface()
-
inline virtual ~ParamInterface()
-
virtual void clear() = 0
Clear all statistics.
-
virtual size_t size() const = 0
Get the number of models for which statistics are being collected.
-
virtual std::string serialize() const = 0
Convert internal models to string format for IO.
- Returns:
Run statistics in string format
-
virtual std::string update(const std::string ¶meters, bool return_update = false) = 0
Update the internal models with those included in the serialized input map.
Note: we combine update and serialize here in order to avoid having to acquire 2 successive mutex locks on the pserver
- Parameters:
parameters – The parameters in serialized format
return_update – Indicates that the function should return a serialized copy of the updated parameters
- Returns:
An empty string or a serialized copy of the updated parameters depending on return_update
-
virtual void update(const ParamInterface &other) = 0
Update the internal run statistics with those from another instance.
The instance will be dynamically cast to the derived type internally, and will throw an error if the types do not match The other instance will be locked during the process for thread safety
-
virtual void update(const std::vector<ParamInterface*> &other)
Update the internal run statistics with those from multiple other instances.
The instance will be dynamically cast to the derived type internally, and will throw an error if the types do not match The other instance will be locked during the process for thread safety
-
virtual void assign(const std::string ¶meters) = 0
Set the internal run statistics to match those included in the serialized input model. Overwrite performed only for those model indices in te input.
- Parameters:
parameters – The serialized input model
-
virtual nlohmann::json get_algorithm_params(const unsigned long model_idx) const = 0
Get the algorithm parameters associated with a given model index. Format is algorithm dependent.
-
virtual std::unordered_map<unsigned long, nlohmann::json> get_all_algorithm_params() const = 0
Get the algorithm parameters for all model indices. Returns a map of model index to JSON-formatted parameters. Parameter format is algorithm dependent.
-
virtual nlohmann::json serialize_json() const = 0
Serialize the set of algorithm parameters in JSON form for purpose of inter-run persistence, format may differ from the above.
-
virtual void deserialize_json(const nlohmann::json &from) = 0
De-serialize the set of algorithm parameters from JSON form created by serialize_json.
Public Static Functions
-
static ParamInterface *set_AdParam(const std::string &ad_algorithm)
Return a pointer to a new instance of the ParamInterface derived class associated with the given algorithm.
-
ParamInterface()
-
class NetPayloadUpdateParams : public chimbuko::NetPayloadBase
- #include <param.hpp>
Net payload for pserver updating params from AD.
Public Functions
-
inline NetPayloadUpdateParams(ParamInterface *param, bool freeze = false)
Construct the NetPayloadUpdateParams object.
- Parameters:
param – A pointer to an instance of ParamInterface
freeze – If true the state will not be modified by the update command
-
inline virtual int kind() const override
The message kind to which the payload is to be bound.
Chimbuko core reserves range -1…-infty, modules must use range 0..+infty
-
inline virtual MessageType type() const override
The message type to which the payload is to be bound.
Private Members
-
ParamInterface *m_param
-
bool m_freeze
If set to true, the additional data from the AD will be ignored and the parameter state will not change
-
inline NetPayloadUpdateParams(ParamInterface *param, bool freeze = false)
-
class NetPayloadGetParams : public chimbuko::NetPayloadBase
- #include <param.hpp>
Net payload for AD updating params from pserver.
Public Functions
-
inline NetPayloadGetParams(ParamInterface const *param)
-
inline virtual int kind() const override
The message kind to which the payload is to be bound.
Chimbuko core reserves range -1…-infty, modules must use range 0..+infty
-
inline virtual MessageType type() const override
The message type to which the payload is to be bound.
Private Members
-
ParamInterface const *m_param
-
inline NetPayloadGetParams(ParamInterface const *param)
-
class ParamInterface
CopodParam
-
namespace chimbuko
-
class CopodFuncParam
- #include <copod_param.hpp>
The algorithm parameters for a given model.
Public Functions
-
CopodFuncParam()
-
inline const Histogram &getHistogram() const
-
inline Histogram &getHistogram()
-
inline double getInternalGlobalThreshold() const
-
inline void setInternalGlobalThreshold(double to)
-
void merge(const CopodFuncParam &other)
Merge another instance of HbosFuncParam into this one.
- Parameters:
other – The other instance
-
nlohmann::json get_json() const
-
inline bool operator==(const CopodFuncParam &other) const
-
inline bool operator!=(const CopodFuncParam &other) const
-
CopodFuncParam()
-
class CopodParam : public chimbuko::ParamInterface
- #include <copod_param.hpp>
@brief Implementation of ParamInterface for COPOD based anomaly detection
Public Functions
-
CopodParam()
-
~CopodParam()
-
virtual void clear() override
Clear all statistics.
-
inline void copy(const CopodParam &r)
Copy an existing HbosParam.
-
const int find(const unsigned long model_id)
Check if the statistics for a model exist in the histogram.
-
inline const std::unordered_map<unsigned long, CopodFuncParam> &get_copodstats() const
Get the internal map between model index and statistics.
-
inline void set_copodstats(const std::unordered_map<unsigned long, CopodFuncParam> &to)
Set the internal map between model index and statistics to the provided input.
-
inline virtual size_t size() const override
Get the number of functions for which statistics are being collected.
-
virtual std::string serialize() const override
Convert internal Histogram to string format for IO.
- Returns:
Histogram in string format
-
virtual std::string update(const std::string ¶meters, bool return_update = false) override
Update the internal Histogram with those included in the serialized input map.
- Parameters:
parameters – The parameters in serialized format
return_update – Controls return format
- Returns:
An empty string if return_update==False, otherwise the serialized updated parameters
-
virtual void assign(const std::string ¶meters) override
Set the internal Histogram to match those included in the serialized input map. Overwrite performed only for those keys in input.
- Parameters:
parameters – The serialized input map
-
void assign(const CopodParam &other)
Set the internal data to match those included in the input. Overwrite performed only for those keys in input.
- Parameters:
params – The input data
-
inline CopodFuncParam &operator[](unsigned long id)
Get an element of the internal map.
- Parameters:
id – The model index
-
void update(const CopodParam &other)
Update the internal Histogram with those included in another CopodParam instance.
The other instance is locked during the process
- Parameters:
other – [in] The other CopodParam instance
-
inline virtual void update(const ParamInterface &other) override
Update the internal model from another instance.
The instance will be dynamically cast to the derived type internally, and will throw an error if the types do not match
-
void update_and_return(CopodParam &other)
Update the internal histogram with those included in another CopodParam instance. Other CopodParam is then updated to reflect new state.
- Parameters:
other – [inout] The other CopodParam instance
-
virtual nlohmann::json get_algorithm_params(const unsigned long model_idx) const override
Get the parameters (histogram) associated with a specific model index in JSON format.
-
virtual std::unordered_map<unsigned long, nlohmann::json> get_all_algorithm_params() const override
Get the algorithm parameters for all model indices. Returns a map of model index to JSON-formatted parameters. Parameter format is algorithm dependent.
-
virtual nlohmann::json serialize_json() const override
Serialize the set of algorithm parameters in JSON form for purpose of inter-run persistence, format may differ from the above.
-
virtual void deserialize_json(const nlohmann::json &from) override
De-serialize the set of algorithm parameters from JSON form created by serialize_json.
Public Static Functions
-
static std::string serialize_cerealpb(const CopodParam ¶m)
Convert a CopodParam into a Cereal portable binary representration.
- Parameters:
param – The CopodParam instance
- Returns:
Histogram in string format
-
static void deserialize_cerealpb(const std::string ¶meters, CopodParam &p)
Deserialize a CopodParam instance.
- Parameters:
parameters – [in] The parameter string
p – [out] The CopodParam instance
Private Members
-
std::unordered_map<unsigned long, CopodFuncParam> m_copodstats
Map of model index and corresponding Histogram
-
CopodParam()
-
class CopodFuncParam
HbosParam
-
namespace chimbuko
-
class HbosFuncParam
- #include <hbos_param.hpp>
The algorithm parameters for a given model.
Public Functions
-
HbosFuncParam()
-
inline const Histogram &getHistogram() const
-
inline Histogram &getHistogram()
-
inline double getInternalGlobalThreshold() const
-
inline void setInternalGlobalThreshold(double to)
-
void merge(const HbosFuncParam &other, const binWidthSpecifier &bw)
Merge another instance of HbosFuncParam into this one.
- Parameters:
other – The other instance
bw – The specifier for the bin width used in the histogram merge
-
nlohmann::json get_json() const
-
inline bool operator==(const HbosFuncParam &other) const
-
inline bool operator!=(const HbosFuncParam &other) const
-
HbosFuncParam()
-
class HbosParam : public chimbuko::ParamInterface
- #include <hbos_param.hpp>
Implementation of ParamInterface for HBOS based anomaly detection.
Public Functions
-
HbosParam()
-
~HbosParam()
-
virtual void clear() override
Clear all statistics.
-
bool find(const unsigned long model_id) const
Check if the statistics for a model exist in the histogram.
-
inline const std::unordered_map<unsigned long, HbosFuncParam> &get_hbosstats() const
Get the internal map between model index and statistics.
-
inline void set_hbosstats(const std::unordered_map<unsigned long, HbosFuncParam> &to)
Set the internal map between model index and statistics to the provided input.
-
inline virtual size_t size() const override
Get the number of models for which statistics are being collected.
-
virtual std::string serialize() const override
Convert internal Histogram to string format for IO.
- Returns:
Histogram in string format
-
virtual std::string update(const std::string ¶meters, bool return_update = false) override
Update the internal Histogram with those included in the serialized input map.
- Parameters:
parameters – The parameters in serialized format
return_update – Controls return format
- Returns:
An empty string if return_update==False, otherwise the serialized updated parameters
-
virtual void assign(const std::string ¶meters) override
Set the internal Histogram to match those included in the serialized input map. Overwrite performed only for those keys in input.
- Parameters:
parameters – The serialized input data
-
void assign(const HbosParam ¶ms)
Set the internal data to match those included in the input. Overwrite performed only for those keys in input.
- Parameters:
params – The input data
-
inline HbosFuncParam &operator[](unsigned long id)
Get an element of the internal map.
- Parameters:
id – The model index
-
void update(const HbosParam &other)
Update the internal histogram with those included in the input data.
The other instance is locked during the process
- Parameters:
param – [in] The input HbosParam object
-
inline virtual void update(const ParamInterface &other) override
Update the internal run statistics with those from another instance.
The instance will be dynamically cast to the derived type internally, and will throw an error if the types do not match
-
void update_and_return(HbosParam &from_into)
Update the internal data with those included in the input. The input is then updated to reflect new state.
- Parameters:
from_into – [inout] The input/output data
-
virtual nlohmann::json get_algorithm_params(const unsigned long model_idx) const override
Get the parameters (histogram) associated with a specific model index in JSON format.
-
virtual std::unordered_map<unsigned long, nlohmann::json> get_all_algorithm_params() const override
Get the algorithm parameters for all model indices. Returns a map of model index to JSON-formatted parameters. Parameter format is algorithm dependent.
-
inline int getMaxBins() const
Get the maximum number of bins.
-
inline void setMaxBins(const int b)
Set the maximum number of bins.
-
void generate_histogram(const unsigned long model_id, const std::vector<double> &values, double global_threshold_init, HbosParam const *global_param = nullptr)
Generate the histogram for a particular function based on the batch of runtimes.
- Parameters:
model_id – The model index
values – The data used to generate the model
global_param – A pointer to the current global histogram. If non-null both the global model and the values dataset will be used to determine the optimal bin width
global_threshold_init – The initial value of the internal, global threshold
-
virtual nlohmann::json serialize_json() const override
Serialize the set of algorithm parameters in JSON form for purpose of inter-run persistence, format may differ from the above.
-
virtual void deserialize_json(const nlohmann::json &from) override
De-serialize the set of algorithm parameters from JSON form created by serialize_json.
Public Static Functions
Private Functions
Private Members
-
std::unordered_map<unsigned long, HbosFuncParam> m_hbosstats
Map of model id and corresponding Histogram
-
int m_maxbins
Maximum number of bins to use in the histograms
-
HbosParam()
-
class HbosFuncParam
SstdParam
-
namespace chimbuko
-
class SstdParam : public chimbuko::ParamInterface
- #include <sstd_param.hpp>
@brief Implementation of ParamInterface for anomaly detection based on function time distribution (mean, std. dev., etc)
Public Functions
-
SstdParam()
-
~SstdParam()
-
virtual void clear() override
Clear all statistics.
-
inline virtual size_t size() const override
Get the number of models for which statistics are being collected.
-
virtual std::string serialize() const override
Convert internal run statistics to string format for IO.
- Returns:
Run statistics in string format
-
virtual std::string update(const std::string ¶meters, bool return_update = false) override
Update the internal run statistics with those included in the serialized input map.
- Parameters:
parameters – The parameters in serialized format
return_update – Indicates that the function should return a serialized copy of the updated parameters
- Returns:
An empty string or a serialized copy of the updated parameters depending on return_update
-
virtual void assign(const std::string ¶meters) override
Set the internal run statistics to match those included in the serialized input map. Overwrite performed only for those keys in input.
- Parameters:
runstats – The serialized input map
-
void update(const std::unordered_map<unsigned long, RunStats> &runstats)
Update the internal run statistics with those included in the input map.
- Parameters:
runstats – [in] The map between global function index and statistics
-
void update(const SstdParam &other)
Update the internal statistics with those included in another SstdParam instance.
The other instance is locked during the process
- Parameters:
other – [in] The other SstdParam instance
-
inline virtual void update(const ParamInterface &other) override
Update the internal run statistics with those from another instance.
The instance will be dynamically cast to the derived type internally, and will throw an error if the types do not match
-
void update_and_return(std::unordered_map<unsigned long, RunStats> &runstats)
Update the internal run statistics with those included in the input map. Input map is then updated to reflect new state.
- Parameters:
runstats – [inout] The map between global function index and statistics
-
inline void update_and_return(SstdParam &other)
Update the internal statistics with those included in another SstdParam instance. Other SstdParam is then updated to reflect new state.
- Parameters:
other – [inout] The other SstdParam instance
-
void assign(const std::unordered_map<unsigned long, RunStats> &runstats)
Set the internal run statistics to match those included in the input map. Overwrite performed only for those keys in input.
- Parameters:
runstats – The input map between global function index and statistics
-
inline RunStats &operator[](unsigned long id)
Get an element of the internal map.
- Parameters:
id – The model index
-
inline const std::unordered_map<unsigned long, RunStats> &get_runstats() const
Get the internal map between model index and statistics.
-
virtual nlohmann::json get_algorithm_params(const unsigned long model_idx) const override
Get the algorithm parameters associated with a given model index.
-
virtual std::unordered_map<unsigned long, nlohmann::json> get_all_algorithm_params() const override
Get the algorithm parameters for all model indices. Returns a map of model index to JSON-formatted parameters. Parameter format is algorithm dependent.
-
virtual nlohmann::json serialize_json() const override
Serialize the set of algorithm parameters in JSON form for purpose of inter-run persistence, format may differ from the above.
-
virtual void deserialize_json(const nlohmann::json &from) override
De-serialize the set of algorithm parameters from JSON form created by serialize_json.
Public Static Functions
-
static std::string serialize_cerealpb(const std::unordered_map<unsigned long, RunStats> &runstats)
Convert a run statistics mapping into a Cereal portable binary representration.
- Parameters:
The – run stats mapping
- Returns:
Run statistics in string format
-
static void deserialize_cerealpb(const std::string ¶meters, std::unordered_map<unsigned long, RunStats> &runstats)
Convert a run statistics Cereal portable binary representation string into a map.
- Parameters:
parameters – [in] The parameter string
runstats – [out] The map between global function index and statistics
Protected Functions
-
SstdParam()
-
class SstdParam : public chimbuko::ParamInterface
Parameter Server
The parameter server runs on the head node and aggregates function anomaly and counter statistics for visualization. Aggregated statistics for function executions are also maintained and synchronized back to the AD instances such that the anomaly detection algorithm uses the most complete statistics to identify anomalies.
AnomalyStat
Warning
doxygenfile: Cannot find file “AnomalyStat.hpp
global_anomaly_stats
Warning
doxygenfile: Cannot find file “global_anomaly_stats.hpp
global_counter_stats
Warning
doxygenfile: Cannot find file “global_counter_stats.hpp
PSglobalFunctionIndexMap
-
namespace chimbuko
-
namespace modules
-
namespace performance_analysis
-
class PSglobalFunctionIndexMap
- #include <PSglobalFunctionIndexMap.hpp>
A class that maintains a global mapping between function name and an index, which is to be synchronized over the nodes.
Public Functions
-
inline PSglobalFunctionIndexMap()
< Next unassigned index
-
unsigned long lookup(unsigned long pid, const std::string &func_name)
Lookup a function by name and return the index. A new index will be assigned if the function has not been encountered before.
- Parameters:
pid – The program index
func_name – The function name
-
bool contains(unsigned long pid, const std::string &func_name) const
Check if the map contains the specified function.
- Parameters:
pid – The program index
func_name – The function name
-
std::unordered_map<unsigned long, std::pair<unsigned long, std::string>> getFunctionIndexMap() const
Get a map between the function index and the pid /function name.
- Returns:
A map of function index -> (program index, function name)
-
nlohmann::json serialize() const
Serialize the map to a JSON object.
-
void deserialize(const nlohmann::json &fmap)
Set the map to the contents of the JSON object.
-
inline PSglobalFunctionIndexMap()
-
class NetPayloadGlobalFunctionIndexMap : public chimbuko::NetPayloadBase
- #include <PSglobalFunctionIndexMap.hpp>
Net payload for communicating function index pserver->AD.
Public Functions
-
inline NetPayloadGlobalFunctionIndexMap(PSglobalFunctionIndexMap *idxmap)
-
inline virtual int kind() const override
The message kind to which the payload is to be bound.
Chimbuko core reserves range -1…-infty, modules must use range 0..+infty
-
inline virtual MessageType type() const override
The message type to which the payload is to be bound.
Private Members
-
PSglobalFunctionIndexMap *m_idxmap
-
inline NetPayloadGlobalFunctionIndexMap(PSglobalFunctionIndexMap *idxmap)
-
class NetPayloadGlobalFunctionIndexMapBatched : public chimbuko::NetPayloadBase
- #include <PSglobalFunctionIndexMap.hpp>
Net payload for communicating function index pserver->AD in batches.
Public Functions
-
inline NetPayloadGlobalFunctionIndexMapBatched(PSglobalFunctionIndexMap *idxmap)
-
inline virtual int kind() const override
The message kind to which the payload is to be bound.
Chimbuko core reserves range -1…-infty, modules must use range 0..+infty
-
inline virtual MessageType type() const override
The message type to which the payload is to be bound.
Private Members
-
PSglobalFunctionIndexMap *m_idxmap
-
inline NetPayloadGlobalFunctionIndexMapBatched(PSglobalFunctionIndexMap *idxmap)
-
class PSglobalFunctionIndexMap
-
namespace performance_analysis
-
namespace modules
PSProvenanceDBclient
Warning
doxygenfile: Cannot find file “PSProvenanceDBclient.hpp
PSstatSender
-
namespace chimbuko
-
struct PSstatSenderPayloadBase
- #include <PSstatSender.hpp>
Base class for wrappers around objects/object pointers that return JSON objects that are sent to the parameter server.
The JSON objects are collected into a single object whose members are tagged according to the “tag” provided by the wrapper Nothing will be sent if the resulting JSON object is empty
Subclassed by chimbuko::PSstatSenderCreatedAtTimestampPayload, chimbuko::PSstatSenderVersionPayload, chimbuko::modules::performance_analysis::PSstatSenderGlobalAnomalyMetricsCombinePayload, chimbuko::modules::performance_analysis::PSstatSenderGlobalAnomalyMetricsPayload, chimbuko::modules::performance_analysis::PSstatSenderGlobalAnomalyStatsCombinePayload, chimbuko::modules::performance_analysis::PSstatSenderGlobalAnomalyStatsPayload, chimbuko::modules::performance_analysis::PSstatSenderGlobalCounterStatsCombinePayload, chimbuko::modules::performance_analysis::PSstatSenderGlobalCounterStatsPayload
Public Functions
-
virtual void add_json(nlohmann::json &into) const = 0
Add the JSON object payload to ‘into’ as a new member with an appropriate tag (user should ensure no duplicate tags!)
-
inline virtual bool do_fetch() const
Whether to request a callback to process the response (optional)
- Parameters:
packet – The string packet returned by the previous call to get_json()
returned – The string returned in response
-
inline virtual void process_callback(const std::string &packet, const std::string &returned) const
If a callback is requested, this function is called after it is returned.
-
inline virtual ~PSstatSenderPayloadBase()
-
virtual void add_json(nlohmann::json &into) const = 0
-
struct PSstatSenderCreatedAtTimestampPayload : public chimbuko::PSstatSenderPayloadBase
- #include <PSstatSender.hpp>
A payload to insert a “created_at” timestamp entry into the record.
Public Functions
-
virtual void add_json(nlohmann::json &into) const override
Add the timestamp into the ‘into’.
-
virtual void add_json(nlohmann::json &into) const override
-
struct PSstatSenderVersionPayload : public chimbuko::PSstatSenderPayloadBase
- #include <PSstatSender.hpp>
A payload to insert a “version” entry into the record.
Public Functions
-
virtual void add_json(nlohmann::json &into) const override
Add the version number into the ‘into’.
-
virtual void add_json(nlohmann::json &into) const override
-
class PSstatSender
- #include <PSstatSender.hpp>
A class that periodically sends aggregate statistics to the visualization module via curl using a background thread.
Public Functions
-
PSstatSender(size_t send_freq = 1000)
Constructpr.
- Parameters:
send_freq – The frequency (in milliseconds) at which sends are performed to the viz module
-
~PSstatSender()
-
inline void set_send_freq(const size_t freq)
Change the frequency (in milliseconds) at which sends are performed to the viz module. Must be set prior to calling run_stat_sender.
-
void run_stat_sender(const std::string &url, const std::string &stat_save_dir = "")
Start sending global anomaly stats to the visualization module (curl)
- Parameters:
url – The URL of the visualization module
stat_save_dir – Optionally output the stats to disk in this directory alongside/instead of to the viz module
-
void stop_stat_sender(int wait_msec = 0)
Stop sending global anomaly stats to the visualization module (curl)
-
inline void add_payload(PSstatSenderPayloadBase *payload)
Add a payload. Takes ownership of pointer, which is freed.
-
inline bool bad() const
If an exception is caught in the thread loop, the thread will stop issuing sends and set this bool to true.
Private Members
-
size_t m_send_freq
Number of seconds between sends to viz
-
std::atomic_bool m_bad
If an exception is caught in the thread loop, the thread will stop issuing sends and set this bool to true
-
std::vector<PSstatSenderPayloadBase*> m_payloads
Vector of payload wrappers defining the sets of data sent to the parameter server
-
PSstatSender(size_t send_freq = 1000)
-
struct PSstatSenderPayloadBase
Network
The network is the communication pathway between the AD instances and the parameter server. The default implementation, ZMQnet uses zeroMQ, and a deprecated interface via MPI is also provided and can be selected at compile time.
NetInterface
-
namespace chimbuko
Enums
-
class NetPayloadBase
Subclassed by chimbuko::NetPayloadGetParams, chimbuko::NetPayloadGetParamsFromManager, chimbuko::NetPayloadHandShake, chimbuko::NetPayloadPing, chimbuko::NetPayloadUpdateParamManager, chimbuko::NetPayloadUpdateParams, chimbuko::modules::performance_analysis::NetPayloadGlobalFunctionIndexMap, chimbuko::modules::performance_analysis::NetPayloadGlobalFunctionIndexMapBatched, chimbuko::modules::performance_analysis::NetPayloadRecvCombinedADdata, chimbuko::modules::performance_analysis::NetPayloadRecvCombinedADdataArray, chimbuko::modules::performance_analysis::NetPayloadUpdateAnomalyMetrics, chimbuko::modules::performance_analysis::NetPayloadUpdateAnomalyStats, chimbuko::modules::performance_analysis::NetPayloadUpdateCounterStats
Public Functions
-
virtual int kind() const = 0
The message kind to which the payload is to be bound.
Chimbuko core reserves range -1…-infty, modules must use range 0..+infty
-
virtual MessageType type() const = 0
The message type to which the payload is to be bound.
-
virtual void action(Message &response, const Message &message) = 0
Act on the message and formulate a response.
-
inline void check(const Message &msg) const
Helper function to ensure the message is of the correct kind/type.
-
inline virtual ~NetPayloadBase()
-
virtual int kind() const = 0
-
class NetPayloadHandShake : public chimbuko::NetPayloadBase
- #include <net.hpp>
Default handshake response; this is bound automatically to the network.
Public Functions
-
inline virtual int kind() const override
The message kind to which the payload is to be bound.
Chimbuko core reserves range -1…-infty, modules must use range 0..+infty
-
inline virtual MessageType type() const override
The message type to which the payload is to be bound.
-
inline virtual int kind() const override
-
class NetPayloadPing : public chimbuko::NetPayloadBase
- #include <net.hpp>
Ping command.
Public Functions
-
inline virtual int kind() const override
The message kind to which the payload is to be bound.
Chimbuko core reserves range -1…-infty, modules must use range 0..+infty
-
inline virtual MessageType type() const override
The message type to which the payload is to be bound.
-
inline virtual int kind() const override
-
class NetInterface
- #include <net.hpp>
Network interface class.
Subclassed by chimbuko::LocalNet, chimbuko::ZMQMENet, chimbuko::ZMQNet
Public Types
-
typedef std::unordered_map<int, std::unordered_map<MessageType, std::unique_ptr<NetPayloadBase>>> PayloadMapType
Map of message kind/type to payloads
-
typedef std::unordered_map<int, PayloadMapType> WorkerPayloadMapType
Map of worker index and message type to payloads
Public Functions
-
NetInterface()
Construct a new Net Interface object.
-
virtual ~NetInterface()
Destroy the Net Interface object.
-
virtual void init(int *argc = nullptr, char ***argv = nullptr, int nt = 1) = 0
(virtual) initialize network interface
- Parameters:
argc – command line argc
argv – command line argv
nt – the number of threads for a thread pool
-
virtual void finalize() = 0
(virtual) finalize network
-
virtual void run() = 0
(virtual) Run network server
-
virtual void stop() = 0
(virtual) stop network server
-
virtual std::string name() const = 0
(virtual) name of network server
- Returns:
std::string name of network server
-
void add_payload(NetPayloadBase *payload, int worker_idx = 0)
Add a payload to the receiver bound to particular message kind/type specified internally.
Assumes ownership of the NetPayloadBase object and deletes in constructor worker_idx: ZMQNet - worker_idx corresponds to the worker thread. Payloads must be assigned to each thread MPINet - use 0 always ZMQMENet - worker_idx corresponds to the endpoint thread
- Parameters:
payload – The payload
worker_idx – The worker index to which the payload is bound (implementation defined, see below)
Public Static Functions
-
static void find_and_perform_action(int worker_id, Message &msg_reply, const Message &msg, const WorkerPayloadMapType &payloads)
Find the action associated with the given worker_id and message type and perform the action.
- Parameters:
worker_id – The worker index
msg_reply – The reply message
msg – The input message
payloads – The map of worker/message type to payload
-
static void find_and_perform_action(Message &msg_reply, const Message &msg, const PayloadMapType &payloads)
Find the action associated with the given message type and perform the action.
- Parameters:
msg_reply – The reply message
msg – The input message
payloads – The map of worker/message type to payload
Protected Functions
-
virtual void init_thread_pool(int nt) = 0
initialize thread pool
- Parameters:
nt – the number threads in the pool
Protected Attributes
-
int m_nt
The number of threads in the pool
-
WorkerPayloadMapType m_payloads
Map of worker index (implementation defined), message kind and message type to a payload
-
typedef std::unordered_map<int, std::unordered_map<MessageType, std::unique_ptr<NetPayloadBase>>> PayloadMapType
-
namespace DefaultNetInterface
Functions
-
NetInterface &get()
get default network interface for easy usages
- Returns:
NetInterface& default network
-
NetInterface &get()
-
class NetPayloadBase
MPINet
ZMQNet
-
namespace chimbuko
-
class ZMQNet : public chimbuko::NetInterface
- #include <zmq_net.hpp>
A network interface using ZeroMQ.
Public Types
Public Functions
-
ZMQNet()
-
~ZMQNet()
-
virtual void init(int *argc, char ***argv, int nt) override
(virtual) initialize network interface
- Parameters:
argc – command line argc
argv – command line argv
nt – the number of threads for a thread pool
-
virtual void finalize() override
Finalize network.
-
virtual void run() override
(virtual) Run network server
-
virtual void stop() override
Stop network server.
-
inline virtual std::string name() const override
Name of network server.
- Returns:
std::string name of network server
-
inline void setMaxMsgPerPollCycle(const int max_msg)
Set the maximum number of messages that the router thread will route front->back and back->front per poll cycle.
-
inline void setIOthreads(const int nt)
Set the number of IO threads used by ZeroMQ (default 1). Must be called prior to init(…)
-
inline void setPort(const int port)
Set the port upon which the connection is made. Must be called prior to run(..). Default 5559.
-
inline void setAutoShutdown(const bool to)
Set the rule for automatic shutdown once all clients have disconnected (default true)
-
inline void setTimeOut(const long time_ms)
Set the timeout on polling for client requests or responses from worker threads (-1 = no timeout [default])
Public Static Functions
Protected Functions
-
virtual void init_thread_pool(int nt) override
initialize thread pool
- Parameters:
nt – the number threads in the pool
Private Functions
-
int recvAndSend(void *skFrom, void *skTo, int max_msg)
Route a message to/from worker thread pool.
- Parameters:
skFrom – ZMQ origin socket
skTo – ZMQ destination socket
max_msg – The maximum number of messages this function will attempt to drain from the queue (including disconnect message)
- Returns:
the number of messages routed
Private Members
-
void *m_context
ZeroMQ context pointer
-
long long m_n_requests
Accumulated number of RPC requests
-
std::vector<PerfStats> m_perf_thr
Performance monitoring for worker threads; will be combined with m_perf before write
-
mutable std::vector<std::mutex> m_thr_mutex
Mutexes for locking between main thread and individual worker threads
-
int m_max_pollcyc_msg
Maximum number of front->back and back->front messages that will be routed per poll cycle. Too many and we risk starving a socket, too few and might hit perf issues
-
int m_io_threads
Set the number of IO threads used by ZeroMQ (default 1)
-
int m_clients
Number of connected clients
-
bool m_client_has_connected
At least one client has connected previously
-
int m_port
The port upon which the net connects
-
bool m_autoshutdown
The network will shutdown once all clients have disconnected
-
long m_poll_timeout
The timeout (in ms) after which on no activity the network with shutdown (default -1: infinite)
-
bool m_remote_stop_cmd
Registration of requests for server to stop issued by clients
-
ZMQNet()
-
class ZMQNet : public chimbuko::NetInterface
ZMQMENet
-
namespace chimbuko
-
class ZMQMENet : public chimbuko::NetInterface
- #include <zmqme_net.hpp>
A multi-endpoint, multi-threaded interface using ZeroMQ.
Public Functions
-
ZMQMENet()
-
~ZMQMENet()
-
virtual void init(int *argc, char ***argv, int nt) override
(virtual) initialize network interface
- Parameters:
argc – command line argc
argv – command line argv
nt – the number of endpoint threads
-
virtual void finalize() override
Finalize network; blocking wait for worker threads to finish.
-
virtual void run() override
(virtual) Run network server
-
virtual void stop() override
Stop network server.
-
inline virtual std::string name() const override
Name of network server.
- Returns:
std::string name of network server
-
inline void setBasePort(const int port)
Set the base port upon which the connection is made (thread ports are offset by thread index). Must be called prior to run(..). Default 5559.
Public Static Functions
Protected Functions
-
virtual void init_thread_pool(int nt) override
initialize thread pool
- Parameters:
nt – the number threads in the pool
Private Members
-
int m_base_port
Port of first endpoint
-
int m_nthread
Number of endpoint threads
-
std::vector<PerfStats> m_perf_thr
Performance monitoring for worker threads; will be combined with m_perf before write
-
std::vector<int> m_clients_thr
Tracker of number of connected clients, used to determine when a thread exits
-
bool m_finalized
Has previously been finalized
-
ZMQMENet()
-
class ZMQMENet : public chimbuko::NetInterface
Message
-
namespace chimbuko
Enums
-
enum MessageType
Enum of the message “type” or action.
Values:
-
enumerator REQ_ADD
-
enumerator REQ_GET
-
enumerator REQ_CMD
-
enumerator REQ_QUIT
-
enumerator REQ_ECHO
-
enumerator REP_ADD
-
enumerator REP_GET
-
enumerator REP_CMD
-
enumerator REP_QUIT
-
enumerator REP_ECHO
-
enumerator REQ_ADD
-
enum BuiltinMessageKind
Enum of the message “kind” describing the context of the action for builtin actions.
NOTE: Builtin actions will reserve the range -1 .. -infty for values; modules must use range 0…+infty for defined functionality
Values:
-
enumerator DEFAULT
-
enumerator CMD
-
enumerator PARAMETERS
-
enumerator DEFAULT
Functions
-
std::string toString(const MessageType m)
-
std::string toString(const BuiltinMessageKind m)
-
std::string toString(const MessageCmd m)
-
class Message
- #include <message.hpp>
A class containing a message and header that can be serialized in JSON form for communication.
Public Functions
-
void set_info(int src, int dst, int type, int kind, int frame = 0, int size = 0)
Set the message information (header)
- Parameters:
src – source rank
dst – destination rank
type – message type
kind – message kind
frame – frame index
size – message size
-
void deserializeMessage(const std::string &from)
Unpack the string into this object, setting the header and content as appropriate.
-
inline int src() const
Get the origin rank.
-
inline int dst() const
Get the destination rank.
-
inline int type() const
Get the message type.
-
inline int kind() const
Get the message kind.
-
inline int size() const
Get the message size in bytes.
-
inline int frame() const
Get the message io frame (step)
-
inline void clear()
clear data buffer
-
class Header
Public Functions
-
inline Header()
header size in bytes
-
inline int &src()
source rank
- Returns:
int& reference to the source rank
-
inline int src() const
-
inline int &dst()
desination rank
- Returns:
int& reference to the destination rank
-
inline int dst() const
-
inline int &type()
message type
- Returns:
int& reference to the message type
-
inline int type() const
-
inline int &kind()
message kind
- Returns:
int& reference to the message kind
-
inline int kind() const
-
inline int &size()
message size
- Returns:
int& reference to the message size
-
inline int size() const
-
inline int &frame()
message frame index
- Returns:
int& reference to the message frame index
-
inline int frame() const
-
nlohmann::json get_json() const
-
void set_header(const nlohmann::json &j)
Private Members
-
int m_h[8]
header information
0: src rank 1: dst rank 2: message type 3: message kind 4: message size (except header) in bytes 5: frame index (or step index) 6: reserved 7: reserved
-
inline Header()
-
void set_info(int src, int dst, int type, int kind, int frame = 0, int size = 0)
-
enum MessageType
Utils
Utility functions and classes.
ADIOS2parseUtils
-
namespace chimbuko
Functions
-
std::ostream &operator<<(std::ostream &os, const mapPrint &mp)
ostream output of a map using mapPrint wrapper
-
template<typename T>
std::ostream &operator<<(std::ostream &os, const vecPrint<T> &mp) ostream output of a vector using vecPrint wrapper
-
varBase *parseVariable(const std::string &name, const std::map<std::string, std::string> &varinfo, adios2::IO &io, adios2::Engine &eng)
A factory for generating varBase derived class instances that contain the data read from the input stream.
Returns a NULL ptr if the type is not supported The name/varinfo data can be obtained using the adios2::IO::AvailableVariables method
-
struct mapPrint
- #include <ADIOS2parseUtils.hpp>
Wrapper allowing ostream output of a string map object.
-
template<typename T>
struct vecPrint - #include <ADIOS2parseUtils.hpp>
Wrapper allowing ostream output of a vector object.
-
struct varBase
- #include <ADIOS2parseUtils.hpp>
Abstract interface for an object that reads, stores and outputs data or arrays of data from ADIOS2 streams.
Subclassed by chimbuko::varPOD< T >, chimbuko::varTensor< T >
-
template<typename T>
class varPOD : public chimbuko::varBase - #include <ADIOS2parseUtils.hpp>
Capture POD (single-value) data.
-
template<typename T>
class varTensor : public chimbuko::varBase - #include <ADIOS2parseUtils.hpp>
Capture multi-dimensional tensor data.
Public Functions
-
inline varTensor(const std::string &name, const std::vector<unsigned long> &shape, adios2::IO &io, adios2::Engine &eng)
-
inline virtual void get(adios2::IO &io, adios2::Engine &eng)
Read the variable from the ADIOS2 stream.
-
inline virtual void put(adios2::IO &io, adios2::Engine &eng)
Write the variable to the ADIOS2 stream.
-
inline T &operator()(const std::vector<unsigned long> &coord)
Get the value at given coordinate (non-const)
-
inline varTensor(const std::string &name, const std::vector<unsigned long> &shape, adios2::IO &io, adios2::Engine &eng)
-
std::ostream &operator<<(std::ostream &os, const mapPrint &mp)
Anomalies
Warning
doxygenfile: Cannot find file “Anomalies.hpp
barrier
commandLineParser
Defines
-
_WRP_P_0(PARENT, CHILD)
Macros for generating the structure list needed for addOptionalArgMultiArg.
-
_WRP_P_1(PARENT, CHILD)
-
_WRP_P_2(PARENT, CHILD, ...)
-
_WRP_P_3(PARENT, CHILD, ...)
-
_WRP_P_4(PARENT, CHILD, ...)
-
_WRP_P_5(PARENT, CHILD, ...)
-
_WRP_P_GET_MACRO(_0, _1, _2, _3, _4, _5, NAME, ...)
-
WRAP_PARENT(PARENT, ...)
-
addOptionalCommandLineArg(PARSER, PARENT, NAME, HELP_STR)
Helper macro to add an optional command line arg to the parser PARSER for member with given name NAME, PARENT struct instance, and help string HELP_STR. Option enabled by “-NAME” on command line.
-
addOptionalCommandLineArgWithDefault(PARSER, PARENT, NAME, DEFAULT, HELP_STR)
Helper macro to add an optional command line arg to the parser PARSER for member with given name NAME default value DEFAULT, PARENT struct instance, and help string HELP_STR. Option enabled by “-NAME” on command line.
-
addOptionalCommandLineArgDefaultHelpString(PARSER, PARENT, NAME)
Helper macro to add an optional command line arg to the parser PARSER for member with given name NAME, PARENT struct instance, and default help string “Provide the value for NAME”.
-
addOptionalCommandLineArgWithFlag(PARSER, PARENT, NAME, FLAGNAME, HELP_STR)
Helper macro to add an optional command line arg to the parser PARSER for member with given name NAME and flag with FLAGNAME both belonging to PARENT struct instance, and help string HELP_STR. Option enabled by “-NAME” on command line. FLAGNAME will be set true if parsed.
-
addOptionalCommandLineArgMultiValue(PARSER, PARENT, ARG_NAME, HELP_STR, ...)
Helper macro to add an optional command line arg to the parser PARSER with argument name -${ARG_NAME} which sets multiple variables.
Macro supports up to 5 members
Example usage: addOptionalCommandLineArgMultiValue(parser_instance, struct_instance, set_2vals, “the help”, a, b) called with -set_2vals 1 2 will set structure_instance members a and b to 1 and 2, respectively
-
addOptionalCommandLineArgOptArg(PARSER, PARENT, NAME, OPT_ARG, HELP_STR)
Helper macro to add an optional command line arg to the parser PARSER with given member name NAME, PARENT struct instance, and help string HELP_STR. Option enabled by “-OPT_ARG” on command line.
-
addOptionalCommandLineArgOptArgWithDefault(PARSER, PARENT, NAME, OPT_ARG, DEFAULT, HELP_STR)
Helper macro to add an optional command line arg to the parser PARSER with given member name NAME, default value DEFAULT, PARENT struct instance, and help string HELP_STR. Option enabled by “-OPT_ARG” on command line.
-
addMandatoryCommandLineArg(PARSER, PARENT, NAME, HELP_STR)
Helper macro to add a mandatory command line arg to the parser PARSER with given member name NAME, PARENT struct instance, and help string HELP_STR.
-
addMandatoryCommandLineArgDefaultHelpString(PARSER, PARENT, NAME)
Helper macro to add an optional command line arg to the parser PARSER with given member name NAME, PARENT struct instance, and default help string “Provide the value for NAME”.
-
namespace chimbuko
-
-
class optionalCommandLineArgBase
- #include <commandLineParser.hpp>
Base class for optional arg parsing structs.
Subclassed by chimbuko::ADOutlier::AlgoParams::cmdlineParser, chimbuko::optionalCommandLineArg< MemberType >, chimbuko::optionalCommandLineArgMultiValue< MemberTypes >, chimbuko::optionalCommandLineArgWithFlag< MemberType >
Public Functions
-
virtual int parse(const std::string &arg, const char **vals, const int vals_size) = 0
If the first string matches the internal arg string (eg “-help”), a number of strings are consumed from the array ‘vals’ and that number returned. A value of -1 indicates the argument did not match.
- Parameters:
vals – An array of strings
vals_size – The length of the string array
-
virtual void help(std::ostream &os) const = 0
Print the help string for this argument to the ostream.
-
inline virtual ~optionalCommandLineArgBase()
-
virtual int parse(const std::string &arg, const char **vals, const int vals_size) = 0
-
template<typename MemberType>
class optionalCommandLineArg : public chimbuko::optionalCommandLineArgBase - #include <commandLineParser.hpp>
A class that parses an argument of a given type into the struct.
Public Functions
-
inline optionalCommandLineArg(MemberType &member, const std::string &arg, const std::string &help_str)
Create an instance with the provided argument and help string.
-
inline virtual int parse(const std::string &arg, const char **vals, const int vals_size) override
If the first string matches the internal arg string (eg “-help”), a number of strings are consumed from the array ‘vals’ and that number returned. A value of -1 indicates the argument did not match.
- Parameters:
vals – An array of strings
vals_size – The length of the string array
-
inline optionalCommandLineArg(MemberType &member, const std::string &arg, const std::string &help_str)
-
template<typename MemberType>
class optionalCommandLineArgWithFlag : public chimbuko::optionalCommandLineArgBase - #include <commandLineParser.hpp>
A class that parses an argument of a given type into the struct and sets a bool flag argument to true.
Public Functions
-
inline optionalCommandLineArgWithFlag(MemberType &member, bool &flag, const std::string &arg, const std::string &help_str)
Create an instance with the provided argument and help string.
-
inline virtual int parse(const std::string &arg, const char **vals, const int vals_size) override
If the first string matches the internal arg string (eg “-help”), a number of strings are consumed from the array ‘vals’ and that number returned. A value of -1 indicates the argument did not match.
- Parameters:
vals – An array of strings
vals_size – The length of the string array
-
inline optionalCommandLineArgWithFlag(MemberType &member, bool &flag, const std::string &arg, const std::string &help_str)
-
template<typename MemberType, class ...RemainingMemberTypes>
struct optionalCommandLineArgMultiValue_parse - #include <commandLineParser.hpp>
Recursive template class for parsing multiple values.
Public Functions
-
inline optionalCommandLineArgMultiValue_parse(MemberType &member, RemainingMemberTypes&... rem)
-
inline void parse(const char **vals)
Public Members
-
MemberType &member
-
inline optionalCommandLineArgMultiValue_parse(MemberType &member, RemainingMemberTypes&... rem)
-
template<typename MemberType>
struct optionalCommandLineArgMultiValue_parse<MemberType> Public Functions
-
inline optionalCommandLineArgMultiValue_parse(MemberType &member)
-
inline void parse(const char **vals)
Public Members
-
MemberType &member
-
inline optionalCommandLineArgMultiValue_parse(MemberType &member)
-
template<class ...MemberTypes>
class optionalCommandLineArgMultiValue : public chimbuko::optionalCommandLineArgBase - #include <commandLineParser.hpp>
A class that parses an argument of a given type into the struct with multiple values.
Public Functions
-
inline optionalCommandLineArgMultiValue(MemberTypes&... members, const std::string &arg, const std::string &help_str)
Create an instance with the provided argument and help string.
-
inline virtual int parse(const std::string &arg, const char **vals, const int vals_size) override
If the first string matches the internal arg string (eg “-help”), a number of strings are consumed from the array ‘vals’ and that number returned. A value of -1 indicates the argument did not match.
- Parameters:
vals – An array of strings
vals_size – The length of the string array
Public Static Attributes
-
static constexpr int NValues = std::tuple_size<std::tuple<MemberTypes...>>::value
Private Members
-
inline optionalCommandLineArgMultiValue(MemberTypes&... members, const std::string &arg, const std::string &help_str)
-
class mandatoryCommandLineArgBase
- #include <commandLineParser.hpp>
Base class for mandatory arg parsing structs.
Subclassed by chimbuko::mandatoryCommandLineArg< MemberType >
-
template<typename MemberType>
class mandatoryCommandLineArg : public chimbuko::mandatoryCommandLineArgBase - #include <commandLineParser.hpp>
A class that parses an argument of a given type into the struct.
Public Functions
-
inline mandatoryCommandLineArg(MemberType &member, const std::string &help_str)
Create an instance with the provided argument and help string.
-
inline mandatoryCommandLineArg(MemberType &member, const std::string &help_str)
-
class commandLineParser
- #include <commandLineParser.hpp>
The main parser class for a generic struct ArgsStruct.
Public Functions
-
inline void addOptionalArg(optionalCommandLineArgBase *arg_parser)
Add an optional argument parser object. Assumes ownership of pointer.
-
template<typename MemberType>
inline void addOptionalArg(MemberType &member, const std::string &arg, const std::string &help_str) Add an optional argument bound to the given member object with provided argument (eg “-a”) and help string.
-
template<typename MemberType>
inline void addOptionalArgWithFlag(MemberType &member, bool &flag, const std::string &arg, const std::string &help_str) Add an optional argument bound to the given member object and a bool flag, with provided argument (eg “-a”) and help string.
-
template<class ...MemberTypes>
inline void addOptionalArgMultiValue(const std::string &arg, const std::string &help_str, MemberTypes&... members) Add an optional argument that has multiple associated values.
-
template<typename MemberType>
inline void addMandatoryArg(MemberType &member, const std::string &help_str) Add an mandatory argument with the given type, member pointer (eg &ArgsStruct::a) and help string.
-
inline size_t nMandatoryArgs() const
Get the number of mandatory arguments.
-
inline void parse(const int narg, const char **args)
Parse an array of strings of length ‘narg’ into the structure.
Parsing will commence with first entry of args
-
inline void parseCmdLineArgs(int argc, char **argv)
Parse the command line arguments into the structure.
Parsing will commence with second entry of argv
Private Members
-
std::vector<std::unique_ptr<mandatoryCommandLineArgBase>> m_man_args
Container for the individual mandatory arg parsers
-
std::vector<std::unique_ptr<optionalCommandLineArgBase>> m_opt_args
Container for the individual optional arg parsers
-
inline void addOptionalArg(optionalCommandLineArgBase *arg_parser)
-
class optionalCommandLineArgBase
DispatchQueue
-
namespace chimbuko
-
class DispatchQueue
- #include <DispatchQueue.hpp>
A class for dispatching work items over a thread pool.
Public Functions
-
DispatchQueue(std::string name, size_t thread_cnt = 1)
Construct an instance of class, providing a name for the instance and the number of threads.
- Parameters:
name – The name of the instance
thread_cnt – The number of threads (default 1)
-
~DispatchQueue()
-
void dispatch(const fp_t &op)
Enqueue a work item (lvalue reference)
- Parameters:
op – An instance of std::function<void(void)>
-
void dispatch(fp_t &&op)
Enqueue a work item (rvalue reference)
- Parameters:
op – An instance of std::function<void(void)>
-
size_t size()
Return the number of outstanding work items in the queue.
Private Functions
-
void thread_handler(void)
-
DispatchQueue(std::string name, size_t thread_cnt = 1)
-
class DispatchQueue
error
-
namespace chimbuko
Functions
-
ErrorWriter &Error()
The global error writer instance.
-
void writeErrorTerminateHandler()
For fatal errors we delay writing the error to the output stream in case it is caught. This terminate handler ensures it is written.
After flushing the error the handler calls the terminateHandlerAbortAction above
-
void set_error_output_stream(const int rank, std::ostream *strm = &std::cerr)
Set the error output of the global error writer to a stream and specify the rank.
-
struct ErrorWriter
- #include <error.hpp>
A class for writing out errors to an output stream.
Public Functions
-
ErrorWriter()
-
inline void setRank(const int rank)
Set the MPI rank. This will add the rank to the error output.
-
void recoverable(const std::string &msg, const std::string &func, const std::string &file, const unsigned long line)
Signal a recoverable error.
Private Functions
-
ErrorWriter()
-
ErrorWriter &Error()
hash
-
namespace chimbuko
-
template<typename T, size_t N>
struct ArrayHasher - #include <hash.hpp>
Hash function for std::array.
-
template<typename T, size_t N>
map
-
namespace chimbuko
Typedefs
Functions
-
template<typename T>
T *getElemPRT(const unsigned long pid, const unsigned long rid, const unsigned long tid, std::unordered_map<unsigned long, std::unordered_map<unsigned long, std::unordered_map<unsigned long, T>>> &map) Get an element from the commonly-occuring triple-depth map of process/rank/thread to element (non-const)
- Parameters:
pid – The process index
rid – The rank index
tid – The thread index
map – The map
- Returns:
A pointer to the element if it exists, nullptr otherwise
-
template<typename T>
T const *getElemPRT(const unsigned long pid, const unsigned long rid, const unsigned long tid, const std::unordered_map<unsigned long, std::unordered_map<unsigned long, std::unordered_map<unsigned long, T>>> &map) Get an element from the commonly-occuring triple-depth map of process/rank/thread to element (const)
- Parameters:
pid – The process index
rid – The rank index
tid – The thread index
map – The map
- Returns:
A pointer to the element if it exists, nullptr otherwise
-
template<typename T>
std::unordered_map<unsigned long, T> *getMapPR(const unsigned long pid, const unsigned long rid, std::unordered_map<unsigned long, std::unordered_map<unsigned long, std::unordered_map<unsigned long, T>>> &map) Get the map between thread and element from the commonly-occuring triple-depth map of process/rank/thread to element (non-const)
- Parameters:
pid – The process index
rid – The rank index
map – The map
- Returns:
A pointer to the map element if it exists, nullptr otherwise
-
template<typename T>
std::unordered_map<unsigned long, T> const *getMapPR(const unsigned long pid, const unsigned long rid, const std::unordered_map<unsigned long, std::unordered_map<unsigned long, std::unordered_map<unsigned long, T>>> &map) Get the map between thread and element from the commonly-occuring triple-depth map of process/rank/thread to element (const)
- Parameters:
pid – The process index
rid – The rank index
map – The map
- Returns:
A pointer to the map element if it exists, nullptr otherwise
-
template<typename T>
struct _plus_equals - #include <map.hpp>
Implementation of recursive += for unordered map.
-
template<typename T>
memutils
-
namespace chimbuko
mtQueue
-
template<typename T>
class mtQueue - #include <mtQueue.hpp>
A multi-threaded wrapper around FIFO queue (std::queue)
Public Functions
-
inline mtQueue()
-
inline ~mtQueue()
-
inline bool tryPop(T &out)
Try to obtain a value from the front of the queue.
- Parameters:
out – [out] The value
- Returns:
True if the value is populated, false if the queue is invalid or the queue is empty
-
inline bool waitPop(T &out)
Wait until the queue either has an entry or is invalidated. Value taken from front of queue.
- Parameters:
out – [out] The value
- Returns:
True if queue is valid, false otherwise
-
inline bool empty() const
Return true if the queue is empty.
-
inline void clear()
Remove all entries from the queue.
-
inline void invalidate()
Mark the queue as invalid.
-
inline bool is_valid() const
Check if the queue has been invalidated
-
inline size_t size() const
The number of entries in the queue.
-
inline mtQueue()
PerfStats
-
namespace chimbuko
-
class PerfTimer
- #include <PerfStats.hpp>
A timer class that only measures time if _PERF_METRIC compile flag is set.
Public Functions
-
PerfTimer(bool start_now = true)
-
void start()
(Re)start the timer
-
void pause()
Pause the timer.
-
void unpause()
Unpause the timer.
This is the same as start but it does not zero the accumulated time from previous active periods
-
double elapsed_us() const
Compute the elapsed time in microseconds since start/unpause plus accumulated time from previoud active periods.
-
double elapsed_ms() const
Compute the elapsed time in milliconds since start/unpause plus accumulated time from previoud active periods.
-
PerfTimer(bool start_now = true)
-
class PerfStats
- #include <PerfStats.hpp>
A class that maintains performance statistics of various aspects of the AD module It’s constituent functions only do anything if _PERF_METRIC flag enabled.
Public Functions
-
PerfStats()
Construct with empty path and filename (no output will be written unless these are set)
-
void setWriteLocation(const std::string &output_path, const std::string &filename)
Set the output path and file name.
-
void write() const
Write the running statistics to the file. Only writes out if a path and filename have been provided.
-
inline void clear()
Clear the metrics state.
-
PerfStats()
-
class PerfPeriodic
- #include <PerfStats.hpp>
A class for storing and writing periodic data, eg memory usage, outstanding provDB requests. It stores and writes only if _PERF_METRIC is active, otherwise it does nothing.
Public Functions
-
PerfPeriodic()
Construct with empty path and filename (no output will be written unless these are set)
-
void setWriteLocation(const std::string &output_path, const std::string &filename)
Set the output path and file name.
-
void write()
Write the running statistics to the file. Only writes out if a path and filename have been provided. After writing, stored values are purged.
-
PerfPeriodic()
-
class PerfTimer
RunMetric
-
namespace chimbuko
-
class RunMetric
- #include <RunMetric.hpp>
A class containing a map of a string to its aggregated statistics, used for performance logging.
Public Functions
-
inline RunMetric()
-
inline ~RunMetric()
-
inline void add(std::string name, double val)
Add a value to the statistics tagged by the provided name.
A new entry in the map is created if the name has not been provided previously
-
inline void dump(std::string path, std::string filename = "metric.json") const
Write the data to disk in JSON form.
-
inline RunMetric &operator+=(const RunMetric &r)
Combine this instance with another Does not perform any action on the last recorded value.
-
inline RunStats const *getMetric(const std::string &tag) const
Get the statistics for a particular metric. Returns nullptr if the tag does not exist.
-
inline std::pair<bool, double> getLastRecorded(const std::string &tag) const
Get the last recorded value for a particular metric.
- Returns:
A bool,double pair where the first entry indicates whether the tag exists, and the second its value
-
inline void clear()
Clear the state.
-
inline RunMetric()
-
class RunMetric
RunStats
-
namespace chimbuko
Functions
-
class RunStats
- #include <RunStats.hpp>
Compute statistics in a single pass.
Computes the minimum, maximum, mean, variance, standard deviation, skewness, and kurtosis. Optionally, also computes accumulated values.
RunStats objects may also be added together and copied.
Based entirely on the C++ code by John D Cook at http://www.johndcook.com/skewness_kurtosis.html
Public Functions
-
RunStats(bool do_accumulate = false)
Constructor.
- Parameters:
do_accumulate – If true the sum of the provided values will also be collected
-
~RunStats()
-
void clear()
Reset the statistics.
-
template<class Archive>
inline void serialize(Archive &archive) Serialize using cereal, for example as part of a compound object.
-
void deserialize_cerealpb(const std::string &strstate)
Serialize from Cereal portable binary format
-
void net_deserialize(const std::string &s)
Unserialize this class after communication over the network.
-
void push(double x)
Add a new value to be included in internal statistics.
-
double count() const
Get the number of values added to the statistics.
-
double minimum() const
-
double maximum() const
-
double accumulate() const
If m_do_accumulate, the accumulated sum of all values added, otherwise 0.
-
double mean() const
-
double variance(double ddof = 1.0) const
Return the variance of the data.
If ddof=1 (default) the variance will include Bessel’s correction, and represents an estimate of the population variance. If ddof=0 the variance will be the variance of the sample
-
double stddev(double ddof = 1.0) const
-
double skewness() const
-
double kurtosis() const
-
inline void set_do_accumulate(bool do_accumulate)
Set whether the sum of all values is to be maintained.
-
inline bool get_do_accumulate() const
Determine whether the sum of all values is to be maintained.
-
nlohmann::json get_json() const
Get the current statistics as a JSON object.
-
RunStatsValues get_stat_values() const
Get the current statistics as a RunStatsValues object.
-
RunStats &operator+=(const RunStats &rs)
Combine two RunStats instances such that the resulting statistics are the union of the two.
-
bool equiv(const RunStats &b) const
Test for equivalence up to a fixed tolerance allowing for finite-precision errors.
-
inline void set_eta(double to)
Set the eta (mean) parameter.
-
inline void set_rho(double to)
Set the rho parameter (variance * [count-1])
-
inline void set_count(double to)
Set the count parameter.
Protected Attributes
-
double m_count
count of instances
-
double m_eta
mean
-
double m_rho
= M2 = \sum_i (x_i - \bar x)^2
-
double m_tau
= M3 = \sum_i (x_i - \bar x)^3
-
double m_phi
= M4 = \sum_i (x_i - \bar x)^4
-
double m_min
minimum
-
double m_max
maximum
-
double m_acc
sum
-
bool m_do_accumulate
True if the sum of the input values are maintained
Friends
-
struct RunStatsValues
- #include <RunStats.hpp>
A serializable object containing the stats values.
Public Functions
-
inline bool operator==(const RunStatsValues &r) const
Comparison operator.
-
inline bool operator==(const RunStatsValues &r) const
-
RunStats(bool do_accumulate = false)
-
class RunStats
string
-
namespace chimbuko
threadPool
-
class threadPool
- #include <threadPool.hpp>
A class maintaining a queue of tasks that are performed by a pool of threads.
Public Functions
-
inline threadPool()
-
inline explicit threadPool(const std::uint32_t nt)
Instantiate a pool of nt threads.
- Parameters:
nt – The number of threads to instantiate
-
threadPool(const threadPool &rhs) = delete
The class is not copyable but can be moved.
-
threadPool &operator=(const threadPool &rhs) = delete
-
inline ~threadPool()
-
template<typename Func, typename ...Args>
inline auto sumit(Func &&func, Args&&... args) Submit a function object and its arguments to the queue.
-
inline size_t pool_size() const
Return the number of threads in the pool.
-
inline size_t queue_size() const
Return the number of tasks in the queue.
Private Members
-
mtQueue<std::unique_ptr<IThreadTask>> m_workQueue
-
class IThreadTask
Base class of thread tasks.
Public Functions
-
IThreadTask() = default
-
virtual ~IThreadTask() = default
-
IThreadTask(const IThreadTask &rhs) = delete
-
IThreadTask &operator=(const IThreadTask &rhs) = delete
-
IThreadTask(IThreadTask &&other) = default
-
IThreadTask &operator=(IThreadTask &&other) = default
-
virtual void execute() = 0
Perform the task (executed by thread)
-
IThreadTask() = default
-
template<typename T>
class TaskFuture - #include <threadPool.hpp>
A wrapper class for an std::future instance representing the result of an asynchronous operation.
Public Functions
-
inline ~TaskFuture()
The destructor waits for the asynchronous operation to complete before exiting.
-
TaskFuture(const TaskFuture &rhs) = delete
-
TaskFuture &operator=(const TaskFuture &rhs) = delete
-
TaskFuture(TaskFuture &&other) = default
-
TaskFuture &operator=(TaskFuture &&other) = default
-
inline auto get()
Wait until the asynchronous operation has completed and return the value.
-
inline ~TaskFuture()
-
template<typename Func>
class ThreadTask : public threadPool::IThreadTask A thread task executing a functional object.
Public Functions
-
~ThreadTask() override = default
-
ThreadTask(const ThreadTask &rhs) = delete
-
ThreadTask &operator=(const ThreadTask &rhs) = delete
-
ThreadTask(ThreadTask &&other) = default
-
ThreadTask &operator=(ThreadTask &&other) = default
-
inline void execute() override
-
~ThreadTask() override = default
-
inline threadPool()
time
-
namespace chimbuko
Functions
-
class Timer
- #include <time.hpp>
A timer / stopwatch class.
Public Functions
-
Timer(bool start_now = true)
-
void start()
(Re)start the timer
-
void pause()
Pause the timer.
-
void unpause()
Unpause the timer.
This is the same as start but it does not zero the accumulated time from previous active periods
-
double elapsed_us() const
Compute the elapsed time in microseconds since start/unpause plus accumulated time from previoud active periods.
-
double elapsed_ms() const
Compute the elapsed time in milliconds since start/unpause plus accumulated time from previoud active periods.
Private Types
-
Timer(bool start_now = true)
-
class Timer
verbose
Defines
-
verboseStream
Macro for log output that appears when verbose logging is enabled.
Example usage: verboseStream << “Hello world!” << std::endl;
-
verboseStreamAdd
Macro for log output that appears when verbose logging is enabled. This version does not include the prefix and so can be used as a line continuance.
Example usage: verboseStreamAdd << “Hello world!” << std::endl;
-
progressStream
Macro for log output that includes the date and time, intended for reporting progress on service components for which there is only one rank.
Example usage: progressStream << “Hello world!” << std::endl;
-
headProgressStream(rank)
Macro for log output that appears when either the rank is equal to the head rank or verbose logging is enabled.
- Parameters:
rank – The rank of the current process Example usage: progressStream(rank) << “Hello world!” << std::endl;
-
namespace chimbuko