Management of Dataset States¶
A dataset is describes a set of data with consistent metadata. It is uniquely identified by its ID and is defined by the ID of a base dataset and an ID of the datasetState that describes the differences between it and its base dataset. A datasetState is describing the metadata of a dataset and is uniquely identified by its ID. Synchronizing both datasetStates and the topologies of datasets is done by the datasetManager. The latter exists as a singleton in every kotekan instance that uses it in its configuration. If configured, different kotekan instances can synchronize their datasets and datasetStates via the zentralized dataset_broker. The communication protocol is documented in Dataset Broker.
-
class datasetManager¶
Manages sets of state changes applied to datasets.
This is a singleton class. Use
datasetManager::instance()to get a reference to it.The datasetManager is used to manage the states of datasets that get passed through kotekan stages. A stage in the kotekan pipeline may use the dataset ID found in an incoming frame to get a set of states from the datasetManager.
To receive information about the inputs the datsets in the frames contain, it could do thew following:
auto input_state = dm.dataset_state<inputState>(ds_id_from_frame); const std::vector<input_ctype>& inputs = input_state->get_inputs();
A stage that changes the state of the dataset in the frames it processes should inform the datasetManager by adding a new state and dataset. If multiple states are being applied at the same time a vector of states can be passed to
add_dataset. This causes datasets linking them to be generated, but only final one is returned.The dataset broker is a centralized part of the dataset management system. Using it allows the synchronization of datasets and states between multiple kotekan instances.
- metrics
kotekan_datasetbroker_error_countNumber of errors encountered in communication with the broker.
- REST Endpoints
/force-updateGETForces the datasetManager to register all datasets and states with the dataset_broker.
- Author
Richard Shaw, Rick Nitsche
- Param use_dataset_broker:
Bool. If true, states and datasets will be registered with the dataset broker. If an ancestor can not be found locally,
dataset_statewill ask the broker.- Param ds_broker_port:
Int. The port of the dataset broker (if
use_dataset_brokerisTrue). Default 12050.- Param ds_broker_host:
String. Address to the dataset broker (if ‘use_ds_broke
isTrue`. Prefer numerical address, because the DNS lookup is blocking). Default “127.0.0.1”.- Param retry_wait_time_ms:
Int. Time to wait after failed request to broker before retrying in ms. Default 1000.
- Param retries_rest_client:
Int. Retry value passed to libevent. Caution: Infinite retries are performed by the datasetManager. Default 0.
- Param timeout_rest_client_s:
Int. Timeout value passed to libevent. -1 will use libevent default value (50s). Default 100.
Public Functions
-
datasetManager(const datasetManager&) = delete¶
-
void operator=(const datasetManager&) = delete¶
-
void stop()¶
Signal to stop request threads.
-
dset_id_t add_dataset(state_id_t state, dset_id_t base_dset = dset_id_t::null)¶
Register a new dataset. Omitting base_dset adds a root dataset.
If
use_dataset_brokeris set, this function will ask the dataset broker to assign an ID to the new dataset.- Parameters:
state – The ID of the dataset state that describes the difference to the base dataset.
base_dset – The ID of the dataset this dataset is based on. Omit to create a root dataset.
- Returns:
The ID assigned to the new dataset.
-
dset_id_t add_dataset(const std::vector<state_id_t> &states, dset_id_t base_dset = dset_id_t::null)¶
Register a new dataset with multiple states.
If
use_dataset_brokeris set, this function will ask the dataset broker to assign an ID to the new dataset.- Parameters:
states – The IDs of the dataset states that describes the difference to the base dataset.
base_dset – The ID of the dataset this dataset is based on. Omit to create a root dataset.
- Returns:
The ID assigned to the new dataset.
-
template<typename T, typename ...Args>
inline std::pair<state_id_t, const T*> create_state(Args&&... args)¶ Create and register a state with the manager.
This is the recommended way to create a datasetState as it will directly create the datasetState instance under the ownership of the datasetManager. The calling function is returned the ID and a const pointer to the created state.
If
use_dataset_brokeris set, this function will also register the new state with the broker.- Parameters:
args – Arguments forwarded through to the constructor of the sub-type.
- Returns:
The id assigned to the state and a read-only pointer to the state. The target of this pointer is not free’d during the lifetime of the datasetManager.
-
template<typename T>
inline std::pair<state_id_t, const T*> add_state(std::unique_ptr<T> &&state, typename std::enable_if<std::is_base_of<datasetState, T>::value>::type* = nullptr)¶ Register a state with the manager.
If
use_dataset_brokeris set, this function will also register the new state with the broker.The second argument of this function is to prevent compilation of this function with
Tnot having the base classdatasetState.- Parameters:
state – The state to be added.
- Returns:
The id assigned to the state and a read-only pointer to the state. The target of this pointer is not free’d during the lifetime of the datasetManager.
-
std::string summary()¶
Return the state table.
- Returns:
A string summarising the state table.
-
const std::map<state_id_t, const datasetState*> states()¶
Get a read-only vector of the states.
- Returns:
The set of states. The targets of these state pointers are not free’d during the lifetime of the datasetManager.
-
const std::map<dset_id_t, dataset> datasets()¶
Get a read-only vector of the datasets.
- Returns:
The set of datasets.
-
std::optional<std::pair<dset_id_t, dataset>> closest_dataset_of_type(dset_id_t dset, const std::string &type)¶
Find the closest ancestor of a given type.
If
use_dataset_brokeris set and no ancestor of the given type is found, this will ask the broker for a complete list of ancestors for the given dataset. In that case, this function is blocking, until the broker answers. If you want to do something else, while waiting for the return value of this function, use std::future.- Parameters:
dset – The ID of the dataset to start from.
type – The type name of the state change we are searching for.
- Returns:
The dataset entry matching the type. Unset if no state of given type exists.
-
template<typename T>
inline const T *dataset_state(dset_id_t dset)¶ Find the closest ancestor of a given type.
If
use_dataset_brokeris set and no ancestor of the given type is found, this will ask the broker for a complete list of ancestors for the given dataset. In that case, this function is blocking, until the broker answers. If you want to do something else, while waiting for the return value of this function, use std::future.- Parameters:
dset – The ID of the dataset to start from.
- Returns:
A read-only pointer to the ancestor state. Returns a
nullptrif not found in ancestors or in a failure case. The target of this pointer is not free’d during the lifetime of the datasetManager.
-
fingerprint_t fingerprint(dset_id_t ds_id, const std::set<std::string> &state_types)¶
Fingerprint a dataset for specified states.
Generate a summary of the specified states present in the requested dataset. This will be unique for datasets where one or more of the requested states differ. Datasets that share all these states will give the same fingerprint regardless of differences in any other states.
The fingerprint does not depend on the order of state_types. It is also specific to the types, even when states are missing. This means that for a dataset which contains a state of
type_A, but neither oftype_Bortype_C, the fingerprints with respect to{type_A, type_B}and{type_A, type_C}will be different.- Parameters:
ds_id – Dataset ID of the incoming frame.
state_types – Names of the state types to fingerprint.
- Returns:
Finger print of the dataset.
-
void force_update_callback(kotekan::connectionInstance &conn)¶
Callback for endpoint
force-updatecalled by the restServer.- Parameters:
conn – The HTTP connection object.
Public Static Functions
-
static datasetManager &instance()¶
Get the global datasetManager.
- Returns:
A reference to the global datasetManager instance.
-
static datasetManager &instance(kotekan::Config &config)¶
Set and apply the static config to datasetManager.
- Parameters:
config – The config.
- Returns:
A reference to the global datasetManager instance.
-
class dataset¶
The description of a dataset consisting of a dataset state and a base dataset.
A dataset is described by a dataset state applied to a base dataset. If the flag for this dataset being a root dataset (a dataset that has no base dataset), the base dataset ID value is not defined.
Public Functions
-
inline dataset(state_id_t state, std::string type, dset_id_t base_dset = dset_id_t::null)¶
Dataset constructor. Omitting the base_dset will create a root dataset.
- Parameters:
state – The state of this dataset.
type – The name of the dataset state type.
base_dset – The ID of the base datset. Omit to create a root dataset.
-
dataset(nlohmann::json &js)¶
Dataset constructor from json object. The json object must have the following fields: is_root: boolean state: integer base_dset integer types list of strings.
- Parameters:
js – Json object describing a dataset.
-
bool is_root() const¶
Access to the root dataset flag.
- Returns:
True if this is a root dataset (has no base dataset), otherwise False.
-
state_id_t state() const¶
Access to the dataset state ID of this dataset.
- Returns:
The dataset state ID.
-
dset_id_t base_dset() const¶
Access to the ID of the base dataset.
- Returns:
The base dataset ID. Undefined if this is a root dataset.
-
const std::string &type() const¶
Read only access to the set of states.
- Returns:
The set of states that are different from the base dataset.
-
nlohmann::json to_json() const¶
Generates a json serialization of this dataset.
- Returns:
A json serialization.
-
inline dataset(state_id_t state, std::string type, dset_id_t base_dset = dset_id_t::null)¶
-
class datasetState¶
A base class for representing state changes done to datasets.
This is meant to be subclassed. All subclasses must implement a constructor that can build the type from a
jsonargument, and adata_to_jsonmethod that can serialise the type into ajsonobject.- Author
Richard Shaw, Rick Nitsche
Subclassed by RFIFrameDropState, beamState, eigenvalueState, flagState, freqState, gainState, gatingState, inputState, metadataState, prodState, stackState, subfreqState, timeState
Public Functions
-
inline virtual ~datasetState()¶
-
nlohmann::json to_json() const¶
Full serialisation of state into JSON.
- Returns:
JSON serialisation of state.
-
virtual nlohmann::json data_to_json() const = 0¶
Save the internal data of this instance into JSON.
This must be implement by any derived classes and should save the information needed to reconstruct any subclass specific internals.
- Returns:
JSON representing the internal state.
-
bool equals(datasetState &s) const¶
Compare to another dataset state.
- Parameters:
s – State to compare with.
- Returns:
True if states identical, False otherwise.
-
std::string type() const¶
Get the name of this state.
- Returns:
The state name.
Public Static Functions
-
static state_uptr from_json(const nlohmann::json &j)¶
Create a dataset state from a full json serialisation.
This will correctly instantiate the correct type from the json.
- Parameters:
j – Full JSON serialisation.
- Returns:
The created datasetState or a nullptr in a failure case.
-
class freqState : public datasetState¶
A dataset state that describes the frequencies in a datatset.
- Author
Richard Shaw, Rick Nitsche
Public Functions
-
inline freqState(const nlohmann::json &data)¶
Constructor.
- Parameters:
data – The frequency information as serialized by freqState::to_json().
-
inline freqState(std::vector<std::pair<uint32_t, freq_ctype>> freqs)¶
Constructor.
- Parameters:
freqs – The frequency information as a vector of {frequency ID, frequency index map}.
-
inline const std::vector<std::pair<uint32_t, freq_ctype>> &get_freqs() const¶
Get frequency information (read only).
- Returns:
The frequency information as a vector of {frequency ID, frequency index map}
-
class inputState : public datasetState¶
A dataset state that describes the inputs in a datatset.
- Author
Richard Shaw, Rick Nitsche
Public Functions
-
inline inputState(const nlohmann::json &data)¶
Constructor.
- Parameters:
data – The input information as serialized by inputState::to_json().
-
inline inputState(std::vector<input_ctype> inputs)¶
Constructor.
- Parameters:
inputs – The input information as a vector of input index maps.
-
inline const std::vector<input_ctype> &get_inputs() const¶
Get input information (read only).
- Returns:
The input information as a vector of input index maps.
-
class prodState : public datasetState¶
A dataset state that describes the products in a datatset.
- Author
Richard Shaw, Rick Nitsche
Public Functions
-
inline prodState(const nlohmann::json &data)¶
Constructor.
- Parameters:
data – The product information as serialized by prodState::to_json().
-
inline prodState(std::vector<prod_ctype> prods)¶
Constructor.
- Parameters:
prods – The product information as a vector of product index maps.
-
inline const std::vector<prod_ctype> &get_prods() const¶
Get product information (read only).
- Returns:
The prod information as a vector of product index maps.
-
class stackState : public datasetState¶
A dataset state that describes a redundant baseline stacking.
- Author
Richard Shaw
Public Functions
-
inline stackState(const nlohmann::json &data)¶
Constructor.
- Parameters:
data – The stack information as serialized by stackState::to_json().
-
inline stackState(uint32_t num_stack, std::vector<rstack_ctype> &&rstack_map)¶
Constructor.
- Parameters:
rstack_map – Definition of how the products were stacked.
num_stack – Number of stacked visibilities.
-
inline const std::vector<rstack_ctype> &get_rstack_map() const¶
Get stack map information (read only).
For every product this says which stack to add the product into and whether it needs conjugating before doing so.
- Returns:
The stack map.
-
inline uint32_t get_num_stack() const¶
Get the number of stacks.
- Returns:
The number of stacks.
-
inline std::vector<stack_ctype> get_stack_map() const¶
Calculate and return the stack->prod mapping.
This is calculated on demand and so a full fledged vector is returned.
- Returns:
The stack map.
-
inline virtual nlohmann::json data_to_json() const override¶
Serialize the data of this state in a json object.
-
class timeState : public datasetState¶
A dataset state that keeps the time information of a datatset.
- Author
Rick Nitsche
Public Functions
-
inline timeState(const nlohmann::json &data)¶
Constructor.
- Parameters:
data – The time information as serialized by timeState::to_json().
-
inline timeState(std::vector<time_ctype> times)¶
Constructor.
- Parameters:
times – The time information as a vector of time index maps.
-
inline const std::vector<time_ctype> &get_times() const¶
Get time information (read only).
- Returns:
The time information as a vector of time index maps.
-
class metadataState : public datasetState¶
A dataset state that describes all the metadata that is written to file as “attributes”, but not defined by other states yet.
- Author
Rick Nitsche
Public Functions
-
inline metadataState(const nlohmann::json &data)¶
Constructor.
- Parameters:
data – The metadata as serialized by metadataState::to_json(): weight_type: string instrument_name: string git_version_number: string
-
inline metadataState(std::string weight_type, std::string instrument_name, std::string git_version_tag)¶
Constructor.
- Parameters:
weight_type – The weight type attribute.
instrument_name – The instrument name attribute.
git_version_tag – The git version tag attribute.
-
inline const std::string &get_weight_type() const¶
Get the weight type (read only).
- Returns:
The weigh type.
-
inline const std::string &get_instrument_name() const¶
Get the instrument name (read only).
- Returns:
The instrument name.
-
inline const std::string &get_git_version_tag() const¶
Get the git version tag (read only).
- Returns:
The git version tag.
-
class eigenvalueState : public datasetState¶
A dataset state that keeps the eigenvalues of a datatset.
- Author
Rick Nitsche
Public Functions
-
inline eigenvalueState(const nlohmann::json &data)¶
Constructor.
- Parameters:
data – The eigenvalues as serialized by eigenvalueState::to_json().
-
inline eigenvalueState(std::vector<uint32_t> ev)¶
Constructor.
- Parameters:
ev – The eigenvalues.
-
inline eigenvalueState(size_t num_ev)¶
Constructor.
- Parameters:
num_ev – The number of eigenvalues. The indices will end up running from 0 to num_ev - 1
-
inline const std::vector<uint32_t> &get_ev() const¶
Get eigenvalues (read only).
- Returns:
The eigenvalues.
-
inline size_t get_num_ev() const¶
Get the number of eigenvalues.
- Returns:
The number of eigenvalues.