
Implementation of the data cleansing process.

class ecgan.preprocessing.cleansing.DataCleanser(lower_fault_threshold=None, upper_fault_threshold=None, nan_threshold=None, target_shape=None)[source]

Check for dead or faulty sensors, NaNs and correct shape.

A DataCleanser object can be used for some or all of the above tasks. Most often the ecgan.preprocessing.cleansing.DataCleanser.should_cleanse() method is called which checks if the series fulfills all of the checks. Each check can also be called individually. The input series is generally expected to be a single 2D series of shape (seq_len, features) with features being the different data channels. By default, all values are accepted if no threshold/condition is set.

  • lower_fault_threshold (Optional[int]) -- Lowest value accepted without removing the series from dataset.

  • upper_fault_threshold (Optional[int]) -- Highest value accepted without removing the series from dataset.

  • nan_threshold (Optional[float]) -- Upper limit of allowed percentage of NaNs. Remove series if more than \((self.nan\_threshold \cdot 100)\%\) of all values are NaN.

  • target_shape (Optional[Tuple[int, int]]) -- Accepted shape of series.


Conduct checks for a given 2D time series to determine if it should be cleansed.

Remove sample from dataset if any check fails.

Performed checks:


series (ndarray) -- 2D series of shape seq_len, features.

Return type



Flag indicating whether the sample should be removed from the final dataset.


Check if the sample should be removed because its shape.

If no target_shape is specified in the instance creation, the shape is assumed to be a simple 2D (seq_len, features).


series (ndarray) -- 2D series of shape seq_len, features.

Return type



Flag indicating whether the sample should be removed from the final dataset.


Check for NaN values in the data.

Data is marked for cleansing when at least \((self.nan\_threshold \cdot 100)\%\) of values of one feature are NaN. The data is expected to be a single time series sample of shape (seq_len, features), i.e. a 2D array. Series with more than 0 but less NaNs than allowed can impute the remaining NaNs using the ecgan.preprocessing.preprocessor.BasePreprocessor.


series (ndarray) -- 2D series of shape seq_len, features.

Return type



Flag indicating whether the sample should be removed from the final dataset.

static check_for_dead_sensor(series)[source]

Check for dead sensors in the data.

Data is marked as dead and subsequently as 'to be cleansed' if the variance (and thus the standard deviation) of a sensor is close to zero.


series (ndarray) -- 2D series of shape seq_len, features.

Return type



Flag indicating whether the sample should be removed from the final dataset.


Check for faulty sensors in the data.

Data is marked for cleansing if certain values in the data exceed a threshold or if all values are NaN.


series (ndarray) -- 2D series of shape seq_len, features.

Return type



Flag indicating whether the sample should be removed from the final dataset.