Cleansing
Implementation of the data cleansing process.
- class ecgan.preprocessing.cleansing.DataCleanser(lower_fault_threshold=None, upper_fault_threshold=None, nan_threshold=None, target_shape=None)[source]
Check for dead or faulty sensors, NaNs and correct shape.
A
DataCleanser
object can be used for some or all of the above tasks. Most often theecgan.preprocessing.cleansing.DataCleanser.should_cleanse()
method is called which checks if the series fulfills all of the checks. Each check can also be called individually. The input series is generally expected to be a single 2D series of shape (seq_len, features) with features being the different data channels. By default, all values are accepted if no threshold/condition is set.- Parameters
lower_fault_threshold (
Optional
[int
]) -- Lowest value accepted without removing the series from dataset.upper_fault_threshold (
Optional
[int
]) -- Highest value accepted without removing the series from dataset.nan_threshold (
Optional
[float
]) -- Upper limit of allowed percentage of NaNs. Remove series if more than \((self.nan\_threshold \cdot 100)\%\) of all values are NaN.target_shape (
Optional
[Tuple
[int
,int
]]) -- Accepted shape of series.
- should_cleanse(series)[source]
Conduct checks for a given 2D time series to determine if it should be cleansed.
Remove sample from dataset if any check fails.
Performed checks:
ecgan.preprocessing.cleansing.DataCleanser.check_for_dead_sensor()
ecgan.preprocessing.cleansing.DataCleanser.check_for_faulty_sensor()
- Parameters
series (
ndarray
) -- 2D series of shape seq_len, features.- Return type
bool
- Returns
Flag indicating whether the sample should be removed from the final dataset.
- check_shape(series)[source]
Check if the sample should be removed because its shape.
If no target_shape is specified in the instance creation, the shape is assumed to be a simple 2D (seq_len, features).
- Parameters
series (
ndarray
) -- 2D series of shape seq_len, features.- Return type
bool
- Returns
Flag indicating whether the sample should be removed from the final dataset.
- check_for_nan(series)[source]
Check for NaN values in the data.
Data is marked for cleansing when at least \((self.nan\_threshold \cdot 100)\%\) of values of one feature are NaN. The data is expected to be a single time series sample of shape (seq_len, features), i.e. a 2D array. Series with more than 0 but less NaNs than allowed can impute the remaining NaNs using the
ecgan.preprocessing.preprocessor.BasePreprocessor
.- Parameters
series (
ndarray
) -- 2D series of shape seq_len, features.- Return type
bool
- Returns
Flag indicating whether the sample should be removed from the final dataset.
- static check_for_dead_sensor(series)[source]
Check for dead sensors in the data.
Data is marked as dead and subsequently as 'to be cleansed' if the variance (and thus the standard deviation) of a sensor is close to zero.
- Parameters
series (
ndarray
) -- 2D series of shape seq_len, features.- Return type
bool
- Returns
Flag indicating whether the sample should be removed from the final dataset.
- check_for_faulty_sensor(series)[source]
Check for faulty sensors in the data.
Data is marked for cleansing if certain values in the data exceed a threshold or if all values are NaN.
- Parameters
series (
ndarray
) -- 2D series of shape seq_len, features.- Return type
bool
- Returns
Flag indicating whether the sample should be removed from the final dataset.