Datasets
As ECGAN aims to support a large variety of ECG datasets and tries to be compatible with other tasks related to time series, creating and using new datasets is a key feature.
Role in the ECGAN Pipeline
Datasets are the input to any model used in this framework and we expect them to be implemented in a consistent way. The dataset is given as an argument during initialization of the config and used during all steps - preprocessing, training and evaluation. The default values can impact the suggested parameter choices and are preconfigured depending on the dataset. This means that generating a new config file for a new dataset is usually better than just manually changing the name of the dataset in the config.
Adding new Datasets
To add a new datasets please follow these steps:
Think of an identifier which describes your dataset (e.g.:code:my_ecg).
Add a descriptive class for your dataset which inherits from
ecgan.utils.datasets.Dataset
(e.g.MyEcgDataset
) and add it to theecgan.utils.datasets.DatasetFactory
.Add retrieval class to download the data and store it in a data directory with the prefix
DATASET_NAME/raw
(e.g.data/my_ecg/raw
. The stored information can be split into an arbitrary amount of files at this point. Add the class to theecgan.preprocessing.data_retrieval.DataRetrieverFactory
.Add preprocessing class for your dataset inheriting from
ecgan.preprocessing.preprocessor.Preprocessor
. The output is saved intoDATASET_NAME/processed
(e.g.data/my_ecg/processed
). At this point the data has to conform to the dataset format: it has to be saved to thedata.pkl
file as a 3D Tensor of shape(num_samples, seq_len, num_channels)
. Add the class to theecgan.preprocessing.preprocessor.PreprocessorFactory
class.
Datasets supported by the framework can be found at Supported Datasets.
Note
While the ecgan.utils.datasets.Dataset
class is used to describe arbitrary
datasets, ECGAN further implements a ecgan.training.datasets.BaseDataset
.
This is a class inheriting from the PyTorch dataset
class used to iterate
through datasets. Keep in mind that these two are different.
Descriptions of supported datasets.
- class ecgan.utils.datasets.Dataset(name, num_channels, num_classes, default_seq_len, beat_types, percentage_normal, loading_src)[source]
Bases:
object
Base class for static descriptions of ECG datasets.
- class ecgan.utils.datasets.ShaoxingDataset(name, num_channels, num_classes, default_seq_len, beat_types, percentage_normal, loading_src)[source]
Bases:
ecgan.utils.datasets.Dataset
Static description of the Shaoxing dataset.
- class ecgan.utils.datasets.MitbihDataset(name, num_channels, num_classes, default_seq_len, beat_types, percentage_normal, loading_src)[source]
Bases:
ecgan.utils.datasets.Dataset
Static description of the Mitbih dataset.
- class ecgan.utils.datasets.ElectricDevicesDataset(name, num_channels, num_classes, default_seq_len, beat_types, percentage_normal, loading_src)[source]
Bases:
ecgan.utils.datasets.Dataset
Static description of the Mitbih dataset.
- class ecgan.utils.datasets.WaferDataset(name, num_channels, num_classes, default_seq_len, beat_types, percentage_normal, loading_src)[source]
Bases:
ecgan.utils.datasets.Dataset
Static description of the Wafer dataset.
- class ecgan.utils.datasets.MitbihExtractedBeatsDataset(name, num_channels, num_classes, default_seq_len, beat_types, percentage_normal, loading_src)[source]
Bases:
ecgan.utils.datasets.Dataset
Static description of the MITBIH dataset with extracted and downsampled single beats.
- class ecgan.utils.datasets.MitbihBeatganDataset(name, num_channels, num_classes, default_seq_len, beat_types, percentage_normal, loading_src)[source]
Bases:
ecgan.utils.datasets.Dataset
Static description of the MITBIH Beatgan dataset.
- class ecgan.utils.datasets.SineDataset(name, num_channels, num_classes, default_seq_len, beat_types, percentage_normal, loading_src)[source]
Bases:
ecgan.utils.datasets.Dataset
Static description of the sine dataset.
- class ecgan.utils.datasets.PTBExtractedBeatsDataset(name, num_channels, num_classes, default_seq_len, beat_types, percentage_normal, loading_src)[source]
Bases:
ecgan.utils.datasets.Dataset
Static description of the PTB dataset.
- class ecgan.utils.datasets.CMUMoCapDataset(name, num_channels, num_classes, default_seq_len, beat_types, percentage_normal, loading_src)[source]
Bases:
ecgan.utils.datasets.Dataset
Static description of the CMU MoCap subset.
- class ecgan.utils.datasets.ExtendedCMUMoCapDataset(name, num_channels, num_classes, default_seq_len, beat_types, percentage_normal, loading_src)[source]
Bases:
ecgan.utils.datasets.CMUMoCapDataset
Static description of the extended CMU MoCap subset.