Datasets

As ECGAN aims to support a large variety of ECG datasets and tries to be compatible with other tasks related to time series, creating and using new datasets is a key feature.

Role in the ECGAN Pipeline

Datasets are the input to any model used in this framework and we expect them to be implemented in a consistent way. The dataset is given as an argument during initialization of the config and used during all steps - preprocessing, training and evaluation. The default values can impact the suggested parameter choices and are preconfigured depending on the dataset. This means that generating a new config file for a new dataset is usually better than just manually changing the name of the dataset in the config.

Adding new Datasets

To add a new datasets please follow these steps:

  1. Think of an identifier which describes your dataset (e.g.:code:my_ecg).

  2. Add a descriptive class for your dataset which inherits from ecgan.utils.datasets.Dataset (e.g. MyEcgDataset) and add it to the ecgan.utils.datasets.DatasetFactory.

  3. Add retrieval class to download the data and store it in a data directory with the prefix DATASET_NAME/raw (e.g. data/my_ecg/raw. The stored information can be split into an arbitrary amount of files at this point. Add the class to the ecgan.preprocessing.data_retrieval.DataRetrieverFactory.

  4. Add preprocessing class for your dataset inheriting from ecgan.preprocessing.preprocessor.Preprocessor. The output is saved into DATASET_NAME/processed (e.g. data/my_ecg/processed). At this point the data has to conform to the dataset format: it has to be saved to the data.pkl file as a 3D Tensor of shape (num_samples, seq_len, num_channels). Add the class to the ecgan.preprocessing.preprocessor.PreprocessorFactory class.

Datasets supported by the framework can be found at Supported Datasets.

Note

While the ecgan.utils.datasets.Dataset class is used to describe arbitrary datasets, ECGAN further implements a ecgan.training.datasets.BaseDataset. This is a class inheriting from the PyTorch dataset class used to iterate through datasets. Keep in mind that these two are different.

Descriptions of supported datasets.

class ecgan.utils.datasets.Dataset(name, num_channels, num_classes, default_seq_len, beat_types, percentage_normal, loading_src)[source]

Bases: object

Base class for static descriptions of ECG datasets.

class ecgan.utils.datasets.ShaoxingDataset(name, num_channels, num_classes, default_seq_len, beat_types, percentage_normal, loading_src)[source]

Bases: ecgan.utils.datasets.Dataset

Static description of the Shaoxing dataset.

class ecgan.utils.datasets.MitbihDataset(name, num_channels, num_classes, default_seq_len, beat_types, percentage_normal, loading_src)[source]

Bases: ecgan.utils.datasets.Dataset

Static description of the Mitbih dataset.

class ecgan.utils.datasets.ElectricDevicesDataset(name, num_channels, num_classes, default_seq_len, beat_types, percentage_normal, loading_src)[source]

Bases: ecgan.utils.datasets.Dataset

Static description of the Mitbih dataset.

class ecgan.utils.datasets.WaferDataset(name, num_channels, num_classes, default_seq_len, beat_types, percentage_normal, loading_src)[source]

Bases: ecgan.utils.datasets.Dataset

Static description of the Wafer dataset.

class ecgan.utils.datasets.MitbihExtractedBeatsDataset(name, num_channels, num_classes, default_seq_len, beat_types, percentage_normal, loading_src)[source]

Bases: ecgan.utils.datasets.Dataset

Static description of the MITBIH dataset with extracted and downsampled single beats.

class ecgan.utils.datasets.MitbihBeatganDataset(name, num_channels, num_classes, default_seq_len, beat_types, percentage_normal, loading_src)[source]

Bases: ecgan.utils.datasets.Dataset

Static description of the MITBIH Beatgan dataset.

class ecgan.utils.datasets.SineDataset(name, num_channels, num_classes, default_seq_len, beat_types, percentage_normal, loading_src)[source]

Bases: ecgan.utils.datasets.Dataset

Static description of the sine dataset.

class ecgan.utils.datasets.PTBExtractedBeatsDataset(name, num_channels, num_classes, default_seq_len, beat_types, percentage_normal, loading_src)[source]

Bases: ecgan.utils.datasets.Dataset

Static description of the PTB dataset.

class ecgan.utils.datasets.CMUMoCapDataset(name, num_channels, num_classes, default_seq_len, beat_types, percentage_normal, loading_src)[source]

Bases: ecgan.utils.datasets.Dataset

Static description of the CMU MoCap subset.

class ecgan.utils.datasets.ExtendedCMUMoCapDataset(name, num_channels, num_classes, default_seq_len, beat_types, percentage_normal, loading_src)[source]

Bases: ecgan.utils.datasets.CMUMoCapDataset

Static description of the extended CMU MoCap subset.

class ecgan.utils.datasets.DatasetFactory[source]

Bases: object

Meta module for creating datasets objects containing static data to describe the datasets.