Data Retrieval

Classes specifying the retrieval/creation of datasets.

class ecgan.preprocessing.data_retrieval.DataRetriever(dataset, cfg)[source]

Bases: ecgan.utils.configurable.Configurable

A DataRetriever base class for retrieval of datasets.

Objects of this class are used to download a given dataset and additional information on the dataset from a given source. More information on implemented datasets and how to add new datasets can be found in Datasets.

Parameters

dataset (str) -- Name of the dataset which has to be supported by ecgan.utils.datasets.DatasetFactory.

static configure()[source]

Return the default preprocessing configuration for a data retriever object.

Return type

Dict

abstract load()[source]

Download the dataset to disk.

Return type

None

class ecgan.preprocessing.data_retrieval.KaggleDataRetriever(dataset, cfg)[source]

Bases: ecgan.preprocessing.data_retrieval.DataRetriever

A base class for downloading datasets from Kaggle.

Since there is no rigid format for the datasets on Kaggle, the raw dataset from disk needs to be implemented and preprocessed by a custom Preprocessor.

Warning

Install the pip kaggle module if you want to download the data. It is included in the requirements.txt or can be installed via pip install kaggle. Create a file with your authentication information at ~/.kaggle/kaggle.json. or export the tokens using your command line (see Kaggle on Github for more information). If you cannot or do not want to use the kaggle API, download the data from the individual kaggle repositories and unzip them to <data_location>/<dataset_name>/raw.

load()[source]

Load a dataset from Kaggle.

The source url has to be given in the config as cfg.LOADING_SRC. The target directory has to be given as cfg.LOADING_DIR.

Return type

None

class ecgan.preprocessing.data_retrieval.MitbihDataRetriever(dataset, cfg)[source]

Bases: ecgan.preprocessing.data_retrieval.KaggleDataRetriever

The MITBIH dataset is downloaded via the regular ecgan.preprocessing.data_retrieval.KaggleDataLoader.

This class exists to configure the KaggleDataLoader correctly and supply relevant parameters required for further preprocessing. The given configuration is used only during initialization and can be changed if desired.

The dataset is the raw original dataset and cannot be used for classification by default, requiring manual preprocessing steps. To use the MITBIH data you can either preprocess the downloaded data arbitrarily by yourself or use the supported preprocessed datasets mitbih_beats or mitbih_beatgan during initialization.

Paper:
Information on source:
Original data can be found at PhysioNet. This framework does not use the original data source but an unofficial kaggle mirror. The data remains unchanged but is saved as csv for easier preprocessing.
static configure()[source]

Return the default configuration for the default MITBIH dataset.

Return type

Dict

class ecgan.preprocessing.data_retrieval.MitbihExtractedBeatsDataRetriever(dataset, cfg)[source]

Bases: ecgan.preprocessing.data_retrieval.KaggleDataRetriever

Download the (beat-wise) segmented MITBIH dataset.

The segmented MITBIH dataset is downloaded via the regular KaggleDataLoader.

Paper:
Information on source:
Data is downloaded from the authors official kaggle repository.
static configure()[source]

Return the default configuration for the MITBIH dataset with extracted beats.

Return type

Dict

class ecgan.preprocessing.data_retrieval.PtbExtractedBeatsDataRetriever(dataset, cfg)[source]

Bases: ecgan.preprocessing.data_retrieval.KaggleDataRetriever

Download the (beat-wise) segmented PTB dataset.

The segmented PTB dataset is downloaded via the regular KaggleDataRetriever.

Information on source: Data is downloaded from the authors official kaggle repository.
static configure()[source]

Return the default configuration for the MITBIH dataset with extracted beats.

Return type

Dict

class ecgan.preprocessing.data_retrieval.CMUMoCapDataRetriever(dataset, cfg)[source]

Bases: ecgan.preprocessing.data_retrieval.KaggleDataRetriever

Download the subset of the CMU MoCap dataset used in BeatGAN.

The dataset is downloaded via the regular KaggleDataRetriever.

Information on source: Data is downloaded from a kaggle upload unofficial kaggle repository.
static configure()[source]

Return the default configuration for the MITBIH dataset with extracted beats.

Return type

Dict

class ecgan.preprocessing.data_retrieval.ExtendedCMUMoCapDataRetriever(dataset, cfg)[source]

Bases: ecgan.preprocessing.data_retrieval.KaggleDataRetriever

Download a extended version of the subset of the CMU MoCap dataset used in BeatGAN.

The dataset is downloaded via the regular KaggleDataRetriever.

Information on source: Data is downloaded from a kaggle upload unofficial kaggle repository.
static configure()[source]

Return the default configuration for extended CMU MoCap Dataset.

Return type

Dict

class ecgan.preprocessing.data_retrieval.SineDataRetriever(name, cfg)[source]

Bases: ecgan.preprocessing.data_retrieval.DataRetriever

Class to generate a synthetic dataset containing sine waves.

load()[source]

Generate a synthetic dataset with sine waves and save it.

Configuration is currently limited to the amount of samples you want to create and the target sequence length.

By default, the domain of sines will be between 0 and 25 which can lead to imperfect generated sine waves. This is intended behavior to have more variety in the FFT of generated sine waves and can be changed manually. The amplitude, frequency, phase and vertical translation will be chosen randomly. Furthermore, the dataset will be imbalanced: only 20% of the data will be anomalous. Half of the anomalous data consists of noisy sine waves (added gaussian noise) and the other half consists of superimposed sine waves. The resulting dataset can be used to asses the classification or generative capabilities of a given model.

Since the resulting dataset will already in the target shape, no further preprocessing is currently supported and the data is saved as an already preprocessed dataset.

Return type

None

static configure()[source]

Return the default configuration for an artificial sine dataset.

Return type

Dict

class ecgan.preprocessing.data_retrieval.UrlDataRetriever(dataset, cfg, delete_zip=False)[source]

Bases: ecgan.preprocessing.data_retrieval.DataRetriever

Class to download and extract zipped datasets from URLs.

load()[source]

Load publicly available datasets which are saved as zips and extract them.

The URLDataRetriever does not support additional authentication. If errors occur please check if the dataset is still available at the specified URL in the configuration file and please open an issue if this is not the case.

Subclasses need to implement the abstract methods to define meta data and determine how to unzip the data.

..warning:

The urllib request might require the installation of a Python certificate for Mac.

Return type

None

abstract get_meta()[source]

Get meta information on the downloaded files if required.

Return type

List[Tuple]

abstract extract_data(save_location, unzip_location)[source]

Extract data from zip file.

Parameters
  • save_location (str) -- Reference to local directory where the zip is stored.

  • unzip_location (str) -- Reference to local directory where the data shall be extracted to.

Return type

None

class ecgan.preprocessing.data_retrieval.ShaoxingDataRetriever(dataset, cfg, delete_zip=False)[source]

Bases: ecgan.preprocessing.data_retrieval.UrlDataRetriever

Download and extract the zipped Shaoxing dataset.

Paper:
Information on source:
Data is downloaded from their official figshare mirror.
get_meta()[source]

Get meta information on the downloaded files.

Return type

List[Tuple]

extract_data(save_location, unzip_location)[source]

Extract data from zip file.

Parameters
  • save_location (str) -- Reference to local directory where the zip is stored.

  • unzip_location (str) -- Reference to local directory where the data shall be extracted to.

Return type

None

static configure()[source]

Return the default configuration for the Shaoxing dataset.

The window_length, step size and target sequence length can be configured manually after initialization of the config file.

Return type

Dict

class ecgan.preprocessing.data_retrieval.MitbihBeatganDataRetriever(dataset, cfg, delete_zip=False)[source]

Bases: ecgan.preprocessing.data_retrieval.UrlDataRetriever

Download and extract the zipped MITBIH dataset based on the BeatGAN preprocessing.

Paper:
Information on source:
Data is downloaded from the official Dropbox mirror.
get_meta()[source]

No metadata required.

Return type

List[Tuple]

extract_data(save_location, unzip_location)[source]

Extract data from zip file.

Parameters
  • save_location (str) -- Reference to local directory where the zip is stored.

  • unzip_location (str) -- Reference to local directory where the data shall be extracted to.

Return type

None

static configure()[source]

Return the default configuration for the MITBIH dataset based on the BeatGAN preprocessing.

Return type

Dict

class ecgan.preprocessing.data_retrieval.ElectricDevicesDataRetriever(dataset, cfg, delete_zip=False)[source]

Bases: ecgan.preprocessing.data_retrieval.UrlDataRetriever

Download the electric devices dataset from todo.

static configure()[source]

Return the default configuration for the MITBIH dataset with extracted beats.

Return type

Dict

get_meta()[source]

No metadata required.

Return type

List[Tuple]

extract_data(save_location, unzip_location)[source]

Extract data from zip file.

Parameters
  • save_location (str) -- Reference to local directory where the zip is stored.

  • unzip_location (str) -- Reference to local directory where the data shall be extracted to.

Return type

None

class ecgan.preprocessing.data_retrieval.WaferDataRetriever(dataset, cfg, delete_zip=False)[source]

Bases: ecgan.preprocessing.data_retrieval.ElectricDevicesDataRetriever

Download the Wafer dataset from todo.

static configure()[source]

Return the default configuration for the MITBIH dataset with extracted beats.

Return type

Dict

class ecgan.preprocessing.data_retrieval.DataRetrieverFactory[source]

Bases: object

Meta module for creating data retriever instances.

static choose_class(dataset)[source]

Retrieve a specified dataset and save it to disc.

Parameters

dataset (str) -- String specifying the dataset to be downloaded.

Return type

DataRetriever

Returns

DataRetriever instance.

ecgan.preprocessing.data_retrieval.retrieve_fold_from_existing_split(data_dir, split_path, split_file, fold, target_dir, location=TrackerType.WEIGHTS_AND_BIASES)[source]

Load and split data given an existing split file.

The split file has to be previously saved by an instance of ecgan.evaluation.tracker.BaseTracker.

Parameters
  • data_dir (str) -- Directory containing the data/label pkl files (should be loaded from config used to create the split).

  • split_path (str) -- Pointing to the run from which the split shall be loaded from. Format is usually <entity>/<project>/<run_id>.

  • split_file (str) -- Pointing to the file inside split_path containing the split indices.

  • fold (int) -- The fold used during the training run that shall be evaluated.

  • location (TrackerType) -- Tracker location of split file.

  • target_dir (str) -- Directory the split file is saved to if it is retrieved from remote host.

Return type

Tuple[Tensor, Tensor, Tensor, Tensor, Tensor, Tensor]

Returns

Tensors containing the train_x, test_x, vali_x, train_y, test_y, vali_y data from the given split.