Embeddings

Functions to create low dimensional embeddings.

ecgan.utils.embeddings.calculate_tsne(data, perplexity=30, early_exaggeration=12.0, n_components=2)[source]

Calculate t-SNE.

This is a wrapper function for the corresponding sklearn implementation. Can be applied to both, univariate as well as multivariate series. Can be visualized with the ecgan.visualization.plotter.ScatterPlotter. Keep in mind rerunning t-SNE will not return the same embeddings on different runs because its cost function is not convex. t-SNE is slow in comparison to e.g. UMAP, to speed up training the reducer, one might want to train it on the GPU, a cuda implementation can be found on GitHub (CannyLab).

Parameters

data (ndarray) -- Data whose dimensionality shall be reduced. Either (batch, seq_len) or (batch, seq_len, channel) format.
perplexity (float) -- t-SNE perplexity (more information e.g. here.
early_exaggeration (float) -- Controls how tight the embedded points are packed.
n_components (int) -- Dimension of the embedded space.

References

van der Maaten and Hinton, 2008

Return type: Tuple[ndarray, BaseEstimator]
Returns: The resulting low-dim embedding with shape (dims, samples) and the trained reducer

ecgan.utils.embeddings.calculate_umap(data, target, n_neighbors=25, supervised_umap=True, n_components=2, rnd_seed=None, low_memory=True)[source]

UMAP embeddings according to McInnes et al. 2018.

Using the public UMAP implementation for 2D visualizations.

Parameters

data (ndarray) -- Univariate or multivariate series as numpy array tensor.
target (Union[List[int], object]) -- List of the target classes encoded as integers.
n_neighbors (int) -- Amount of UMAP neighbors used to construct the graph in high dimensionality.
supervised_umap (bool) -- Flag indicating if we want to use supervised umap, utilizing the target info.
n_components (int) -- Dimensionality of low dim. embedding.
rnd_seed (Optional[int]) -- Set random seed if you want to reproduce the embedding. Warning: Slows down performance!
low_memory (bool) -- Enables or disables the low memory mode. Should be True if you run into memory problems during NNDescent. More time required during computation if enabled.

Return type

Tuple[ndarray, BaseEstimator]

Returns

The resulting low-dim UMAP embedding of shape (dim, samples).

ecgan.utils.embeddings.calculate_pca(data, n_components=2)[source]

PCA embeddings using the sklearn library.

Parameters

data (ndarray) -- Univariate or multivariate series as numpy array tensor.
n_components (int) -- Dimensionality of low dim. embedding.

Return type

Tuple[ndarray, BaseEstimator]

Returns

The resulting low-dim PCA embedding of shape (dim, samples) and the trained reducer.

ecgan.utils.embeddings.assert_and_reshape_dim(data)[source]

Assert that data is either of shape (a,b) or (a,b,c) and reshape if required.

Parameters: data (ndarray) -- Data of arbitrary shape.
Return type: ndarray
Returns: Reshaped 2D np.ndarray.