API Reference

Index

Classes

persistable.Persistable

Density-based clustering on finite metric spaces.

persistable.PersistableInteractive

Graphical user interface for doing parameter selection for Persistable.

persistable.Persistable methods

persistable.Persistable.cluster

Cluster dataset passed at initialization.

persistable.PersistableInteractive methods

persistable.PersistableInteractive.start_ui

Serves the GUI with a given persistable instance.

persistable.PersistableInteractive.cluster

Clusters the dataset with which the Persistable instance that was initialized.

persistable.PersistableInteractive.save_ui_state

Save state of input fields in the UI as a Python object.

Details

class persistable.Persistable(X, metric='minkowski', measure=None, subsample=None, n_neighbors='auto', debug=False, threading=False, n_jobs=4, **kwargs)

Density-based clustering on finite metric spaces.

X: ndarray (n_samples, n_features)

A numpy vector of shape (samples, features) or a distance matrix.

metric: string, optional, default is “minkowski”

A string determining which metric is used to compute distances between the points in X. It can be a metric in KDTree.valid_metrics or BallTree.valid_metrics (which can be found by from sklearn.neighbors import KDTree, BallTree) or "precomputed" if X is a distance matrix.

measure: None or ndarray(n_samples), default is None

A numpy vector of length (samples) of non-negative numbers, which is intepreted as a measure on the data points. If None, the uniform measure where each point has weight 1/samples is used. If the measure does not sum to 1, it is normalized.

subsample: None or int, optional, default is None

Number of datapoints to subsample. The subsample is taken to have a measure that approximates the original measure on the full dataset as best as possible, in the Prokhorov sense. If metric is minkowski and the dimensionality is not too big, computing the sample takes time O( log(size_subsample) * size_data ), otherwise it takes time O( size_subsample * size_data ).

n_neighbors: int or string, optional, default is “auto”

Number of neighbors for each point in X used to initialize datastructures used for clustering. If set to "all" it will use the number of points in the dataset, if set to "auto" it will find a reasonable default.

debug: bool, optional, default is False

Whether to print debug messages.

threading: bool, optional, default is False

Whether to use python threads for parallel computation with joblib. If false, the backend loky is used. In this case, using threads is significantly slower because of the GIL, but the backend loky does not work well in some systems.

n_jobs: int, default is 1

Number of processes or threads to use to fit the data structures, for exaple to compute the nearest neighbors of all points in the dataset.

**kwargs:

Passed to KDTree or BallTree.

cluster(n_clusters, start, end, flattening_mode='conservative', keep_low_persistence_clusters=False)

Cluster dataset passed at initialization.

n_clusters: int

Integer determining how many clusters the final clustering must have. Note that the final clustering can have fewer clusters if the selected parameters do not allow for so many clusters.

start: (float, float)

Two-element list, tuple, or numpy array representing a point on the positive plane determining the start of the segment in the two-parameter hierarchical clustering used to do persistence-based clustering.

end: (float, float)

Two-element list, tuple, or numpy array representing a point on the positive plane determining the end of the segment in the two-parameter hierarchical clustering used to do persistence-based clustering.

flattening_mode: string, optional, default is “conservative”

If “exhaustive”, flatten the hierarchical clustering using the approach of ‘Persistence-Based Clustering in Riemannian Manifolds’ Chazal, Guibas, Oudot, Skraba. If “conservative”, use the more stable approach of ‘Stable and consistent density-based clustering’ Rolle, Scoccola. The conservative approach usually results in more unclustered points.

keep_low_persistence_clusters: bool, optional, default is False

Only has effect if flattening_mode is set to “exhaustive”. Whether to keep clusters that are born below the persistence threshold associated to the selected n_clusters. If set to True, the number of clusters can be larger than the selected one.

returns:

A numpy array of length the number of points in the dataset containing integers from -1 to the number of clusters minus 1, representing the labels of the final clustering. The label -1 represents noise points, i.e., points deemed not to belong to any cluster by the algorithm.

class persistable.PersistableInteractive(persistable)

Graphical user interface for doing parameter selection for Persistable.

persistable: Persistable

Persistable instance with which to interact with the user interface.

cluster(flattening_mode='conservative', keep_low_persistence_clusters=False)

Clusters the dataset with which the Persistable instance that was initialized.

flattening_mode: string, optional, default is “conservative”

If “exhaustive”, flatten the hierarchical clustering using the approach of ‘Persistence-Based Clustering in Riemannian Manifolds’ Chazal, Guibas, Oudot, Skraba. If “conservative”, use the more stable approach of ‘Stable and consistent density-based clustering’ Rolle, Scoccola. The conservative approach usually results in more unclustered points.

keep_low_persistence_clusters: bool, optional, default is False

Only has effect if flattening_mode is set to “exhaustive”. Whether to keep clusters that are born below the persistence threshold associated to the selected n_clusters. If set to True, the number of clusters can be larger than the selected one.

returns:

A numpy array of length the number of points in the dataset containing integers from -1 to the number of clusters minus 1, representing the labels of the final clustering. The label -1 represents noise points, i.e., points deemed not to belong to any cluster by the algorithm.

save_ui_state()

Save state of input fields in the UI as a Python object. The output can then be used as the optional input of the start_ui() method.

returns: dictionary

start_ui(ui_state=None, port=8050, debug=False, jupyter_mode='external')

Serves the GUI with a given persistable instance.

ui_state: dictionary, optional

The state of a previous UI session, as a Python object, obtained by calling the method save_ui_state().

port: int, optional, default is 8050

Integer representing which port of localhost to try use to run the GUI. If port is not available, we look for one that is available, starting from the given one.

debug: bool, optional, default is False

Whether to run Dash in debug mode.

jupyter_mode: string, optional, default is “external”

How to display the application when running inside a jupyter notebook. Options are “external” to serve the app in a port returned by this function, “inline” to open the app inline in the jupyter notebook. “jupyterlab” to open the app in a separate tab in JupyterLab.

return: int

Returns the port of localhost used to serve the UI.