API Reference

Index

Classes

persistable.Persistable

Density-based clustering on finite metric spaces.

persistable.PersistableInteractive

Graphical user interface for doing parameter selection for Persistable.

persistable.FilteredGraph

Implements a one-parameter filtered graph.

persistable.Persistable methods

persistable.Persistable.cluster

Cluster dataset passed at initialization.

persistable.PersistableInteractive methods

persistable.PersistableInteractive.start_ui

Serves the GUI with a given persistable instance.

persistable.PersistableInteractive.cluster

Clusters the dataset with which the Persistable instance that was initialized.

persistable.PersistableInteractive.save_ui_state

Save state of input fields in the UI as a Python object.

persistable.FilteredGraph methods

persistable.FilteredGraph.persistence_diagram

Compute the persistence diagram of the filtered graph

persistable.FilteredGraph.prominence_diagram

Compute the prominence diagram of the filtered graph

persistable.FilteredGraph.persistence_based_flattening

Compute the persistence-based flattening of the filtered graph.

Details

class persistable.Persistable(X, metric='minkowski', measure=None, subsample=None, n_neighbors='auto', debug=False, threading=False, n_jobs=4, **kwargs)

Density-based clustering on finite metric spaces.

X: ndarray (n_samples, n_features)

A numpy vector of shape (samples, features) or a distance matrix.

metric: string, optional, default is “minkowski”

A string determining which metric is used to compute distances between the points in X. It can be a metric in KDTree.valid_metrics or BallTree.valid_metrics (which can be found by from sklearn.neighbors import KDTree, BallTree) or "precomputed" if X is a distance matrix.

measure: None or ndarray(n_samples), default is None

A numpy vector of length (samples) of non-negative numbers, which is intepreted as a measure on the data points. If None, the uniform measure where each point has weight 1/samples is used. If the measure does not sum to 1, it is normalized.

subsample: None or int, optional, default is None

Number of datapoints to subsample. The subsample is taken to have a measure that approximates the original measure on the full dataset as best as possible, in the Prokhorov sense. If metric is minkowski and the dimensionality is not too big, computing the sample takes time O( log(size_subsample) * size_data ), otherwise it takes time O( size_subsample * size_data ).

n_neighbors: int or string, optional, default is “auto”

Number of neighbors for each point in X used to initialize datastructures used for clustering. If set to "all" it will use the number of points in the dataset, if set to "auto" it will find a reasonable default.

debug: bool, optional, default is False

Whether to print debug messages.

threading: bool, optional, default is False

Whether to use python threads for parallel computation with joblib. If false, the backend loky is used. In this case, using threads is significantly slower because of the GIL, but the backend loky does not work well in some systems.

n_jobs: int, default is 1

Number of processes or threads to use to fit the data structures, for exaple to compute the nearest neighbors of all points in the dataset.

**kwargs:

Passed to KDTree or BallTree.

cluster(n_clusters, start, end, flattening_mode='conservative', keep_low_persistence_clusters=False)

Cluster dataset passed at initialization.

n_clusters: int

Integer determining how many clusters the final clustering must have. Note that the final clustering can have fewer clusters if the selected parameters do not allow for so many clusters.

start: (float, float)

Two-element list, tuple, or numpy array representing a point on the positive plane determining the start of the segment in the two-parameter hierarchical clustering used to do persistence-based clustering.

end: (float, float)

Two-element list, tuple, or numpy array representing a point on the positive plane determining the end of the segment in the two-parameter hierarchical clustering used to do persistence-based clustering.

flattening_mode: string, optional, default is “conservative”

If “exhaustive”, flatten the hierarchical clustering using the approach of ‘Persistence-Based Clustering in Riemannian Manifolds’ Chazal, Guibas, Oudot, Skraba. If “conservative”, use the more stable approach of ‘Stable and consistent density-based clustering’ Rolle, Scoccola. The conservative approach usually results in more unclustered points.

keep_low_persistence_clusters: bool, optional, default is False

Only has effect if flattening_mode is set to “exhaustive”. Whether to keep clusters that are born below the persistence threshold associated to the selected n_clusters. If set to True, the number of clusters can be larger than the selected one.

returns:

A numpy array of length the number of points in the dataset containing integers from -1 to the number of clusters minus 1, representing the labels of the final clustering. The label -1 represents noise points, i.e., points deemed not to belong to any cluster by the algorithm.

class persistable.PersistableInteractive(persistable)

Graphical user interface for doing parameter selection for Persistable.

persistable: Persistable

Persistable instance with which to interact with the user interface.

cluster(flattening_mode='conservative', keep_low_persistence_clusters=False)

Clusters the dataset with which the Persistable instance that was initialized.

flattening_mode: string, optional, default is “conservative”

If “exhaustive”, flatten the hierarchical clustering using the approach of ‘Persistence-Based Clustering in Riemannian Manifolds’ Chazal, Guibas, Oudot, Skraba. If “conservative”, use the more stable approach of ‘Stable and consistent density-based clustering’ Rolle, Scoccola. The conservative approach usually results in more unclustered points.

keep_low_persistence_clusters: bool, optional, default is False

Only has effect if flattening_mode is set to “exhaustive”. Whether to keep clusters that are born below the persistence threshold associated to the selected n_clusters. If set to True, the number of clusters can be larger than the selected one.

returns:

A numpy array of length the number of points in the dataset containing integers from -1 to the number of clusters minus 1, representing the labels of the final clustering. The label -1 represents noise points, i.e., points deemed not to belong to any cluster by the algorithm.

save_ui_state()

Save state of input fields in the UI as a Python object. The output can then be used as the optional input of the start_ui() method.

returns: dictionary

start_ui(ui_state=None, port=8050, debug=False, jupyter_mode='external')

Serves the GUI with a given persistable instance.

ui_state: dictionary, optional

The state of a previous UI session, as a Python object, obtained by calling the method save_ui_state().

port: int, optional, default is 8050

Integer representing which port of localhost to try use to run the GUI. If port is not available, we look for one that is available, starting from the given one.

debug: bool, optional, default is False

Whether to run Dash in debug mode.

jupyter_mode: string, optional, default is “external”

How to display the application when running inside a jupyter notebook. Options are “external” to serve the app in a port returned by this function, “inline” to open the app inline in the jupyter notebook. “jupyterlab” to open the app in a separate tab in JupyterLab.

return: int

Returns the port of localhost used to serve the UI.

class persistable.FilteredGraph(vertex_values, edges, edge_values, start=-inf, end=inf)

Implements a one-parameter filtered graph. The vertices and edges of the graph should have scalar filtration values such that, if (i,j) is an edge, then the filtration value of i and j are less than or equal to the filtration value of (i,j).

vertex_values: ndarray (num_vertices)

A numpy vector containing the filtration values of the vertices of the graph. Implicitly, the vertices are 0, ..., num_vertices - 1

edges: ndarray (num_edges, 2)

A numpy array containing the edges of the graph. A row edges[i,:] = (j,k) indicates (j,k) is an edge.

edge_values: ndarray (num_edges)

A numpy vector containing the filtration values of the graph edges An entry edge_values[i] = x indicates edge (j,k) has filtration value x.

start: float, optional

The filtration value where the filtration begins.

end: float, optional

The filtration value where the filtration ends.

persistence_based_flattening(n_clusters, flattening_mode='conservative', keep_low_persistence_clusters=False)

Compute the persistence-based flattening of the filtered graph.

n_clusters: int

The desired number of clusters in the output.

flattening_mode: string, optional, default is “conservative”

Use “conservative” for the persistence-based flattening algorithm as described in Rolle and Scoccola, “Stable and consistent density-based clustering via multiparameter persistence”. Use “exhaustive” for the exhaustive persistence-based flattening algorithm from the same paper.

keep_low_persistence_clusters: boolean, optional, default is False

persistence_diagram()

Compute the persistence diagram of the filtered graph

prominence_diagram()

Compute the prominence diagram of the filtered graph