API Reference
Index
Classes
Density-based clustering on finite metric spaces. |
|
Graphical user interface for doing parameter selection for |
|
Implements a one-parameter filtered graph. |
persistable.Persistable methods
Cluster dataset passed at initialization. |
persistable.PersistableInteractive methods
Serves the GUI with a given persistable instance. |
|
Clusters the dataset with which the Persistable instance that was initialized. |
|
Save state of input fields in the UI as a Python object. |
persistable.FilteredGraph methods
Compute the persistence diagram of the filtered graph |
|
Compute the prominence diagram of the filtered graph |
|
Compute the persistence-based flattening of the filtered graph. |
Details
- class persistable.Persistable(X, metric='minkowski', measure=None, subsample=None, n_neighbors='auto', debug=False, threading=False, n_jobs=4, **kwargs)
Density-based clustering on finite metric spaces.
- X: ndarray (n_samples, n_features)
A numpy vector of shape (samples, features) or a distance matrix.
- metric: string, optional, default is “minkowski”
A string determining which metric is used to compute distances between the points in X. It can be a metric in
KDTree.valid_metrics
orBallTree.valid_metrics
(which can be found byfrom sklearn.neighbors import KDTree, BallTree
) or"precomputed"
if X is a distance matrix.- measure: None or ndarray(n_samples), default is None
A numpy vector of length (samples) of non-negative numbers, which is intepreted as a measure on the data points. If None, the uniform measure where each point has weight 1/samples is used. If the measure does not sum to 1, it is normalized.
- subsample: None or int, optional, default is None
Number of datapoints to subsample. The subsample is taken to have a measure that approximates the original measure on the full dataset as best as possible, in the Prokhorov sense. If metric is
minkowski
and the dimensionality is not too big, computing the sample takes time O( log(size_subsample) * size_data ), otherwise it takes time O( size_subsample * size_data ).- n_neighbors: int or string, optional, default is “auto”
Number of neighbors for each point in X used to initialize datastructures used for clustering. If set to
"all"
it will use the number of points in the dataset, if set to"auto"
it will find a reasonable default.- debug: bool, optional, default is False
Whether to print debug messages.
- threading: bool, optional, default is False
Whether to use python threads for parallel computation with
joblib
. If false, the backendloky
is used. In this case, using threads is significantly slower because of the GIL, but the backendloky
does not work well in some systems.- n_jobs: int, default is 1
Number of processes or threads to use to fit the data structures, for exaple to compute the nearest neighbors of all points in the dataset.
**kwargs
:Passed to
KDTree
orBallTree
.
- cluster(n_clusters, start, end, flattening_mode='conservative', keep_low_persistence_clusters=False)
Cluster dataset passed at initialization.
- n_clusters: int
Integer determining how many clusters the final clustering must have. Note that the final clustering can have fewer clusters if the selected parameters do not allow for so many clusters.
- start: (float, float)
Two-element list, tuple, or numpy array representing a point on the positive plane determining the start of the segment in the two-parameter hierarchical clustering used to do persistence-based clustering.
- end: (float, float)
Two-element list, tuple, or numpy array representing a point on the positive plane determining the end of the segment in the two-parameter hierarchical clustering used to do persistence-based clustering.
- flattening_mode: string, optional, default is “conservative”
If “exhaustive”, flatten the hierarchical clustering using the approach of ‘Persistence-Based Clustering in Riemannian Manifolds’ Chazal, Guibas, Oudot, Skraba. If “conservative”, use the more stable approach of ‘Stable and consistent density-based clustering’ Rolle, Scoccola. The conservative approach usually results in more unclustered points.
- keep_low_persistence_clusters: bool, optional, default is False
Only has effect if
flattening_mode
is set to “exhaustive”. Whether to keep clusters that are born below the persistence threshold associated to the selected n_clusters. If set to True, the number of clusters can be larger than the selected one.- returns:
A numpy array of length the number of points in the dataset containing integers from -1 to the number of clusters minus 1, representing the labels of the final clustering. The label -1 represents noise points, i.e., points deemed not to belong to any cluster by the algorithm.
- class persistable.PersistableInteractive(persistable)
Graphical user interface for doing parameter selection for
Persistable
.- persistable: Persistable
Persistable instance with which to interact with the user interface.
- cluster(flattening_mode='conservative', keep_low_persistence_clusters=False)
Clusters the dataset with which the Persistable instance that was initialized.
- flattening_mode: string, optional, default is “conservative”
If “exhaustive”, flatten the hierarchical clustering using the approach of ‘Persistence-Based Clustering in Riemannian Manifolds’ Chazal, Guibas, Oudot, Skraba. If “conservative”, use the more stable approach of ‘Stable and consistent density-based clustering’ Rolle, Scoccola. The conservative approach usually results in more unclustered points.
- keep_low_persistence_clusters: bool, optional, default is False
Only has effect if
flattening_mode
is set to “exhaustive”. Whether to keep clusters that are born below the persistence threshold associated to the selected n_clusters. If set to True, the number of clusters can be larger than the selected one.- returns:
A numpy array of length the number of points in the dataset containing integers from -1 to the number of clusters minus 1, representing the labels of the final clustering. The label -1 represents noise points, i.e., points deemed not to belong to any cluster by the algorithm.
- save_ui_state()
Save state of input fields in the UI as a Python object. The output can then be used as the optional input of the
start_ui()
method.returns: dictionary
- start_ui(ui_state=None, port=8050, debug=False, jupyter_mode='external')
Serves the GUI with a given persistable instance.
- ui_state: dictionary, optional
The state of a previous UI session, as a Python object, obtained by calling the method
save_ui_state()
.- port: int, optional, default is 8050
Integer representing which port of localhost to try use to run the GUI. If port is not available, we look for one that is available, starting from the given one.
- debug: bool, optional, default is False
Whether to run Dash in debug mode.
- jupyter_mode: string, optional, default is “external”
How to display the application when running inside a jupyter notebook. Options are “external” to serve the app in a port returned by this function, “inline” to open the app inline in the jupyter notebook. “jupyterlab” to open the app in a separate tab in JupyterLab.
- return: int
Returns the port of localhost used to serve the UI.
- class persistable.FilteredGraph(vertex_values, edges, edge_values, start=-inf, end=inf)
Implements a one-parameter filtered graph. The vertices and edges of the graph should have scalar filtration values such that, if
(i,j)
is an edge, then the filtration value ofi
andj
are less than or equal to the filtration value of(i,j)
.- vertex_values: ndarray (num_vertices)
A numpy vector containing the filtration values of the vertices of the graph. Implicitly, the vertices are
0, ..., num_vertices - 1
- edges: ndarray (num_edges, 2)
A numpy array containing the edges of the graph. A row
edges[i,:] = (j,k)
indicates(j,k)
is an edge.- edge_values: ndarray (num_edges)
A numpy vector containing the filtration values of the graph edges An entry
edge_values[i] = x
indicates edge(j,k)
has filtration valuex
.- start: float, optional
The filtration value where the filtration begins.
- end: float, optional
The filtration value where the filtration ends.
- persistence_based_flattening(n_clusters, flattening_mode='conservative', keep_low_persistence_clusters=False)
Compute the persistence-based flattening of the filtered graph.
- n_clusters: int
The desired number of clusters in the output.
- flattening_mode: string, optional, default is “conservative”
Use “conservative” for the persistence-based flattening algorithm as described in Rolle and Scoccola, “Stable and consistent density-based clustering via multiparameter persistence”. Use “exhaustive” for the exhaustive persistence-based flattening algorithm from the same paper.
keep_low_persistence_clusters: boolean, optional, default is False
- persistence_diagram()
Compute the persistence diagram of the filtered graph
- prominence_diagram()
Compute the prominence diagram of the filtered graph