GriSPy documentation¶


GriSPy (Grid Search in Python) is a regular grid search algorithm for quick nearest-neighbor lookup.
This class indexes a set of k-dimensional points in a regular grid providing a fast aproach for nearest neighbors queries. Optional periodic boundary conditions can be provided for each axis individually. Additionally GriSPy provides the posibility of working with individual search radius for each query point in fixed-radius searches and minimum and maximum search radius for shell queries.
Repository and Issues¶
Citation¶
If you use GriSPy in a scientific publication, we would appreciate citations to the following paper:
Chalela, M., Sillero, E., Pereyra, L., García, M. A., Cabral, J. B., Lares, M., & Merchán, M. (2019). GriSPy: A Python package for Fixed-Radius Nearest Neighbors Search. arXiv preprint arXiv:1912.09585.
Bibtex
@article{
chalela2019grispy,
title={GriSPy: A Python package for Fixed-Radius Nearest Neighbors Search},
author={
Chalela, Martin and Sillero, Emanuel and Pereyra,
Luis and Garc{\'\i}a, Mario Alejandro and Cabral,
Juan B and Lares, Marcelo and Merch{\'a}n, Manuel},
journal={arXiv preprint arXiv:1912.09585},
year={2019}
}
Full-text: https://ui.adsabs.harvard.edu/abs/2019arXiv191209585C/abstract
Requirements¶
You need Python 3.6 or later to run GriSPy. You can have multiple Python versions (3.x) installed on the same system without problems.
Installation¶
The easiest way to install is using pip:
$ pip install grispy
This will install the latest stable version on PIPy.
If you want to use the latest development version from github, unpack or clone the repo on your local machine, change the directory to where setup.py is, and install using setuptools:
$ python setup.py install
or pip:
$ pip install -e .
Licence¶
MIT License
Copyright (c) 2019 Martin Chalela
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Tutorial¶
Example in 2D Uniform Distribution¶
This example generates a 2D random uniform distribution, and then uses GriSPy to search neighbors within a given radius and/or the n-nearest neighbors
Import GriSPy and others packages¶
[1]:
import numpy as np
import matplotlib.pyplot as plt
from grispy import GriSPy
[2]:
%matplotlib inline
Create random points and centres¶
[3]:
Npoints = 10 ** 3
Ncentres = 2
dim = 2
Lbox = 100.0
np.random.seed(0)
data = np.random.uniform(0, Lbox, size=(Npoints, dim))
centres = np.random.uniform(0, Lbox, size=(Ncentres, dim))
Build the grid with the data¶
[4]:
gsp = GriSPy(data)
Set periodicity. Periodic conditions on x-axis (or axis=0) and y-axis (or axis=1)
[5]:
periodic = {0: (0, Lbox), 1: (0, Lbox)}
gsp.set_periodicity(periodic)
[5]:
GriSPy(N_cells=20, periodic={0: (0, 100.0), 1: (0, 100.0)}, metric='euclid', copy_data=False)
Also you can build a periodic grid in the same step
[6]:
gsp = GriSPy(data, periodic=periodic)
Query for neighbors within upper_radii¶
[7]:
upper_radii = 10.0
bubble_dist, bubble_ind = gsp.bubble_neighbors(
centres, distance_upper_bound=upper_radii
)
Query for neighbors in a shell within lower_radii and upper_radii¶
[8]:
upper_radii = 10.0
lower_radii = 8.0
shell_dist, shell_ind = gsp.shell_neighbors(
centres,
distance_lower_bound=lower_radii,
distance_upper_bound=upper_radii
)
Query for nth nearest neighbors¶
[9]:
n_nearest = 10
near_dist, near_ind = gsp.nearest_neighbors(centres, n=n_nearest)
Plot results¶
[10]:
fig, axes = plt.subplots(1, 3, figsize=(14, 5))
ax = axes[0]
ax.set_title("Bubble query")
ax.scatter(data[:, 0], data[:, 1], c="k", marker=".", s=3)
for ind in bubble_ind:
ax.scatter(data[ind, 0], data[ind, 1], c="C3", marker="o", s=5)
ax.plot(centres[:,0],centres[:,1],'ro',ms=10)
ax = axes[1]
ax.set_title("Shell query")
ax.scatter(data[:, 0], data[:, 1], c="k", marker=".", s=2)
for ind in shell_ind:
ax.scatter(data[ind, 0], data[ind, 1], c="C2", marker="o", s=5)
ax.plot(centres[:,0],centres[:,1],'ro',ms=10)
ax = axes[2]
ax.set_title("n-Nearest query")
ax.scatter(data[:, 0], data[:, 1], c="k", marker=".", s=2)
for ind in near_ind:
ax.scatter(data[ind, 0], data[ind, 1], c="C0", marker="o", s=5)
ax.plot(centres[:,0],centres[:,1],'ro',ms=10)
fig.tight_layout()

Creating your curstom distance function¶
Let’s assume that we intend to compare our distances using levenshtein’s metric for similarity between text (https://en.wikipedia.org/wiki/Levenshtein_distance).
Luckly we have the excellent textdistance
library that implements efficiently this distance.
We can install it with
$ pip install textdistance
and then import it with
[11]:
import textdistance
So to make these custom distance compatible with GriSPy, we must define a function that receives 3 parameters: - c0
the center to which we seek the distance. - centres
the \(C\) centers to which we want to calculate the distance from a c0. - dim
the dimension of each center and c0.
Finally the function must return a np.ndarray
with \(C\) elements where the element \(j-nth\) corresponds to the distance between c0
and centres
\(_j\).
[12]:
def levenshtein(c0, centres, dim):
# textdistance only operates over list and tuples
c0 = tuple(c0)
# creates a empty array with the required
# number of distances
distances = np.empty(len(centres))
for idx, c1 in enumerate(centres):
# textdistance only operates over list and tuples
c1 = tuple(c1)
# calculate the distance
dis = textdistance.levenshtein(c0, c1)
# store the distance
distances[idx] = dis
return distances
Then we create the grid with the custom distance, and run the code
[13]:
gsp = GriSPy(data, metric=levenshtein)
upper_radii = 10.0
lev_dist, lev_ind = gsp.bubble_neighbors(
centres, distance_upper_bound=upper_radii)
Finally we can check our bubble_neighbors
result with a plot
[14]:
fig, axes = plt.subplots(figsize=(6, 6))
ax = axes
ax.set_title("Bubble query with Levenshtein distance")
ax.scatter(data[:, 0], data[:, 1], c="k", marker=".", s=3)
for ind in lev_ind:
ax.scatter(data[ind, 0], data[ind, 1], c="C3", marker="o", s=5)
ax.plot(centres[:,0],centres[:,1],'ro',ms=10)
fig.tight_layout()

API Module¶
Module grispy.core
¶
GriSPy core class.
-
class
grispy.core.
BuildStats
(buildtime, periodicity_set_at, datetime)[source]¶ Bases:
object
Statistics about the grid creation.
-
buildtime
¶ The number of seconds expended in build the grid.
Type: float
-
periodicity_set_at
¶ The date and time when the periodicity was setted.
Type: datetime.datetime
-
datetime
¶ The date and time of build.
Type: datetime.datetime
-
-
class
grispy.core.
PeriodicityConf
(periodic_flag, pd_hi, pd_low, periodic_edges, periodic_direc)[source]¶ Bases:
object
Internal representation of the periodicity of the Grid.
-
class
grispy.core.
GriSPy
(data=None, N_cells=64, periodic=NOTHING, metric='euclid', copy_data=False)[source]¶ Bases:
object
Grid Search in Python.
GriSPy is a regular grid search algorithm for quick nearest-neighbor lookup.
This class indexes a set of k-dimensional points in a regular grid providing a fast aproach for nearest neighbors queries. Optional periodic boundary conditions can be provided for each axis individually.
The algorithm has the following queries implemented: - bubble_neighbors: find neighbors within a given radius. A different radius for each centre can be provided. - shell_neighbors: find neighbors within given lower and upper radius. Different lower and upper radius can be provided for each centre. - nearest_neighbors: find the nth nearest neighbors for each centre.
Other methods: - set_periodicity: set periodicity condition after the grid was built.
To be implemented: - box_neighbors: find neighbors within a k-dimensional squared box of a given size and orientation. - n_jobs: number of cores for parallel computation.
Parameters: - data (ndarray, shape(n,k)) – The n data points of dimension k to be indexed. This array is not copied, and so modifying this data may result in erroneous results. The data can be copied if the grid is built with copy_data=True.
- N_cells (positive int, optional) – The number of cells of each dimension to build the grid. The final grid will have N_cells**k number of cells. Default: 64
- copy_data (bool, optional) – Flag to indicate if the data should be copied in memory. Default: False
- periodic (dict, optional) – Dictionary indicating if the data domain is periodic in some or all its dimensions. The key is an integer that correspond to the number of dimensions in data, going from 0 to k-1. The value is a tuple with the domain limits and the data must be contained within these limits. If an axis is not specified, or if its value is None, it will be considered as non-periodic. Important: The periodicity only works within one periodic range. Default: all axis set to None. Example, periodic = { 0: (0, 360), 1: None}.
- metric (str, optional) – Metric definition to compute distances. Options: ‘euclid’, ‘haversine’ ‘vincenty’ or a custom callable.
-
dim
¶ The dimension of a single data-point.
Type: int
-
grid_
¶ This dictionary contains the data indexed in a grid. The key is a tuple with the k-dimensional index of each grid cell. Empty cells do not have a key. The value is a list of data points indices which are located within the given cell.
Type: dict
-
k_bins_
¶ The limits of the grid cells in each dimension.
Type: ndarray, shape (N_cells+1,k)
-
periodic_flag_
¶ If any dimension has periodicity.
Type: bool
-
periodic_conf_
¶ Statistics and intermediate results to make easy and fast the searchs with periodicity.
Type: grispy.core.PeriodicityConf
-
time_
¶ Object containing the building time and the date of build.
Type: grispy.core.BuildStats
-
periodic_flag_
Proxy to
periodic_conf_.periodic_flag
.
-
set_periodicity
(periodic, inplace=False)[source]¶ Set periodicity conditions.
This allows to define or change the periodicity limits without having to construct the grid again.
Important: The periodicity only works within one periodic range.
Parameters: - periodic (dict, optional) – Dictionary indicating if the data domain is periodic in some or all its dimensions. The key is an integer that corresponds to the number of dimensions in data, going from 0 to k-1. The value is a tuple with the domain limits and the data must be contained within these limits. If an axis is not specified, or if its value is None, it will be considered as non-periodic. Default: all axis set to None. Example, periodic = { 0: (0, 360), 1: None}.
- inplace (boolean, optional (default=False)) – If its True, set the periodicity on the current GriSPy instance and return None. Otherwise a new instance is created and returned.
-
bubble_neighbors
(centres, distance_upper_bound=-1.0, sorted=False, kind='quicksort')[source]¶ Find all points within given distances of each centre.
Parameters: - centres (ndarray, shape (m,k)) – The point or points to search for neighbors of.
- distance_upper_bound (scalar or ndarray of length m) – The radius of points to return. If a scalar is provided, the same distance will apply for every centre. An ndarray with individual distances can also be rovided.
- sorted (bool, optional) – If True the returned neighbors will be ordered by increasing distance to the centre. Default: False.
- kind (str, optional) – When sorted = True, the sorting algorithm can be specified in this keyword. Available algorithms are: [‘quicksort’, ‘mergesort’, ‘heapsort’, ‘stable’]. Default: ‘quicksort’
- njobs (int, optional) – Number of jobs for parallel computation. Not implemented yet.
Returns: - distances (list, length m) – Returns a list of m arrays. Each array has the distances to the neighbors of that centre.
- indices (list, length m) – Returns a list of m arrays. Each array has the indices to the neighbors of that centre.
-
shell_neighbors
(centres, distance_lower_bound=-1.0, distance_upper_bound=-1.0, sorted=False, kind='quicksort')[source]¶ Find all points within given lower and upper distances of each centre.
Parameters: - centres (ndarray, shape (m,k)) – The point or points to search for neighbors of.
- distance_lower_bound (scalar or ndarray of length m) – The minimum distance of points to return. If a scalar is provided, the same distance will apply for every centre. An ndarray with individual distances can also be rovided.
- distance_upper_bound (scalar or ndarray of length m) – The maximum distance of points to return. If a scalar is provided, the same distance will apply for every centre. An ndarray with individual distances can also be rovided.
- sorted (bool, optional) – If True the returned neighbors will be ordered by increasing distance to the centre. Default: False.
- kind (str, optional) – When sorted = True, the sorting algorithm can be specified in this keyword. Available algorithms are: [‘quicksort’, ‘mergesort’, ‘heapsort’, ‘stable’]. Default: ‘quicksort’
- njobs (int, optional) – Number of jobs for parallel computation. Not implemented yet.
Returns: - distances (list, length m) – Returns a list of m arrays. Each array has the distances to the neighbors of that centre.
- indices (list, length m) – Returns a list of m arrays. Each array has the indices to the neighbors of that centre.
-
nearest_neighbors
(centres, n=1, kind='quicksort')[source]¶ Find the n nearest-neighbors for each centre.
Parameters: - centres (ndarray, shape (m,k)) – The point or points to search for neighbors of.
- n (int, optional) – The number of neighbors to fetch for each centre. Default: 1.
- kind (str, optional) – The returned neighbors will be ordered by increasing distance to the centre. The sorting algorithm can be specified in this keyword. Available algorithms are: [‘quicksort’, ‘mergesort’, ‘heapsort’, ‘stable’]. Default: ‘quicksort’
- njobs (int, optional) – Number of jobs for parallel computation. Not implemented yet.
Returns: - distances (list, length m) – Returns a list of m arrays. Each array has the distances to the neighbors of that centre.
- indices (list, length m) – Returns a list of m arrays. Each array has the indices to the neighbors of that centre.
Module grispy.distances
¶
Distances implementations for GriSPy.
-
grispy.distances.
euclid
(c0, centres, dim)[source]¶ Classic Euclidean distance.
In mathematics, the Euclidean distance or Euclidean metric is the “ordinary” straight-line distance between two points in Euclidean space. With this distance, Euclidean space becomes a metric space. The associated norm is called the Euclidean norm. Older literature refers to the metric as the Pythagorean metric. A generalized term for the Euclidean norm is the L2 norm or L2 distance. More info: https://en.wikipedia.org/wiki/Euclidean_distance
-
grispy.distances.
haversine
(c0, centres, dim)[source]¶ Distance using the Haversine formulae.
The haversine formula determines the great-circle distance between two points on a sphere given their longitudes and latitudes. Important in navigation, it is a special case of a more general formula in spherical trigonometry, the law of haversines, that relates the sides and angles of spherical triangles. More info: https://en.wikipedia.org/wiki/Haversine_formula
-
grispy.distances.
vincenty
(c0, centres, dim)[source]¶ Calculate distance on the surface of a spheroid with Vincenty Formulae.
Vincenty’s formulae are two related iterative methods used in geodesy to calculate the distance between two points on the surface of a spheroid, developed by Thaddeus Vincenty (1975a). They are based on the assumption that the figure of the Earth is an oblate spheroid, and hence are more accurate than methods that assume a spherical Earth, such as great-circle distance. More info: https://en.wikipedia.org/wiki/Vincenty%27s_formulae
Module grispy.validators
¶
Functions to validate GriSPy input parameters.
-
grispy.validators.
validate_distance_bound
(distance, periodic)[source]¶ Distance bounds, upper and lower, can be scalar or numpy array.