Tutorial¶

Example with a 2D Uniform Distribution¶

This example generates a 2D random uniform distribution. We will use these points to show how you can use Grid and GriSPy to index and query for neighbors.

Keep in mind that the class Grid provides the funcitonality needed to index the points. The class GriSPy inherits from Grid all its functionalities and adds the relevant methods to perform neighbors queries: set_periodicity, bubble_neighbors, shell_neighbors and nearest_neighbors. For completeness, we will show the example using GriSPy.

### Table of Contents

* Create a random distribution of points

* Index the points with Grid/GriSPy

* Creating your curstom distance function

Create a random distribution of points¶

Import GriSPy and others packages¶

[1]:

import numpy as np
import matplotlib.pyplot as plt

import grispy as gsp

Create random points and centres¶

[2]:

Npoints = 10 ** 3
Ncentres = 2
dim = 2
Lbox = 100.0

rng = np.random.default_rng(seed=0)
data = rng.uniform(0, Lbox, size=(Npoints, dim))
centres = rng.uniform(0, Lbox, size=(Ncentres, dim))

Index the points with Grid/GriSPy¶

[3]:

grid = gsp.GriSPy(data, N_cells=32)

You can get some information about the grid using some helpfull methods:

[4]:

grid.shape   # number of cells per dimension

[4]:

(32, 32)

[5]:

grid.edges   # grid edges in each dimension

[5]:

array([[2.97613690e-01, 1.90001607e-02],
       [9.99501352e+01, 9.98067457e+01]])

[6]:

digits = grid.cell_digits(centres)   # check the cell indices (or digits) where a given set of points would fall
digits

[6]:

array([[31,  1],
       [29,  8]], dtype=int16)

This means that from the 2 input points, the first would be located in the cell (31, 1) and the second in (29, 8).

[7]:

grid.cell_count(digits)   # number of points in the array `data` that where indexed at initialization

[7]:

array([2, 0])

[8]:

grid.cell_centre(digits)    # position of each cell centre

[8]:

array([[98.39306458,  4.69655073],
       [92.16478198, 26.52512006]])

The digits of cells can be used in other useful methods like cell_walls or cell_points.

Another important method is contains which shows if a given set of new points are contained by the grid.

[9]:

points = np.array([
    [30., 50.],
    [-10., -10.]
])
grid.contains(points)

[9]:

array([ True, False])

Set periodicity conditions¶

Set periodicity conditions on x-axis (or axis=0) and y-axis (or axis=1)

[10]:

periodic = {0: (0, Lbox), 1: (0, Lbox)}
grid.set_periodicity(periodic, inplace=True)
grid

[10]:

GriSPy(N_cells=32, copy_data=False, periodic={0: (0, 100.0), 1: (0, 100.0)}, metric='euclid')

Also you can build a periodic grid in the same step

[11]:

grid = gsp.GriSPy(data, periodic=periodic)

Important: Periodic boundaries don’t have to be exactly the same as the values of grid.edges. The edges of the grid are computed from real data to optimize the queries on sparse data. So, there is nothing wrong on the edge of the grid being at 99.9 but the periodic boundary set exactly at 100.

Query for neighbors within upper_radii¶

[12]:

upper_radii = 10.0
bubble_dist, bubble_ind = grid.bubble_neighbors(
    centres, distance_upper_bound=upper_radii
)

Query for neighbors in a shell within lower_radii and upper_radii¶

[13]:

upper_radii = 20.0
lower_radii = 10.0
shell_dist, shell_ind = grid.shell_neighbors(
    centres,
    distance_lower_bound=lower_radii,
    distance_upper_bound=upper_radii
)

Query for nth nearest neighbors¶

[14]:

n_nearest = 10
near_dist, near_ind = grid.nearest_neighbors(centres, n=n_nearest)

Plot results¶

[15]:

fig, axes = plt.subplots(1, 3, figsize=(14, 5))

ax = axes[0]
ax.set_title("Bubble query")
ax.scatter(data[:, 0], data[:, 1], c="k", marker=".", s=3)
for ind in bubble_ind:
    ax.scatter(data[ind, 0], data[ind, 1], c="C3", marker="o", s=5)
ax.plot(centres[:,0],centres[:,1],'ro',ms=10)


ax = axes[1]
ax.set_title("Shell query")
ax.scatter(data[:, 0], data[:, 1], c="k", marker=".", s=2)
for ind in shell_ind:
    ax.scatter(data[ind, 0], data[ind, 1], c="C2", marker="o", s=5)
ax.plot(centres[:,0],centres[:,1],'ro',ms=10)

ax = axes[2]
ax.set_title("n-Nearest query")
ax.scatter(data[:, 0], data[:, 1], c="k", marker=".", s=2)
for ind in near_ind:
    ax.scatter(data[ind, 0], data[ind, 1], c="C0", marker="o", s=5)
ax.plot(centres[:,0],centres[:,1],'ro',ms=10)

fig.tight_layout()

Creating your curstom distance function¶

Let’s assume that we intend to compare our distances using levenshtein’s metric for similarity between text (https://en.wikipedia.org/wiki/Levenshtein_distance).

Luckly we have the excellent textdistance library that implements efficiently this distance.

We can install it with

$ pip install textdistance

and then import it with

[16]:

import textdistance

So to make these custom distance compatible with GriSPy, we must define a function that receives 3 parameters: - c0 the center to which we seek the distance. - centres the \(C\) centers to which we want to calculate the distance from a c0. - dim the dimension of each center and c0.

Finally the function must return a np.ndarray with \(C\) elements where the element \(j-nth\) corresponds to the distance between c0 and centres\(_j\).

[17]:

def levenshtein(c0, centres, dim):
    # textdistance only operates over list and tuples
    c0 = tuple(c0)

    # creates a empty array with the required
    # number of distances
    distances = np.empty(len(centres))
    for idx, c1 in enumerate(centres):

        # textdistance only operates over list and tuples
        c1 = tuple(c1)

        # calculate the distance
        dis = textdistance.levenshtein(c0, c1)

        # store the distance
        distances[idx] = dis

    return distances

Then we create the grid with the custom distance, and run the code

[18]:

grid = gsp.GriSPy(data, metric=levenshtein)

upper_radii = 10.0
lev_dist, lev_ind = grid.bubble_neighbors(
    centres, distance_upper_bound=upper_radii)

Finally we can check our bubble_neighbors result with a plot

[19]:

fig, axes = plt.subplots(figsize=(6, 6))

ax = axes
ax.set_title("Bubble query with Levenshtein distance")
ax.scatter(data[:, 0], data[:, 1], c="k", marker=".", s=3)
for ind in lev_ind:
    ax.scatter(data[ind, 0], data[ind, 1], c="C3", marker="o", s=5)
ax.plot(centres[:,0],centres[:,1],'ro',ms=10)

fig.tight_layout()