RDF: Choosing Bin WidthsΒΆ

The freud.density module is intended to compute a variety of quantities that relate spatial distributions of particles with other particles. This example demonstrates the calculation of the radial distribution function \(g(r)\) using different bin sizes.

import numpy as np
import freud
import util
import matplotlib.pyplot as plt
# Define some helper plotting functions.
def plot_lattice(box, points, colors=None):
    """Helper function for plotting points on a lattice."""
    fig, ax = plt.subplots(1, 1, figsize=(9, 6))
    box_points = util.box_2d_to_points(box)
    ax.plot(box_points[:, 0], box_points[:, 1], color='k')

    if colors is not None:
        p = ax.scatter(points[:, 0], points[:, 1], c=colors, cmap='plasma')
        ax.scatter(points[:, 0], points[:, 1])
    return fig, ax

def plot_rdf(box, points, prop, rmax=3, drs=[0.15, 0.04, 0.001]):
    """Helper function for plotting RDFs."""
    fig, axes = plt.subplots(1, len(drs), figsize=(16, 3))
    for i, dr in enumerate(drs):
        rdf = freud.density.RDF(rmax, dr)
        rdf.compute(box, points)
        axes[i].plot(rdf.R, getattr(rdf, prop))
        axes[i].set_title("Bin width: {:.3f}".format(dr), fontsize=16)
    return fig, ax

To start, we construct and visualize a set of points sitting on a simple square lattice.

box, points = util.make_square(5, 5)
fig, ax = plot_lattice(box, points)

If we try to compute the RDF directly from this, we will get something rather uninteresting since we have a perfect crystal. Indeed, we will observe that as we bin more and more finely, we approach the true behavior of the RDF for perfect crystals, which is a simple delta function.

fig, ax = plot_rdf(box, points, 'RDF')

In these RDFs, we see two sharply defined peaks, with the first corresponding to the nearest neighbors on the lattice (which are all at a distance 2 from each other), and the second, smaller peak caused by the particles on the diagonal (which sit at distance \(\sqrt{2^2+2^2} \approx 2.83\).

However, in more realistic systems, we expect that the lattice will not be perfectly formed. In this case, the RDF will exhibit more features. To demonstrate this fact, we reconstruct the square lattice of points from above, but we now introduce some noise into the system.

box, points = util.make_square(10, 10, noise=0.15)
fig, ax = plot_lattice(box, box.wrap(points), np.linalg.norm(points-np.round(points), axis=1))
ax.set_title("Colored by distance from lattice sites", fontsize=16);
fig, ax = plot_rdf(box, points, 'RDF')

In this RDF, we see the same rough features as we saw with the perfect lattice. However, the signal is much noisier, and in fact we see that increasing the number of bins essentially leads to overfitting of the data. As a result, we have to be careful with how we choose to bin our data when constructing the RDF object.

An alternative route for avoiding this problem can be using the cumulative RDF instead. The relationship between the cumulative RDF and the RDF is akin to that between a cumulative density and a probability density function, providing a measure of the total density of particles experienced up to some distance rather than the value at that distance. Just as a CDF can help avoid certain mistakes common to plotting a PDF, plotting the cumulative RDF may be helpful in some cases. Here, we see that decreasing the bin size slightly alters the features of the plot, but only in very minor way (i.e. decreasing the smoothness of the line due to small jitters).

fig, ax = plot_rdf(box, points, 'n_r')