Design Principles

Vision

The freud library is designed to be a powerful and flexible library for the analysis of simulation output. To support a variety of analysis routines, freud places few restrictions on its components. The primary requirement for an analysis routine in freud is that it should be substantially computationally intensive so as to require coding up in C++: all freud code should be composed of fast C++ routines operating on systems of particles in periodic boxes. To remain easy-to-use, all C++ modules should be wrapped in Python code so they can be easily accessed from Python scripts or through a Python interpreter.

In order to achieve this goal, freud takes the following viewpoints:

  • freud works directly with NumPy <http://www.numpy.org/>_ arrays to retain maximum flexibility. Integrations with other tools should be performed via the common data representations of NumPy arrays.

  • For ease of maintenance, freud uses Git for version control; GitHub for code hosting and issue tracking; and the PEP 8 standard for code, stressing explicitly written code which is easy to read.

  • To ensure correctness, freud employs unit testing using the Python unittest framework. In addition, freud utilizes CircleCI for continuous integration to ensure that all of its code works correctly and that any changes or new features do not break existing functionality.

Language choices

The freud library is written in two languages: Python and C++. C++ allows for powerful, fast code execution while Python allows for easy, flexible use. Intel Threading Building Blocks parallelism provides further power to C++ code. The C++ code is wrapped with Cython, allowing for user interaction in Python. NumPy provides the basic data structures in freud, which are commonly used in other Python plotting libraries and packages.

Unit Tests

All modules should include a set of unit tests which test the correct behavior of the module. These tests should be simple and short, testing a single function each, and completing as quickly as possible (ideally < 10 sec, but times up to a minute are acceptable if justified).

Benchmarks

Modules can be benchmarked in the following way. The following code is an example benchmark for the freud.density.RDF module.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
import numpy as np
import freud
from benchmark import Benchmark
from benchmarker import run_benchmarks


class BenchmarkDensityRDF(Benchmark):
    def __init__(self, r_max, bins, r_min):
        self.r_max = r_max
        self.bins = bins
        self.r_min = r_min

    def bench_setup(self, N):
        self.box_size = self.r_max*3.1
        np.random.seed(0)
        self.points = np.random.random_sample((N, 3)).astype(np.float32) \
            * self.box_size - self.box_size/2
        self.rdf = freud.density.RDF(self.bins, self.r_max, r_min=self.r_min)
        self.box = freud.box.Box.cube(self.box_size)

    def bench_run(self, N):
        self.rdf.compute((self.box, self.points), reset=False)
        self.rdf.compute((self.box, self.points))


def run():
    Ns = [1000, 10000]
    r_max = 10.0
    bins = 10
    r_min = 0
    number = 100
    name = 'freud.density.RDF'
    classobj = BenchmarkDensityRDF

    return run_benchmarks(name, Ns, number, classobj,
                          r_max=r_max, bins=bins, r_min=r_min)


if __name__ == '__main__':
    run()

in a file benchmark_density_RDF.py in the benchmarks directory. More examples can be found in the benchmarks directory. The runtime of BenchmarkDensityRDF.bench_run will be timed for number of times on the input sizes of Ns. Its runtime with respect to the number of threads will also be measured. Benchmarks are run as a part of continuous integration, with performance comparisons between the current commit and the master branch.

Make Execution Explicit

While it is tempting to make your code do things “automatically”, such as have a calculate method find all _calc methods in a class, call them, and add their returns to a dictionary to return to the user, it is preferred in freud to execute code explicitly. This helps avoid issues with debugging and undocumented behavior:

# this is bad
class SomeFreudClass(object):
    def __init__(self, **kwargs):
        for key in kwargs.keys:
            setattr(self, key, kwargs[key])

# this is good
class SomeOtherFreudClass(object):
    def __init__(self, x=None, y=None):
        self.x = x
        self.y = y

Code Duplication

When possible, code should not be duplicated. However, being explicit is more important. In freud this translates to many of the inner loops of functions being very similar:

// somewhere deep in function_a
for (int i = 0; i < n; i++)
    {
    vec3[float] pos_i = position[i];
    for (int j = 0; j < n; j++)
        {
        pos_j = = position[j];
        // more calls here
        }
    }

// somewhere deep in function_b
for (int i = 0; i < n; i++)
    {
    vec3[float] pos_i = position[i];
    for (int j = 0; j < n; j++)
        {
        pos_j = = position[j];
        // more calls here
        }
    }

While it might be possible to figure out a way to create a base C++ class all such classes inherit from, run through positions, call a calculation, and return, this would be rather complicated. Additionally, any changes to the internals of the code, and may result in performance penalties, difficulty in debugging, etc. As before, being explicit is better.

However, if you have a class which has a number of methods, each of which requires the calling of a function, this function should be written as its own method (instead of being copy-pasted into each method) as is typical in object-oriented programming.

Python vs. Cython vs. C++

The freud library is meant to leverage the power of C++ code imbued with parallel processing power from TBB with the ease of writing Python code. The bulk of your calculations should take place in C++, as shown in the snippet below:

# this is bad
def badHeavyLiftingInPython(positions):
    # check that positions are fine
    for i, pos_i in enumerate(positions):
        for j, pos_j in enumerate(positions):
            if i != j:
                r_ij = pos_j - pos_i
                # ...
                computed_array[i] += some_val
    return computed_array

# this is good
def goodHeavyLiftingInCPlusPlus(positions):
    # check that positions are fine
    cplusplus_heavy_function(computed_array, positions, len(pos))
    return computed_array

In the C++ code, implement the heavy lifting function called above from Python:

void cplusplus_heavy_function(float* computed_array,
                              float* positions,
                              int n)
    {
    for (int i = 0; i < n; i++)
        {
        for (int j = 0; j < n; j++)
            {
            if (i != j)
                {
                r_ij = pos_j - pos_i;
                // ...
                computed_array[i] += some_val;
                }
            }
        }
    }

Some functions may be necessary to write at the Python level due to a Python library not having an equivalent C++ library, complexity of coding, etc. In this case, the code should be written in Cython and a reasonable attempt to optimize the code should be made.