Overview

This document covers the brief for more than just SystemML, but since SystemML is the most basic part of the solution, and since it has to go somewhere, here it is.

Context

Our research group (the ABRG) is a Computational Neuroscience group, so a major facet of our work is the design and analysis of models of neural systems. For some models, off-the-shelf software exists for efficiently computing the algorithms. For example, GENESIS solves compartmental neuron models. But for some other models we have to resort to building bespoke software.

Since software design is not an entirely trivial task, we spend substantial fractions of our time on this task. Sometimes we build our bespoke models in Matlab, sometimes we build them in C or C++ (usually for execution within Matlab as mex files), and sometimes we build them in other environments (Python, Webots, to give recent examples). The choice in each case is influenced by the complexity of the computation that needs to be performed and the ease and efficiency with which it can be represented in a particular language.

The Problem

If off-the-shelf software does not exist, and since we are not in the business of software development, we do just enough to get the job done by building a small piece of code to solve the model and nothing else. This approach has served satisfactorily in the past, so what is the problem? Recent developments within the group and within the field have changed our environment.

Sharing
The group has recently expanded considerably, and it has become more obvious that at times two or more of us have developed substantially the same piece of software to solve much the same problem. In addition, the models we are building are becoming more complex (recent work in the group has included networks of millions of cortical neurons). This renders the software design task an order of magnitude harder, forcing us to develop in low-level languages, to perform clever memory management, or to design efficient complex algorithms. "Coding up the model", then, ceases to be a throwaway afternoon of work and becomes instead a week or weeks, so the advantage to be gained in sharing is the greater.
Connecting
We have entered an era of building "whole-brain" and "brain-and-body" models (see the WhiskerBot, REVERB, ICEA and BIOTACT projects, for examples we've been involved in). This requires us to build integrated systems with many and varied components. Whilst some components continue to be neural models, complete brain models and, especially, simulations of interactions with the world require other model types: classical control models of robot behaviour, or physics models of interactions with an environment. These modules must be connectable.
Deployment
We need to compute these integrated system models in diverse environments. At one end of the scale we want to compute on a high node count cluster so that we can get answers in reasonable time. At the other, we want to deploy our models on robots, to satisfy the philosophy of "embodied modelling". We would like to satisfy those two aims with the same software, rather than have to develop separately to suit the two environments.

In summary, we wish to share our work with each other (and outside the group), we want to be able to connect our models into computable integrated systems, and we want to deploy those systems effectively in diverse environments.

Existing solutions

One solution is to use an existing model integration tool: Simulink, for example. Simulink suffers from some drawbacks, however. First, it is proprietary (i.e. not free in either sense). Second, it has a large resource footprint (a Matlab installation) which renders it unusable in constrained environments. Third, it offers no support for parallelisation. In addition, connection to third-party software is not straightforward (e.g. existing solvers like GENESIS). Other existing solutions similarly lack generality.

Proposal

So we did what any organisation would do in such a situation - we went right ahead and formed a committee. The Integration Working Group (IWG) was tasked with finding solutions to the problems outlined above. Its first finding was that no existing tool solved all these problems in a general and extensible way. The remainder of the IWG's work has been in specifying our own solution.

At first, this solution is intended to benefit our group. However, we have worked hard from the outset to make the solution sufficiently general to solve the same types of problems experienced by other researchers and, crucially, to solve the problems of integration across organisation boundaries. If we're working hard in Sheffield to code a low-level neural model, should they be repeating exactly the same work in Edinburgh? This is not a sensible use of funding. Uptake amongst other groups in the consortiums in which we are involved will be a good measure of whether we have succeeded in this wider aim.

Our proposed solution, BRAHMS, was born to solve the connect and deploy problems within the context of the WhiskerBot project. Almost accidentally, BRAHMS solved the share problem on a within-group basis, too. However, our final proposal is substantially more general, and BRAHMS, as an independent part of that proposal, is a much more complete creature than when it began. Our proposal has the following facets:

SystemML
An infrastructure and file format for sharing computable integrated systems. The file format allows the specification of a system built of unspecified processes communicating over unspecified interfaces, and facilitates the reuse of these systems or of any part thereof. The infrastructure facilitates the distribution of the software required to implement those processes and the interfaces between them.
BRAHMS
A SystemML execution client. The client is modular: each process in the system, and each interface in the system, is implemented by a software plug-in module written by a third party. Therefore, BRAHMS can run any SystemML system regardless of what bespoke software has been written to implement one or more of the processes or interfaces. The process plug-ins are distributed through the SystemML infrastructure.