Overview

SystemML Systems are hierarchical representations, such that a Subsystem may be connected into a larger System just as a Process would be. Any System is complete and independent, such that it remains computable if removed from its parent System (with the caveat of provision of any required inputs), so we refer to Systems with the understanding that they may be nested in a parent System.

Systems and Processes offer the same interface to their parent (that is, Systems and Processes derive from a more abstract concept, which we call Computable). A Computable exposes to its parent two interfaces, input and output, each interface consisting of zero or more Sets, each Set consisting of zero or more Ports.

Key to manipulating these Systems is the concept of the SystemML Identifier, which is the specification for the string that uniquely specifies a part of such a System.

Sets

A central task of SystemML is to link Process output Ports with Process input Ports. Inputs can be presented to a process in a single unordered Set. In this case, the destination process must work out for itself what the input represents. The input type may aid in this, as may the input name, as may the Process parameters. However, there are cases where the name may not be useful, and there are cases where semantically different classes of input may be offered with the same type, and there are cases where we just don't wish to parametrize such information into the Process (it may be an off-the-shelf model, for instance). Without additional information, these different classes of input are therefore indistinguishable at the destination process. Setting aside iffy practices such as relying on order of presentation, SystemML uses the concept of "Sets", groups of input (and output) Ports that add semantics to the unordered Ports.

By analogy, consider a physical audio mixer that takes zero or more of each of three classes of input: "inputs to route to both channels", "inputs to route to left channel", "inputs to route to right channel" (B, L, R). Given that audio signals all have the same type & structure, and that order is no use in this case because of the unknown number of members of each class, and that naming conventions may not be agreed upon between source and destination process (alright so you can't pass names down audio cables; no metaphor holds up to close examination), how is the audio mixer to know how to handle the different inputs? The answer is that the physical audio mixer defines input "Sets", usually groups of sockets delineated with coloured lines, which will be treated internally with identical semantics. It is then the Link itself (provided by the user plugging cables between source and destination) that specifies how a particular input is treated, by routing it to the correct input Set, B, L or R.

SystemML adopts the same convention, allowing individual processes to define these delineated Sets on its input interface. The input Set can then be defined in a Link, and the Process, thus, gets semantic information about the input. This is, admittedly, one-dimensional, but we feel that is probably enough for almost all cases, without imposing too much complexity.

Output Sets are also allowed, for slightly different reasons. This allows the routing of entire semantic classes of outputs to one place, despite not knowing how many there are and what they are called. For instance, I can take all the "speaker" outputs from my audio mixer, and route them to a single speaker, or an oscilloscope.

Note that Sets are not an intellectual overhead: Processes all have a Default Set both on their input and output interfaces, without them having to take any action. And if a link does not specify input or output Set, the Default Set is assumed. The Default Set, thus, behaves like SystemML without Sets. Additionally, they are a negligible computational overhead, because they have significance only during initialisation phase.

To summarise, Sets provide a layer of connection semantics that we can expect to need from time to time in a physically-familiar way, without incurring an overhead, freeing us from specifying those semantics (where required) through non-robust (name, type, order) or inconvenient (parameters) routes.

Identifiers

A Computable is referred to within its context by an identifier (a string). References in SystemML are all relative to the context in which they appear. Sets and Ports can be referenced by a similar identifier, with no overlap between identifiers for Computables, Sets or Ports. An identifier may specify a Computable, a Set or a Port. Some examples follow.

  • SubSystem
  • SubSystem/Process
  • SubSystem/Process>>OutputSet
  • SubSystem/Process>>OutputSet>OutputPort
  • SubSystem/Process<<InputSet
  • SubSystem/Process<<InputSet<InputPort

Note that the forward slashes and chevrons (angle brackets) are delimiters, rather than part of any component of the identifier. The conventional ".." syntax is not used, since the context above a system is invisible to the system (see "Encapsulation" below). There exists one implied rule for filling in the blanks when an identifier is not specified completely, as follows:

If the OutputSet (or InputSet) is missing, the Default Set is assumed. Both of the following reference the same OutputPort, on the Default Set. In this way, Processes (and researchers) that do not need the functionality of Sets need not involve themselves with them.

  • SubSystem/Process>OutputPort
  • SubSystem/Process>>>OutputPort

Note, as above, that the correct way to specify explicitly the Default Set is with the empty string.

Absolute Identifiers

An Absolute Identifier uniquely specifies an object within the Root System.

Relative Identifiers

A Relative Identifier specifies an object with respect to some Parent System (which may be the Root System in some cases).

Partial Identifiers

When some part of the identifier is implied, a Partial Identifier may (or sometimes must) be used. For instance, an Expose specifies that an internal member is exposed as As. For exposed Ports and Sets, the Computable part of As must be the System that contains the Expose, so As is specified as a Partial Identifier specifying only the Set and/or Port parts.

Naming conventions (constraints)

SystemML assigns names to Systems, Processes, Sets, Ports and Data objects, also to abstract objects defining relationships between them (detailed elsewhere), and also uses a unique identifier to specify the class name of a component (Process or Data). Particular delimiters used in SystemML documents are forward slash (/) and both chevrons or angle brackets (<, >); forward slash is also used in class names. In addition, class names need to be able to be used directly as filenames on file-based implementations of SystemML namespaces, and it is fantastically convenient if some of the objects in a system (Systems, Processes, Sets, Ports) can be specified as variable names so that they can be represented simply in a structure in an interpreted language (e.g. Matlab).

All of which means that we could have different naming constraints for each class of items, but that would be confusing. At the loss of the ability to use certain sexy names for some things (e.g. dot conventions) we feel it is not unreasonable to constrain all names within SystemML to follow the standard variable-naming convention of starting with an Alpha, and continuing with an Alphanumeric or underscore ([a-zA-Z][a-zA-Z0-9_]*, if you speak Regular Expression).

This constraint may be relaxed in future implementations, but is currently enforced.

Encapsulation and exposure

A Subsystem is complete and independent, i.e. encapsulated. We cannot generally know anything about the internals of a Subsystem, so we generally cannot refer to its internal processes, Ports, and Subsystems (though the SystemML syntax allows it where the information is available, see above). Therefore, a Subsystem looks like a Process to its context - that is, it exposes an input and an output interface that can be accessed as if it were a Process. Systems don't have any explicit Sets or Ports on these interfaces, but Ports and Sets of contained Processes (and other Subsystems) can be Exposed onto these interfaces. In addition, internal Processes can be exposed in their entirety, though this is probably best avoided in general.

A practical line of argument leads to the same conclusion. Certain Processes may generate a single output with a fixed name (e.g. a summer may generate a single output named "out"). If two instantiations of such a process (A and B) appear in one Subsystem (S), it is then impossible to unambiguously specify one or other of the outputs using the label "out": S>out could refer to A>out or B>out. In some cases, we may wish to expose both of these Ports to the outside world. We could use explicit referencing, S/A>out and S/B>out, but we break encapsulation by doing this since we assume something about the internal structure of S that may change in future. Therefore, Subsystems should expose Ports in a way that does not imply anything about the internals of the Subsystem, and a parsimonious way to do this is using the Process-equivalent ("Computable") syntax, regardless of the depth at which the Ports are nested.

To achieve encapsulation, we avoid absolute references to Computables and Ports - that is, within some big nested system, we do not refer to a Port by an absolute name like subsys1/subsys2/process1>portA. As a result, references within SystemML only make sense in context (i.e. they are all relative to the Subsystem in which they appear). In instantiating a system, we might find it helpful to be able to make absolute references, especially if we are instantiating across multiple OS processes. Note, then, that, though we cannot use absolute references within SystemML if we wish to encapsulate, we can use absolute references during instantiation in an implementation, since in this phase absolute references do not change. This is a simple mechanism for referring to even remotely-instantiated processes within a large nested system. Such references could, for instance, be written in a UNIX-intuitive "/computable/computable/computable>portA", where the forward slash prefix indicates an absolute reference.