Overview

DataML is an XML flavour that can represent most variables in a one-to-one fashion. It is a more-or-less general way of representing pretty much any data. Support for DataML is made available to component authors through the DataMLNode class in C++ bindings 1065 and 1199; this is a functional overlay on XMLNode. BRAHMS deals in XMLNode's, and it's up to you how you choose to handle them, whether they are DataML or not. But you will probably find it convenient, at least to begin with, to use the DataMLNode overlay to access DataML state information.

Typical Usage

If you parametrize your Process using DataML, things will work as follows. You will write a Matlab script that uses addprocess to add your Process to a System, using the SystemML Toolbox. It will pass its state data (parameters) as a Matlab structure. The SystemML Toolbox will see that your state data does not specify a particular StateML writer, and will attempt to use the built-in DataML writer (this will work for any Matlab state structure, unless you've got function handles or custom classes hidden in there, which can't be translated into DataML).

This state data will then be presented to your 1199 Process as a DataML (XML) node. In your Process, you will use the DataMLNode interface layer to access the data - see any of the Standard Library Components for deep examples, but the basics are covered in the Developing Processes Quickstart. If you are authoring in 1258 or 1262, the DataML will be automatically translated by the bindings from DataML into a form native to your Process. Note, then, that if you are authoring in 1258 (Matlab) the state data object you get passed during EVENT_STATE_SET is identical to the one you passed in using addprocess.

Format

The format of DataML is not documented. Use the SystemML Toolbox (implicitly, see above) to interact with DataML from Matlab, and use the DataMLNode class to interact with DataML from your Process. The format is private, and may change until the SystemML interface stabilises.

Notes

Since there's nowhere else to put this, note the potentially confusing storage arrangements for complex numeric data. Numeric data may be interleaved (first dimension is size 2 and is interpreted as real/imag) or adjacent (last dimension is ...). Simply enough, a storage class suffixed "x" indicates complex adjacent, "y" indicates complex interleaved. Generally, we want to be able to store any way the data is supplied, so we only do JIT translation as required.

This is complicated when storing unencapsulated, using a binary file, because binary files are always written with time as the last dimension, for obvious reasons. In this case, each sample is still adjacent or interleaved, as indicated by "x" and "y", but the whole log has time as the last dimension. Therefore, in adjacent form, the second-to-last dimension of the whole file is actually real/imag.

Unpacking is fairly straightforward once one of these three possibilities is divined. The key is that binary files always use the last dimension for time, regardless of complexity. Matlab requires unpacking to adjacent format. Numpy requires unpacking to interleaved format. So there's no gain in choosing one over the other, as far as I can see. We stay agnostic.

Dimensions must be supplied as an attribute of the DataML node, unless the node is scalar. Thus, absent dimensions imply a scalar node.

A node with no attributes is a special way to store a 1xN row vector of characters, i.e. a simple string.