brahms_bench is a Matlab function that benchmarks BRAHMS running on your system. Some benchmarks test your system, others test BRAHMS (on your system). To see what benchmarks are available, enter
brahms_bench in Matlab. Along with the results of each benchmark you will see the results on various other systems, for comparison.
This benchmark measures the absolute performance of your machine in terms of floating-point and integer operations, and writes to L2 cache and main RAM. The first two numbers produced ("Float" and "Integer") are an indication of the number of operations of each type that can be performed in a given time period (currently, these numbers do not correspond directly to any standard measurement, so we don't attempt to relate them in this way - this will probably happen in a future release). The second pair of numbers ("BW(1)" and "BW(2)", bandwidth) are indications of the memory bandwidth in MB/s when writing to L2 and RAM. These numbers do correspond loosely to the raw write speeeds of these interfaces, but may be scaled. All of the four numbers should, thus, be considered only as relative performance metrics, at this stage.
This benchmark measures the absolute overhead involved in using BRAHMS, and is a measure of how much performance cost you incur by using BRAHMS as compared with developing a monolithic implementation of your system. Ideally, the numbers produced will be small! The output is an estimate of the iteration time of a process and a
link - to estimate the iteration cost of a whole system, you can just add up the iteration time of the processes and the links, since BRAHMS scales linearly (as illustrated in the AEI paper).
Typical iteration times are nanosecond-scale for contemporary hardware, when running single-threaded. When running multi-threaded,
link iteration time is substantially longer (around a microsecond), due to the computation incurred in asking the OS to perform inter-thread synchronisation. Thus, lightweight executions can run noticeably faster single-threaded than multi-threaded on single-core hardware, where multi-threaded execution offers no advantage.
However, these iteration times are relatively short. For a medium-sized system containing 10 processes and 20 links, the total system iteration time might be around 8 (ST) or 25 (MT) us. System computation per iteration does not need to be very substantial, therefore, before this overhead is swamped by the time required to execute the system. Such a medium-sized system, for example, can be run at around 11kHz (ST) or 4kHZ (MT) without incurring more than a 10% overhead. Thus, these overheads are only really a concern when high wallclock-time iteration rates are required (e.g. when exchanging real-time signals with hardware).
- This benchmark has been run in a VirtualBox installation, and generated extremely noisy results. I don't yet know what this means, but such a configuration is clearly not a performance environment, so I've not investigated.
M Source Code (against 995)