![]() ![]() BRAHMS
Reference Manual » Invocation Interface » Bindings » Matlab (995) » brahms_gperf
This documentation is offline - click here to go online
Index | Search: Go online to use Search
| ||
Overview
Warnings
ExampleM Source Code (against 995)
% invoke BRAHMS, asking for the execution report
brahms High overheadA BRAHMS execution will display high overhead if the amount of computation performed in each process iteration is low. This is because switching between processes (and/or threads), as well as passing in and out state data and results, can end up costing more resources than performing the computation. Here, we engineer such a case. We ran M Source Code (against 995)
% construct exe
exe
brahms_execution TimeRunPhase MaxThreadCount In the figure below, the large orange blocks indicate CPU time for each of the two worker threads, as measured by the CPU timers. Each is marked with an axis up the side in CPU seconds (this is like Wall Clock time, but for each thread - since the threads run in parallel they can clock up more CPU seconds between them than have passed on the actual Wall Clock). This lets us see that the threads were relatively balanced, but we could do better by juggling the processes between them (there is not currently any support for this). These CPU times have relatively poor resolution (usually on the order of 10ms). In the two threads, four and three processes ran, respectively; these are represented by the blue blocks. The height of the blue blocks gives the relative execution time taken by each process (as measured by microsecond timers against the Wall Clock; the microsecond timers are much more accurate, but cost CPU time themselves to operate). Thus, the tallest blue block is always as high as the orange block. This lets us see which processes are taking up all our execution time. The pale orange block towards the bottom of the thread 0 block represents the time this thread spent in Init Phase. Often, Init Phase and Term Phase are so brief that they will not log any CPU time at all, remembering that the CPU timers have poor resolution (they have not appeared at all for thread 1, for instance). Within that block, the lighter blue bars indicate the relative usage of each process in the thread. The green/red bar up the side of each thread block indicates the breakdown of CPU usage between User Mode and Kernel Mode. Kernel Mode represents time spent in system operations (particularly, filing system requests and, in this case, context switching). Along the bottom is the overall execution timeline, as measured by the Invocation Bindings. The first section (pale green) is prepare, and was brief in this case. The second section (dark green) is execute. The third (pale green) is collect and its length represents how long it took to collect the fairly large set of results from this execution and load them into Matlab. The execute section is further broken down, by the timings returned by the BRAHMS executable, into Init, Run and Term Phases. The discrepancy between the time measured by the executable in execute and that measured by the bindings (around 100ms) represents the time taken by the OS to load the executable and get the timers started, and the time not counted at the end as the Report File is being written and the executable is unloaded. This overhead should remain constant for longer executions. The length of the Term Phase, here, is around 100ms; some of this represents the time taken to terminate the system as a whole, but the bulk of it is the time taken to terminate the Data objects in the System - since these are terminated in the caller thread, the time does not show up in the CPU timings. Overall, this system used something like 0.3 seconds of User Mode Run Phase time (the sort that actually gets things done) in two seconds of total execution. The the latter second was entirely concerned with loading the results into Matlab; even so, this execution has high overhead. ![]() Low overheadWhen the computation performed per iteration is greater, the overhead tends to negligible. Here, we ran M Source Code (against 995)
srcdata
brahms_execution TimeRunPhase ShowGUI This is a low overhead execution. All the overheads discussed above are greatly reduced or unmeasurable. There is no measurable use of CPU Kernel Time, which indicates that we spent all our time computing rather than servicing system calls. Having turned off all logging ( ![]() Parallel speed-upFinally, we engineer an example that takes good advantage of the hardware to parallelise the execution for good speed-up. We took Perfect balance can always be achieved by not limiting the thread count - the scheduler will, then, simply schedule the processes on whichever CPU is free. The cost of this, however, is more context switching, and a compromise position will usually achieve the best overall performance. However, for typical largish systems, there is usually very little to be gained by attempting to manually balance the execution. For this reason, we do not discuss balancing in the documentation, other than our little play around here to illustrate the operation of brahms_gperf. ![]() M Source Code (against 995)
% RESAMPLE a numeric of arbitrary type
f1
sml_system addprocess std/2009/source/numeric 'addprocess std/2009/resample/numeric 'link brahms_execution TimeRunPhase ShowGUI MaxThreadCount |
||
This is a documentation page for the BRAHMS Modular Execution (Simulation) Framework. For details of licensing and distribution, please click here. To advise of an error or omission, please click here. |