SystemML Class
std/2009/util/rng
Summary
General Purpose Random Number Generator
Status
Stable

Overview

A general purpose random number generator (RNG). Currently offers only the Ziggurat RNG due to Marsaglia and Tsang (2000), but later editions will offer additional generators. This RNG is expected to be an adequate source of random numbers for all processes that do not explicitly concern themselves with random number generation and do not require specialist random numbers. See at the bottom for a discussion of this algorithm.

Native Interface

Notes
  • As with all Data and Utility Components, this one presents its C++ interface in a namespace named for its SystemML Class, with underscores instead of slashes and the release tagged on the end. It may be easier to import this namespace, provided it does not conflict with other symbols in your source code, but you can always use symbols within it explicitly, as shown here. An alternative is to create a namespace alias with a more convenient name - this approach is used, for example, in the 1199 template.

The utility object provides the following C++ interface (for details of the C interface, see the associated header file).

C/C++ API (Utility)
struct Utility { void setName(const char* name); void create(Symbol hComponent); void select(const char* generator); void seed(const std::vector<UINT32>& seed); void seed(const UINT32* seed, UINT32 count); void fill(DOUBLE* dst, UINT64 count, DOUBLE gain = 1.0, DOUBLE offset = 0.0); void fill(FLOAT32* dst, UINT64 count, FLOAT32 gain = 1.0, FLOAT32 offset = 0.0); DOUBLE get(); };

Most of this interface is undocumented due to noone's done it yet, but hopefully it's more-or-less self-explanatory. Some functions whose operation is not obvious are detailed below. For usage examples, see the source code for any Standard Library process that uses this utility (this includes std/random/numeric).

void setName(const char* name)
If you want to give the Utility a name (which may help generate sensible error messages if you hold more than one utility in your component), call this function before calling create().
void select(const char* generator)
Select the generator to use. generator must be one of "MT2000.normal", "MT2000.exponential" or "MT2000.uniform" (Marsaglia & Tsang 2000).
DOUBLE get()
The returned value is unmodified, that is, has a unity gain and zero offset applied.

Example

For a usage example, see the source code for std/random/numeric.

Acknowledgements

RNG_MT2000 was originally written by George Marsaglia and Wai Wan Tsang in 2000 (published in Journal of Statistical Software, 5(8)).

Discussion

Third-party RNGs

Using dynamic-link third-party libraries is highly not recommended in BRAHMS (any any SystemML client), to avoid deficits in accountability. If you really want to use a different RNG, suggest that we add it to this general RNG class instead. You can, of course, do what you like, but we did warn you.

Issues with Marsaglia & Tsang (2000)

Some commentary originally from Rob Stewart, though I (mitch) may have edited it I can't remember.

According to Leong et al. (2005), the MT2000 RNG failed the chi-square test for normal distributions. They proposed to fix this by adding five lines of C code (Leong et al., 2005). It's just a question of the relatively short period of the uniform RNG from MT2000, coupled with the fact that they generated 20 billion random numbers.

As I understand it, the only problem with the SHR3 inline uniform generator in the 2000 code is its relatively short period (232-1 to be precise). By using this as the foundation of the ziggurat code, the Gaussian rng also has the same short period. Now, what Leong and others did in the 2005 paper was to generate 20 billion random numbers (or roughly 5 times the period of the rng) and this was enough to break Gaussianity by the chi-square measure. This is not a very interesting result, but does highlight that if you want a longer period, you need to use a better uniform generator. What is interesting is how close the polar method and the 2005 improved method are to the 95% confidence limit in their tests. Neither method is anywhere near 99% confidence, and I would be interested to see how a MT-ziggurat generator would fare on this test. Anyone want to give it a go?

I really don't see this as a big issue in simulation, and thus continue to use the original algorithm from Marsaglia & Tsang (2000). Ultimately, you need to ask yourself whether repeats in the rng every few billion calls matters in your simulation. For example, in many of my simulations, I use RNOR to add Gaussian iid noise to each model neuron's membrane potential at each simulation time step. I also use RNOR to generate log-normal numbers for the inter-spike-intervals of extrinsic neurons, but we'll ignore that for the moment. Say I have a network of 1000 neurons, and a time step of 0.2 ms, that gives 5000 RNOR calls every simulation ms, or 5 million every second. RNOR will repeat every 859 seconds of simulation time for this network model. Do your simulations run this long? Mine don't. Of course, when you have 100,000 model neurons you start to get repeats every few seconds, but are your models that large? These are important reality checks. Also, even if your rng does repeat, will this actually introduce any periodicities into your model's behaviour? Not likely. The state of the model will be different when the repeat occurs, and the repeats will be applied to different model components. You might also want to look here:

http://www.doornik.com/research/ziggurat.pdf

If you were really serious about using the best uniform generator for this, then I would look into implementing the Mersenne Twister (MT19937) to generate the uniforms:

http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/emt.html

http://en.wikipedia.org/wiki/Mersenne_twister

http://www.mathworks.com/matlabcentral/fileexchange/loadFile.do?objectId=6614&objectType=file

Interestingly, for version 7.3 (I think), the mathworks incorporated the mersenne twister in as an option in rand as explained here:

http://www.mathworks.com/access/helpdesk/help/techdoc/ref/index.html?/access/helpdesk/help/techdoc/ref/rand.html

This is a GOOD THING, and everyone using that version of matlab should use this rng in their matlab code.

Now, the short period is perhaps not strictly the _only_ problem with the ziggurat code. Doornik (2005) showed there was another problem with the method when it was modified to produce doubles rather than floats, but I'm not sure how relevant his result is for float generation.

It looks like someone (Jochen Voss) has already put mt19937 together with the ziggurat method. Nice! (jazz man voice). It is available here:

http://seehuhn.de/comp/ziggurat/

I haven't really looked at it yet, but it's GSL-centric. Anyone got time to give it a try? Of course, for any of you that didn't know already, the ziggurat method has been the one used for matlab's randn since matlab 5:

http://www.mathworks.com/support/solutions/data/1-ZAZ3C.html?solution=1-ZAZ3C

So the mathworks trust the method, but use a version with a longer period...

References

Leong, P. H. W., Zhang, G., Lee, D.-U., Luk, W., & Villasenor, J. D. (2005). A comment on the implementation of the ziggurat method. Journal of Statistical Software, 12(7). URL: http://www.jstatsoft.org/v12/i07/

Marsaglia, G. & Tsang, W. W. (2000). The ziggurat method for generating random variables. Journal of Statistical Software, 5(8). URL: http://www.jstatsoft.org/v05/i08/

Doornik, J.A. (2005). An Improved Ziggurat Method to Generate Normal Random Samples. Working papers, Nuffield College, Oxford. URL: http://www.doornik.com/research.html