For those risk analysts out there using Monte Carlo in their analysis, have you ever wondered why the industry standard for simulation sample sizes in Monte Carlo is set at five thousand iterations?
So many Monte Carlo systems I see in use today run a standard five thousand simulations but why five thousand, why not ten thousand, why not five thousand and one?
How many samples should we have in our simulation sample size and when is the number of iterations insignificant? To be concise, if you were to add another sample to your Monte Carlo simulation, when doesn't it make a difference to the final result?
This is a tricky dilemma as one would expect because we are often generating random data from a function, in many cases this function is no more sophisticated than the outcome of the rand() function in Microsoft Excel. Yet, if we were to test the performance of two sets of random numbers we could of course borrow testing methods from the paired T-Test technique to serve our experiment.
In our example I generated multiple sets of random numbers; a 2 by 4 sample set or group, a 2 by 8 sample set, a 2 by 256, 512, 1024, 2048, 5000 and why not double that sample size to take us to a pair of unique 10,000 random samples sets that need to be compared. If we were to do this and contrast the sample means and standard deviations between each set in a group, when would the differences between these distribution estimates become insignificant?
Perhaps we should ask ourselves, why should we care anyway?
Well the results from our little experiment below tell us we might want to take a look at this.
The Test Method | Martin Davies
Perhaps we should ask ourselves, why should we care anyway?
Well the results from our little experiment below tell us we might want to take a look at this.
In a spreadsheet I created multiple test sets of varying sample sizes then compared the differences in means and standard deviations between each set in a group. I repeated this exercise a thousand times across each paired set or group using a macro! That is a lot of calculations for Intel to bang through and I have to say the tester macro ran so many samples that the processor fan sped and the PC nearly blew up, at one point I had this sense that the laptop was going to take off.
The results were definitely worth the effort!!!
In the table below I have tallied up the differences between means and standard deviations across testing sets, and captured the tally when the differences between a pair is greater than a tolerance of 0.0005. Increasing the test tolerance or significance to 0.00005 drove out more failures per test, just as one would expect.
Firstly, if we have a peep at the single blue line graph (see image above), there is plenty of noise at the small end of the spectrum and slowly it goes flat as we move into larger sample sizes. Flat is what we are seeking because it has less error.
It follows that if you are generating random numbers between 0.0005 and 0.99995 and you classify 0.0005 as significant, you may find two thousand samples are more than enough. If your significance is higher, say another decimal point just as an example, well; you might want to run ten thousand iterations rather than five, just to be sure.
In all the tests that I ran, ten thousand iterations never failed a test but most of the time when we are dealing with a significance of 0.0005, two thousand samples seem to generate stable outputs and a flat error curve.
Quite simply, I have this sense that five thousand samples just doesn't serve us so well and we might be better off with more or less samples in our simulations when we are using Monte Carlo. It's a bit like Goldilocks in the story of the three bears when she eats their porridge, it's either too hot or not hot enough or in this case, too much or not enough.
To finish up and curiously might I add, whether we apply the T-Test method purely which was smoother in output or any other kind of mean / standard deviation type of calculation, the test results seemed to be relatively consistent and repeatable.
The Data Generator | Martin Davies
The results were definitely worth the effort!!!
In the table below I have tallied up the differences between means and standard deviations across testing sets, and captured the tally when the differences between a pair is greater than a tolerance of 0.0005. Increasing the test tolerance or significance to 0.00005 drove out more failures per test, just as one would expect.
The Test Results | Martin Davies
The outcome is quite interesting, which I will summarize below:
It follows that if you are generating random numbers between 0.0005 and 0.99995 and you classify 0.0005 as significant, you may find two thousand samples are more than enough. If your significance is higher, say another decimal point just as an example, well; you might want to run ten thousand iterations rather than five, just to be sure.
In all the tests that I ran, ten thousand iterations never failed a test but most of the time when we are dealing with a significance of 0.0005, two thousand samples seem to generate stable outputs and a flat error curve.
Quite simply, I have this sense that five thousand samples just doesn't serve us so well and we might be better off with more or less samples in our simulations when we are using Monte Carlo. It's a bit like Goldilocks in the story of the three bears when she eats their porridge, it's either too hot or not hot enough or in this case, too much or not enough.
To finish up and curiously might I add, whether we apply the T-Test method purely which was smoother in output or any other kind of mean / standard deviation type of calculation, the test results seemed to be relatively consistent and repeatable.
No comments:
Post a Comment