Content of PetroWiki is intended for personal use only and to supplement, not replace, engineering judgment. SPE disclaims any and all liability for your use of such content. More information

# Monte Carlo simulation

Monte Carlo simulation is a process of running a model numerous times with a random selection from the input distributions for each variable. The results of these numerous scenarios can give you a "most likely" case, along with a statistical distribution to understand the risk or uncertainty involved. Computer programs make it easy to run thousands of random samplings quickly.

## Developing a model

Monte Carlo simulation begins with a model, often built in a spreadsheet, having input distributions and output functions of the inputs. The following description is drawn largely from Murtha. Monte Carlo simulation is an alternative to both single-point (deterministic) estimation and the scenario approach that presents worst-case, most-likely, and best-case scenarios. For an early historical review, see Halton.

A Monte Carlo simulation begins with a model (i.e., one or more equations together with assumptions and logic, relating the parameters in the equations). For purposes of illustration, we select one form of a volumetric model for oil in place, N, in terms of area, A; net pay, h; porosity, φ; water saturation, Sw; and formation volume factor, Bo.

N = 7,758Ahφ(1 - Sw) / Bo. ………………………………(1)

Think of A, h, φ, Sw, and Bo as input parameters and N as the output. Once we specify values for each input, we can calculate an output value. Each parameter is viewed as a random variable; it satisfies some probability vs. cumulative–value relationship. Thus, we may assume that the area, A, can be described by a log-normal distribution with a mean of 2,000 acres and a standard deviation of 800 acres, having a practical range of approximately 500 to 5,000 acres. Fig. 1 identifies and shows the distributions for each of the input parameters.

A trial consists of randomly selecting one value for each input and calculating the output. Thus, we might select

• A = 3,127 acres
• h = 48 ft
• φ = 18%
• Sw = 43%
• Bo = 1.42 res bbl/STB

This combination of values would represent a particular realization of the prospect yielding 84.1 million bbl of oil. A simulation is a succession of hundreds or thousands of repeated trials, during which the output values are stored in a file in the computer memory. Afterward, the output values are diagnosed and usually grouped into a histogram or cumulative distribution function. Figs. 2 and 3 show the output and the sensitivity chart for this model.

### Selecting input distributions

Log-normal distributions are often used for many of the volumetric model inputs, although net-to-gross ratio and hydrocarbon saturation are seldom skewed right and are always sharply truncated. Triangles are also fairly common and are easy to adapt because they can be symmetric or skewed either left or right. Sometimes, the distributions are truncated to account for natural limits (porosity cutoffs, well spacing). When all the inputs are assumed to be log-normal with no truncation and independent of one another, the product can be obtained analytically.

### Shape of outputs

In this example, regardless of the distribution types of the inputs, the output is approximately log-normal. That is, the reserves distribution is always skewed right and “looks” log-normal. In fact, a product of any kind of distributions, even with skewed-left factors, has the approximate shape of a log-normal distribution. For our first example, Fig. 2 displays the best-fitting log-normal curve overlaying the output histogram.

## Applications of Monte Carlo simulation

Although decision trees are widely used, they tend to be restrictive in the type of problems they solve. Monte Carlo simulation, however, has a broad range of applicability. For that reason, we devote an entire section to them rather than listing a few applications. Suffice it to say that Monte Carlo simulation is used to answer questions like:

• “What is the chance of losing money?”
• “What is the probability of exceeding the budget?”
• “How likely is it that we will complete the well before the icebergs are due to arrive?”

## Sensitivity analysis

Ask anyone what sensitivity analysis means and they are likely to tell you it has to do with changing a variable and observing what happens to the results. That is the gist of it, but the concept is much broader. We begin with traditional methods, compare their Monte Carlo simulation and decision tree counterparts, and then discuss some extensions and refinements.

The traditional tornado chart or diagram consists of bars of various length indicating the range of values of some key output (cost, reserves, NPV) associated with the full range of values of one input, for example:

• some line item cost
• some geological attribute such as porosity
• capital investment

The calculations are done by holding all but one variable fixed at some base value, while the single input is varied.

Although this interpretation is often useful and very widely used in presentations, it is flawed in several ways.

• Holding all variables but one fixed presumes the variables are fully independent. Many models have pairs of inputs that depend on each other or on some third variable; when one parameter increases, the other one tends to increase (positive correlation) or decrease (negative correlation).
• The base case at which all but one variable is held constant might be a mean or a mode or a median. There is no firm rule.
• There may not be a minimum or maximum value for a given input. Any input described by a normal or log-normal distribution has an infinite range. Even if we acknowledge some practical limit for purpose of the exercise, there is no guideline what those limits should be (e.g., a P1 or P5 at the low end).
• Focusing on the extreme cases sheds no light on how likely it is to be that extreme. There is no convenient way (and if there were, it would almost certainly be incorrect) to see a 90% confidence interval in these bars that make up the tornado chart.

All this is not to say that tornado charts are worthless. On the contrary, they are “quick and dirty” methods and can help us understand which inputs are most important. It is just that we do not want to rely upon them when better methods are available.

### Spider diagrams

Like tornado charts, a spider diagram is a traditional but somewhat limited tool. Again, one holds fixed all but one variable and examines how the output changes (usually measured as a percent change) as we vary that one input (usually by a few specific percentages). Typically, we might vary each input by 5, 10, and 20% and see how much the output changes. Often the percent change is not linear, causing the resulting graph to have broken line segments, accounting for the name: spider diagram.

As with classical tornado charts, the spider diagram makes several assumptions, most of which are unrealistic.

• The variables are completely independent (no correlation or conditionals between them).
• The same range (plus or minus 20%) is suitable for each of the inputs, whereas some inputs might have a natural variable range of only a few percent, while others could vary by 50 or 100% from the base case.
• The base case is again arbitrary, possible being the mean, median, or mode of each input.

Again, while these restrictions make the spider diagram less than perfect, it is often a good first pass at sensitivity and is widely used in management circles. See Figs. 4 and 5, respectively, for examples of tornado and spider diagrams.

### Regression and correlation methods

At the completion of a Monte Carlo simulation, the user has available two robust methods of sensitivity analysis. Consider the database consisting of one output, Y, and the corresponding inputs, X1, X2, ..., Xn. We can perform multiple linear regressions of Y on the Xi and obtain the βi values, numbers between –1 and +1, which indicate the fraction of standard deviation change in the output when the ith input changes by one standard deviation. That is, suppose βi = 0.4, Y has a standard deviation of 50, and Xi has a standard deviation of 6. Then, changing Xi by 6 units would change Y by 20 units.

An alternative form of sensitivity from the Monte Carlo simulation is obtained by calculating the rank-order correlation coefficient between Y and Xi. These values also lie between –1 and +1 and indicate the strength of the relationship between the two variables (Xi and Y). Both regression and correlation are useful. While it may seem more natural to think in terms of the regression method, the xy scatter plot of the Y vs. Xi can be a powerful tool in presentations. It illustrates how a small sample from a key input (i.e., one with a high correlation coefficient) might restrict the output to a relatively narrow range, thus aiding in the interpretation of the sensitivity plot. Both of these methods can be presented as a “tornado” chart, with horizontal bars having orientation (right means positive, left means negative) and magnitude (between –1 and 1), thereby ranking the inputs according to strength or importance. Fig. 4 shows the chart for rank correlation; the corresponding chart for regression would be quite similar.