You must log in to edit PetroWiki. Help with editing

Content of PetroWiki is intended for personal use only and to supplement, not replace, engineering judgment. SPE disclaims any and all liability for your use of such content. More information

# Design of uncertainty models

Probabilistic models—like any models—benefit from good design. A Monte Carlo model is, in principle, just a worksheet in which some cells contain probability distributions rather than values. Thus, one can build a Monte Carlo model by converting a deterministic worksheet with the help of commercial add-in software. Practitioners, however, soon find that some of their deterministic models were constructed in a way that makes this transition difficult. Redundancy, hidden formulas, and contorted logic are common features of deterministic models that encumber the resulting Monte Carlo model.

Likewise, presentation of results from probabilistic analysis might seem no different from any other engineering presentation (problem statement, summary and conclusions, key results, method, and details). Monte Carlo and decision-tree models, however, demand special considerations during a presentation.

This page describes the features of probabilistic models, outlines elements of a good design, and suggests how to ensure that presentations are effective. For the most part, these comments pertain to Monte Carlo models.

## Model: equations + assumptions

For our purposes, a model is one or more equations together with assumptions about the way the variables (inputs or outputs) may be linked or restricted. Next, we give some guidelines for model builders.

**Specify all key equations** For example, N = AhR for volumetric reserves or q = q_{i} × e^{–at} for an exponential decline production forecast. Some models have very simple equations, such as cost estimates where total cost is just an aggregation of line items. Other models have complex structure, such as cash-flow models with multiple production streams, alternative development plans, or intricate timing issues. While some aspects are routine (e.g., revenue = price × volume, cash = revenue – costs), features unique to the problem at hand should be stressed.

**Are there alternative models?** Sometimes there are two or more models that achieve much the same objective. Comparing the model at hand with others familiar to the audience can be useful.

**Other projects that use this model** Knowing that other projects have used a model adds credibility and opens the opportunity to learn prices and expenses. While there may be dozens or even hundreds of time steps, the prototype need be mentioned only once.

**List all assumptions** For example:

- Two successful wells are necessary before field is proved.
- If field size exceeds 100 Bcf, then a second platform is needed.
- Gas price is locked according to contract.
- Success rate on second well increases if first well is commercial.
- Pipeline has maximum capacity of 50,000 B/D.
- All reserves must be produced within 15 years.

## Input distributions, types and dependency

**List all deterministic inputs** Although probability distributions occupy center stage in a Monte Carlo model, key deterministic values should be highlighted (e.g., interest rate = 10.5%, start time = 1 January 1996, duration = 10 years).

**List all input distributions: type, defining parameters, basis for choosing this distribution** Even in large models with hundreds of input distributions, it is essential to identify them all. Large models tend to have multiple parameters of the same kind, which can be typified by one particular variable. For instance, cash-flow models often have a new distribution each time period for prices and expenses. While there may be dozens or even hundreds of time steps, the prototype need be mentioned only once.

Of all the features about the model, the reason for selecting one distribution over another is often a point of discussion that will be raised in a presentation. Each distribution should be identified by type (e.g., normal, log-normal, beta) and by defining parameters (mean and standard deviation, or minimum, mode, maximum). Moreover, the user should explain why the particular distribution was chosen (empirical data that was fit by software, experience, or fundamental principle). The justifications should usually be brief, especially when the user/presenter can state that the particular choice of distribution is not critical to the results. In case other distributions were tested, there should be a comparison between the results available if needed.

## Selection of outputs

In most models, everyone is aware of the natural output(s). In a cost model, we are interested in total cost, but we may also want to know certain subtotals. In reserves models, we want to know the distribution of reserves, but we may want to see the hydrocarbons in place or the breakdown into oil, gas, and liquids. In cash-flow models, we want net-present value(NPV) and perhaps IRR, but we might also want to see production forecasts or cash-flow forecasts, as well as some derived quantities such as cost per barrel, profit to investment ratios, and so on. Bear in mind that an effective presentation focuses on key elements of the results. Too much detail interferes with the bottom line and will risk loss of attention by the audience. The model designer must choose a suitable level of detail.

## Sampling process

Monte Carlo models give the user the option of two types of sampling: one is Monte Carlo and the other is stratified, also called Latin Hypercube sampling. The vast majority of users prefer stratified sampling because the model converges to the desired level in far fewer iterations and, thus, runs faster, allowing the user to do more testing. An example of stratified sampling is to request 100 samples but insist that there is one representative of each percentile. That is, there would be one value between P_{0} and P_{1}, another between P_{1} and P_{2}, and so on.

## Storage of iterations

Monte Carlo software gives the user a choice of how much of the input/output data to store and make accessible after the simulation. At one extreme, one can save only the designated outputs (the reserves, NPV, and total cost, for example). At another extreme, one can store all sampled values from the input distributions. Having the inputs available at the end of a run is necessary to do sensitivity analysis, which calculates the rank correlation coefficient between each output array and each input array, as well as stepwise linear regression coefficients (discussed later). Experienced modelers sometimes identify intermediate calculations and designate them as outputs just to make their values available for post-simulation analysis. **Ref. 1**^{[1]} discusses “pseudocases” which are constructed from these auxiliary variables. For small models, one can be generous in storing data. As models grow, some discretion may be necessary to avoid long execution times or massive data to file and document.

## Sensitivity analysis

Sensitivity analysis, in essence, is “what if” analysis. (Tornado diagrams and spider diagrams) are obtained by holding fixed all but one variable and measuring the change in a key output when the remaining input is varied by some specified amount.

**Monte Carlo sensitivity** Monte Carlo models offer a robust form of sensitivity analysis, which usually comes with two choices of metrics: rank correlation and regression. In each case, the objective is to rank the various inputs according to their impact on a specified (target) output.

**Rank correlation sensitivity analysis** Let Y be an output and X an input for the model. The rank correlation coefficient, rr, between Y and X is a number between –1 and +1. (See the definition and discussion in The tools of the trade.) The closer rr is to +1 or –1, the more influence X has on Y. Positive correlation indicates that as X increases, Y tends to increase. When rr is negative, Y tends to decrease as X increases. A sample of values appears in **Fig. 1**.

**Regression sensitivity analysis** Let Y be an output and X_{1} ,..., X_{n} be inputs. At the end of the simulation, a stepwise linear regression is done with Y as the dependent variable, generating a set of normalized regression coefficients for the X_{s}. These coefficients fall between –1 and 1, where a –0.4 for X_{i} would indicate that Y would decrease by 0.4 standard deviations if X_{i} increased by one standard deviation. Generally speaking, the two methods (correlation and regression) give the same ranking of the inputs.

**Decision-tree sensitivity** Decision-tree sensitivity analysis relies on the classical sensitivity methods. We select one or two decision-tree inputs, namely probabilities or values, and let them vary over a prescribed range (containing the base value), solving the decision tree for each value. When one value is varied at a time, the resulting data can be displayed graphically as a plot of decision-tree value on the vertical axis and input value on the horizontal axis, with one segmented linear graph for each branch of the root decision node. See Decision tree analysis for more details and associated figures. When two values are varied simultaneously, the analogous graph requires three dimensions and has the form of a segmented planar surface, which is often hard to display and explain. Alternatively, one can display the two-dimensional grid of pairs of values for the two inputs being varied, coloring them according to which decision branch is optimal.

One can do multiple one-way analyses and show a tornado or spider chart. Still, decision trees have limits to sensitivity analysis. Even more important, some decision trees have probabilities or values on different branches that are not independent. Consequently, users must be cautious when varying any values in the decision tree, ensuring that related values are also varied appropriately. For example, imagine a decision tree with two branches that estimates the cost of handling a kick under different conditions, say whether or not protective pipe has been set. When the value of the kick is changed for the case without the protective casing, it may, in part, be because rig rates are higher than average, which would also make the costs on the other branch greater as well. Again, Decision tree analysis provides more detail on decision-tree sensitivity analysis.

## Analysis and presentation of results

Presentation is everything—an overstatement, perhaps, but worth considering. People good at probabilistic analysis face their greatest challenge when presenting results to managers who are not well versed in statistics or analysis techniques but responsible for making decisions based on limited information. Our job is to convey the essential information effectively, which requires finesse, discretion, and focus. Imagine a network newscast, in which time is severely limited and the audience may easily lose interest. Good model design and analysis deserve the best presentation possible. Just recall how a student described an ineffective professor as one who “really knew the material but just didn’t communicate with us.”

An effective written report should be, at most, three pages long. An oral report should be less than 30 minutes. We list the essential ingredients.

- State the problem succinctly.
- Describe the model briefly, noting any unusual assumptions or model features.
- Show key results, using histograms and cumulative distribution functions (single cell) and probabilistic time series (called trend charts or summary graphs, for production forecasts and cash flows).
- Display a sensitivity chart with at most 10 or 12 inputs for each important output; consider showing a crossplot of output vs. key input to help explain sensitivity.
- Use overlays of histograms or cumulative functions to compare alternative plans or solutions.
- Address correlation among inputs, showing the correlation matrix with a basis for choice of values.
- Compare probabilistic model results with previous deterministic results for base-case compatibility, and explain any inconsistencies.

A corporate statistician once told us that he is careful with the language he uses in presentations. Instead of a cumulative distribution or probability density function, he uses phrases like “probability vs. value chart.” Think of speaking in a foreign language: use simple terms when possible; save the esoteric language for your specialist colleagues, who might be impressed rather than turned off by it.

## References

- ↑ API RP 10B, Recommended Practice for Testing Well Cements, 22nd edition. 1997. Washington, DC: API.

## Noteworthy papers in OnePetro

Use this section to list papers in OnePetro that a reader who wants to learn more should definitely read

## External links

Use this section to provide links to relevant material on websites other than PetroWiki and OnePetro

## See also

Challenges with probabilistic models

Problems with deterministic models

Decision_analysis:_additional_tools

Statistical concepts in risk analysis