Content of PetroWiki is intended for personal use only and to supplement, not replace, engineering judgment. SPE disclaims any and all liability for your use of such content. More information

Decision tree analysis and Monte Carlo simulation are the most commonly used tools in decision and risk analysis. But other tools such as optimization, options analysis, and combinations of these various tools can also be useful. This article examines the importance of data analysis and the nature and application of these other tools.

## Data analysis

Regardless of the principal tool used in risk analysis—Monte Carlo simulation or decision trees—empirical data may play an important role. Estimating the probabilities and values for a decision tree is often done by examining historical data. Similarly, the input distributions selected for a Monte Carlo model are easier to justify when analogous data is available to support the choices of distribution type and value of defining parameters, such as mean and standard deviation.

There are two procedures we can follow when given a set of data, depending on how willing we are to make any assumptions about them. We can make no assumptions about any underlying distribution, describing the data in terms of mean, median, mode, range, quartiles or deciles, and the like. We can draw a stem-and-leaf diagram, a histogram and/or a cumulative distribution, looking for bimodality, outliers, and other anomalies. We can assume the data is a sample from some particular population. We can calculate standard deviation and skew. We can go on to find one or a few possible distribution types and defining parameters that would be likely candidates for this population. Stated tersely, we find the best-fitting probability distribution for the data.

The first method is straightforward. Using a spreadsheet, we can invoke the functions AVERAGE, MEDIAN, MODE, MIN, MAX, COUNT and so on, referencing the column or row of data, or we can use the menu sequence, Tools/Data Analysis/Descriptive Statistics, then Tools/Data Analysis/Histogram.

The second method requires software that uses a "goodness-of-fit" metric to compare a fitted density function to the data's histogram. The most popular one is the chi-square test, defined as: χ2 = sum(di2/yi), where di is the difference between the ith data point and yi, the function's prediction for that point. The distribution that minimizes this sum of normalized squared errors is deemed the best fitting curve.

While this process seems simple, some caution is advised. For example, bearing in mind that the density function is supposed to pass as close as possible to the data (in the sense of minimizing the value χ2, it is obvious that the value of the chi-square best-fit statistic depends on the number of classes one chooses for the histogram. Nevertheless, the software generally yields a few good fits for your selection—distributions that would have very similar results in a model.

To avoid the dependence on number of classes, you might choose one of the two other popular fitting metrics, namely the Anderson-Darling and the Kolmogorov-Smirinov. Neither depends on the number of histogram classes because they use numerical integration.

This curve fitting—while resembling the common least squares, linear regression procedure of finding the best linear relationship Y = mX + b —differs in several respects.

• Linear regression requires only three or four points to establish a sensible trend between Y and X; but density-function fitting requires a dozen or so points or more to establish a histogram with a few classes and a few points per class.
• Linear regression is intuitive; anyone can draw a fairly good line through a scatter plot, but not many people are good at sketching log-normal or normal curves, and the best-fitting triangles are often surprises.
• The subroutines to minimize the goodness-of-fit function χ2 are not as simple as the familiar formula for regression, often given as an exercise in Calculus II as soon as the student knows how to take simple partial derivatives of quadratic functions.

To repeat, one should use the curve-fitting software with care. A few other things to note:

• Often the best-fitting curve is not one of the familiar distributions, yet there is almost always a familiar type that is nearly as good a fit.
• The software may require that the user specify whether to fix the left bound of the distribution at some constant such as zero to obtain a good fit of a log-normal distribution, but this rules out normal curves and restricts triangles considerably.
• The software requires a minimum number of points to work properly. Check the user manual.
• Using the cumulative histogram and cumulative distribution for fitting always looks like a better fit than the density function and the histogram.

### Using risk analysis to rank investments

Decision trees explicitly compare two or more alternatives and choose the one having the best expected value. In Monte Carlo simulation, the "answer" is simply one or more output distributions—not a single number. Suppose we are modeling reserves, for example. The output is a distribution having a mean and a standard deviation and skewness and a set of percentiles. When we include the dry-hole case, the distribution will not take a simple shape of a log-normal or normal distribution, but would have a spike at zero and one or more lumps depending on the possibility of two or more layers or components (see Fig. 10.22). Similarly, if we are modeling NPV, we will often get a complicated distribution. Now suppose we had a competing prospect and estimated its reserves and its NPV. The question becomes, "Is there some way to compare these distributions to rank the two prospects?" There are numerous methods to rank and compare. We mention a few of them.

Let μA and μB be the means and σA and σB the standard deviations of distribution of reserves for two prospects A and B, and let pA and pB be their respective chances of success. Here are some possible ranks.

• According to the larger of μA and μB.
• According to the larger of pAμA and pBμB.
• According to the larger of μA/σA and μB/σB.

A 2D ranking can be realized by cross plotting (μA,σA) and (μB,σB). This works best with several prospects: where we look for dominance in the diagonal direction and where μ gets bigger and σ A gets smaller. This is essentially the method of portfolio optimization. What all these metrics, except the first one, have in common is that we scale back the mean by some factor of risk.

Now, let μA and μB be the means and σA and σB the standard deviations of distribution of NPV for two investments A and B, and let IA and IB be their respective MEAN investment (you could be fancy and treat investment as a distribution). Next, we list some possible ranks.

• According to the larger of μA and μB.
• According to the larger of μA/IA and μB/IB.
• According to the larger of μA/σA and μB/σB .
• By cross plotting ( μA, σA) and (μB, σB) and looking for a dominance in the diagonal direction where μ gets bigger and σA gets smaller.
• A similar cross plot but using the semistandard deviation obtained by averaging those squared deviations from the mean for which the value is less than the mean. This is the traditional portfolio optimization metric leading to the efficient frontier.
• According to the larger of μA/(μA – P5A) and μA/(μB – P5B). [This metric is somewhat inappropriately named risk-adjusted return on capital (RAROC) and P5 is called value at risk (VAR)].

Whatever measure you use that reduces a complex set of information—in this context, one or more probability distributions—to a single value or to a pair of values to be plotted in a scatter plot, you should know that it will be imperfect. One reason for so many different metrics is that people constantly find fault with them. The usual conflict is to have two investments, A and B, where A is ranked higher by the metric chosen, only to find that everyone agrees that B is more attractive. One specific example the authors were involved with used RAROC. The investment involved a government who could default at any point in time, causing a loss of investment and termination of revenue. The probability of default was assigned each time period. After the base model was built, the probability of default was reduced (creating a more attractive investment), and yet, the RAROC decreased.

## Optimization

Classical mathematical programming, which includes linear programming, features a standard optimization problem, which we shall describe in terms of NPV.

Suppose there is a fixed exploration budget, which you must decide how to allocate among four types of drilling prospects. For each, you know the chance of success, the range of drilling and completion cost, and the corresponding ranges of discovery size and ultimate value. You thereby assign each a risk, ranging from low to high. Your objective is to maximize net present value (NPV), but you want to avoid “too much” risk.

The deterministic version of this problem seeks to maximize NPV constrained by the limit on capital and uses average values for everything. The optimal solution is described by a budget allocation and the resulting average NPV. The user would have to decide what risky means. For example, drilling all high-risk wells might be too risky.

The probabilistic version assumes distributions for all well costs, as well as the NPV for the successes, and furthermore assigns a P(S) for each prospect. One additional type of constraint can be included: we can describe “riskiness” of NPV by the coefficient of variation (CV) of the NPV distribution or some high-end percentile, say P90. Here is one way to state the optimization problem.

### Optimizing an exploration program

DDD Enterprises has investment prospects in four world locations, called ventures, and must decide how to allocate it exploration budget among them. Among its objectives are to maximize NPV and to avoid large cost overruns. Technical experts have modeled estimates for drilling and completion costs as well as NPV for discoveries. These distributions, along with the chances of success for each group, are listed in Table 1.

In three of the countries, prior commitments require that a minimum number of wells be drilled. Each country has an upper limit on the number of wells established by either available prospects or drilling rigs. The board of directors has urged that estimated average exposure be limited to \$170 million. Moreover, they require a 90% confidence level for the actual exposure to be less than \$200 million. Given these constraints, the Board wishes to maximize average NPV.

Exposure is defined to be the total of drilling cost (number of wells times average dry-hole cost) plus completion cost (number of successful wells times average completion cost). Average exposure is found by assuming a weighted average of successes [P(S) times number of wells drilled]. All prospects are assumed to be independent. Binomial distributions are used for successes.

Running the optimization consists of batch processing 50 or 100 or more Monte Carlo simulations to find the one that maximizes mean NPV, while honoring the constraints on exposure and the individual number of wells per country. Any simulation resulting in a distribution of exposure with P90 exceeding \$210 million is summarily rejected.

## Real options for risk analysis

One of the recent methods of risk analysis is real options. Borrowing the idea from the investment community, proponents argue that many of our assets possess characteristics similar to a financial option. First, we review simple puts and calls and then outline their counterparts in both the upstream and downstream components of our business.

• A financial option always references a specific underlying asset, which we shall envision as a share of stock.
• The investor pays for the option, an amount called the option price or premium.
• A call option (or simply a call) is the right to buy one share of a stock at a given price (the strike price) on or before a given date (the exercise date).

A put option (or simply a put) is the right to sell one share of a stock at the strike price on or before the exercise date. The value of the option on the exercise date is either the premium or the difference between the market price and the strike price, whichever is greater. That is, we do not exercise the option unless it is to our advantage.

A so-called European option requires that the purchase be made on the exercise date; a so-called American option allows the purchase on or before the exercise date. European options are simpler to model and think about. For instance, the decision to exercise a European option is straightforward: do it if you are “in the money,” (i.e., if the value is positive on the exercise date).

A real option is similar to a financial option but is far more general. Corporations increasingly recognize the implicit value of certain aspects of their business. Specific types of real options that might be available in any development are listed next.

• Changing the scope of the project.
• Changing the time horizon: moving the start date up or back; extending or shrinking the duration, even abandoning the project.
• Changing the mode of operation.

While there are great similarities between financial and real options, their differences are noteworthy. For instance, the underlying asset of a financial option is a share of stock or some other asset available in a market. In theory, the option holder has no influence on the price of that asset (although in practice, things get more complicated; the option holder can buy or sell large quantities of the asset). A real option, however, is usually some kind of project or investment, and the holder of the option may have considerable influence over its value.

### Software for real options

There is special software for real options. At the time of this writing, there is no inexpensive special software analogous to Monte Carlo simulation or decision trees that can be purchased for less than U.S. \$1,000 and run as an add-in in Excel or as a standalone program.

Nevertheless, an experienced Monte Carlo simulation expert can model real options in Excel. Essentially, one must be careful to acknowledge the different possible times at which the option can be exercised, quantify the value, provide branches for the different decisions (whether to exercise or not), and alter the subsequent cashflow properly.

## Combination of tools

Risk optimization combines Monte Carlo simulation with classical optimization (e.g., linear programming, quadratic programming). Another combination that has been used since the late 1990s involves Monte Carlo simulation and decision trees. In essence, any value on the decision tree may be replaced with a continuous probability distribution. Then, on each iteration, samples are chosen from these distributions, and the new decision tree is created and solved, yielding an expected value for the root node. After a few hundred iterations, this root value distribution can then be reviewed. A refinement to this method captures which choices are selected on each iteration. At the end of the simulation, the report can indicate the percentage of time that each decision branch was selected. A branch selected a large percentage of the time would be regarded as an optimal path. This idea is analogous to project-scheduling software run with Monte Carlo enhancements, in which we capture the percentage of time that each activity appears on the critical path. Needless to say, combining tools in this way makes it even more imperative that the user be cautious in designing, testing, and implementing any model to avoid creating unrealistic realizations.

## Risk mitigation and risk management

Risk analysis involves the modeling and quantification of uncertainty. Risk mitigation happens after the analysis and focuses on those unacceptable ranges of possibility (of cost overruns, shortfalls of reserves or NPV, and so on). Risk management is sometimes used as an inclusive term that encompasses risk analysis and risk mitigation and other times is used interchangeably with risk mitigation. In either case, risk management concentrates on what you do after the risk analysis.

Once the drivers of uncertainty have been identified, the focus shifts to ways to reduce the uncertainty. If a reserve model proves to be most sensitive to the bulk volume of the prospect, the company may be more willing to acquire 3D seismic data. If the cash flow of a proposed gas-fired electric power plant shows to be highly sensitive to natural gas price, then one strategy would be to hedge gas prices. When drilling an infill well where there is a great deal of uncertainty about initial production, it may make sense to fund a program of 10 wells or 50 wells rather than a single well, so on average, the wells produce according to expectations. In essence, the average of a sample tends toward the population average.

In general, risk mitigation is protection from unfavorable situations, using a variety of instruments and tools, including:

• Hedges
• Turnkey, price- or cost-lock contracts
• Guarantees
• Insurance
• Partnering and diversification
• Increased level of activity to help achieve the law of averages
• Alternate technology or redundancy

One key to risk management when doing Monte Carlo simulation is the sensitivity chart, which tells us the inputs that really matter. Those are the ones that deserve our attention. While it may be an important variable to some specialist, any input that fails to make the top 10 or so, on the sensitivity chart, does not deserve additional resources, assuming we are looking for reduced uncertainty in the outputs. One of the real benefits of risk analysis is the prioritizing of variables to direct the company to those things that could make a difference. Murtha shows a detailed comparison between Monte Carlo simulation and decision trees by solving a problem using both methods.