Content of PetroWiki is intended for personal use only and to supplement, not replace, engineering judgment. SPE disclaims any and all liability for your use of such content. More information

# Decision tree analysis

Risk analysis is a term used in many industries, often loosely, but we shall be precise. By risk analysis, we mean applying analytical tools to identify, describe, quantify, and explain uncertainty and its consequences for petroleum industry projects. Typically, there is money involved. Always, we are trying to estimate something of value or cost. Sometimes, but not always, we are trying to choose between competing courses of action.

The tools we use depend on the nature of the problem we are trying to solve. Often when we are choosing between competing alternatives, we turn toward decision trees. When we simply wish to quantify the risk or the uncertainty, the tool of choice is Monte Carlo simulation.

A decision tree is a visual model consisting of nodes and branches, such as Fig. 1, explained in detail later in this article. For now, observe that it grows from left to right, beginning with a root decision node (square, also called a choice node) the branches of which represent two or more competing options available to the decision makers. At the end of these initial branches, there is either an end node (triangle, also called a value node) or an uncertainty node (circle, also called a chance node). The end node represents a fixed value. The circle’s branches represent the possible outcomes along with their respective probabilities (which sum to 1.0). Beyond these initial uncertainty nodes’ branches, there may be more squares and more circles, which generally alternate until each path terminates in an end node.

## Purpose of decision trees

The idea is to describe several possible paths representing deliberate actions or choices, followed by events with different chances of occurrence. The actions are within the control of the decision-makers, but the events are not. By assigning probabilities and values along the way, we can evaluate each path to select an optimal path. The evaluation is simple, consisting of alternating between calculating weighted averages or expected values at each circle, then choosing the best action from each square. Ultimately, we obtain a value for the root node. The solution to the decision tree consists in this pairing of root value and optimal path.

The numbers at end nodes generally represent either net present value (NPV) or marginal cost—the goal being to either maximize NPV or minimize cost. Thus, the optimal action at each square might be a maximum (for NPV) or a minimum (for cost) of the various branches emanating from that square.

Fig. 1 shows a simple decision tree with one choice node and one chance node. The decision tree represents a choice between a safe and a risky investment. Selecting the risky alternative results in a 50% chance of winning \$40 and a 50% chance of losing \$10. Alternatively, one can be guaranteed \$8. We solve the decision tree by first calculating the expected value of the chance node, 0.5 × 40 + 0.5 × (–10) = 15, and then selecting the better of the two alternatives: \$15 vs. \$8, namely \$15. The “correct” path is the risky investment, and its value is \$15.

Some would question this logic and say that they prefer the sure thing of \$8 to the chance of losing \$10. A person who would prefer the guaranteed \$8 might also prefer \$7 or \$6 to the risky investment. Trial and error would reveal some value, say \$6, for which that person would be indifferent between the two alternatives. That is, they would be just as happy to have \$6 as they would to have the opportunity to flip a fair coin and get paid \$40 if heads comes up and lose \$10 if tails comes up. In this case, we call \$6 the certainty equivalent of the chance. The difference between the actual expected value and the certainty equivalent, in this case \$15 – \$6 = \$9, is called the risk premium, suggesting the price you would pay to mitigate the risk. Pursuing this line of reasoning leads us to the topic of utility functions.[1]

## Utility functions

Suppose you are faced with a risky choice, say whether to drill a prospect or divest yourself of it. If successful, you would then develop a field. If unsuccessful, you would lose the dry hole cost. For simplicity, we imagine the best and worst possible NPV, a loss of \$100 million and a gain of \$500 million. We proceed to construct a utility function for this investment. For brevity, we denote NPV as V and utility as U. We wish to construct a function, U = f (V). This function maps the range, [–100,500], usually represented on the horizontal axis to the range [0.1] on the vertical axis. Typically, the shape is concave down, like U = log(V), U = √V, or U = 1 – exp(–V/R), where R is a large constant. There is a formal set of rules (axioms) of utility theory from which one can prove certain propositions. A company or an individual willing to obey these axioms can develop and use a utility function for decision-making. Rather than get into the level of detail necessary to discuss the axioms, let us simply construct one utility curve to get the flavor of the process.

First, assign utility U =1 for V = 500, and U = 0 for V = –100. Next, ask for what value V you would be indifferent between V and a 50-50 chance of –100 and 500. Suppose this value happens to be 50. This establishes that U = 0.5 corresponds to V = 50. The reason follows from the axioms of utility theory.[1] Essentially, these axioms allow us to build a decision tree with values, then replace the values with their utility counterparts. So, a decision tree having a sure thing with a choice of 50—a risky choice with a 50% chance of –100, and a 50% chance of 500—would represent an indifferent choice. The corresponding utilities on the risky branch would have an expected utility of 0.5 × 0 + 0.5 × 1 = 0.5 or 1/2.

We now have three points on the utility curve. We obtain a fourth point by asking for a certainty equivalent of the 50–50 chance of –100 and +50. If this value chosen is –40, that would say that U(–25) = 1/4. Next, we ask for a certainty equivalent of the risky choice of 50–50 chance of the values 50 and 500. If this is 150, then U(150) = 3/4. We could continue this process indefinitely, selecting a pair of values the utility of which is known and generating a value the utility of which is halfway between. The resulting table of pairs can be plotted to obtain the utility curve.

In theory, once the utility curve is established, all decisions are based on utility rather than value. So any decision tree we build with value is converted to the corresponding utility-valued tree and solved for maximal utility. The solution yields a path to follow and an expected utility, which can be converted back to a value, namely its certainty equivalent. Finally, the difference between the certainty equivalent and the expected value of the original (value laden) decision tree is called the risk premium. Thus, everything that follows about decision trees could be coupled with utility theory, and the decision trees we build could be converted to ones with utility rather than values. Software can do this effortlessly by specifying a utility function.

## Decision tree basics

The expected value is an essential idea not only in decision trees, but throughout risk and decision analysis. Here are some of its interpretations and properties.

Expected value

• Is the long-run average value of the chance
• Is the probability-weighted average of the end-node values
• Is a surrogate for the entire chance node
• Is a function of both the probabilities and the values
• Has the same units as the end-node values
• Is usually not equal to one of the end-node values, but always between the minimum and maximum
• Provides no information about risk

Chance nodes

Any number of branches can emanate from a chance node. Typical decision-tree fragments have two, three, or four branches. As with choice nodes, we often limit the number of branches to three or four through consolidation. Sometimes, there are two or more decision trees that represent the same decision. For instance, consider the choice of playing a game in which you must flip a fair coin exactly twice and for each head you win \$10 and for each tail you lose \$9. We can represent the game path (as opposed to the choice of “pass” or “do not play”) with two consecutive chance nodes or one chance node with either three or four outcomes. See Fig. 2. All decision trees are valid. Each one tells a different story of the game.

Choice nodes

Like chance nodes, choice nodes may have any number of branches, but often, they have two or three. Some simple examples are given next.

Two choices—no other alternatives

• Do or do not proceed or delay.
• Drill vertical or slant hole
• Run 3D seismic or do not
• Replace bit or do not
• Set pipe or do not

Three or more choices

• Proceed/stop/delay

Solving a decision tree includes selecting the one branch from each choice node the expected value of which is optimal—namely the largest value when the decision tree values are NPV and the smallest value when the decision tree values are cost. The people involved in constructing a decision tree (sometimes referred to as framing the problem) have the responsibility of including all possible choices for each choice node. In practice, there is a tendency to second guess the solution process and disregard certain choices because they seem dominated by others. Avoid this. In general, the early stages of the decision tree building should be more like a brainstorming session, in which participants are open to all suggestions. Clearly, there must be a balance between the extremes of summarily rejecting a choice and going into too much detail. Experienced leaders can be useful at the problem-framing stage.

Discretization

One of the steps in reducing a Monte Carlo simulation to decision trees involves replacing a continuous distribution with a discrete counterpart. Elsewhere, we describe solutions to estimation problems by Monte Carlo simulation, resulting in an output distribution. Imagine we are trying to characterize the NPV of a field development that can range from an uncertain dry-hole cost through a large range of positive value, depending on several variables such as:

• Reserves
• Capital investment
• Productivity
• Oil/gas prices
• Operating expenses

Most of us would conduct the analysis with Monte Carlo simulation, but some would prefer to portray the results to management with the help of a decision tree.

Consider the decision tree in Fig. 3, which depicts a classic problem of success vs. failure for an exploration well. The failure case (“dry hole”) is simple enough, but success is a matter of degree. Yet no one would argue that the four cases listed here are the only actual possibilities. Rather, they are surrogates for ranges of possible outcomes with corresponding probabilities. The four discrete values might have been extracted from a distribution of possible successes. The process of replacing the continuous distribution with discrete values is called discretization.

Suppose we run a Monte Carlo simulation with 1,000 iterations; then examine the database of results in a spreadsheet, sorting the 1,000 values of NPV from small to large, then grouping them into categories, perhaps arbitrarily chosen, called uncommercial, small, medium, large, and giant. Within each category, we take the average value and calculate the fraction of values in that range, namely (number of data)/1,000. These are, respectively, the values and the probabilities entered in the decision tree. Clearly, each value is now a surrogate for some range. We do not really believe that there are only five possible outcomes to the choice of drill.

Conditional probability and Bayes’ Theorem in decision trees

From Fig. 4, a simple decision tree shows the selection between setting pipe and drilling ahead when approaching a zone of possible overpressure. The overpressured zone is a “kick” the probability of occurrence of which is 0.2. The values in this case are cost, so we want to minimize the root node cost.

The values represent estimated costs for three things: setting pipe (\$10,000), controlling the overpressure without protection of casing (\$100), and with protection (\$25,000, including the cost of setting pipe). The expected values of the two chance nodes are 0.2 × 100 + 0.8 × 0 = 20, and 0.2 × 25 + 0.8 × 10 = 13. Therefore, we decide to set pipe at an expected cost of \$13,000 rather than drill ahead with an expected cost of \$20,000.

When decision trees have a second chance node, the uncertainty nodes that follow it use conditional probability. Thus, in Fig. 5, the probabilities for Failure B and Success B are really P(~B|A) and P(B|A) because these events occur after Success A has occurred. Thus, Bayes’ Theorem comes into play, and the user must exercise care not to violate the laws of conditional probability, as the following example illustrates. First, we restate this result.

Bayes’ Theorem P(B|A) = P(A|B) × P(B)/P(A); P(A) = P(A&B1) + P(A&B2) + ... + P(A&Bn), where B1, B2, ... Bn are mutually exclusive and exhaustive.

Value of information We are often faced with a problem of assessing uncertainty (in the form of some state of nature) and its consequences with limited data. When the stakes are high, it may be possible to postpone the decision, invest some resources, and obtain further information (from some sort of diagnostic tool) that would make the decision more informed. Here are some typical states of nature we try to assess. [[[Decision tree analysis#Example 2: value of information|See example below]]]

• Will a prospect be commercial or noncommercial?
• Will a target structure have closure or no closure?
• Will our recent discovery yield a big, medium, or small field?
• Will we need only one small platform or either one big or two small platforms?
• Is the oil field a good or a marginal waterflood prospect?
• Does the zone ahead of the drill bit have abnormal or normal pressure?

Some corresponding types of information are

• Pilot flood—prospect: good/marginal.
• 3D seismic—closure: likely/unlikely/can’t tell.
• 3D seismic—hydrocarbon: indicated/not indicated.
• Well test—productivity: high/moderate/low.
• Delineation well—platform needs: big/small.
• Wireline logs—pressure: high/normal/low.

Solving the value of information decision tree Before we can solve the expanded decision tree, we must fill in the remaining probabilities in the lower portion, which are calculated with Bayes’ Theorem. First,

Similarly,

And

Next, we calculate the conditional probabilities.

and

We leave it to the reader to verify that the expanded decision tree now has a value of \$48.6 million, whereas the original decision tree has a value \$44 million (= 0.6 ×100 –0.4 × 40). By definition, the value of information is the difference between the new and old decision tree values; value of information = \$48.6 – \$44 = \$4.6 million. We conclude that we should be willing to pay up to \$4.6 million to purchase the 3D seismic interpretation

## Decision tree sensitivity

Decision trees also have inputs and outputs. The inputs consist of the values (typically either NPV or cost) or the probabilities of the various outcomes emanating from the chance nodes. Sensitivity analysis amounts to selecting one of these inputs and letting it vary throughout a range, recalculating the decision tree with each new value, then plotting the output (the root decision value) as a function of the chosen input range, which yields a piecewise linear graph for each of the root decision options.

For instance, consider the example introduced earlier concerning whether to drill on or set pipe as we approach a possibly overpressured zone (see Fig. 4). By varying the chance that the zone is overpressured from 0.1 to 0.5 (around the base case value of 0.2), we calculate the cost of the two alternatives (Fig. 7) and see that only for a very small chance of overpressure values would it be correct to drill forward, and for the other case, setting pipe is a safe and low-cost choice. Similarly, we could perturb the cost of encountering overpressure from the base case value of 100 to a low value of 50 and a high value of 200 and obtain a similar graph.

Finally, one can vary two inputs simultaneously. That is, we could consider all combinations of the P(kick) and cost of kick. This is called a two-way sensitivity analysis in contrast to the one-way analysis already described. It is helpful to have software to handle all these cases, which are otherwise tedious. The graph for the two-way sensitivity analysis is difficult to interpret, being a broken plane in three dimensions. Alternatively, we can generate a rectangle of combinations and color-code (or otherwise distinguish) them to indicate which ones lead to the choice of setting pipe.

In the end, however, sensitivity analysis for decision trees resembles more the deterministic methods of the traditional tornado plots or spider diagrams than it does the more robust sensitivity of Monte Carlo simulation. In fact, software packages often offer these charts to present the results. In spite of the limitations of these methods, it is imperative that anyone using decision trees do a careful job of sensitivity analysis and include those results in any presentation.

## Examples

### Example 1: upgrading a prospect[2]

Suppose that we believe two prospects are highly dependent on each other because they have a common source and a common potential seal. In particular, suppose P(A) = 0.2, P(B) = 0.1, and P(B|A) = 0.6. This is the type of revised estimate people tend to make when they believe A and B are highly correlated. The success of A “proves” the common uncertainties and makes B much more likely.

However, consider the direct application of Bayes’ Theorem: P(A|B) = P(B|A) × P(A)/P(B) = (0.6) × (0.2)/0.1 = 1.2. Because no event, conditional or otherwise, can have a probability exceeding 1.0, we have reached a contradiction that we can blame on the assumptions.

When two prospects are highly correlated, they must have similar probabilities; one cannot be twice as probable as the other. Another way of looking at this is to resolve the equations: P(A|B)/P(A) = P(B|A)/P(B), which says that the relative increase in probability is identical for both A and B.

Aside from these precautions, when assigning probabilities to event branches of a decision tree, there is another use of Bayes’ Theorem in decision trees, namely the value of information, one of the most important types of applications of decision trees.

### Example 2: value of information

Given a prospect, you are faced with the choice of drilling for which the geoscientists give a 60% chance of success or divesting. The chance of success is tantamount to the structure being closed, all other chance factors (source, timing and migration, reservoir quality) being very close to 1.0. A member of the team suggests the possibility of acquiring 3D seismic interpretation before proceeding. He does caution, however, that the seismic interpretation, like others in the past, could yield three possible outcomes: closure likely, closure unlikely, and inconclusive. The extended decision tree, shown in Fig. 6, incorporates these possibilities. Note how the original decision tree (before considering the third option of acquiring information) would have had only two choices and one chance node.

Additional data necessary to do the problem includes: the mean NPVs of \$100 million for the success case, \$40 million for the failure case, and \$10 million for divesting, along with the sensitivity table (Table 1), which indicates how accurate or reliable the 3D interpretation is for this particular context (for a given geographical/geological environment, with data of certain quality and interpretation by a particular individual/company). The interpretation of the table is P(closure likely|closed) = 0.70 = P(A1|B1), as opposed to the possible misinterpretation that the value 0.70 refers to the conditional probability in the opposite direction, P(B1|A1).

One should be curious about the source of this data. The values for success and failure cases and for divestiture are obtained by routine engineering analysis. The sensitivity table must come in part from the expert doing the interpretation. In a perfect world, these estimates would be backed by extensive empirical data. In reality, the best we can do is to estimate the entries and then do sensitivity analysis with our decision tree.

Speaking of perfect, there is a special case worth noting, namely when the information is “perfect,” which corresponds to the information in Table 2. The entries in the lower left and upper right corners of the sensitivity table are called, respectively, false negatives [P(A3|B1)] and false positives [P(A1|B2)]. They both measure inaccuracy of the prediction device.

## Nomenclature

 A = area, acres; one or a set of mutually exclusive and exhaustive Bayesian-type events B = another one or a set of mutually exclusive and exhaustive Bayesian-type events, various units P = probability of an event, dimensionless

## References

1. Clemen, R.T. and Reilly, T. 2000. Making Hard Decisions with Decision Tools Suite. Boston, Massachusetts: Duxbury Press.
2. Murtha, J.A. 2001. Risk Analysis for the Oil Industry. Supplement to Hart’s E&P (August 2001) 1–25. http://www.jmurtha.com/downloads/riskanalysisClean.pdf.

## Noteworthy papers in OnePetro

Gatta, S.R. 1999. Decision Tree Analysis and Risk Modeling To Appraise Investments on Major Oil Field Projects. Presented at the Middle East Oil Show and Conference, Bahrain, 20-23 February 1999. SPE-53163-MS. http://dx.doi.org/10.2118/53163-MS