Content of PetroWiki is intended for personal use only and to supplement, not replace, engineering judgment. SPE disclaims any and all liability for your use of such content. More information

# Estimating cost and time

Estimating capital, one of the main ingredients for any cash flow calculation, is largely in the domain of the engineering community. Petroleum engineers are responsible for drilling costs and are often involved with other engineers in estimating costs for pipelines, facilities, and other elements of the infrastructure for the development of an oil/gas field.

All practicing engineers have heard horror stories of cost and schedule overruns, and some have even been involved directly with projects that had large overruns. Why did these overruns occur, and what could have been done to encompass the actual cost in the project estimate? Overruns can result from:

• inefficiencies,
• unscheduled problems and delays,
• changes in design or execution, or
• a host of other reasons

The upstream oil/gas industry is a risky business. One inherently risky operation that we routinely undertake is drilling and completing a well. Thus, it should come as no surprise that estimating the total cost and time of a drilling prospect is a common application of uncertainty analysis, principally Monte Carlo simulation.

Cost models fall into the general class of aggregation models—we add line-item costs to get a total cost. These line items are specified as ranges or probability distributions, and the total cost is then a sum of the line items.

### Simple authorization for expenditure (AFE) model

Table 1 shows a probabilistic AFE for drilling a single well. The line items are described by symmetric triangular distributions. There are two subsections of the model, each with a cost subtotal.

• The first cost subtotal comprises the cost of goods and services (consumables and tangibles) that are not time-dependent.
• The second cost subtotal represents the rig cost (i.e., the costs attributable to accomplishing each phase).

The two ultimate outputs are Cost Total (the sum of the two subtotals) and Rig Time.

The user enters estimates for the minimum, most likely, and maximum for each estimated line item. In the top portion, the user enters estimates for line-item costs. In the bottom portion, the user enters estimates for the activity times. The costs associated with these tasks are then calculated as the time to complete the task multiplied by the rig day rate.

Assumptions include:

• Items in the activity portion include all aspects of drilling. Thus, the “9 5/8-in. section” would include any tripping, minor expected delays, running casing, and cementing, in addition to drilling. (See comments on level of detail.)
• There is no correlation between any pair of items.
• The rig day rate is either a constant (if the rig is under contract) or a distribution (no contract).
• The estimate covers only “scheduled events” and does not take into account either change of scope or “trouble time.”

Some of these assumptions make the model simpler to design but less realistic. These shortcomings are easy to overcome, as we address later in this section.

### Why use triangular inputs?

In our example of a simple AFE model, we chose symmetric triangular distributions for the inputs. Why? Our example came from a location where there were many offset and analogous wells on which to base our cost and time distributions. Many cost engineers are trained to provide a base cost, which is frequently viewed as a most likely value together with a downside and upside (sometimes stated as a plus and minus percentage of the base cost). The triangular distribution is therefore a natural starting point. In practice, many line-item ranges are right-skewed, acknowledging the belief that time and cost have more potential to exceed the base case than to fall short.

Another skewed-right distribution is the log-normal, and it is also popular for line items. One drawback of the log-normal for cost estimates, however, is that it is fully determined by specifying only two points, not three. Although some users take three points and convert to a log-normal, one should be careful with the process. Suppose, for instance, that we are given the three values 30, 60, and 120 for a low, most likely, and high estimate for some line item. We could use the two extreme values as a P2.5 and P97.5 and assume that this 95% range (confidence interval) between them is approximately four standard deviations. The logic is that for normal distributions, the range would be exactly 3.92 standard deviations. For log-normal distributions, there is no simple rule, though experimentation would lead to a reasonable estimate. Once the standard deviation is estimated, one other value determines a unique log-normal, and the user may typically decide that the mid-range value will serve for a mode, P50, or mean.

### Resulting time and cost estimates

Figs. 1 and 2 show the cumulative distribution of the AFE well time and the corresponding sensitivity chart. Because of the dominance of the time-dependent costs in this particular model, the cumulative distribution for total well cost, Fig. 3, and its corresponding sensitivity graph, Fig. 4, are quite similar to those of the well time. The drilling AFE calculated probabilistically now allows us to report that we are 90% confident that the well will cost between U.S. \$10.1 and \$11.5 million with an expected (mean) cost of \$10.8 million. Similarly, we are 90% confident that the well will take between 58 and 70 days, with an expectation (mean) of 64 days to drill. The sensitivity charts indicate that the driving parameters in both the time and cost to drill this well are the 9 5/8-in. section—the testing and the completion phases. If we wanted to reduce our uncertainty and have the biggest impact on the well time and cost, we would focus our attention (i.e., engineering skills) on those phases.

## Handling problems

As previously mentioned, one of the assumptions in this AFE model, as it stands, is that there are no unscheduled or problem events included. In reality, there rarely, if ever, is a well drilled that does not encounter one or more unscheduled events. The event may impact either the cost or the schedule or both. Because we want the model to be as realistic as possible, we must include the possibility of these unexpected events in our model.

### Mechanics of modeling problems

A simple method of handling various problems encountered in drilling is to introduce a discrete variable that takes on the value zero when no problem occurs and the value one when there is a problem. We assign the probability of a one occurring, that is, that a problem will occur on any given iteration. Either a binomial distribution or a general discrete distribution may be used.

Table 2 shows the modified drilling AFE worksheet with two rows inserted to accommodate this modification. In the first row (row 26), we have a cell for the probability of occurrence of the problem—in this instance, stuck pipe—and another cell for a binomial distribution that references the problem’s probability. The probability of having stuck pipe in this example is 30%, obtained from our experience with similar wells in this area. What if we had no data? We would assign a probability based on our expert engineer’s opinions.

The second row contains a new line item for the time needed to free the stuck pipe. In cell F27, we multiply the values sampled from the two distributions (logically equivalent to an “if” statement) to get either zero (in case the binomial distribution returns zero, signifying no stuck pipe in that iteration) or some value between 5 and 20 days. The corresponding cost in cell G27 is like any other formula in that column, except in this case it takes the value zero some of the time.

### Effect of including one or more potential problems

Fig. 5 shows the probability density function (PDF) for the resulting AFE well time estimate, when the potential problem of having stuck pipe is included. Notice that while the graph appears right-skewed, which is more in keeping with our experience (i.e., wells are more likely to have overruns), the graph is actually bimodal. In 70% of the cases, we do not have stuck pipe and everything goes as planned. In 30% of the simulations, we have stuck pipe, and there is an associated time and cost uncertainty to recovering from that problem. What has happened to the sensitivity diagram (Fig. 6)? Now the primary driver is whether we get stuck or not. Maybe it is time to look at the alternative drilling fluid system or the new technology that can get us through the whole section quicker, thus reducing significantly our chances of getting stuck.

## Considerations

### Handling correlation among line items

In many cases, when one line-item cost is high, other line-item costs are likely to be high. Steel price changes, for instance, can cause simultaneous changes in several line items of a cost estimate. In such cases the user can assign a correlation coefficient to appropriate pairs of line-item distributions. The level of correlation is pretty subjective unless one has data. For example, if we track average unit prices for two items on a weekly or monthly basis, we can use the CORREL function in Excel to calculate the correlation coefficient. When data are not available, one method is to try two or three correlation coefficients (say 0.3, 0.6, 0.9) and examine the impact on the model outputs. For cost models, all (positive) correlation increases the standard deviation of the outputs; correlation does not affect the mean.

### Central limit theorem effects

Cost models follow the pattern of any aggregation model—the outputs tend to have relatively narrow ranges compared to the inputs. As a rule of thumb, summing N similar line items will yield a total with a coefficient of variation that shrinks by a factor of √N . The central limit theorem (see Language of Risk Analysis and Decision Making) says this reduction is exactly true when the distributions are identical, normal, and uncorrelated. In practice, the rule is surprisingly accurate, provided that one or two very large items do not dominate the sum. Also, the rule tends to lose accuracy when several items are highly positively correlated, because the resulting increase of extreme input values tends to spread out the results. The results of an aggregation model tend to be skewed rather than normal when the model includes events simulated using binomial or discrete distributions, such as those used for problem events.

### Level of detail

Historically, deterministic drilling-AFE models might have hundreds of line items—in part, for accounting purposes. Monte Carlo AFE models, however, tend to have a few dozen items. Construction cost estimating models can be even more detailed. One operator had a work breakdown structure (WBS) with 1,300 lines. While it is possible to transform such a detailed model into a Monte Carlo model, the drivers (the most sensitive variables) tend to be 20 or fewer. Therefore some users have two models—one very detailed and the other a more coarse, consolidated version. Keep in mind that one reason for doing risk analysis is to identify key inputs and then try to manage them. Many times the risk analysis model will be designed to optimize use of historical data while allowing the user to track a meaningful level of detail as the project progresses.

Software continues to improve, but very large, highly detailed models can be difficult to manage. For instance, there is usually some limit on the size of a correlation matrix, yet the models with hundreds of line items will necessitate incorporating correlation (otherwise, the central limit theorem effects will reduce the resulting distribution’s standard deviation to an unrealistically low level of uncertainty). The most popular Monte Carlo application packages are add-ins to Excel, which has a limit on matrix size (256 columns as of spring 2002).

## Summary

Capital expenditure and budgeting models, such as cost and time models, are good examples of aggregation models. In these models, we must address the following:

• what type of distributions to use to describe the input parameters
• what correlation exists among the input parameters
• what level of complexity or detail is appropriate to the model
• what problem events to incorporate

The results we obtain from these models allow us to plan and budget based on a range of outcomes; the sensitivity charts focus our attention on the drivers to apply our risk management and risk-mitigation skills. A simple drilling AFE model (with each activity finishing before the next one begins) was used to illustrate these risk analysis concepts. More complex time and cost models, such as those with concurrent tasks, can also be solved with more complicated spreadsheet models or other existing software.