You must log in to edit PetroWiki. Help with editing

Content of PetroWiki is intended for personal use only and to supplement, not replace, engineering judgment. SPE disclaims any and all liability for your use of such content. More information

PEH:Geologically Based, Geostatistical Reservoir Modeling

Jump to navigation Jump to search

Publication Information


Petroleum Engineering Handbook

Larry W. Lake, Editor-in-Chief

Volume VI – Emerging and Peripheral Technologies

H.R. Warner Jr., Editor

Chapter 2 – Geologically Based, Geostatistical Reservoir Modeling

By Richard L. Chambers and Jeffrey M. Yarus, Landmark Graphics Corp.

Pgs. 59-119

ISBN 978-1-55563-122-2
Get permission for reuse


Reservoir characterization encompasses all techniques and methods that improve our understanding of the geologic, geochemical, and petrophysical controls of fluid flow. It is a continuous process that begins with the field discovery and all the way through to the last phases of production and abandonment.

Reservoir modeling is the final step in the reservoir-characterization process, and consists of building an upscaled geologic model for input to the fluid-flow numerical simulator. Dynamic reservoir simulation is used to forecast ultimate hydrocarbon recovery on the basis of a given production scheme, or to compare the economics of different recovery methods. Conducting a dynamic flow simulation requires several input data types. The high-resolution geologic model (HRGM), for example, uses a grid-size specification; a geometric description of bounding surfaces, faults, and internal bedding geometries; a 3D distribution of permeability and porosity; and relative permeability and capillary pressure/saturation functions or tables. Other necessary information could include fluid pressure/volume/temperature (PVT) properties, well locations, perforation intervals, production indices, production or injection rates, and/or limiting production or injection pressures.

This chapter describes geostatistical reservoir-modeling technologies that depart from traditional deterministic modeling methods, and examines closely the numerous solutions that satisfy the constraints imposed by the data. Using these tools, we can assess the uncertainty in the models, the unknown that inevitably results from never having enough data.

Geostatistical terminology can be confusing. For example, many authors use the terms stochastic modeling, probabilistic modeling, and conditional simulation interchangeably. For simplicity's sake, this chapter also uses these three terms as equivalent, although strictly speaking, in geostatistics, a stochastic model is considered conditional when it honors the measured data statistics and the spatial model.

The material presented here is not heavily mathematical, and we have purposely taken some liberties with notation and terminology to make it easier for a novice or nonexpert to understand the concepts. There are no theoretical derivations or formal proofs given. Although mathematical formalism is kept to a minimum, the presentation is not simplistic. General equations and matrix notation are used when appropriate.


Geostatistics Defined

Earth-science data exhibit spatial connectivity to greater and lesser degree. As the distance between two data points increases, the similarity between the two measurements decreases. Geostatistics is a rapidly evolving branch of applied statistics and mathematics that offers a collection of tools for understanding and modeling spatial variability. Spatial variability includes scales of connectivity (heterogeneity) and directionality within data sets. Geostatistical methods also allow us to quantify and assess the reliability of the models we generate.

Origin of Geostatistics

Geostatistics originated in the mining industry. In the early 1950s, when "classical" statistics were found unsuitable for estimating disseminated ore reserves, D.G. Krige, a South African mining engineer, and H.S. Sichel, a statistician, developed a new estimation method.[1][2] The French engineer Georges Matheron expanded on Krige's innovative concepts and formalized them within a single framework, and coined the word "kriging" in recognition of Krige's work. [3] Although the kriging technique originally was developed for solving ore-reserve estimation problems, with the advent of high-speed computers in the 1970s, it spread to many other areas of earth science; however, it was not until the mid-to-late 1980s that geostatistical techniques were used to any extent in the petroleum industry, though their acceptance and use has grown steadily and significantly since.

Role of Geostatistics in Reservoir Characterization

The enormous upfront expense of developing heterogeneous reservoirs, and the desire to increase ultimate recovery has spurred oil companies to develop and use innovative reservoir-characterization techniques. Geostatistics is one of the many recent technologies often incorporated into the reservoir-characterization process.

Since the late 1980s, geostatistical techniques have become an accepted technology for characterizing petroleum reservoirs, especially when incorporating 3D seismic data. The resultant numerical descriptions often are put into a fluid-flow simulator. Use of geostatistics necessitates the cooperation between the geoscience and reservoir-engineering disciplines, allowing each to contribute fully in the process of building the reservoir model. This is quite different from past approaches, in which mathematical formalization often was left to the reservoir engineers. The multidisciplinary approach, coupled with improved technology for reservoir modeling, ensures that important geologic characteristics are not overlooked in the process.

Traditional geology is qualitative, based soundly on classification schemes and descriptions associated with physical phenomena. In the normal course of reservoir modeling, such qualitative geologic models are transformed into numerical models, though often by a reservoir engineer, rather than by a geologist. If the geologic model is precise, such a transformation presents no problem; however, in the past, the numerical models tended to bear little resemblance to the geologic models on which they were based. The differences commonly were caused by discipline-related interpretation and typically were economically pragmatic. Reservoir models were and continue to be expensive to produce, such that simulating a reservoir at a very fine resolution is impractical. To reduce computer simulation time (ergo cost), the geologic model is coarsened to a more manageable number of grid nodes.

But drastically reducing the size of a reservoir model has ramifications. If the heterogeneity, or complexity, of the geology is oversimplified, the integrity of simulation results can be affected. A coarser initial representation may be appropriate and adequate for a relatively simple reservoir, but with a complex reservoir, it can yield misleading simulation results.

To prevent this problem, history-matching techniques are used to fine-tune the coarser engineering model. Porosity, permeability, and other parameters may be adjusted until the fluid-flow simulation matches the observed well performance, pressures, and flow rates from production tests. If any of these three conditions is matched, the model is assumed to be reasonable, although not unique. Although this model may match history up to a point in time, it can be a poor predictor of future production.

Reservoir simulation brings a kind of closure to a study by providing the economics and development plan, but the production forecast it provides often is inaccurate. Fortunately, the geostatistical approach to reservoir modeling has been in place long enough that the efficacy of the geostatistical procedure can be assessed in terms of production history after the simulation. The improvement to flow-simulation results from using geostatistically built models has been noted not only personally by the authors, but also by others, as well. [4][5][6][7] Frequently, these predictions were overly optimistic, largely because reservoirs were considerably more heterogeneous and compartmentalized than their reservoir models presumed them to be.

Such consistent deviation from prediction indicates the presence of an undefined parameter whose exclusion from the model created a bias. In pointing out that it frequently has been the case that individual wells have not performed as predicted and infill-drilling patterns have not behaved as predicted, we are not suggesting that there are no good reservoir models, nor do we mean to be overly critical. Instead, we are challenging our industry to be more innovative by taking advantage of current technology to preserve heterogeneity in the upscaled models that are input to the reservoir simulator. Compared to the early use of the geostatistical approach, more-recent efforts have included progressively more geology and concern for the consistency of the geology and the observed physical parameters. The results of these efforts certainly validate the need to incorporate more characteristics on the basis of our understanding of the depositional environments represented in the reservoir.

The idea that reservoirs are heterogeneous is not new. Using a relative scale of heterogeneity tied to the original depositional environments, Tyler and Gholston[8] and Tyler and Finley[9] have shown that a substantial amount of mobile hydrocarbons often are left behind in reservoirs of varying heterogeneity. Weber,[10] Srivastava,[11] King and Mansfield, [12] Botton-Dumay et al., [13] and Srinivasan and Caers[14] were pioneers in bed-level evaluation of the effects of heterogeneity on hydrocarbon recovery.

Geostatistically derived reservoir modeling is perhaps the most successful means of improving performance predictions in heterogeneous reservoirs. It is successful because understanding the heterogeneity that exists in the interwell space inherently is a statistical problem that can be quantified. It is not the only approach, nor is it useful in all cases, but it is a rigorous approach that has proved beneficial in the face of many real conditions and practical considerations involved in modeling petroleum reservoirs.

The goal of geostatistically derived modeling is to construct a more realistic model of reservoir heterogeneity using methods that do not simply average reservoir properties. Like the traditional deterministic approach, it preserves both the measured (hard) data where they are known and the interpretative (soft) data whenever they are informative; however, unlike the deterministic approach, geostatistics provides scientists with numerous plausible results (realizations). The degree to which the various models differ is a reflection of the unknown, a measurement of the uncertainty. Some of the realizations may challenge the prevailing geologic wisdom and almost certainly will provide a group of economic scenarios ranging from optimistic to pessimistic.

Having more than one result to upscale and analyze in the flow simulator changes the paradigm of traditional reservoir analysis, though, and such a change is necessary because heterogeneity in dynamic data is not readily apparent when using the traditional method. Srinivasan explains the problem of dynamic data in this way, "Only limited information pertaining to the heterogeneity of the permeability field is present in the dynamic data. Such information must be extracted using calibration methods, so that reservoir models then can be constrained to that calibrated information. The methodology matches the history data in a probabilistic sense (acknowledging other unknowns, such as relative permeability) and can be used to make robust predictions of the future production."*


Personal communication between Jeffery Yarus and Sanjay Srinivasan, U. of Texas (2002)

Back to Basics: Classical Statistics and Its Role in Geostatistical Modeling

A fundamental step in any scientific investigation is the quantitative-description stage. This is particularly true today of the geologic sciences, which in the past had depended largely on qualitative description. Until the facts are gathered and described quantitatively, analysis of their causes is premature. Statistics works with quantities of data, not with a single datum, and so requires those data to be in manageable form. Organized data are the clearest data. Thus, much of statistics deals with the organization, presentation, and summary of information.

Both the computation of classical statistical measures (e.g., mean, mode, median, variance, standard deviation, and skewness), and graphic data representation (e.g., histograms and scatter plots) commonly are used to understand the nature of data sets in a scientific investigation—including a reservoir study. A distinguishing characteristic of earth-science data sets (e.g., for petroleum reservoirs), though, is that they contain spatial information, which classical statistical descriptive methods cannot adequately describe. Spatial aspects of the data sets, such as the degree of continuity—or conversely, heterogeneity—and directionality are very important in developing a reservoir model. Analysis of spatially rich data is within the domain of geostatistics (spatial statistics), but a foundation in classical statistics and probability is prerequisite to understanding geostatistical concepts.

Sampling also has proved invaluable in thousands of studies, but it, too, can lead to statistical insufficiencies and biases. So, when can a sample be trusted? The answer depends on how the sample was selected. As discussed more fully later in this section, classical statistics assumes that each observation in the data set is independent of the others or is random. That is, it assumes that the samples (e.g., porosity measurements from a core or from logs) are from a larger, theoretical parent population in which each selected sample has the same chance as any other of being included in the sample group. Petroleum-geologic samples (e.g., well data) are not random, primarily for two reasons. First, they are oversampled in areas that are conducive to oil and gas production. Second, the samples themselves are tied to a coordinate system and so are related in geographic space. Thus, the use of a classical-statistical approach is problematic because usually samples are biased and have an underlying dependence. For a sample to be trusted, the bias must be adjusted and the spatial dependency accounted for.

Measurement Systems

A quantitative approach requires more than a headlong rush into the data, armed with a computer. Because conclusions from a quantitative study are based at least in part on inferences drawn from measurements, the geoscientist and reservoir engineer must be aware of the nature of the measurement systems with which the data are collected. There are four measurement scales—nominal, ordinal, interval, and ratio—each more rigorously defined than the one before it. The nominal and ordinal scales classify observations into exclusive categories. The interval and ratio scales involve determinations of the magnitude of an observation and so are the ones we normally think of as "measurements." [15] All four of these systems are used in reservoir descriptions.

Nominal Scale. The nominal scale classifies observations into mutually exclusive categories of equal rank, such as "red," "green," or "blue." Symbols (e.g., A, B, C, or numbers) often are used, as well. In geostatistics, for example, when predicting lithofacies occurrence, we often code lithofacies as 1, 2, and 3 for sand, siltstone, and shale, respectively. Within this code, or scale, there is no connotation that 2 is "twice as much" as 1, or that 3 is "greater than" 2. Furthermore, a lithofacies value such as 2.5 has no meaning at all.

Ordinal Scale. In an ordinal scale, observations are ranked hierarchically. A classic example of an ordinal scale in the geosciences is the Mohs hardness scale. Although the ranking scale extends from one to ten, the step from 1 to 2 is not equal in magnitude to that from 9 to 10. Thus, the Mohs hardness scale is a nonlinear scale of mineral hardness. In the petroleum industry, too, kerogen types are based on an ordinal scale that reflects the stages of organic diagenesis.

Interval Scale. The interval scale is so named because the width of successive intervals remains constant. A common example of an interval scale is temperature. The increase in temperature between 10 and 20°C is the same as the increase between 110 and 120°C. An interval scale does not have a natural zero or a point where the magnitude is nonexistent, and so it is possible to have negative values; however, in the petroleum industry, some reservoir properties are based on an interval scale measured along continuums for which there are practical limits. It would be impossible, for example, to have negative porosity, permeability, or thickness, or porosity greater than 100%.

Ratio Scale. Ratios not only have equal increments between steps, but also have a zero point. Ratio scales are the highest form of measurement. All types of mathematical and statistical operations are performed with them. Many geologic measurements are done on a ratio scale because they have units of length, volume, mass, and so forth. A commonly used ratio in the petroleum industry is the net-to-gross ratio of pay and nonpay.

For most of this chapter discussion, we will be concerned mainly with the analysis of interval and ratio data. Typically, no distinction is made between the two, and they may occur intermixed in the same problem. For example, a net-to-gross map is a ratio, whereas porosity and permeability measurements are on an interval scale.

Samples and Sample Populations

Statistical analysis is built around the concepts of "populations" and "samples," and implicitly assumes that the sampling is random and unbiased. A population is a well-defined set of elements (either finite or infinite), which commonly are measurements and observations made on items of a specific type (e.g., porosity or permeability). A sample is a subset of elements taken from the population. Furthermore, there are finite and infinite (or parent) populations. A finite population might consist of all the wells drilled in the Gulf of Mexico during the year 2001, for example, whereas the parent population would be all possible wells drilled in the Gulf of Mexico in the past, present, and future (albeit a practical impossibility).

Each reservoir is unique and completely deterministic. Thus, if all the information were available, there would be no uncertainty about the reservoir. Unfortunately, though, our sample data set offers us a sparse and incomplete picture of the reservoir. Furthermore, the sampling program (drilling wells) is highly biased, at least spatially, in that we generally do not drill wells randomly. Instead, we drill in locations that we believe are economically favorable, and thus anomalous. Because the entire reservoir will not be examined directly, we never will know the true population distribution functions of the reservoir properties.

Even when collected at the borehole, the data might not be representative because certain observations are purposely excluded, and this produces a statistical bias. Suppose, for example, we are interested in the pore volume (PV) of a particular reservoir unit for pay estimation. Typically, we use a threshold or a porosity cutoff when making the calculation, thus deliberately and optimistically biasing the true PV to a larger volume. If the lower-porosity rocks are not oil saturated, this might not be a bias, but without certainty of such, the PV estimate is considered biased. Both the statistical insufficiencies caused by sparse, irregular well spacing and the biases present in data acquisition reduce our ability to accurately and precisely define reservoir heterogeneity enough to ensure a surprise-free production history.

Sparse and biased sampling presents a major challenge. The biased sample population often is used as the conditioning data during construction of a geostatistical reservoir model. Thus, the assumptions made about the population distribution function influence the results; if the assumptions are incorrect, then the modeling results are highly suspect. With these limitations in mind, our task is to best estimate the reservoir properties while minimizing the effects of our uncertainty. To do this, we use a variety of statistical tools to understand and summarize the properties of the samples and make inferences about the entire reservoir.

Probability and Uncertainty

Probability is a mathematical concept that allows predictions to be made in the face of uncertainty. The probabilistic approach in this chapter defines two types of uncertainty that are associated with small-scale inherent variability, commonly is associated with relatively small (meters-length) scales. The first type is referred to as measurement error. It is irreducible and generally cannot be perfectly explained. For example, in terms of reservoir geology, it could be the small-scale randomness that is inherent in the reservoir, such as the product of the nonlinear processes of deposition and erosion. For example, the probability of finding a sediment bundle containing one depositional microstructure is simply the ratio of the volume of that microstructure to the volume of the reservoir. If this ratio does not vary systematically across the reservoir, then the volume of the depositional microstructure—say, ripple-laminated sandstone—can be predicted, albeit with the same type of uncertainty as is associated with a coin toss. That is, given a probability of 0.2, the volume represented by this microfacies would be expected to be 0.2 times the local volume of the reservoir; however, we expect the real volume to vary within the limits of our probabilistic estimate. At any point in the reservoir, the random depositional processes will permit the true volume to vary within limits tied to our probability model.

The second type of uncertainty is the small-scale geologic variability, and it stems from an incomplete sampling of reservoir topology, which can lead to an incomplete knowledge of the connectivity of flow units. The problem here is related to the nature of the flow paths (e.g., the connectivity of permeable and impermeable elements within the reservoir). It is not sufficient to know the probability that a permeable bed will occur in a borehole, nor is it enough to know the bed's average thickness. We also need to predict the existence and location of "choke points," permeability restrictions or barriers within the flow paths. These choke points constitute infinitesimal reservoir volume and seldom are documented by borehole data or seismic procedures. Yet, they can be critical to understanding flow because they connect local or even regional barriers. We know of their existence from well tests and pressure data and from our knowledge of outcrop or field analogs.

For both types of uncertainty, it is assumed that there is an underlying population that exactly defines the system of interest. In the case of small-scale inherent variability, the proportion of ripple drift lamination at any location is a fixed constant, whereas the proportion within the reservoir is those constant values integrated over the entire reservoir volume. Likewise, the degree of variability in this ratio across all possible locations can be calculated. The data from all possible locations can be assembled into a cumulative frequency distribution. This rarely is accomplished, but when it is, the results are termed a "census" and the knowledge of the population of interest is considered exhaustive.

In the absence of a census, frequency distributions can be approximated using mathematical formulae. Each distribution is completely definable in terms of a few constants (parameters). A Gaussian distribution, for instance, is completely defined by two parameters, population mean (M) and standard deviation (σ). Varying one or the other will produce a family of Gaussian distributions that vary in location on the real number line and in range.

Most statistical practice seeks to determine the parameters of a distribution without the cost and effort of a census, and does so by estimating the parameters on the basis of a relatively small set of carefully collected observations (valid sample). An unbiased sample will produce an unbiased estimate of the population parameters; however, it cannot be known with certainty whether any set of sample parameters is identical to the values of the population parameters. Additionally, a collection of such estimates (i.e., statistics) from repeated sampling will be centered on the values of the parameters. The spread of values around the parametric value commonly is an inverse function of the number of observations in a sample.

Not all frequency distributions are Gaussian, and the functions that generate estimates of their parameters therefore are different. The Gaussian distribution is special for several reasons, though. First, it is the distribution generated as the product of a simple nonlinear process (e.g., the velocity distribution in turbulent flow) and so is encountered often. Second, the distribution of sample means, whatever the nature of the population distribution, tends to approach the Gaussian as the number of samples increases. Third, statisticians have shown that for a wide range of non-Gaussian distributions, statistical inference of population parameters is robust in that failure of a Gaussian assumption by a surprisingly wide amount does not lead to large errors; however, there exist families of such "pathological" distributions for which common statistical inference never can be robust.

An assumption of an underlying population distribution is attractive because the number of observations needed to estimate the population parameters often is quite small. In addition, procedures exist to determine sample size as a function of the needed precision of the estimate of the parametric values. With this in mind, we now can discuss statistical inference.

Statistics may be described as the science of summarizing large volumes of information using a small set of parameters. Inferential statistics is the science of distinguishing the probable from the possible. [16] In petroleum engineering, for example, reserve estimations often are described probabilistically as proven, probable, or possible (P90, P50, or P10, respectively). The simplest kind of statistical inference is whether two samples are likely to have derived from the same population. The farther apart the sample means are (keeping σ constant), the smaller the chance that they were drawn from the same population. Importantly, though, the likelihood that this is so can be inferred, and probability is what enables the statistician to use information from samples to make such inferences or to describe the population from which the samples were obtained.

Sampling to characterize a frequency distribution is based on a set of assumptions, one of the most important of which is that the samples are mutually independent. If the assumption is violated, statistical inferences can be wrong. For example, traditional statistical inference becomes problematic when samples taken near one another tend to have more similar values than do samples taken farther apart. As the distance between sample locations increases, this dependence decreases until at some threshold distance, the samples become statistically independent. Under such circumstances, the data are said to be spatially correlated. The ability to draw statistical inference from spatially correlated data is the central premise of a subdiscipline known as spatial statistics, or geostatistics. [17]

A geostatistical method known as conditional simulation can provide probabilistic information about reservoir properties (e.g., gross rock volume and total and recoverable reserves) as a probability distribution function (pdf). The conditional simulation results can be summarized in a specific way to determine the probability regarding some aspect of the reservoir (e.g., of exceeding, or not, an economic threshold). These concepts are covered later in this chapter in Sec. 2.2.17.

Random Variables and Probability Distributions

One of the tasks faced by geoscientists and reservoir engineers is that of estimating a property at a location where there has been no measurement. This requires a model of how the phenomenon behaves at unsampled locations. Without a model, one only has the sample data, and no inference can be made about the values at unsampled locations. The underlying model and its behavior, then, are essential elements of the statistical and geostatistical framework. The geostatistical method clearly identifies the basis of its models, [18] whereas many other estimation methods (e.g., linear regression, inverse distance, or least squares) do not. Furthermore, as with traditional statistics, random variables and their probability distributions are the foundation of the geostatistical method.

Why a Probabilistic Approach? Deterministic models are applicable only when the processes that generated the data are known in enough detail that an accurate description of the entire population can be made from only a few sample values. Unfortunately, though, very few earth science processes are understood so well. Although we know the physics or chemistry of the fundamental processes (e.g., depositional mechanisms, tectonic processes, and diagenetic alterations), the variables we study in earth science data sets are the products of a vast number of complex interactions that are not fully quantifiable. For the great majority of earth-science data sets, we must accept some uncertainty about how the attribute behaves between sample locations. [18] Thus, a probabilistic approach is required. The random function model concept introduced in this section and the next recognizes this fundamental uncertainty and provides the tools not only to estimate values at unsampled locations, but also to measure the reliability of such estimates.

Random Variable Defined. A random variable is a numerical function defined over a sample space, whose values are generated randomly according to some probabilistic mechanism. [16][18] Throwing a die, for example, produces values randomly from the set {1, 2, 3, 4, 5, 6}. A coin toss also produces numbers randomly. If we designate "heads" as zero and "tails" as one, then we can draw randomly from the set {0, 1}. The set of outcomes and their corresponding probabilities is known as the probability law or the probability distribution. There are two classes of random variables, and their distinction is based on the sample interval associated with the measurement. The two classes are the discrete random variable and the continuous random variable. [15]

A discrete random variable is easily identified by examining the number and nature of the values it assumes. If the variable can assume only a finite or a countable infinity of values, it must be discrete. In most practical problems, discrete random variables represent count or classified data, such as point counts of minerals in a thin section or in a facies classification. The die throw and coin toss generate discrete random variables. The probability distribution of a discrete random variable is a formula, table, or graph that provides the probability associated with each value of the discrete random variable. There are four common discrete random variable probability distributions: binomial, negative binomial, Poisson, and hypergeometric.

Continuous random variables are associated with sample spaces that represent the infinitely large number of sample points contained on a line interval. The probabilistic model for the frequency distribution of a continuous random variable uses a mathematically defined curve, usually smooth, that is called the pdf (or probability distribution function). Although these distributions assume a variety of shapes, the curves for many random variables observed in nature approximate a bell shape. A variety of terms commonly are used to describe this bell-shaped curve. Practitioners could say that such curves are bell-shaped, Gaussian, or normal in their distribution. The terms are synonymous, and they informally refer only to the shape of the distribution. Most of the variables used in reservoir modeling (e.g., porosity, permeability, thickness) are continuous random variables, so it is important to describe their pdf.

Frequency Distributions of Continuous Variables. Frequency distributions of continuous random variables follow a theoretical pdf that can be represented by a continuous curve that can have a variety of shapes; however, rather than displaying the functions as curves, the distributions most often are displayed as histograms constructed from the data. Many statistical methods, including some geostatistical ones, are based on the frequent supposition that random variables follow a normal distribution. The central limit theorem (CLT) is the foundation of the normal pdf and warrants some discussion.

The CLT states that under general conditions, as the sample size increases, the sums and means of samples drawn from a population of any distribution will approximate a normal distribution. [16][17] The significance of the CLT is twofold. First, it explains why some measurements tend to approximate a normal distribution. Second, and more importantly, it simplifies and makes more precise the use of statistical inference. Many algorithms used to make estimations or simulations require knowledge about the pdf. If we can predict its shape or behavior accurately using only a few descriptive statistics that are representative of the population, then our estimates that are based on such predictions should be reliable. If the CLT is correct, then by knowing only the sample m and sample σ, the true pdf can be recreated precisely.

But the difficulty with the CLT and with most approximation methods is that we must have some idea of how large the sample size must be for the approximation to yield useful results. Unfortunately, we find ourselves in a circular-reasoning scenario. There is no clear-cut way to know the proper number of samples because knowing that depends on knowing the true population pdf in advance; hence, we assume the CLT is correct, and fortunately, as a practical matter, it does tend to behave well, even for small samples.

Properties of the Normal Distribution. The histogram of a normal distribution is symmetrical about the mean. Therefore, the mean, median, and mode of the normal distribution occur at the same point. This histogram is referred to as the normal frequency distribution. The following percentages of the total area of the normal frequency distribution lie within these limits:

  • m ± σ contains 68.26% of the data.
  • m ± 2 σ contains 95.46% of the data.
  • m ± 3 σ contains 99.73% of the data.

Directly calculating any portion of the area under the normal curve requires an integration of the normal distribution function. Fortunately, for those of us who have forgotten our calculus, this integration is available in tabular form. [17][19]

Application of the Normal Distribution. The normal frequency distribution is the most widely used distribution in statistics. There are three important applications of this distribution. [17]

  1. To determine whether, in fact, a given sample is normally distributed or not before applying certain tests. Most geostatistical simulation methods require that the data have a normal distribution (this is discussed more fully later in the chapter.). If they do not, the simulation results can be inaccurate and a transformation is required. To determine whether a sample comes from a normal distribution, we must calculate the expected frequencies for a normal curve of the same m and σ, and then compare them.
  2. To test underlying hypotheses about the nature of a phenomenon being studied.
  3. To make reliable predictions about the phenomenon. For geoscientists, this produces a better or an unbiased estimation of reservoir properties between the well data.

Data and Data Analysis

Often, the goal of a project is to provide a general description and analysis of a data set, and this can be done using classic statistical tools in a process commonly known as exploratory data analysis (EDA). EDA is an important precursor to a geostatistical reservoir-characterization study, which may include interpolation, or simulation and uncertainty assessment. Unfortunately, though, in many reservoir studies today (including routine mapping of attributes), EDA tends to be overlooked. It is absolutely necessary to understand the reservoir data fully, and doing so will be rewarded with much-improved results.

There is no single set of prescribed steps in EDA; one should follow one's instincts in explaining the behavior of the data. By using various EDA tools, not only will you gain a clearer understanding of your data, but you also will discover possible sources of errors. Errors are easily overlooked, especially in large data sets and when computers are involved, because we tend to become detached from the data. A thorough EDA fosters an intimate knowledge of the data, so that suspicious results are more easily noticed.

A number of excellent textbooks offer a more thorough discussion of EDA, [15][16][17][18][20] though here we will provide a only brief review of classic statistical methods it uses. These methods generally fall under the categories of univariate data analysis, multivariate data analysis, and normal-score transform.

Univariate Data Analysis

There are several ways to summarize a univariate (single-attribute) distribution. Often, simple descriptive statistics are computed, such as the sample mean and variance, and plotted on a corresponding histogram; however, such univariate statistics are very sensitive to extreme values (outliers) and, perhaps more importantly, do not provide any spatial information. Spatial information is the heart of a geostatistical study—a reef and a delta, for example, can have identical univariate statistical profiles, but the geographic distribution of their petrophysical properties will be completely different.

Frequency Tables and Histograms. Histograms (frequency distributions) are graphic representations that are based on a frequency table that records how often data values fall within certain intervals or classes. It is common to use a constant class width for a histogram, so that the height of each bar is proportional to the number of values within that class. When data are ranked in ascending order, they can be represented as a cumulative frequency histogram, which shows the total number of values below certain cutoffs, rather than the total number of values in each class.

Summary Statistics. The summary statistics are the univariate statistics as represented graphically by the histogram. They are grouped into four categories: measures of location, measures of spread, measures of shape, and the z-score statistic.

Measures of Location. The measures of location provide information about where the various parts of the data distribution lie, and are represented by the following:

  • Minimum—The smallest value.
  • Maximum—The largest value.
  • Mean—The arithmetic average of all data values. The mean is quite sensitive to outliers, and can be biased by a single, erratic value.
  • Median—The midpoint of all observed data values, when arranged in ascending or descending order. Half the values are above the median, and half are below. This statistic represents the 50th percentile of the cumulative frequency histogram and generally is not affected by an occasional erratic data point.
  • Mode—The most frequently occurring value in the data set. This value falls within the tallest bar on the histogram.
  • Quartiles—Each of the quarter-points of all observed data values. Quartiles represent the 25th, 50th and 75th percentiles on the cumulative frequency histogram.

Measures of Spread. Measures of spread describe the variability of the data values, and are represented by variance, standard deviation, and interquartile range. Variance is the average squared difference of the observed values from the mean. Because variance involves squared differences, it is very sensitive to outliers. Standard deviation is the square root of the variance, and often is used instead of variance because its units are the same as those of the attribute being described. Interquartile range is the difference between the upper (75th percentile) and lower (25th percentile) quartiles. Because this measure does not use the mean as the center of distribution, it is less sensitive to outliers.

Measures of Shape. All statistical data analyses require an assumption about the nature of the population probability distribution however, to assume this blindly can be dangerous. To determine that the data are consistent with the assumed distribution, one of several numerical indicators can be used. One approach is to use the method of moments. Moment measures are defined in the same way as moments in physics—the mean can be defined as the first moment about the origin, the standard deviation as the first moment about the mean (or the second about the origin), and so forth.

For its definition, the Gaussian distribution only requires the values of the first two moments. All higher moments have the value of zero or are constant for all members of the Gaussian family. The third Gaussian moment has been called "skewness" and the fourth "kurtosis." The skewness, for example, is equal to the averaged cubed difference between the data values and the mean, divided by the cubed root of the standard deviation. In all Gaussian distributions, the skewness = zero and the kurtosis = 3.0.

Because skewness is a cubic function, the computed values may be negative or positive, and which sign they are can indicate how the sample values depart from a Gaussian assumption. A positive value indicates that the distribution departs by being asymmetric about the mean and that it contains too many large values. The resultant histogram is asymmetric, with an elongated tail for the higher values. Positive skewness denotes that the center of gravity of the distribution (the mean) is above than the 50th percentile (the median). Negative skewness indicates the reverse (i.e., that the center of gravity is below the 50th percentile).

Often, a few large outliers will distort a statistical analysis, and frequently these represent errors in data entry; however, when there are a few such outliers among thousands of correct values, their existence cannot be surmised from the skewness value. Historically, data analysts have used the coefficient of variation (the ratio of σ to m) as a further check for the existence of outliers. A rule of thumb is that when the coefficient of variation exceeds the value of unity, the data should be checked for outliers and corrected, if necessary.

The z-score statistic transforms the data values into units of standard deviation and rescales the histogram with a mean of zero and a variance of 1.0. The z-score is a statistic used to screen for outliers or spurious data values. Absolute score values greater than a specified cutoff (e.g., 2.0 to 2.5 standard deviations) lie beyond the expected range of the zero mean. Statistically, such data are outliers and should be investigated carefully to determine whether they are erroneous data values or whether they represent a local anomaly in the reservoir property.

The z-score rescaling does not transform the shape of the original data histogram. If the histogram is skewed before the rescaling, it retains that shape after rescaling. The x-axis of the rescaled data is in terms of ± standard deviation units about the mean of zero.

Multivariate Data Analysis

Univariate statistics deals with only one variable at a time, but frequently we measure many variables on each sample. Working with multiple variables requires the use of multivariate data statistics. If we want to express the relationships between two variables (e.g., porosity and permeability), we do so through the regression study and correlation analysis. In a regression study, we estimate the relationship between two variables by expressing one as a linear (or nonlinear) function of the other. In correlation analysis we estimate how strongly two variables vary together. It is not always obvious which one—regression or correlation—should be used in a given problem. Indeed, practitioners often confuse these methods and their application, as does much of available statistics literature, so it is important to clearly distinguish these two methods from one another. [17] Looking at the purpose behind each will help make the distinction clear.

Regression Study. In regression analysis, the purpose is to describe the degree of dependency between two variables, X and Y, to predict Y (the dependent variable) on the basis of X (the independent variable). The general form of the equation is


where a = the Y-intercept; b = the slope of the function; X = the independent variable whose units are those of the X variable; and Y = the dependent variable whose units are those of the Y variable. Generally, b is known as the regression coefficient, and the function is called the regression equation. [17]

Correlation Analysis. Correlation analysis is similar to regression, but is less rigorous and is used to determine generally whether variables are interdependent; however, correlation analysis makes no distinction between dependent and independent variables, and one is not expressed as a function of the other. The correlation coefficient r is a statistic measuring the strength of the relation (linear or nonlinear) between all points of two or more variables. Its value lies between +1 (perfect, positive correlation) and –1 (perfect, inverse correlation). A value of zero indicates a random relation (no correlation). The square of the correlation coefficient r2, known as R-squared (coefficient of determination), is a measure of the proportion of the variation of one variable that can be explained by the other. [17]

In its classical form, regression analysis is strictly used to estimate the value at a single point. In this respect, it often is used incorrectly in the petroleum industry. The misuse is due to the requirement of sample independence not being recognized, a prerequisite for regression analysis. [17] It would be inappropriate to apply the regression equation spatially, when the data by their very nature are dependent. For example, the value at a given well can be highly correlated to a value in a nearby well. Indeed, the result of implementing regression analysis spatially can lead to highly erroneous results.

For example, seismic attributes often are used to estimate reservoir properties in the interwell region on the basis of a correlation between a property measured at the well (e.g., porosity) and a seismic attribute (e.g., acoustic impedance). Let us say that during regression and correlation analyses for these properties, we find that there is a –0.83 correlation between well-derived porosity and seismic acoustic impedance. Because of this strong correlation, we proceed with deriving the regression equation—well porosity = ab (seismic acoustic impedance)—to transform and then map our 3D seismic acoustic-impedance data into porosity, not recognizing that we have applied a point estimation method as a spatial estimator. Although the results may appear fine, b imparts a spatial linear bias (trend) in the estimates during the mapping process. This bias becomes apparent in an analysis of the residuals. Particularly unsettling is the misapplication of regression analysis to the mapping of permeability from porosity, a common practice! The topic of biasing is revisited in the Kriging Estimator subsection of Sec. 2.2.14, where we find kriging to be the spatial extension of regression analysis.

Covariance. The correlation coefficient represents a normalized covariance. Subtracting the mean and dividing the result by the standard deviation normalizes the covariance for each measurement. The transformed data then have a mean of zero and a standard deviation of unity. This data transformation restricts the range of the correlation coefficient to between –1 and +1, and sometimes it is more useful to use the raw (untransformed) data to calculate the relationship between two variables, so that the range of the correlation coefficient is unrestricted. The (untransformed) covariance formula is


where Xi and Yi = the measured values of variables X and Y, respectively, whose units are those of X and Y and whose i varies between the first and last measurements; mx and my = the sample means of X and Y, respectively; n = the number of X and Y data pairs; and Covx,y = the covariance of the variables X and Y.

The covariance is greatly affected by extreme pairs (outliers). This statistic also forms the foundation for the spatial covariance, which measures spatial correlation, and for its alternative construct, the variogram, which measures spatial dissimilarity. Rather than computing the covariance between two properties, we compute a statistic on one property measured at different locations.

Normal-Score Transform

Many statistical techniques assume that the data have an underlying Gaussian (normal) distribution. Geologic data usually do not, though, and typically require a numerical transformation to achieve one. The transformed data are used for some geostatistical analyses, and can be reconfigured to their original state in a back transform, if done correctly. Thus, it is a temporary state and is used for the convenience of satisfying the Gaussian assumption, when necessary. We can define zi as any raw data value (having any units or dimensions) at any location. If zi is transformed such that its distribution has a standard normal histogram of zero mean and unity variance, the transformed value is designated as yi . Such a transform is referred to as a normal-score transform, and the yi -values are called normal scores. The transform process is described in detail throughout the literature. [21][22][23]

The normal-score transform can transform any data-distribution shape into the Gaussian form. Once the data are transformed, subsequence data analysis, modeling, interpolation, and geostatistical simulation are performed in the transformed space. As previously mentioned, the final step requires a back-transformation into the original data space; however, the normal-score transform is an ad hoc procedure and is not underlain by a full panoply of proofs and theorems. Using this transform is justified only insofar as the back-transformation will recover the original data. The transform process becomes more accurate as the original data-distribution approaches the Gaussian. Sensitivity analysis has shown that the transform is robust for a variety of unimodal distributions, even those that are very different from the Gaussian. Pathological distributions include those that may be polymodal, with null frequencies common in the intermodal values. Such distributions actually are mixtures of unimodal distributions, and each mode should be transformed independently.

Some practitioners are uncomfortable with the amount of data manipulation involved in normal-score transformations. Yet, most of us would not hesitate to perform a logarithm transform of permeability data, for example, because doing so makes it easier to investigate the relationship between permeability and porosity—its justification is simply its mathematical convenience. The same is true of a normal-score transformation. First, parametric geostatistical simulation assumes that the data have a Gaussian distribution because of the nature of the algorithms. Also, as the earlier discussion on the normal probability distribution pointed out, if we know the mean and the variance, we have a perfectly predictable model (the data histogram), which makes interpolation and simulation easier. As long as no significant data are lost in the back-transformation, the process is benign.

Geostatistics: Into the Domain of the Spatially Correlated Variable

Consider the two images in Fig. 2.1, which for discussion purposes we can consider to be different porosity maps. Map A has a nearly random appearance, with only a hint of northwest/southeast preferential alignment of porosity. Map B shows a higher degree of continuity, with the axis of maximum correlation oriented northwest/southeast. Visually, these maps look quite different, but the descriptive-statistical measures, such as the mean and variance, are identical for each. This simple example illustrates the fact that classical statistical analysis cannot fully describe the nature of the data, especially when the data have a distinct, organized pattern. The main difference between classical statistics and geostatistics is the assumption of spatial dependency. That is, the location of with respect to one another plays an important role in the analysis, modeling, and estimation procedures.

Almost all the variables of interest in the petroleum industry (e.g., porosity, permeability, facies, saturation, net-to-gross, volumes) are the product of a number of complex physical and chemical processes that impose spatial dependency on the reservoir rocks. That is, they display distinct geographic patterns of continuity when mapped. Understanding and modeling the scales of continuity and directional information contained in the data is important for efficient hydrocarbon production. [24][25] Attributes that exhibit spatial continuity are called regionalized variables (RV) , and their spatial continuity can be described by a statistic called the semivariogram. The introduction of the semivariogram into an estimation algorithm has resulted in what now is called kriging. [1][2][26][27][28][29]

Properties of the RV and Random Functions

We have seen that two data sets can have the same univariate statistics, yet have very different spatial properties (Fig. 2.1). The complex attributes we deal with in the petroleum industry can be described by random functions that are combinations of regionalized and random variables. Regionalized variable theory is based on the statistics of the RV, [3][30][31] which differs from ordinary scalar random variables in its spatial continuity, yet still possesses the usual distribution statistics, such as mean and variance. The RV also differs in that it has a defined location. Two realizations (measurements) of an RV that differ in spatial location display in general a nonzero correlation; however, successive realizations of an ordinary scalar random variable are uncorrelated. [29] Therefore, RVs and spatial correlation analysis are used to quantify the distance- and direction-related spatial properties in a sample data set.

Semivariograms and Covariance. The semivariogram (informally and commonly known as the variogram or the experimental variogram) is a statistical measure of the rate of change with distance, for attributes that vary in space. [32] The formula for calculating the experimental variogram (Eq. 2.3) involves terms that depend on measurements at specific locations, namely zi and zi+h. Unlike the mean value of a data set, which is a single value, the variogram is a continuous function of distance, calculated from discrete measurements between pairs of points whose separation distance h falls within a given distance interval called a lag. The lag is a vector, involving not only the magnitude of the separation, but also the azimuth of the line through each data pair. For a given azimuth, the squared difference of the RV is calculated for each pair in a given lag. The average value for each lag then is calculated and plotted on a graph of the mean-squared difference against the lag intervals. As we shall see later, the variogram is required in many of the geostatistical methods for prediction or simulation away from control points.

Given a sample of observations, and provided that the mean is constant as a function of h, an unbiased estimator of the variogram is


where RTENOTITLE = the mean-squared difference between two measured variables whose separation interval is equal to a distance vector RTENOTITLE; np = the total number of sample pairs; RTENOTITLE = the measured value of a regionalized variable z at location RTENOTITLE, where i varies between the first and last measurements; and RTENOTITLE = the measured value of a regionalized variable RTENOTITLE at a location that is RTENOTITLE distance away, where i varies between the first and last measurements, with the same units as for the RV. Now compare Eq. 2.3 to Eq. 2.2, which computes the traditional covariance statistic.

Fig. 2.2 shows the anatomy of an experimental variogram. The variogram is a measure of dissimilarity with distance at each lag before reaching a constant value (the sill). The distance h at which the unbiased estimate γ(h) reaches the sill is called the range or the scale. If the variogram does not appear to go through the origin, but instead shows a discontinuity and intersects the ordinate, the value of γ(h) at the intersection is called the nugget.

In practice, the experimental variogram can be calculated and modeled, but it is implemented in the kriging algorithm using the covariance function in most software programs. If a covariance exists, the variogram and the covariance are related by


where RTENOTITLE = the variance defined as the mean-squared difference between pairs of measured variables whose separation interval is equal to a distance vector RTENOTITLE, with the same units as for the measured variables squared; Cov(0) = the minimum covariance in the covariance function; and CovRTENOTITLE = the mean covariance value between pairs of variables whose separation interval is equal to a distance of vector RTENOTITLE, with the same units as for the measured variables, squared.

The covariance can be viewed as an inverse variogram. As such, the covariance function measures increasing similarity (autocovariance) with distance, rather than dissimilarity with distance.


In general, statistics relies on some replication notation, whereby estimates can be derived and the variation and uncertainty of the estimate can be understood from repeated observations. In spatial analysis and estimation, the idea of stationarity is used to obtain the necessary replication. Stationarity is a property of the random function model, not of the underlying spatial distribution. In its strict sense, it requires the mean value of an RV to be constant between samples and independent of location. The four degrees of stationarity considered important in geostatistics are strict stationarity, second-order stationarity, the intrinsic hypothesis, and quasi-stationarity. [18][33]

Second-order stationarity assumes that the mean and covariance are the same between pairs of points that fall within the same separation interval, no matter which two points are chosen. Thus, in second-order stationarity, the covariance is dependent only on the distance between two points, and not on the location. Intrinsic stationarity assumes that the expected values of the mean and variance (variogram) are invariant with respect to location. The intrinsic hypothesis is sufficient for most geostatistical studies. Quasi-stationarity occurs when a trend can be seen at long separation intervals, and so the covariance is smaller than the scale of the trend and there is local stationarity. Second-order and intrinsic stationarity are necessary assumptions for achieving replication to estimate the dependence rules, which then allows us to make predictions and assess uncertainty. It is the spatial information particularly in these two degrees of stationarity—the similar distance between any two points in a given lag—that provides the replication. [18][33]

The equation for computing the experimental variogram (Eq. 2.3) involves terms that depend on locations [ RTENOTITLE and RTENOTITLE] that occur inside the field of the regionalized variable z. The averaging generally cancels out the dependency on location, such that the dependency is based solely on the distance h. This is an assumption, though, rather than a fact—geostatistics does not have a test to verify this assumption. Strict application of the variogram requires a constant mean. A gentle, systematic variation in the mean value, such as the increase in temperature with depth, is called a drift or a trend. A regionalized variable that exhibits a drift is termed "nonstationary"; conversely, a stationary regionalized variable is drift-free. Proper variogram computation requires the removal of the drift. There are several ways to do this, [32][34][35] but those methods are beyond the scope of this chapter.

Structural Analysis

Structural analysis (also called spatial-continuity analysis) is the computation and modeling of the patterns of spatial dependence that characterize a regionalized variable. This amounts to the study of the experimental variogram. Rarely is structural analysis the goal of a study; rather, it is the necessary first step before modeling the regionalized variable with kriging or conditional simulation techniques. Ultimately, both of these techniques will require covariance information that is supplied by the structural analysis. There are two main steps to performing structural analysis. First, compute the experimental measures of continuity (variogram), accounting for anisotropy and azimuth, and then model the experimental variograms with a continuous function.

Computing the Experimental Variogram. If data are sampled on a regular grid, then the calculation search strategy for data pairs is simple. Unfortunately, though, wells rarely are drilled on a regular grid, and so to extract as much information as possible, we search for pairs of wells in lag intervals (discussed above), rather than along a simple vector. Identifying the best lag interval sometimes is frustrating but generally is an iterative process through which much is learned about the data. Several excellent texts are available on this subject. [18][25][32][33]

Modeling the Experimental Variogram. The experimental variogram is calculated only along specific interdistance vectors that correspond to angular and distance bins. To use the experimental variogram, kriging and conditional simulation applications require a model of spatial dependency. This is because the kriging system of equations requires knowledge of the covariance function for all possible distances and azimuths, and because the model smoothes the experimental statistics and introduces spatial information.

Variogram modeling is not a curve-fitting exercise in the least-squares sense. Least-squares fitting of the experimental variogram points cannot ensure a function that yields a kriging variance ≥ 0, a condition known as positive definiteness. [18][33] Only a limited number of positive definite functions are known to fit the shapes of experimental variograms. Those most often used in commercial software are the spherical, exponential, Gaussian, and linear. A combination or nesting of functions is used to model complex experimental variograms.

Nugget Effect. As mentioned previously, often the experimental variogram shows a discontinuity at the origin, which is termed the "nugget effect" (see Fig. 2.2). The discontinuity is a manifestation of a relative high variance at the first lag. It is caused by irreducible measurement error inherent in the data and by small-scale geologic variability that is due to incomplete sampling of the reservoir topology. [18][33] It is our observation in reservoir geostatistics that the nugget effect is almost entirely due to small-scale geologic variability. The occurrence of a nugget effect is important and can be indicative of continuities that are smaller than the average well spacing (tantamount to the shortest lag interval). It is important to model the nugget, if present, because it will influence both kriging and conditional simulation. The latter will add more variance near wells, and the former will show more smoothing near wells.

Spatial Cross-covariance Analysis. Until now, only a single variable has been considered for spatial analysis (e.g., comparing porosity values to other nearby porosity values). The study of spatial relationships between two or more different variables requires the use of a cross-correlation statistic that defines the degree to which one variable is capable of explaining the behavior of another. The cross-variogram model is useful when performing cokriging or conditional cosimulation (e.g., integrating well and seismic data). The cross-variogram equation (Eq. 2.5) compares paired points that represent different variables, as in the case of the traditional covariance statistic (Eq. 2.2). Like the variogram, the cross-variogram is a continuous function of h. [18][33]

For example, if you want to estimate porosity from seismic acoustic impedance using a multivariate form of kriging, then it is necessary to compute and model the variograms for both attributes, as well as the cross-variogram. Consider that if RTENOTITLE = the RV measured at location RTENOTITLE; RTENOTITLE = the RV measured at RTENOTITLE plus some separation distance RTENOTITLE; RTENOTITLE = the secondary attribute measured at location RTENOTITLE; and RTENOTITLE = the secondary attribute measured at RTENOTITLE plus some separation distance RTENOTITLE, then the variograms of the primary and secondary attributes have the form of Eq. 2.3, whereas the cross-variogram between the primary and secondary attribute RTENOTITLE is estimated by


where RTENOTITLE = the cross-variance between two random variables z and t for a given distance vector RTENOTITLE, with the same units as for the measured variable squared; RTENOTITLE = the distance vector between pairs of points whose units are in terms of the coordinate system; np = the total number of sample pairs; RTENOTITLE = measured value of the regionalized variable at a location where RTENOTITLE varies between the first measurement and the last; RTENOTITLE = measured value of a regionalized variable z at a location RTENOTITLE distance away where RTENOTITLE varies between the first and last measurements, with the same units as for the regionalized variable; RTENOTITLE = a second regionalized variable at a location where RTENOTITLE varies between the first measurement and the last; and RTENOTITLE = the measured variable t at a location RTENOTITLE distance away where RTENOTITLE varies between the first and last measurements, with the same units as for the regionalized variable.

Unlike the variogram (covariance), the cross-variogram (cross-covariance) can take on negative values. This is observed when two variables are inversely correlated and have a negative correlation coefficient, such as in the porosity and acoustic impedance example given in this subsection.

Support Effect and Data Integration

Interestingly, geostatistics was not developed originally to solve interpolation problems, but to address what is called the "support effect." Support is the volume on which a sample is measured. Some attributes we measure can be considered point measurements, in that there is a location for each sample, such as well data. Others, such as well-test permeability, are measured over a volume and with the well location taken as the center of volume. A change in any of the characteristics of a support defines a new RV. Thus, an additional precaution in structural analysis is to make certain that the data for estimating the variogram relate to the same support. In general, larger supports tend to reduce variability, producing variograms with smaller sills and larger ranges. [32][36]

The support effect tends to be overlooked when combining information that comes from variables measured over different volumes (e.g., when combining well measurements and seismic attributes) or core and well-test permeabilities. Ignoring the support effect can impart a systematic bias to estimates. There are several procedures available to account for a change in support, and doing so is critical when integrating data measured by different methods (e.g., core data vs. wireline data vs. seismic attributes). Using the cross-variance model in a cokriging or cosimulation system is one way to provide estimates or simulated values that help to account for the support effects. [32][37][38] In general, geostatistical laws for managing support are well documented and need to be applied more rigorously in reservoir modeling, where it is often neglected. [38][39]

Kriging and Cokriging

Whether by hand or by computer, contouring data requires the use of an interpolation procedure. Many algorithms are used in computer mapping, all of which require criteria to be satisfied. Quite often, though, a computer-generated map is unsatisfactory because it does not look real—that is, it does not depict geology as we envision—and so usually requires editing. The geoscientist working by hand interpolates between data points, draws connecting contours, and intuitively smoothes and biases the isolines to construct a map on the basis of a geologic model.

A variety of commonly used mapping algorithms can be used for computer interpretation. The weights used during the interpolation usually are based on the distance of each control point (sample value) from the target location (grid node). Control points closer to the target receive the larger weights; however, if the data exhibit strong anisotropy, it does not necessarily hold true that the closest control point should receive the greatest weight. Rather, more distant control points along the axis of maximum correlation should have greater influence on the interpolated value. [25]

Kriging and cokriging are geostatistical techniques used for interpolation (mapping and contouring) purposes. Both methods are generalized forms of univariate and multivariate linear regression models, for estimation at a point, over an area, or within a volume. They are linear-weighted averaging methods, similar to other interpolation methods; however, their weights depend not only on distance, but also on the direction and orientation of the neighboring data to the unsampled location.

Kriging's practical strength as an interpolation method lies in its ability to capture anisotropy of the underlying geologic variables through the spatial covariance model, yielding maps that look more geologically plausible. From a statistical perspective, there is an additional strength when used under conditions of multivariate normality. The kriging algorithm has been given the acronym BLUE (best linear unbiased estimator). The fact that it is a linear, unbiased estimator is not unique. Essentially all the interpolation algorithms used for mapping have these attributes. It is the statement "best" that is intriguing—kriging is considered to be best because under these multivariate normality conditions, it minimizes the error variance associated with the estimate. That is to say, for the set of neighboring data used in an estimation, the set of weights calculated with kriging guarantees that no other method will generate a lower estimation error The "best" statement is clarified further by showing kriging's connection with the condition of unbiasedness. Unbiasedness is assumed for all the interpolation algorithms, and means simply that, when mathematically interpolating, we expect to overestimate as often as underestimate. Thus, we can visualize the error in estimation as a bell-shaped curve with a mean of zero. It is this assurance of a balanced distribution of error variance that is as narrow as possible that has earned kriging the accolade "best."

Kriging Estimator. Several types of kriging methods are available, and they are distinguishable by how the mean value is determined and used during the interpolation process. [21][29][33][40][41] The four most commonly used variations are:

  • Simple kriging—The global mean is known (or can be supplied by the user) and is held constant over the entire area of interpolation.
  • Ordinary kriging—The local mean varies and is re-estimated on the basis of nearby (or local) data values.
  • Kriging with an external drift—The shape of the map is described by a secondary variable called the drift term (e.g., seismic data, gravity data, magnetic data, and Landsat data). Although this method uses two variables, only one covariance model is required and the shape comes from a related 2D or 3D attribute that guides the interpolation of the primary attribute known only at discrete locations. A typical application is time-to-depth conversion, where the primary attribute (depth at the wells) acquires its shape from the secondary attribute, the external drift (two-way-travel time known on a 2D grid).
  • Indicator kriging—Estimates the probability of a discrete attribute at each grid node (e.g., lithology, productivity) and requires a binary coding of the attribute.

General Krigin System of Equations. The unknown value z0* is a linear combination of N values of a regionalized variable z(ui→):


where z0* = the value at an unsampled location to be estimated from a linear combination of n values of a regionalized variable zi, with the same units as for the regionalized variable; λi = the weight of the regionalized variable zi at a given location, unitless; and zi = the regionalized variable at a given location, with the same units as for the regionalized variable. The λi values are determined according to three criteria: the weights sum to 1.0; the estimate is unbiased; and the estimation variance is minimized.

The kriging system is a set of N+1 linear equations with N+1 unknowns. The system of equations generally is written in terms of covariance and is the result of minimizing the variance:


for all i = 1, n. In Eq. 2.7, C0i = the matrix notation for the covariance between a sample at a given location and the unsampled location (target), with the same units as for the regionalized variable, squared; Cij = the covariance between two measured samples at given locations, where i and j are the indices of the sample pair and vary between the first and last measurements, with the same units as for the regionalized variable, squared; λi = the undetermined weight assigned to a given sample, and for which i is the index of that sample and varies between the first and last measurements, unitless; and μ = a Lagrange multiplier. In matrix shorthand, the formula is written as


where C = covariance matrix constructed from measured sample pairs in a given neighborhood, with the same units as for the regionalized variable, squared; λ = the vector of undetermined weights for measured samples within a neighborhood, unitless; and c = the vector of covariances constructed from measured samples in a given neighborhood to the target location, with the same units as for the regionalized variable, squared. In addition, the kriging variance is written as


where RTENOTITLE = the kriging variance, whose units are in terms of the regionalized variable, squared; C00 = the sill of the variogram or the value of the covariance at a lag beyond which the covariance no longer changes (usually a value close to zero), with the same units as for the regionalized variable, squared; λi = the undetermined weight assigned to a given sample and varies between the first and last measurements, unitless; C0i = the covariance between a sample at a given location and the unsampled location (target), with the same units as for the regionalized variable, squared; and μ = a Lagrange multiplier.

Kriging Estimation Variance. As mentioned in the early part of this section, the estimation variance (Eq. 2.9) computed by all kriging programs provides the narrowest confidence interval about the estimate and thus produces the "best" estimate, but only under conditions of multivariate normality; however, if the distribution of data values departs from multivariate normality (a frequent occurrence), the kriging variance might not be precise and might only represent a measure of the relative goodness of the estimate. [42] The estimation error is more reliably obtained by post-processing conditional simulations, which is discussed later.

Search Ellipse. Computer-mapping programs must be instructed in how to gather and use control points during interpolation. Most modelers who are familiar with computer mapping know that this involves designing a search ellipse, or search neighborhood. The "unique neighborhood" ("global neighborhood") is the simplest type, uses all the data, and has an infinite radius. A "moving neighborhood" is a search strategy that uses only a portion of the total number of control points. Typically, the modeler must specify the radius length, the number of sectors, and the number of control points per sector. Most traditional mapping programs allow the user to specify only one radius defining a circular (isotropic) ellipse; however, during variographic analysis, we often find that the spatial model requires an anisotropic covariance function. Therefore, the search ellipse should be designed with radii lengths that are similar to the correlation scales (or their relative ratios), with its longest axis aligned with the direction of maximum correlation. [25]

Model Cross-validation. Cross-validation is a statistical method for testing the integrity of the covariance function and the neighborhood design. The procedure uses the kriging algorithm and allows us to compare estimated values at the measured value locations, just as one computes residuals between predicted and observed values in regression or analysis of variance. [18][21][32][33] The procedure is as follows:

  • For each sample in the data set, compute a kriged estimate at the same location, using the covariance function and neighborhood, but ignoring that sample value during re-estimation. Thus, each sample value of the data set has a re-estimated value and a kriging variance.
  • From this information, create a variety of displays. One useful display is a histogram of the standardized re-estimation error, which is the re-estimated value minus the measured values, divided by the kriging variance. A histogram that is symmetrical about its zero mean indicates a nonbiased model, which ensures that anywhere in the mapped area, interpolated values have an equal chance of being over or under the true value.
  • It also is instructive to cross-plot and compute the correlation between the standardized re-estimation errors and the re-estimated values. There should be no correlation; if there is, the results are biased. Unbiasedness cannot be ensured using traditional regression methods because of the slope term b in the equation.

Cokriging Estimator. Traditional regression methods only use data available at the target location and fail to use existing spatial correlations from secondary-data control points and the primary attribute to be estimated. [41] Cokriging methods are used to take advantage of the covariance between two or more regionalized variables that are related, and are appropriate when the main attribute of interest (well data) is sparse, but related secondary information (seismic) is abundant. Geostatistical-data-integration methods yield more-reliable reservoir models because they capitalize on the strengths of both data types.

The mutual spatial behavior of RVs is known as co-regionalization. Cokriging requires the same conditions to be satisfied as kriging does, but demands more variography, modeling, and computation time. Interestingly, though, the support effect is handled in cokriging through the cross-covariance function, such that many different types and scales of data are integrated easily. [18][21][33][40][41][43]

The common cokriging methods are multivariate extensions of the kriging system of equations, and use two or more additional attributes. The four common cokriging methods are simple, ordinary, collocated, and indicator.

Simple cokriging uses a multivariate spatial model and a related secondary 2D or 3D attribute to guide the interpolation of a primary attribute known only at well locations. The mean is specified explicitly and assumed to be a global constant.

Ordinary cokriging is similar to simple cokriging in that the mean is assumed constant, but it differs in that rather than being specified globally, the mean is estimated locally within each set of neighborhood control points.

Collocated cokriging can be simple or ordinary. It uses all the primary data, as before, but also includes one or more secondary-data values that have locations different than for the well data.

Indicator cokriging can be simple or ordinary, and estimates the probability or proportion of a discrete attribute at each grid node (e.g., lithology or productivity), and requires binary coding of the primary and secondary attributes. A modified form of indicator cokriging is indicator collocated cokriging.

General Cokriging System of Equations. The unknown value RTENOTITLE is a linear combination of N values of two or more regionalized variables. The general equation for two-variable cokriging for which input data are located only at wells is


where RTENOTITLE = the estimate at the grid node; λi = the undetermined weight assigned to the primary sample zi and varies between 0 and 100%; zi = the regionalized variable at a given location, with the same units as for the regionalized variable; tj = the secondary regionalized variable that is co-located with the primary regionalized variable zi, with the same units as for the secondary regionalized variable; and βj = the undetermined weight assigned to tj and varies between 0 and 100%.

The estimate is unbiased, with the estimation variance minimized, and requires the following:

  • The spatial covariance model of the primary attribute (e.g., well data).
  • The spatial covariance model of the secondary attribute (e.g., seismic data).
  • The spatial cross-covariance model of primary and secondary attributes (e.g., well and seismic data).

Collocated cokriging's advantage comes from its use of a secondary variable that is sampled at many locations, but is not necessarily coincident with the primary variable, as previously mentioned. One potential problem with any of the collated cokriging methods is that the secondary data can be overabundant relative to the primary variable. This generally is the case when a seismic attribute is used as the secondary variable, with relatively sparse well data used as the primary variable. To circumvent this problem, a reduced form of collocated cokriging is used, whereby the equation retains only the covariance of the primary variable zi and the seismic samples (or other secondary data) that are coincident only with the target node (variable t0 , below).[21]

This reduced form of the collocated cokriging estimator that retains only the covariance of the secondary data at the target node is written as


where RTENOTITLE = the estimate at the grid node; λi = the undetermined weight assigned to zi and varies between 0 and 100%; zi = the regionalized variable at a given location, with the same units as for the regionalized variable; t0 = the secondary regionalized variable located at the target location (grid node), with the same units as for the secondary variable; and β0 = the undetermined weight assigned to t0 and varies between zero and 100%.

The full version of the collocated cokriging estimator, as implemented by practitioners at the Centré de Géostatistiques (France),[44] is written as


where RTENOTITLE = the estimate at the grid node; λi = the undetermined weight assigned to zi and varies between 0 and 100%; zi = the regionalized variable at a given location, with the same units as for the regionalized variable; tj = the secondary regionalized variable co-located with zi, with the same units as for the secondary regionalized variable; βj = the undetermined weight assigned to tj and varies between 0 and 100%; t0 = the secondary variable located at the target location, with the same units as for the secondary variable; an β0 = the undetermined weights assigned to t0 and varies between 0 and 100%.

This form of the collocated cokriging estimator requires knowledge of the secondary variable tj at all primary-data locations zi, and at all grid nodes t0 being estimated. Both methods:

  • Require only the simple covariance model of the primary attributes zi. When primary data are sparse, the covariance model often is derived from the covariance model of the densely sampled secondary attribute tj.
  • Use all primary data (and all secondary data, for the full version located at the control points), according to the search criteria and the secondary data attribute located at the target grid node during estimation.
  • Incorporate the Markov-Bayes assumption,[21] which says that if the secondary attribute covariance model is assumed to be proportional to the primary attribute covariance model, then the correlation coefficient and the ratio of the secondary variance to the primary variance transforms a univariate covariance model into a multivariate covariance model.

The corresponding cokriging system for determining the weights is the same as that of the general kriging system. As with kriging, there is an associated estimation variance for each interpolated value, and the same precautions hold true.[33]

Static-Data Integration

So far, the discussion on cokriging has covered the geostatistical techniques available for data integration when a correlation exists between two or more attributes. Three factors are important for effective data integration[36][45]:

  • The scale factor (support effect)—Each type of measurement has a scale factor associated with it that is related to the volume on which the measurement is made. These factors are microscale [e.g., scanning electron microscope (SEM), thin section], macroscale (e.g., core, wireline, cross-well seismic), and megascale (e.g., well tests, flow meters, surface seismic). Each measures different scales of reservoir heterogeneity.
  • Measurement environment—Different methods provide measurements in the different environments. Core analyses typically are performed under ambient temperature and pressure, for example, whereas wireline methods provide indirect measurements at reservoir conditions.
  • Measurement type—Reservoir property measurements are either direct (e.g., core analyses) or indirect (e.g., wireline, well tests).

Before attempting any type of data integration, it is important to understand the nature of the information obtained for each type of attribute measured. Reconciling the three integration factors above is not easy, and there is no procedure or set of absolute rules to follow to do so. Thus, generating a single, consistent value is not straightforward, especially when integrating static and dynamic information, such as core and well-test permeabilities.

Cosentino[36] suggested that the best way to integrate data is stepwise, progressively integrating data from the smallest to the largest scale. An example of this approach is the following four-step process used by many investigators to integrate wireline and seismic data. [46][47][48][49][50][51][52][53][54] The basic idea is to use some attribute from the seismic volume (e.g., acoustic impedance) to guide the interpolation of a measured reservoir property (e.g., porosity).

  1. Calibration. Well data are high-resolution, depth-related, local information, whereas 3D seismic-volume data are spatially dense but vertically lower-resolution, time-related information. These two data types require both vertical and areal calibration, which typically are carried out using synthetic seismograms.
  2. Choosing the seismic attribute. Seismic processing and interpretation packages offer a plethora of seismic attributes of different natures: point or averaged-over-a-time windows that are based on amplitude, time, or a complex trace. Techniques are available to invert a 3D seismic amplitude cube into acoustic impedance, an attribute that is based on sonic velocity and rock density. The main objective in choosing a seismic attribute is to identify the one that works best as a predictor for the attribute of interest. Take care, though, because it is not unusual to find false correlations, ones that have no physical basis for the relationship. The probability of finding a false correlation increases with the number of seismic attributes considered and is inversely proportional to the number of data control points.[55]
  3. Prediction. In the prediction step, the areal distribution of the variable of interest is mapped by integrating the well data and the seismic attribute, using either linear or nonlinear regression models or preferably a geostatistical method such as collocated cokriging.
  4. Cross-validation. Cross-validation involves systematically removing wells from the data set, one by one, and re-estimating their values on the basis of the model selected. Although it is not always performed, cross-validation does provide a means of validating the contribution of the secondary information, which improves the prediction.

This stepwise approach can be used to integrate many data types. For example, it could be applied to core and wireline data. Cores are sparsely sampled relative to wireline logs. First, and importantly, depth-register the core data and logs. Next, select a well-log measurement that correlates to the core measurement, e.g., core-measured porosity, neutron density, or sonic transit time. For the best geologic representation, it is best to establish correlations on the basis of sequences and lithology. Once correlations are established, use regression or geostatistical methods to integrate the data. With the cokriging approach, core data are the primary data and the wireline measurements are the secondary information. As in the previous example, results can be cross-validated to demonstrate the contribution of the wireline data.

Static- and Dynamic-Data Integration

Regardless of the method, the HRGM typically is generated on the basis of static data only; however, this often presents a problem in the reservoir-simulation phase because modifications to the petrophysical properties are required for history matching. Conditioning of stochastic reservoir models with production data has been the focus of considerable research because of the perceived potential of such data integration. [36][56][57][58][59] A priori constraint and a posteriori constraint are the two main lines of this research.

A priori constraint is the traditional approach to integrating well tests into geostatistical models. In it, the well-test data are entered directly into the numerical processing as input information to generate a random permeability field that honors the average well-test data around the wells. In a posteriori constraint, the stochastic model is modified after its generation and is forced to honor the well-test data. This is an inverse problem, where the 3D petrophysical distribution is perturbed until it satisfies some posterior constraint.

Conditional Simulation and Uncertainty Estimation

Geostatistical simulation is well accepted in the petroleum industry today as a method for characterizing heterogeneous reservoirs. It often is preferable to traditional interpolation approaches, in part because it captures the heterogeneous character observed in many petroleum reservoirs and provides more accurate hydrocarbon reserve estimates. Geostatistical simulation methods preserve the variance observed in the data, instead of just the mean value, as in interpolation. Their stochastic approach allows calculation of many equally probable solutions (realizations), which can be post-processed to quantify and assess uncertainty.

Many practitioners are suspicious of stochastic methods—and even reject them outright—because natural processes that form reservoirs are not random. But geostatistical stochastic simulation is not a coin-toss experiment. Furthermore, while it is true that reservoirs are not products of random processes, it also is true that they have attributes that cause them to behave as if they were random. For example, physical and chemical processes modify reservoir characteristics from their original state, confounding our ability to make predictions even when we understand the processes. Such changes cause behavior that can be captured by stochastic principles. [18][24][60]

Recall that kriging is a deterministic method whose function has a unique solution and does not attempt to represent the actual variability of the studied attribute. The smoothing property of any interpolation algorithm replaces local detail with a good average value; however, the geologist and reservoir engineer are more interested in finer-scaled details of reservoir heterogeneity than in a map of local estimates of the mean value. Like the traditional deterministic approach, stochastic methods preserve hard data where known and soft data where informative. Unlike the deterministic approach, though, it provides geoscientists and reservoir engineers with many realizations. The kriged solution is the average of numerous realizations, and the variability in the different outcomes is a measure of uncertainty at any location. Thus, the standard deviation of all values simulated at each grid node is the quantification of uncertainty. [24][60]

What Do We Want From a Simulation?. Which simulation method we choose depends on what we want from a stochastic-modeling effort and—to great extent—the types of available data. Not all conditional simulation studies need a "Cadillac" method. For many, a "Volkswagen" serves the purpose well. Among the reasons for performing stochastic simulation, four important ones are: (1) to capture heterogeneity; (2) to simulate facies or petrophysical properties, or both; (3) to honor and integrate multiple data types; and (4) to quantify and assess uncertainty.

Principles of Stochastic Modeling. In general, conditional simulation requires that the basic input parameters—the spatial model (variograms) and the distribution of sample values (cumulative distribution function, or cdf)—remain constant within a given geologic interval and/or facies, from realization to realization. Typically, the structural and stratigraphic model (major structural surfaces and the discretized layers between them) remains fixed. Because each realization begins with a different, random seed number, each has a unique "random walk," or navigational path through the 3D volume. The random walk provides the simulation algorithm with the order of cells to be simulated, and is different from realization to realization; therefore, the results are different at unsampled locations, producing local changes in the distribution of facies and petrophysical properties in the interwell space. Note that selection of the same random seed always will reproduce the same random walk. This characteristic is for computational convenience. In practice, multiple realizations are performed at or close to the geologic scale, and not necessarily at the flow-simulation scale.

There are two basic categories of conditional simulation methods: pixel-based and object-based. Pixel-based methods operate on one pixel at a time. Object-based methods operate on groups of pixels that are connected and arranged to represent genetic shapes of geologic features. Pixel-based methods can be used with either continuous or categorical data, whereas object-based methods are used only with categorical data.

Pixel-Based Model. A pixel-based model assumes that the variable to be simulated is a realization of a continuous (Gaussian) random function. Using the spatial model, search ellipse, and control data, a pixel-based method simulates values grid node by grid node. Some of the most popular pixel-based algorithms are: turning bands, sequential Gaussian, sequential indicator, truncated Gaussian, and simulated annealing. Each method can produce a range of realizations that capture the uncertainty of an RV, and so the method choice here will be based on the goals and on data types and their availability. The pixel-based method works best in the presence of facies associations that vary smoothly across the reservoir, as often is the case in deltaic or shallow marine reservoirs. No assumption is made about the shape of the sedimentary bodies. This method is preferred when the net-to-gross ratio is high.[36]

Object-Based (Boolean) Model. The algorithms for a Boolean model generate spatial distributions of sedimentary bodies (channels, crevasse splays, reefs, etc.) whose parameters (orientation, sinuosity, length, width, etc.) can be inferred from the assumed depositional model, seismic data, outcrops, and even well-test interpretations. The object-based method simulates many grid nodes at one time, superimposing geometries (e.g., sheets, discs, or sinusoids) onto a background that typically is a shaly lithofacies. The method used for object modeling is referred to as the marked-point process. [61][62][63] This method generally works best with a low net-to-gross ratio and widely spaced wells.[36]

It is difficult to say a priori which type of method, pixel- or object-based, is best. Although we have observed that it is common for practitioners to have a personal bias toward one method or the other, the basis for such preference often is not well founded. For example, method selection often is based on familiarity with the procedure or what is available in a given software package. Additionally, we have observed that geologists tend to prefer object-based methods because they often produce realizations that appear "crisp" and realistic (e.g., deltas look like deltas and channels look like channels). Engineers tend toward pixel-based methods because they require less descriptive input and often are computationally faster than object-based methods. In fact, both methods are computationally sound and offer unique characteristics.

From a practical point of view, the methods can be combined to achieve an effective model. For example, the practitioner could model a transition sequence from offshore to onshore with a pixel-based method, and then superimpose a channel system from a delta using an object-based method.

To help in selecting the appropriate methods, we can offer the following for consideration:

  1. Pixel-based methods are more forgiving in that they require fewer assumptions about the data. As such, the error variance generated from a set of realizations generally will be higher than with object-based modeling. We surmise that pixel-based models create a larger space of uncertainty and therefore are more likely to "capture" the correct solution, even if the initial conceptual geologic model is incorrect.
  2. Object-based models work best when the data density and net-to-gross ratio are low. A set of object realizations will generate a lower error variance than that from a pixel-based model, and thus can be said to have a smaller space of uncertainty. When the conceptual geologic model is strongly supported by the data and is well understood, the method is highly successful; however, because more assumptions about the data are required, the resultant realizations are less forgiving (i.e., if the original conceptual model is wrong, there is little chance of it successfully capturing the correct solution).

Stochastic-Simulation Methods. There are several theoretically sound and practically tested conditional-simulation approaches, and choosing one can be bewildering and daunting for a novice to stochastic methods. parametric-simulation techniques assume that the data have a Gaussian distribution, so that a transform of the data typically is prerequisite. Note that indicator data do not undergo such a transform. Furthermore, data transformations are not required in the object-based method, where only indicator data are required. The steps of parametric simulation are:

  • Perform a normal-score data transform from the z -space to the y -space.
  • Compute and model the variogram (covariance) on the y- normal scores.
  • Perform multiple simulations of the y- normal scores on a grid or within a volume.
  • Back-transform the simulated y- normal scores to the simulated z -values.
  • Post-process the multiple simulations to assess uncertainty.

Turning-Bands Simulation. In turning bands simulation (one of the earliest simulation methods), first the data are kriged and then unconditional simulations are created using a set of randomly distributed bands, or lines. The general procedure is as follows:

  1. Raw data values are kriged to a regular grid.
  2. Numerous random lines (bands) with various azimuths are generated around a centroid located at the grid or volume center. The modeler controls the number of lines.
  3. Unconditional simulations of normal-score transformed data are performed along each line using the transformed-data histogram and variogram.
  4. Values along the lines are linearly interpolated to grid nodes—the more lines, the less interpolation.
  5. Unconditional interpolated values are back interpolated to well locations.
  6. Unconditional interpolated values and well locations are kriged.
  7. The grid of unconditional interpolated values is subtracted from the grid of unconditional results in step 5. This creates a residual map with a value of zero at the well locations.
  8. The residuals are back interpolated from the y-space to the z-space.
  9. The back-interpolated residuals from step 8 are added to the original kriged map from step 1.
  10. The result is a grid or volume of data values that reproduce both the mean of the raw data and the variance.

For a more complete explanation, see Mantoglou and Wilson. [64]

Sequential Simulation. Three sequential-simulation procedures use the same basic algorithm for different data types: sequential Gaussian simulation (SGS) simulates continuous variables, such as petrophysical properties; sequential indicator simulation (SIS) simulates discrete variables, using SGS methodology to create a grid of zeros and ones; and Bayesian indicator simulation (a newer form of SIS) allows direct integration of seismic attributes, and uses a combination of classification and indicator methods.

As described in the literature, the general sequential simulation process is[21][23]:

  1. Perform a normal-score transformation of the raw data.
  2. Randomly select a node that is not yet simulated in the grid.
  3. Estimate the local conditional probability distribution function (lcpd) for the residuals at the selected node. The residuals can be calculated by subtracting the grid of an unconditional simulation from a kriged grid of the unconditional values sampled at the geographic coordinates of the wells.
  4. Create a newly simulated value by adding together the randomly drawn residual value and the mean of the transformed data.
  5. Include the newly simulated value in the set of conditioning data, within a specified radius of the new target location. This ensures that closely spaced values have the correct short-scale correlation.
  6. Repeat until all grid nodes have a simulated value.

As with turning-bands simulation, each time a new random walk is defined, a new and different result will occur. In this case, though, the lcpd is updated continually by the previously simulated values.

Truncated Gaussian Simulation. Developed by Institut Français du Pétrole and the Centré de Géostatistiques, [65][66][67] the truncated Gaussian algorithm simulates lithofacies directly by using a set of cutoffs that partition Gaussian field. The cutoffs commonly are generated from facies proportions calculated from well data. One simple method for doing this is by calculating the vertical proportion curve. A vertical proportion curve is a stacked bar diagram that represents the percentage of all facies in all wells in the study area. The proportion of each facies is calculated layer by layer, where a layer is a subdivision of the reservoir unit being modeled. Truncated Gaussian simulation involves first generating a continuous variable, and then applying cutoffs (termed the Gaussian thresholds) during the simulation. This method works exceptionally well with transitional facies, such as those from foreshore to upper shoreface to lower shoreface to offshore.

Simulated-Annealing Simulation. The simulated-annealing simulation method is borrowed from metallurgy. In metallurgy, when fusing two pieces of metal, the attachment zone is heated to a temperature at which the molecular structure can be rearranged. As the metal cools again, the molecular structure changes and a bond forms where the two pieces of metal are joined. In transferring this idea to stochastic modeling, one produces an initial realization, introduces some particular conditions (new pieces of "metal" to be fused), then "heats" and "cools" it to rearrange the pixels (or objects) to match (band) the particular conditions introduced.

The simulated-annealing simulation method constructs the reservoir model through an iterative, trial-and-error process, and does not use an explicit random-function model. It can be used as the basis for both pixel- and object-based simulation and in either case the simulated image is formulated as an optimization process. [14][60] For example, our desired result might be an image of a sand/shale model with a 70% net-to-gross ratio, an average shale length of 60 m, and average shale thicknesses of 10 m. The starting image has pixels (objects) arranged randomly, with the correct global proportion of sand and shale, but with an incorrect net-to-gross relationship that stems from the completely random assignment of the sand and shale. In addition, the average shale lengths and widths are not correct. During the computation, the annealing algorithm modifies the initial image by swapping information from node to node, and determining whether or not an individual swap improves the realization. This method allows some swap "improvements" to be rejected to prevent the occurrence of a "local minimum," a well-known problem with annealing techniques. The swapping process continues until a final image is produced that matches the statistics of the input data.

The simulated-annealing process produces excellent results, but can be inefficient because millions of perturbations may be required to arrive at the desired image. Nevertheless, the availability of faster computers with more memory are making simulated-annealing simulation methods more attractive.[14][60] They are particularly desirable for integrating dynamic data, such as production histories and well tests, to ensure history matching from any realization.

Boolean simulation methods create reservoir models based on objects that have a genetic significance, rather than building up models one elementary node or pixel at a time. [14][61][62][63][68][69] To use Boolean methods, select a basic shape for each depositional facies that describes its geometry. For example, you might model channels that look sinuous in map view and half-elliptical in cross section, or deltas that look like triangular wedges in map view. The modeler must specify the proportions of shapes in the final model and choose parameters that describe the shapes. Some algorithms have rules describing how geologic bodies are positioned relative to each other. For example, can the objects cross each other like braided streams, or attach like splays and channels? Do objects repulse or attract, or must there be a minimum distance between the shapes?

Once the shape distribution parameters and position rules are chosen, the six-step simulation is performed:

  1. Fill the reservoir model background with a lithofacies, such as shale.
  2. Randomly select a starting point in the model.
  3. Randomly select one lithofacies shape and draw it according to the shape, size, and orientation rules.
  4. Check to see whether the shape conflicts with any control data or with previously simulated shapes. If it does, reject it and go back to step 3.
  5. Check to see whether the global lithofacies proportions are correct. If they are not, return to step 2.
  6. Use pixel-based methods to simulate petrophysical properties within the geologic bodies.

If control data must be honored, objects are simulated at control points first, before simulating the interwell region. Be sure that there are no conflicts with known stratigraphic and lithologic sequences at the well locations.

Boolean techniques currently are of interest to the petroleum industry, and geologists find object modeling particularly satisfactory because the shapes created are based on geometries and facies relationships that have been measured, and because the images look geologically realistic. Criticisms of Boolean modeling are that it requires a large number of input parameters and that a priori knowledge is needed to select the parameter values. Furthermore, in the past, Boolean-type algorithms could not always honor all the control data because the algorithms are not strict simulators of shape, and often require changes to the facies proportions of object sizes to complete a given realization; however, new technology has greatly alleviated this problem. A number of research/academic institutions and vendors continue to research better ways to implement these algorithms.

Uncertainty Quantification and Assessment. As discussed earlier, stochastic models allow quantification of uncertainty related to the geologic description. An infinite number of possible realizations are obtained simply by modifying the seed number. Comparing a sufficiently large number of these realizations then will provide a measurement of the uncertainty associated with the assumed geologic model.

Since the late 1990s, real modeling experiences have generated questions concerning the workflow, aimed at capturing and developing a working definition for uncertainty. Currently, for example, the accepted implementation of stochastic modeling involves four general steps: Produce multiple realizations of the fine-scaled model by changing the seed number; rank the results on the basis of some criteria; upscale the P10, P50, and P90 results; and flow-simulate the above three solutions to capture the range of the uncertainty.

One common criticism of the above workflow is that the actual space of uncertainty is much larger than that explored by the variability of the random function. This concept often is overlooked; we tend to identify uncertainty on the basis of stochastic simulations that fix all input parameters, and simply to change the seed value from simulation to simulation. Our focus so far has been on understanding the random function and the uncertainty around it; now we will turn our attention to other important uncertainty around it, we now turn our attention to other important uncertainties that deal with the geologic model. There are at least five sources of uncertainty in a typical reservoir model[14][24][36][70][71][72][73]:

  1. Uncertainty in data quality and interpretation—Basic data-measurement errors generally are ignored and the data are treated as error-free when modeling the reservoir. The same holds true for interpreted data, such as potential errors in picking seismic time horizons.
  2. Uncertainty in the structural model—The structural model virtually always is created using a deterministic approach (e.g., when seismic time horizons are converted to depth using an uncertain velocity model, then treated as fixed surfaces in the geologic model.) The structural model is one of the largest sources of uncertainty, and greatly affects volumetric calculations.
  3. Uncertainty in the stratigraphic model—Uncertainty in the stratigraphic model is related to the reliability of sequence determination and correlation through the wells.
  4. Uncertainty in the stochastic model choice and its parameters—If the same geologic sequence were modeled using different algorithms, each stochastic simulation method would yield different results and explore a different part of the space of uncertainty; however, the space sampled by the different algorithms would overlap significantly.
  5. Uncertainty from multiple realizations—The uncertainty reported in most stochastic-modeling studies usually is from multiple realizations. Compared to the other sources for error, it often is small (but not always).

The above sources of uncertainty are listed in decreasing order of significance. It should be somewhat intuitive that change in the data, structural model, or sequence stratigraphic model is likely to have a greater impact on the reservoir model than will changing a single parameter of a given random function; however, it is important to understand that when building a geologic model, all levels of uncertainty must be accounted for to achieve an accurate assessment of the space of uncertainty. The next section further defines the order of uncertainty and the relationship between scenarios and realizations.

Realizations, Scenarios, and the Space of Uncertainty. To account for the different sources of uncertainty, we can classify uncertainty into three orders on the basis of the degree of impact on the reservoir model.

  • First-order uncertainty stems from major changes in modeling assumptions, such as changing the data or the structural model, testing of different depositional models, or changing the petrophysical model.
  • Second-order uncertainty is caused by small changes to the parameter of the random function (e.g., the sill, range, or model of the variogram), to the cdfs, or to the stochastic-model choice.
  • Third-order uncertainty results from changes only in the interwell space that are caused by the algorithm selected with its parameterization, and by changing the seed number from realization to realization. (parameters that control first- and second-order uncertainties remain fixed.) We refer to the larger changes in uncertainty that are represented by the first and second order as scenarios, whereas we call the third order changes realizations.

Quite often, scenario modeling plays a key role early in the field development phase, when data are sparse and operators hold fundamental differences of opinion about the appropriate conceptual depositional model to use. For example, they might differ on whether the system is wave-dominated fluvial deltaic or tidal-dominated. Although both are deltaic, the overprint of the oceanic processes changes the strike of dominant sand bodies, making it either parallel to the coast (wave-dominated) or perpendicular to the coast (tidal-dominated). Note that each scenario model will require its own set of realizations. Thus, measuring the total space of uncertainty will require that multiple scenarios and their respective realizations be constructed and their ranges of uncertainty be pooled together. This may cause a great deal of computational effort, but it often is a mistake to assume that by modeling the scenarios without their respective realizations, the critical spectrum of uncertainty will be captured.

Static Displays of Uncertainty. The most common way to visualize uncertainty is as a static view, through summary statistics maps prepared from a suite of realizations. Several types of displays are conventionally are used for this purpose. These include maps of mean and median; maps of spread; uncertainty/probability/risk maps; isoprobability maps; and display of multiple realizations.

Maps of Mean and Median. Mean and median maps are based on the average and median of a given number of conditional simulations. At each cell, the program computes the average or median value for the values at that location from all simulations. When the number of simulations is large, the map converges to the kriged solution. Fig. 2.3 shows the mean and median maps computed from 200 sequential Gaussian simulations (200 SGS) of net pay. Mean and median maps are identical when they are based on the statistics of an infinite number of simulations.

Maps of Spread. Spread is most commonly displayed as a map of the standard deviation (error) at each grid cell, computed from all input maps (Fig. 2.4).

Maps of Uncertainty, Probability, and Risk. The probability of meeting or exceeding a user-specified threshold at each grid cell is displayed using maps of uncertainty, probability, and risk, in which grid-cell values range from 0 to 100%. Fig. 2.5 illustrates schematically how such a map is generated during post-processing. Such maps are used to assess the risk (uncertainty) on the basis of an economic criterion. For example, we might determine from recent drilling whether a well is commercial or noncommercial, on the basis of the probability of encountering 8 m of net pay. In Fig. 2.5, the vertical straight line at 8 m represents the threshold, whereas the left-hand and right-hand curved lines represent the probability distributions of values simulated at two grid nodes for proposed well locations. The left-hand curve shows only a 35% chance of encountering 8 m or more of net pay at its well location, but the right-hand curve shows its location has a 75% chance of meeting this economic criterion. During post-processing, the modeler fixes the threshold, and the program computes the probability of meeting or exceeding it. Fig. 2.6 shows risk maps for thresholds of 8 m and 16 m of net pay. Such maps are very useful for identifying well locations and probable nonreservoir areas.

Isoprobability Maps. Rather than holding the threshold constant, sometimes it is preferable to freeze the probability value and create maps that correspond to local quantiles or percentiles. Such maps are known as isoprobability maps, and their grid-cell values relate to the attribute, rather than to probability. Fig. 2.7 shows a probability plot, from which isoprobability maps are created. In this example, the uncertainty assessment involves modeling net pay. An isoprobability map created at the tenth percentile shows that net pay has a 10% chance of being thick, because the modeled variable is meters; this represents a pessimistic view of the hydrocarbon potential. Conversely, a ninetieth percentile is an optimistic picture of the hydrocarbon potential, showing that there is only a 10% chance that the net pay is very thin. Fig. 2.8 shows isoprobability maps of net pay created for P10 and P90.

Multiple Realizations. Another common format for illustrating uncertainty is simply to display several possible realizations that as a group represent the broad spectrum of outcomes (Fig. 2.9).

Dynamic Displays of Uncertainty. In his discussions on the use of animated (dynamic) displays of the realizations, Srivastava[74] emphasizes that, like a well-produced animated cartoon, a visually effective dynamic display of uncertainty should present the realizations in a gradual and logically successive and informative way (i.e., not simply in random succession). The key to a successful dynamic rendering of uncertainty is finding a way to show all the realizations. Separate displays of realizations would be a step in the right direction, but workstation screen space is limited, as is the patience of the modeler scrolling through the images. The scrolling process could be automated, presenting the realizations one at a time in rapid succession, but successive realizations might differ considerably in local detail, and create a flickering display that would be more irritating than illuminating.[74]

To make an appealing dynamic display, each realization must be treated as if it were a frame from a movie clip, with each successive frame showing minimal differene from the last, so that the brain interprets the minor transitions as gradual movement. Srivastava[74] suggests that it is possible to create an acceptable animation with as few as 10 frames/second.

For example, were we to create realizations of a structural top and make an animated display, the movie clip might show a perpetually undulating dome-shaped feature. The surface would appear fixed at well locations, and thus stable. Moving away from the well control, the surface would appear to swell and heave; small depressions would grow into large, deep holes and valleys. Small bumps would grow into hills and then into larger hills with ridges, and then would shrink again. In several seconds of animation, the modeler could see several hundred realizations, and the variability (uncertainty) in the suite of realizations would show in the magnitude of changes over time. The modeler instantly would recognize areas that show little movement (more certainty) vs. those that wax and wane (greater uncertainty).

In making dynamic displays, though, one must overcome the fact that most geostatistical simulation algorithms have no way to produce a series of realizations that show minimal differences, which animation requires. Recall that to generate a new simulated result, we select a new seed number for the random number generator and rerun the program. But any change in the seed number, however small, produces unpredictable changes in the appearance of the resulting simulation, which means that consecutive seed numbers, for example, could produce simulations that are quite different in their local variability. There is no way, then, to predict which seed numbers will produce similar-enough realizations to construct consecutive frames for the animation. One way we overcome this problem is with a simulation technique known as probability field simulation (P-field simulation), although this technique has its own advantages and disadvantages.

P-field simulation is a conditional simulation technique developed by Froidevaux[75] and Srivastava.[76] The advantage of P-field simulation is that it is ideally suited to the problem of uncertainty animation. It sets up a matrix of probabilities with dimensions that are identical to the 2D or 3D project grid. The spatial model controls the pattern of the probabilities on the matrix; that is, a value of high probably most likely will be adjacent to another high value, and such values could be arranged along the direction of continuity if the variogram is anisotropic. To generate a new realization, one only needs to shift the values in the probability matrix by one row or one column. It is not necessary to generate a new random seed and a subsequent random walk. The result is a small, incremental change from the previous realization. Interestingly, any conditional simulation method that uses unconditional simulation as an intermediate step can be configured to produce a set of ordered realizations that show a progression of small changes from one to the next and that can be animated.[76]

If given viable techniques, geoscientists and reservoir engineers will be able to design and plan more at the workstation. Consider how much more effective a reservoir-development team could be using such techniques—creating a dynamic display for a well in the thickest part of the reservoir, for example, and discovering from the animation that there exists no predictably economically viable net pay. Such techniques could bring reality to the reservoir-modeling process and ensure that planning does not take place on a single, arbitrary model that cannot duplicate reality.[74]

A Geostatistical Reservoir-Modeling Workflow

The final step in the reservoir-characterization process, reservoir modeling, consists of building multiple HRGMs, and upscaling and performing flow simulations.

The HRGM integrates multidisciplinary data. The reservoir architecture is built using depth-converted seismic horizons and stratigraphic data, and the geometries and facies of the depositional environments are simulated within this framework, using information from boreholes, cores, seismic lines, and outcrops. Petrophysical properties (porosity φ, permeability k, and water saturation Sw) are distributed within the appropriate facies. The high-resolution models may contain tens of millions of grid cells and require upscaling before flow simulation.

Part of the reservoir-modeling process can use geostatistical methods that consider the spatial nature of geologic data. Specifically, geostatistical reservoir characterization allows for valid construction of a pdf of hydrocarbon volumes and other key reservoir properties. From such distributions, proven, probable, and possible scenarios (P10, P50, and P90) can be selected and upscaled for presentation to full-field fluid-flow simulators for engineering analysis.

This section outlines the data requirements and steps necessary to create an HRGM that uses geostatistical technology, for input to a fluid-flow simulator. Creating such a model involves integrating the structural, stratigraphic, and petrophysical model into a 3D numerical representation of the reservoir. The high-resolution model typically must be upscaled before importing it to the fluid-flow simulator.

Basic Elements of a Reservoir-Characterization Study

The result of reservoir characterization is the creation of the shared-earth model.[77] This type of model is important in four ways: It is a central part of the reservoir-characterization team's work; it ensures cross-disciplinary data consistency; it allows each discipline to measure how its own interpretation fits with other specialty models; and it leads to a more-consistent global model.

The shared-earth model provides for efficient updating of the critical information necessary for 3D modeling. Exploration and production both benefit from such cross-validation and data integration. The ten basic elements (steps) of the shared-earth model are:

  1. Basic interpretation.
  2. Premodeling organization.
  3. Data preparation and formatting.
  4. EDA.
  5. 3D structural model.
  6. 3D sedimentary model.
  7. 3D petrophysical model.
  8. Upscaled 3D dynamic model.
  9. Flow simulation.
  10. Model assumptions iteration and updating.

Basic Interpretation. At the basic interpretation stage, the discipline expert interprets the primary data, whereas the geologist and geophysicist collaborate on the structure model and sequence definition. The petrophysicist, geologist, and reservoir engineer also decide on how to determine petrophysical properties.

Premodeling Organization. From the premodeling organization step onward, the reservoir modeling requires a multidisciplinary team approach. Premodeling organization involves determining project goals and then designing a workflow (Fig. 2.10) to monitor the progress of the reservoir study. The workflow provides a system of checks and balances that ensures that the necessary data are ready at the appropriate times in the project. It also guarantees that an integrated approach is followed, because each step requires the interaction of multiple disciplines.

Data Preparation and Formatting. Data preparation and formatting is critical to the accuracy of the results and often is extremely time consuming because different software packages import/export data in different formats. The data-preparation process does serve as a quality-control step, though—incomplete, inaccurate, or missing data yield poor results.

EDA. A key step in any study is EDA. In this step, quality control of the data is critical because the relationships between key variables and general data characteristics are identified using various tools, including both classical and geostatistical methods.

3D Structural Modeling. The 3D structural model (Fig. 2.11) shows the larger framework of the reservoir, and consists of two primary elements, the bounding surfaces and the faults. At this stage, there is no volume between the bounding surfaces. Seismic surfaces generally are converted to depth and adjusted for the well tops of the key marker surfaces (e.g., sequence boundaries, parasequence boundaries, and maximum flooding surfaces). Important aspects of fault-building are: (1) fault geometry; (2) fault-to-fault relations; (3) that fault-to-bounding-surface contacts are a perfect match (this prevents later problems during flow simulation); and (4) that the modeling is restricted to those faults that directly impact fluid flow.

3D Sedimentary Modeling. The 3D sedimentary model has two main elements: the definition of the internal stratigraphic layering (bedding geometry) and the definition of the facies. In this step, the sedimentary model must be defined in terms of sequence stratigraphy.

Stratigraphic Model. Once the 3D structural framework has been created and the sequences (reservoir units) identified, the internal bedding geometries are defined within each sequence. Proportional bedding (Fig. 2.12) assumes an equal number of layers everywhere within the sequence, regardless of layer thickness. parallel bedding is a layering scheme that is parallel with other reservoir surfaces (Fig. 2.12) (e.g., parallel to the top or base) or to an internal or external surface marker. Combinations of these layering schemes allow the geologist to depict the depositional bedding geometries more realistically. The layering schemes define lines of correlation inside the model and are used to laterally connect facies and, ultimately, the petrophysical properties.

Facies Model. So far, the 3D stratigraphic model has depicted the structural configuration and internal layering, but the volume still is empty (Fig. 2.13). The next step is to model the facies and simulate their 3D spatial distribution. Facies are defined from cores either as electrofacies (i.e., based on rock properties) or as depositional facies, and are coded using discrete integer values. Each sequence and its associated facies and petrophysical properties are modeled independently of the other sequences. The modeling honors the vertical and lateral facies relationships with the depositional environment. The three data required for facies simulation (Fig. 2.14) are facies codes along the well; porosity and permeability, where available; and markers that indicate the well depths that correspond to the structural surface used to define the overall geometry. Whether a pixel-based (Fig. 2.15) or Boolean (Fig. 2.16) simulation method is chosen depends highly on the data and the depositional environments. Facies modeling is not mandatory, and some studies bypass it, proceeding directly to the simulation of petrophysical properties.

3D Petrophysical Modeling. After facies modeling, the petrophysical properties (net-to-gross, φ, k, Sw) are assigned on a facies-by-facies basis, using the sedimentary model as a template. Figs. 2.15 and 2.16 show the porosity distribution within a pixel simulation and Boolean simulation, respectively. Volumetrics are computed once the petrophysical properties have been simulated.

Upscaled 3D Dynamic Modeling. Up to this phase, successive models were built by adding information to the previous one. The high-resolution petrophysical model often has many millions of grid cells. Current software and computer limitations for the simulation require us to simplify (upscale) the high-resolution model before going to the flow simulator. The upscaling takes into account the coarsening of the grid (x,y) dimensions (Fig. 2.17 ) and defines stratigraphic layering, sequence-by-sequence. Upscaling the grid geometry also upscales the petrophysical properties.

Flow Simulation. Flow simulation is an important step in the shared-earth model, and is the process through which the model assumptions are iterated and updated. The next section discusses the iteration and updating of model assumptions, but as a topic, flow simulation itself is beyond the scope of this chapter. For a complete discussion of numerical-reservoir simulation, see Chap. 17 in the Reservoir Engineering and Petrophysics volume of this Handbook.

Model Assumption Iteration and Updating. It is unlikely that a history match will be achieved on the first flow simulation. A global history might be matched, but locally, wells are unlikely to match the pressure and production history. At this point, it is necessary to revisit the model assumptions, for which the reservoir engineer's input is needed. From looking at the flow-simulation results, the reservoir engineer can offer valuable insight into which parameters are the most sensitive to flow, and how to tune the parameters. Rather than adjusting the relative permeability curves to match history, it may be better to change the modeling parameters and generate an updated reservoir model. Local adjustments may provide a history match at a point in time, but the model might still be a poor predictor of future performance.

Benefits of an Integrated 3D Reservoir Model

In today's economy, a model of sufficient detail is required to make the best reservoir-management decisions, accounting for uncertainty, toward the most efficient recovery of hydrocarbons. Six motivating factors for integrated 3D reservoir modeling are:

  • The need for reliable estimates of gross rock volume and original hydrocarbons in place, which are important for determining the economics of producing the reservoir, determining production facility requirements, ranking development opportunities of alternative reservoirs, and allocating equity shares with partners.
  • That a good reservoir model is invaluable in selecting well locations and well designs (vertical, horizontal, multilateral), and in assessing the number of wells needed to produce the reservoir economically.
  • The need to assess bypassed pay potential and the value of infill drilling.
  • That the integration of all static and dynamic data in a consistent framework ensures a better model.
  • That modern portfolio management includes risk assessment. A stochastic-modeling method helps quantify uncertainty in the HRGMs.
  • That flow simulation and production performance is not based on the probable (P50) scenario. Geostatistical methods allow us to test several scenarios and select realizations representing the P10, P50, and P90 outcomes, for example.

Practical Considerations and Rules of Thumb

Tables 2.1 through 2.4 summarize the practical considerations for each of the major topics discussed so far in this chapter.

Geologic and Reservoir-Engineering Issues

Reservoir modeling involves several geologic and engineering elements, though these actually are difficult to categorize strictly as either geologic or engineering because of the cause/effect relationship they have with one another. For example, the modeling scale traditionally is thought of as a geologic element, but it affects the amount of upscaling required, and so becomes an engineering element, as well. Likewise, stochastic-modeling methods provide many plausible images of the reservoir, thus generating multiple realizations and scenarios, an operation generally performed by the geoscientist. Ranking and selecting these realizations and scenarios are the final steps before going to the flow simulator and are performed as a joint effort, but a stochastic-modeling study puts onto the reservoir engineer the additional burden of history-matching multiple models, which can be a major undertaking. Thus, the modeling team would be wise to select a limited appropriate set of models for this effort.

Modeling Scale

Geologists want the highest-resolution geologic model possible, much to the dismay, though, of the reservoir engineer tasked with creating it. Consider, for example, a 5 × 8 km reservoir that is 400 m thick. If the geologist decides to create grid cells that are 50 × 50 m horizontally and 1 m vertically, the resultant 3D grid will have more than 6.5 million cells. Although this is not an especially large stochastic model, it is larger than most reservoir engineers are willing to flow-simulate. Thus, the high-resolution 3D grid is coarsened and the petrophysical properties upscaled to a few-hundred-thousand-cell dynamic model whose size is more compatible with the flow simulator.

Stochastic modeling at a coarser scale often is suggested by reservoir engineers, who tend to consider such an approach as equally valid and far more practical than creating an HRGM and then upscaling it before flow simulation. The argument for coarsening embraces the idea that upscaling decimates the geologic and petrophysical detail, and so questions the need to model at a scale finer than that of the flow-simulation grid to begin with. Furthermore, the upscaling process is fraught with assumptions, and because not all upscaling techniques are equal, they can bias the results to the selected method.

The results of these two approaches are not equivalent, and the volume support issue at least partly can explain the concerns about performing a conditional simulation at too coarse a scale. A coarse-scale simulation may save time, but it relies on a priori knowledge about vertical layering (e.g., predefined flow units), the optimum horizontal cell size, and the petrophysical property distributions, parameters that neither should be predefined arbitrarily nor based solely on volumetric and material-balance calculations. Unfortunately, the high-resolution stochastic-modeling approach usually will increase the cycle time of a reservoir study because there is more work to be done. Constructing a stochastic model at too coarse a resolution often has proved inaccurate. It can imply a blind assumption that the geologic detail in a higher-resolution model is unnecessary. That said, there is a limit to the capabilities of a flow simulator, and an overly high-resolution model serves no one's interest. The key is to strike a balance that keeps the project objectives clearly in mind.

The most advantageous workflow uses an appropriate fine-scale model as a guide when defining the flow units and constructing the flow-simulation grid. Both approaches undoubtedly will decrease or "smooth" the existing heterogeneity, but modeling first at a finer scale can produce a more informative, upscaled grid that preserves the critical heterogeneity. Considering the example above regarding the size of the model, a typical flow-simulation grid cell easily could contain 450 000 m 3 (300 × 150 × 10 m) of rock. It is unrealistic to think that such a volume of heterogeneous rock could be represented adequately by a single value of porosity and one permeability value in each of the x, y, and z domains. It would be prudent to optimize the upscaled flow grid on a detailed geologic model where coarser cells could be used for nonreservoir layers and finer cells used in key reservoir layers, where the effects of heterogeneity are important. Note, however, that this does not mean that finer-scale models are the norm—detailed models must be justified.

Regridding and Upscaling

Regridding and upscaling generally are considered part of the current workflow for reservoir characterization as a way of coarsening a 3D grid for numerical reservoir simulation, which makes the flow-simulation computational process achievable in a reasonable time frame; however, with increased computer power and innovative approaches to flow simulation, upscaling may not be an issue in the near future. During the 1990s, the model size of flow-simulation projects grew from 50,000 cells to more than five million because of the availability of faster computers and parallel-processing technology, and there is little doubt that this trend will continue. Additionally, fast streamline simulators capable of handling million-node models or more are becoming very popular. Though they are somewhat more limited than full-field flow simulators, they are sufficient for resolving many reservoir questions.

Multiple Simulations and Scenarios

Stochastic-modeling methods provide many plausible images of the reservoir. Recall that realizations are the result of sampling the uncertainty by changing only the seed number from simulation to simulation, whereas scenarios reflect major changes in the assumptions about the depositional model or the structural framework. Thus, each scenario can have multiple realizations, with the possibility of generating hundreds of models that honor the available data.

Ranking the Stochastic Models

Obviously, no company can afford the time or expense to history-match all the realizations generated in a stochastic-modeling study, nor is it necessary to do so. The primary reason for creating all these models is to quantify uncertainty in the geologic model to make better reservoir-management decisions. The fast streamline simulators offer a means to screen and rank realizations relatively quickly on the basis of some agreed-upon criteria. Once the realizations are ranked, the simulations most closely corresponding to, for example, a P10, P50, and P90 are upscaled and imported to the flow simulator, so that flow simulation and production performance no longer are based only on the most likely (P50) scenario. The P10 and P90 results provide a measure of uncertainty in future production performance and are error bars on the P50 model. Narrow error bars offer more confidence in the predicted performance, but wide error bars indicate more uncertainty and more potential risk.

Volume Support

Data in the petroleum industry comes from a variety of sources, measured across many different scales, e.g., core permeability vs. well-test permeability, or seismic data vs. well data. In practice, such data often are integrated without regard to the vast differences in their measurement scales, which is problematic. An excellent example of this is the traditional calibration of core porosity to log-derived porosity. Core-plug measurements of porosity often are aligned with log data over a common interval by using a mathematical adjustment, such as some form of linear or nonlinear regression. In this example, the assumption is that the core data are more precise because porosity is measured in the laboratory on a small rock volume. Although the procedure is mathematically possible, it is not necessarily appropriate because it ignores the issue of support, the rock volume on which porosity is measured, which should make any such comparison suspect, particularly when data are being interpolated. In this case, the mathematical calibration procedure is tantamount to shifting, stretching, and squeezing the data to achieve a better fit. In other physical sciences, such as ore mining, computations of variables measured on a different support are not performed unless adjustment is made for volume support because not doing so can lead to very costly errors in ore reserve estimates. In the petroleum industry, though, the change of support typically is not addressed.

Consider another example of the volume-support effect when estimating porosity in a typical grid cell using a common computer gridding algorithm. The size of a 2D grid cell often is determined using a rule of thumb of one well per grid cell. A grid mesh consisting of 50-m2 grid cells would be considered a fine grid mesh, and interpolating porosity values from boreholes over such a fine mesh would not be given a second thought. The depth of investigation of a neutron-porosity log is approximately 0.08 m, and the area of resolution around the borehole is approximately 0.02 m2. During an interpolation of a porosity measurement over an area of rock of 0.02 m2, the porosity value is implicitly assumed to be the same as for area of 2500 m2. With a grid cell of 300 × 150 m, the assumption extends over an area of 45 000 m2. This problem becomes increasingly more dramatic in 3D.

Geostatistics attempts to combine appropriately data that have been measured at different scales, using a calibration method that categorizes covariables as hard data and soft data. These terms often are used informally, their difference generally being relative, tied to the degree of interpretation required to derive the data values and their scale of measurement. In the earlier example regarding core-plug measurements of porosity, the core porosity is the hard datum and the log porosity is the soft datum. Well data, too, are considered hard data, whereas seismic data are soft data. There are two good reasons for calibration[78]: First, it forces the proponent of any piece of information to document its origin and its relevance to the modeling effort, and second, it allows the impact of that information on the final reservoir forecast to be assessed through sensitivity analysis or by using geostatistical stochastic simulation.

In practice, the hard data are honored exactly in the numerical model, ignoring measurement error, whereas the soft data are honored less precisely and serve as a guide during the interpolation or simulation process outside the range of influence of the hard-data values. The degree to which the soft data are honored depends partially on the strength of their correlation to the hard data. The support of the soft data is assumed the same as in the traditional linear (or nonlinear) regression method. The degree of influence by the soft data affects only the weights used in the estimation or simulation procedure and is a function of a cross-covariance model that considers correlation, distance, and orientation. The scale estimated or simulated through the calibration process is that of the hard data.

Most geostatistical software packages can take into account hard-data measurement errors, but soft-data errors typically are much more difficult to measure and quantify. For example, calibration data often are so sparse that a proper calibration is impossible; for these cases, the calibration is borrowed from an analog. Any a priori decision to freeze the uncalibrated data must be made by all members of the reservoir-modeling team.

Geostatistical Technology Into the Next Decade

Geostatistics is a rapidly evolving branch of applied statistics and mathematics for creating realistic models and quantifying uncertainty. There is much discussion among users and vendors on how to advance this technology. Some of the key issues are soft-data integration, uncertainty quantification, advances in computer technology, and the use of intelligent workflow managers (IWMs).

Soft-Data Integration

Integrating soft data (e.g., seismic attributes or well-test data) into the reservoir model is possible using geostatistical methods, but the results are not always satisfactory. There are no reliable methods to integrate seismic attributes in true 3D, mainly because of the low vertical resolution of the seismic information. Selecting appropriate variables from the plethora of seismic attributes is both overwhelming and confusing. Most of the attributes are highly correlated simply because they are derivatives of one another, but there is no guarantee that their correlation with a reservoir property is meaningful. In the future, additional tools will become available to screen and rank seismic attributes, perhaps making linear and nonlinear combinations, for use in reservoir modeling. Other techniques will be developed to help us better understand the relationships between reservoir properties and seismic attributes. Also likely are advances in static- and dynamic-data integration. The earlier in the modeling process that dynamic data are integrated, the easier it is for the reservoir engineer to make a history match.

Uncertainty Quantification

Uncertainty can be quantified through stochastic simulation, but to avoid underestimating the uncertainties, this approach must be used with an understanding of the modeling assumptions. Many uncertainty results have been arrived at simply by calculating the variability between realizations from fixed parameters, neglecting uncertainties in the modeling parameters themselves. Such an approach is sure to lead to an optimistic view of uncertainty, though. The Great Uncertainty Study shows that it is possible to run "brute force" conditional simulations incorporating all uncertainty[6] This study goes to the extremes of quantifying uncertainty on structural, sedimentary, and fluid parameters, and combining global realizations into 58 global 3D realizations that are upscaled and flow-simulated. It also takes into account model parameter uncertainty, using Monte-Carlo sampling of some of the input parameters, e.g., net-to-gross of each layer for each realization. Dubrule[24] questions whether this approach is destined to become an industry standard. The answer is probably "no" because it is important to understand the assumptions behind a geostatistical uncertainty analysis. Not all parameter modifications have the same impact on the amount of uncertainty captured. As previously discussed, first-order parameter changes have the greatest impact on uncertainty, whereas realizations (third order) probably measure the smallest space of uncertainty. In the future, a suite of tools is likely to be developed to help evaluate parameter sensitivities and their impact on sampling the space of uncertainty. Once the key parameters are determined, perturbations of them will generate a suite of simulations for uncertainty analysis.

Advances in Computer Technology

Computer [especially personal computer (PC)] technology continues to advance rapidly. The advent of 2.0-gigahertz and higher microprocessors having two gigabytes and more of random access memory (RAM) have caused PCs to rival the UNIX world, and at much less expense. Such advances have encouraged most reservoir-modeling-software vendors to port their code to run under the Microsoft Windows NT, Microsoft XP, or Linux operating systems. Some of the newer companies offer geostatistical reservoir-modeling software that operates only on a PC. These trends are likely to continue.

Parallel Processing. Several vendors offer parallel fluid-flow numerical-simulation code, but this technology seems not currently to be used to its fullest capabilities. Faster, less expensive computers should advance the use of parallel-processing technology, not only for fluid-flow simulation, but also for generating a significant number of 3D stochastic models to quantify uncertainty. This technology will accelerate the ranking, selection, and upscaling of the multiple realizations.

Large Stochastic Models. As computing power increases and parallel processing becomes the norm, larger and larger models will be flow-simulated. Possibly, the need to upscale eventually will be bypassed.

Faster History Matching. When coupled with the use of computers that use parallel processing, flow simulating more geologically realistic models on the basis of integrated static and dynamic data should accelerate the history-matching process.


For novice modelers, the introduction to geostatistical-modeling software can be an overwhelming experience, so that they tend to choose the path of least resistance—accepting recommended defaults—when creating a stochastic model. Some commercial software has reasonably well-thought-out workflow managers to assist the user; however, what we should see in the future is IWMs. The IWM will interview the user, asking questions about the quantity, quality, and types of data, and assumptions about the depositional environments. The interview will lead the user through a series of panels recommending various options. These options may then lead to other questions.


Geostatistics is a powerful technology toolbox for reservoir characterization, and stochastic modeling clearly is not a simple game of tossing a coin for predicting what is present in the interwell space. Furthermore, numerical flow simulation and production performance are not based on the "most likely" (P90) scenario, and geostatistical methods allow us to test several scenarios and to select realizations representing, for example, the P10, P50, and P90 outcomes.

A good reservoir model is invaluable in selecting well locations and well designs (vertical, horizontal, and multilateral) and in assessing not only the number of wells needed to produce the reservoir economically, but also the bypassed pay potential and the value of infill drilling. A model of sufficient detail is required to make the best reservoir-management decisions, accounting for uncertainty, for the most efficient recovery of hydrocarbons.

Developing an integrated 3D reservoir model answers this requirement because it provides a reliable way to estimate the gross rock volume and original hydrocarbons in place to determine the economics of producing the reservoir, determine production facility requirements, rank development opportunities of alternative reservoirs, and allocate equity shares with partners. Modern portfolio management includes risk assessment, and a stochastic-modeling method helps to quantify uncertainty in the HRGMs. Using geostatistical reservoir-modeling technologies to integrate all static and dynamic data in a consistent framework ensures a better model.


a = the Y-intercept
b = the slope of the function
c = the vector of covariances constructed from measured samples in a given neighborhood to the target location, with the same units as for the regionalized variable, squared
C = the covariance matrix constructed from measured sample pairs in a given neighborhood, with the same units as for the regionalized variable, squared
Cij = the covariance between two measured samples at a given location, where i and j are the indices of the sample pairs and vary between the first and last measurements; units are those of the regionalized variables, squared
RTENOTITLE = the mean covariance value between pairs of values whose separation interval is equal to a distance of vector RTENOTITLE; units are those of the measured variable, squared
Cov(0) = the minimum covariance in the covariance function; units are those of the measured variables, squared
Covx,y = covariance (untransformed) of variables X and Y
C00 = the sill of the variogram or the value of the covariance at a lag beyond which the covariance no longer changes (usually a value close to zero), with the same units as for the regionalized variable, squared
C0i = the matrix notation for the covariance between the unsampled location and a neighbor
h = separation distance
RTENOTITLE = the lag or distance vector between pairs of points whose units are terms of the coordinate system
k = permeability, md or darcies
mx = sample mean of X, units are those of the X variable
my = sample mean of Y, units are those of the Y variable
M = population mean
n = the total number of samples
np = the total number of sample pairs
N = generic nomenclature referring to a total number of items, such as samples or equations in a matrix
r = correlation coefficient
r2 = coefficient of determination
Sw = water saturation, the percentage of the total fluid that is attributable to water; fraction or percent
ti = a second regionalized variable at a location where i varies between the first and last measurements
RTENOTITLE = the measured variable ti at a location RTENOTITLE distance away, where i varies between the first and last measurements, with the same units as for the regionalized variable
tj = the secondary regionalized variable co-located with the primary regionalized variable at a given location, with the same units as for the secondary regionalized variable
t0 = the secondary variable located at the target location, with the same units as for the secondary regionalized variable
X = the independent variable whose units are those of the X variable
Xi = the measured value of variable X, with i varying between the first and last measurements; units are those of the X variable
yi = data value in transformed space at a specific location
Y = the dependent variable whose units are those of the Y variable
Yi = the measured value of variable Y, with i varying between the first and last measurements; units are those of the Y variable
z = the regionalized variable (primary attribute)
RTENOTITLE = the measured value of a regionalized variable z at location RTENOTITLE, where i varies between the first and last measurements
RTENOTITLE = the measured value of a regionalized variable RTENOTITLE at a location RTENOTITLE distance away from RTENOTITLE
RTENOTITLE = the value at an unsampled location to be estimated from a linear combination of n values of a regionalized variable RTENOTITLE; units are those of the regionalized variable
βj = the undetermined weight assigned to tj , and varies between 0 and 100%
β0 = the undermined weight assigned to t0, and varies between 0 and 100%
RTENOTITLE = the mean-squared difference between two measured variables whose separation interval is equal to a distance vector RTENOTITLE
RTENOTITLE = the cross-variance function between two random variables z and t for a given distance vector, RTENOTITLE,with the same units as for the measured variable, squared
λ = the vector of undetermined weights for measured samples within a neighborhood, unitless
λi = the weight of the regionalized variable zi at a given location i, unitless
μ = a Lagrange multiplier
σ = standard deviation
RTENOTITLE = the kriging variance, with units that are in terms of the regionalized variable, squared
φ = porosity, fraction or percent


We would like to acknowledge the following individuals whose help greatly enhanced the depth and clarity in this chapter. Their contributions are immeasurable and we are very grateful for their time and effort. These are Robert Ehrlich, Salt Lake City, Utah; Mohan Srivastava, Toronto, Canada; Don Myers, Tucson, Arizona; and Sanjay Srinivasan, University of Calgary, Calgary. We also thank the book editors, Hal Warner and Larry Lake, who kept us honest, in line, and more or less on time. Finally, we owe a huge debt of gratitude to Amanda Van Beuren, wordsmith par excellence. Without her, our text may have gone undeciphered for centuries.


  1. 1.0 1.1 Krige, D.G. 1951. A Statistical Approach to Some Basic Mine Evaluation Problems on the Witwatersrand. J. Chem. Metall. Min. Soc. 52: 119.
  2. 2.0 2.1 Sichel, H.S. 1952. New methods in the statistical evaluation of mine sampling data. Trans. of the London Inst. of Mining & Metallurgy 61: 261-288.
  3. 3.0 3.1 Matheron, G. 1962. Traite de Geostatistique Appliquee, tome 1, 111. Paris, France: Editions Technip.
  4. Deutsch, C.V. and Meehan, D.N. 1996. Geostatistical Techniques for Improved Reservoir Management: Methodology. Hart's Petroleum Engineer Intl. (March): 21.
  5. Beattie, C.I., Mills, B.R., and Mayo, V.A. 1998. Development Drilling of the Tawila Field, Yemen, Based on Three-Dimensional Reservoir Modeling and Simulation. Presented at the SPE Annual Technical Conference and Exhibition, New Orleans, Louisiana, 27-30 September 1998. SPE-49272-MS.
  6. 6.0 6.1 Lia, O. et al. 1997. Uncertainties in Reservoir Production Forecasts. AAPG Bulletin 81 (5): 775.
  7. King, M.J. and Mansfield, M. 1999. Flow Simulation of Geologic Models. SPE Res Eval & Eng 2 (4): 351-367. SPE-57469-PA.
  8. Tyler, N. and Gholston, J.K. 1988. Heterogeneous Submarine Fan Reservoirs, Permian Spraberry Trend, West Texas. Report of Investigations No. 171, Texas Bureau of Economic Geology, Austin, Texas, 37.
  9. Tyler, N. and Finley, R.J. 1991. Architecture Controls on the Recovery of Hydrocarbons from Sandstone Reservoirs. SEPM Concepts in Sedimentology Paleontology 3: 1–5.
  10. Weber, K.J. 1986. How Heterogeneity Affects Oil Recovery. Reservoir Characterization, L.W. Lake and H.B. Carroll Jr. ed., 487-544. Orlando, Florida: Academic Press.
  11. Srivastava, R.M. 1994. An Overview of Stochastic Methods for Reservoir Simulation. Stochastic Modeling and Geostatistics, J.M. Yarus and R.L. Chambers ed., Ch. 3, 3-16. Tulsa, Oklahoma: AAPG Computer Applications in Geology, AAPG.
  12. King, M.J. and Mansfield, M. 1997. Flow Simulation of Geologic Models. Presented at the SPE Annual Technical Conference and Exhibition, San Antonio, Texas, 5-8 October 1997. SPE-38877-MS.
  13. Botton-Dumay, R., Cogrel, Y.M., Massonnat, G.J. et al. 1997. Realistic Methodology for Permeability Modelling Used for Conserving Heterogeneity During Assisted History Matching-Applied to a Turbiditic Reservoir Field Case. Presented at the SPE Annual Technical Conference and Exhibition, San Antonio, Texas, 5-8 October 1997. SPE-38677-MS.
  14. 14.0 14.1 14.2 14.3 14.4 Srinivasan, S. and Caers, J. 2000. Conditioning reservoir models to dynamic data - A forward modeling perspective. Presented at the SPE Annual Technical Conference and Exhibition, Dallas, Texas, 1-4 October 2000. SPE-62941-MS.
  15. 15.0 15.1 15.2 Davis, J.C. 1986. Statistics and Data Analysis in Geology, second edition. New York City: John Wiley & Sons.
  16. 16.0 16.1 16.2 16.3 Mendenhall, W. 1971. Introduction to Probability and Statistics, Belmont, California: Wadsworth Publishing Co.
  17. 17.0 17.1 17.2 17.3 17.4 17.5 17.6 17.7 17.8 Sokal, R.R. and Rohlf, J.F. 1969. Biometry, San Francisco, California: W.H. Freeman and Co.
  18. 18.00 18.01 18.02 18.03 18.04 18.05 18.06 18.07 18.08 18.09 18.10 18.11 18.12 Isaaks, E.H. and Srivastava, R.M. 1989. An Introduction to Applied Geostatistics, Oxford, UK: Oxford University Press.
  19. Hald, A. 1952. Statistical Tables and Formulas, Table I. New York City: John Wiley & Sons, Inc.
  20. Koch, G.S. Jr. and Link, R.F. 1981. Statistical Analysis of Geological Data, 850. New York City: Dover Publications, Inc.
  21. 21.0 21.1 21.2 21.3 21.4 21.5 21.6 Deutsch, C.V. and Journel, A.G. 1998. GSLIB: Geostatistical Software Library and User's Guide, second edition. Oxford, UK: Oxford University Press.
  22. Olea, R.A. 1991. Geostatistical Glossary and Multilingual Dictionary. Oxford, UK: Oxford University Press.
  23. 23.0 23.1 Deutsch, C.V. 2002. Geostatistics Reservoir Modeling. Oxford, UK: Oxford University Press.
  24. 24.0 24.1 24.2 24.3 24.4 Dubrule, O. 1998. Geostatistics in Petroleum Geology. AAPG Course Note Series #38, 52. Tulsa, Oklahoma: AAPG.
  25. 25.0 25.1 25.2 25.3 Chambers, R.L., Yarus, J.M., and Hird, K.B. 2000. Petroleum Geostatistics for the Nongeostatistician—Part 1. The Leading Edge (May): 474.
  26. Watermeyer, G.A. 1919. Applications of the Theory of Probability in the Determination of Ore Reserves. J. Chem. Metall. Min. Soc. 19: 97.
  27. Truscott, S.J. 1929. The Computation of the Probable Value of Ore Reserves from Assay Results. Trans. Inst. Min. Metall. 38: 482.
  28. de Wijs, H.J. 1951. Statistics of Ore Distribution. Part 1: Frequency Distribution of Assay Values. Geologie en Mijnbouw 13: 365.
  29. 29.0 29.1 29.2 Henley, S. 1981. Nonparametric Geostatistics. Essex, UK: Elsevier Applied Science Publishers.
  30. Matheron, G. 1963. Principles of Geostatistics. Economic Geology 58 1246.
  31. Matheron, G. 1970. Random Functions and Their Application in Geology. Geostatistics, A Colloquium, D.F. Merriam ed., 79-88. New York City: Plenum.
  32. 32.0 32.1 32.2 32.3 32.4 32.5 Olea, R.A. 1994. Fundamentals of Semivariogram Estimation, Modeling, and Usage. Stochastic Modeling and Geostatistics, ed. J.M. Yarus and R.L. Chambers, Ch. 3, 27-35. Tulsa, Oklahoma: AAPG Computer Applications in Geology.
  33. 33.0 33.1 33.2 33.3 33.4 33.5 33.6 33.7 33.8 33.9 Hohn, M.E. 1999. Geostatistics and Petroleum Geology, second edition. Amsterdam, The Netherlands: Kluwer Academic Publishers.
  34. Olea, R.A. 1975. Optimum Mapping Techniques Using Regionalized Variable Theory. Series on Spatial Analysis, Ch. 3, 137. Lawrence, Kansas: Kansas State Geological Survey.
  35. Christakos, G. 1992. Random Fields Models in the Earth Sciences. San Diego, California: Academic Press.
  36. 36.0 36.1 36.2 36.3 36.4 36.5 36.6 Cosentino, L. 2001. Integrated Reservoir Studies. Paris, France: Institut Français du Pétrole Publications, Editions Technip.
  37. Clark, I. 1979. Practical Geostatistics. London, England: Applied Science Publishers.
  38. 38.0 38.1 Journel, A.G. and Huijbregts, C.J. 1978. Mining Geostatistics. London, England: Academic Press.
  39. Tran, T.T.B. 1996. The Missing Scales and Dirst Simulation of Block Effective Properties. Journal of Hydrology 182: 37.
  40. 40.0 40.1 Wackernagel, H. 1995. Multivariate Geostatistics: An Introduction with Applications. Berlin, Germany: Springer-Verlag.
  41. 41.0 41.1 41.2 Journel, A.G. 1989. Fundamentals of Geostatistics in Five Lessons. Short Course in Geology, Vol. 8, 40. Washington, DC: AGU.
  42. Journel, A.G. 1986. Geostatistics: Models and Tools for Earth Sciences. Math. Geol. 18: 119.
  43. Doyen, P.M., Psaila, D.E., and Strandenes, S. 1994. Bayesian Sequential Indicator Simulation of Channel Sands From 3-D Seismic Data in The Oseberg Field, Norwegian North Sea. Presented at the SPE Annual Technical Conference and Exhibition, New Orleans, Louisiana, 25-28 September 1994. SPE-28382-MS.
  44. Bleines, C. et al. 2001. ISATIS. Avon Cedex, France: Geovariances, Avon Cedex.
  45. Ahmed, U., Crary, S.F., and Coates, G.R. 1991. Permeability Estimation: The Various Sources and Their Interrelationships. J Pet Technol 43 (5): 578-587. SPE-19604-PA.
  46. Doyen, P.M. 1988. Porosity from seismic data: A geostatistical approach. Geophysics 53 (10): 1263–1275.
  47. Doyen, P.M. and Guidish, T.M. 1992. Seismic Discrimination of Lithology and Porosity, a Monte Carlo Approach. Reservoir Geophysics: Investigations in Geophysics, ed. R.E. Sheriff, Ch. 7, 243-250. Tulsa, Oklahoma: SEG.
  48. Xu, W., Tran, T.T., Srivastava, R.M. et al. 1992. Integrating Seismic Data in Reservoir Modeling: The Collocated Cokriging Alternative. Presented at the SPE Annual Technical Conference and Exhibition, Washington, D.C., 4-7 October 1992. SPE-24742-MS.
  49. Araktingi, U.G. et al. 1993. Integration of Seismic and Well Log Data in Reservoir Modeling. Reservoir Characterization III, B. Linville ed., 515-554. Tulsa, Oklahoma: PennWell Publishing.fckLR↑ Chambers, R.L., Zinger, M.A., and Kelly, M.C. 1
  50. Chambers, R.L., Zinger, M.A., and Kelly, M.C. 1994. Constraining Geostatistical Reservoir Descriptions with 3-D Seismic Data to Reduce Uncertainty. Stochastic Modeling and Geostatistics, ed. J.M. Yarus and R.L. Chambers, Ch. 3, 143-158. Tulsa, Oklahoma: AAPG Computer Applications in Geology, AAPG.
  51. Bashore, W.M. et al. 1994. Importance of a Geological Framework and Seismic Data Integration for Reservoir Modeling and Subsequent Fluid-Flow Predictions. Stochastic Modeling and Geostatistics, ed. J.M. Yarus and R.L. Chambers, Ch. 3, 159-176. Tulsa, Oklahoma: AAPG Computer Applications in Geology, AAPG.
  52. Wolf, D.J., Withers, K.D, and Burnaman, M.D. 1994. Integration of Well and Seismic Data Using Geostatistics. Stochastic Modeling and Geostatistics, ed. J.M. Yarus and R.L. Chambers, Ch. 3, 177-200. Tulsa, Oklahoma: AAPG Computer Applications in Geology, AAPG.
  53. Burns, C.S. 1997. Integrating 3D Seismic into the Reservoir Model, and Its Impact on Reservoir Management. Presented at the Latin American and Caribbean Petroleum Engineering Conference, Rio de Janeiro, Brazil, 30 August-3 September 1997. SPE-38996-MS.
  54. Behrens, R.A. and Tran, T.T. 1998. Incorporating Seismic Data of Intermediate Vertical Resolution Into 3D Reservoir Models. Presented at the SPE Annual Technical Conference and Exhibition, New Orleans, Louisiana, 27-30 September 1998. SPE-49143-MS.
  55. Kalkomey, C.T. 1997. Potential Risks When Using Seismic Attributes as Predictors of Reservoir Properties. The Leading Edge (March): 247.
  56. Alabert, F.G. 1989. Constraining Description of Randomly Heterogeneous Reservoirs to Pressure Test Data: A Monte Carlo Study. Presented at the SPE Annual Technical Conference and Exhibition, San Antonio, Texas, 8-11 October 1989. SPE-19600-MS.
  57. Deutsch, C.V. 1993. Conditioning Reservoirs to Well Test Information. Geostatistics Troia, ed. A.O. Soars, 505. Dordrecht, The Netherlands: Kluwer Academic Publishers.
  58. Deutsch, C.V. and Journel, A.G. 1994. Integrating Well Test-Derived Effective Absolute Permeabilities in Geostatistical Reservoir Modeling. Stochastic Modeling and Geostatistics, ed. J.M. Yarus and R.L. Chambers, Ch. 3, 131-142. Tulsa, Oklahoma: AAPG Computer Applications in Geology, AAPG.
  59. Blanc, G., Guerillot, D., Rahon, D. et al. 1996. Building Geostatistical Models Constrained by Dynamic Data - A Posteriori Constraints. Presented at the European 3-D Reservoir Modelling Conference, Stavanger, Norway, 16-17 April 1996. SPE-35478-MS.
  60. 60.0 60.1 60.2 60.3 Chambers, R.L., Yarus, J.M., and Hird, K.B. 2000. Petroleum Geostatistics for the Nongeostatistician—Part 2. The Leading Edge (June): 592.
  61. 61.0 61.1 Haldorsen, H.H. and Damsleth, E. 1990. Stochastic Modeling (includes associated papers 21255 and 21299 ). J Pet Technol 42 (4): 404-412. SPE-20321-PA.
  62. 62.0 62.1 Tyler, K., Henriquez, A., and Svanes, T. 1994. Modeling Heterogeneities in Fluvial Domains: A Review of the Influences on Production Profiles. Stochastic Modeling and Geostatistics, ed. J.M. Yarus and R.L. Chambers, Ch. 3, 77-90. Tulsa, Oklahoma: AAPG Computer Applications in Geology, AAPG.
  63. 63.0 63.1 Hatløy, A.S. 1994. Numerical Modeling Combining Deterministic and Stochastic Methods. Stochastic Modeling and Geostatistics, ed. J.M. Yarus and R.L. Chambers, Ch. 3, 109-120. Tulsa, Oklahoma: AAPG Computer Applications in Geology, AAPG.
  64. Mantoglou, A. and Wilson, J.W. 1982. The Turning Bands Methods for Simulation of Random Fields Using Line Generation by a Spectral Method. Water Research 18 (5): 1379.
  65. Ravenne, C. and Beucher, H. 1988. Recent Development in Description of Sedimentary Bodies in a Fluvio Deltaic Reservoir and Their 3D Conditional Simulations. Presented at the SPE Annual Technical Conference and Exhibition, Houston, Texas, 2-5 October 1988. SPE-18310-MS.
  66. Mathieu, G. et al. 1993. Reservoir Heterogeneity in Fluviatile Keuper Facies: A Subsurface and Outcrop Study. Subsurface Reservoir Characterization from Outcrop Observations, ed. R. Eschard and B. Doligez, 145-160. Paris, France: Technip Publication.
  67. Matheron, G., Beucher, H., de Fouquet, C. et al. 1987. Conditional Simulation of the Geometry of Fluvio-Deltaic Reservoirs. Presented at the SPE Annual Technical Conference and Exhibition, Dallas, Texas, 27–30 September. SPE-16753-MS.
  68. Dubrule. O. 1989. A Review of Stochastic Models for Petroleum Reservoir. Geostatistics, ed. M. Armstrong, 493-506. Amsterdam, The Netherlands: Kluwer Publishers.
  69. Wang, J. and MacDonald, A.C. 1997. Modeling Channel Architecture in a Densely Drilled Oilfield in East China. Presented at the SPE Annual Technical Conference and Exhibition, San Antonio, Texas, 5-8 October 1997. SPE-38678-MS.
  70. Goovaerts, P. 1999. Impact of the Simulation Algorithm, Magnitude of Ergodic Fluctuations and Number of Realizations on the Space of Uncertainty of Flow Properties. Stochastic Environmental Research and Risk Assessment 13 (3): 161.
  71. Goovaerts, P. 2006. Geostatistical Modeling of the Spaces of Local, Spatial, and Response Uncertainty for Continuous Petrophysical Properties. Stochastic Modeling and Geostatistics: Principles, Methods, and Case Studies, Volume II , ed. T.C. Coburn, J.M. Yarus and R.L. Chambers. Tulsa, Oklahoma: AAPG Computer Applications in Geology, AAPG.
  72. Journel, A.G. and Ying, Z. 2001. The Theoretical Links Between Sequential Gaussian, Gaussian Truncated Simulation, and Probability Field Simulation. Mathematical Geology 33: 31.
  73. Wingle, W.L. and Poeter, E.P. 1993. Uncertainty Associated with Semi Variograms Used for Site Simulation. Ground Water 31: 725.
  74. 74.0 74.1 74.2 74.3 Srivastava, R.M. 1995. The Visualization of Spatial Uncertainty. Stochastic Modeling and Geostatistics, ed. J.M. Yarus and R.L. Chambers, Ch. 3, 339-346. Tulsa, Oklahoma: AAPG Computer Applications in Geology, AAPG.
  75. Froidevaux, R. 1992. Probability Field Simulation. Geostatistics Troia 1992 ed. A. Soares. Proc., Fourth Geostatistics Congress, Kluwer Academic Publishers, Dordrecht, The Netherlands, 73.
  76. 76.0 76.1 Srivastava, R.M. 1992. Reservoir Characterization With Probability Field Simulation. Presented at the SPE Annual Technical Conference and Exhibition, Washington, DC, 4–7 October. SPE-24753-MS.
  77. Journel, A.G. 1995. Geology and Reservoir Geology. Stochastic Modeling and Geostatistics, ed. J.M. Yarus and R.L. Chambers, Ch. 3, 19-20. Tulsa, Oklahoma: AAPG Computer Applications in Geology, AAPG.
  78. Samson, P., Casaux, J.-M., Cavailles, B. et al. 1998. 3D Modeling form Seismic to Fluid Flow Simulation: A Case Study. Presented at the SPE Annual Technical Conference and Exhibition, New Orleans, Louisiana, 27-30 September 1998. SPE-49293-MS.

Additional Reading

Abry, C.G. 1975. Geostatistical Model for Predicting Oil: Tatum Basin, New Mexico. AAPG Bulletin 59: 2111.

Armstrong, M. et al. 2003. Plurigaussian Simulation in the Geosciences. New York City: Springer-Verlag.

Chilès, J.-P. and Delfiner, P. 1999. Geostatistics: Modeling Spatial Uncertainty New York City: John Wiley & Sons.

Clark, I. 1977. Practical Kriging in Three Dimensions. Computers & Geosciences 3 (1): 173.

Cressie, N.A.C. 1993. Statistics for Spatial Data. New York City: John Wiley & Sons.

Datta-Gupta, A., Lake, L.W., and Pope, G.A.: "Characterizing Heterogeneous Permeable Media with Spatial Statistics and Tracer Data Using Sequential Simulated Annealing," Mathematical Geology (1995) 27, 763.

Delfiner, P., Delhomme, J.P., and Pelissier-Combescure, J. 1983. Application of Geostatistical Analysis to the Evaluation of Petroleum Reservoirs With Well Logs. Presented at the SPWLA 24th Annual Logging Symposium, Calgary, Canada, July, 37.

Deutsch, C.V. and Srinivasan, S. 1996. Improved Reservoir Management Through Ranking Stochastic Reservoir Models. Proc., SPE/DOE Tenth Symposium on Improved Oil Recovery, Tulsa, 2, 105.

Dubrule, O. and Haldorsen, H.H. 1986. Geostatistics for Permeability Estimation. Reservoir Characterization, ed. L.W. Lake and H.B. Carroll Jr., 223-247. San Diego, California: Academic Press, Inc.

Goovaerts, P. 1997. Geostatistics for Natural Resources Evaluation, 496. Oxford, UK: Oxford University Press.

Guerillot, D.R. and Morelon, I.F. 1992. Sorting Equiprobable Geostatistical Images by Simplified Flow Calculations. Presented at the SPE Annual Technical Conference and Exhibition, Washington, D.C., 4-7 October 1992. SPE-24891-MS.

Haas, A. and Noetinger, B. 1999. Stochastic Reservoir Modelling Constrained by Well Test Permeabilities. Geostatistics Wollongong 1996, 501-511, ed. E.Y. Baafi and N.A. Schofield. Dordrecht: The Netherlands: Kluwer Academic Publishers.

Hird, K.B. and Kelkar, M.G. 1994. Conditional Simulation Method for Reservoir Description Using Spatial and Well-Performance Constraints. SPE Res Eng 9 (2): 145-152. SPE-24750-PA.

Jensen, J.L., Lake, L.W., Corbett, P.W.M., and Goggin, D.J. 2000. Statistics for Petroleum Engineers and Geoscientists, second edition. New York City: Elsevier Science.

Jensen, J.L. et al. 1996. Permeability Semivariograms, Geological Structure, and Flow Performance. Mathematical Geology 28: 419.

Journel, A.G. 1983. Nonparametric Estimation of Spatial Distributions. Mathematical Geology 15: 445.

Journel, A.G. 1990. Geostatistics for Reservoir Characterization. Presented at the SPE Annual Technical Conference and Exhibition, New Orleans, Louisiana, 23-26 September 1990. SPE-20750-MS.

Krige, D.G. 1984. Geostatistics and the Definition of Uncertainty. Inst. Min. Met. Trans. 93: Sect. A, A41.

Lake, L.W., Scott, A.J., and Kocurek, G.A. 1986. Reservoir Characterization for Numerical Simulation. Final Report, DOE/BC/10744-8, Bartlesville Project Office. Washington, DC: US DOE.

Lake, L.W., Carroll, H.B., and Wesson, T. 1991. Second International Reservoir Characterization Conference. Reservoir Characterization II, ed. L.W. Lake, H.B. Carroll Jr., and T.C. Wesson, 478-491. San Diego, California: Academic Press.

Matheron, G. 1971. The Theory of Regionalized Variables and Its Application, 5. Fountainebleau, France: Les Cahiers du Centre de Morphologie Mathématique de Fontainebleau, Fountainebleau, France.

Meyers, D.E. 1983. Estimation of Linear Combinations and Co-Kriging. Mathematical Geology 15 (5): 633.

Meyers, D.E. 1984. Co-Kriging: New Developments. Geostatistics for Natural Resources Characterization, Part 1, ed. G. Verly et al., 295-305. Dordrecht, The Netherlands: Reidel, Dordrecht, The Netherlands.

Olea, R.A. 1977. Measuring Spatial Dependence with Semivariograms. Series on Spatial Analysis, 3. Lawrence, Kansas: Kansas Geological Survey.

Olea, R.A. 1999. Geostatistics for Engineers and Earth Scientists Norwell, Massachusetts: Kluwer Academic Publishers.

Oliver, D.S. 1996. On Conditional Simulation to Inaccurate Data. Mathematical Geology 28: 811.

Seifert, D. and Jensen, J.L. 2000. Object and Pixel-Based Reservoir Modeling of a Braided Fluvial Reservoir. Mathematical Geology 32: 581.

Srivastava, R.M. 1990. An Application of Geostatistical Methods for Risk Analysis in Reservoir Management. Presented at the SPE Annual Technical Conference and Exhibition, New Orleans, Louisiana, 23-26 September 1990. SPE-20608-MS.

Thiele, M.R., Rao, S.E., and Blunt, M.J. 1996. Quantifying Uncertainty in Reservoir Performance Using Streamtubes. Mathematical Geology 28: 843.

Yarus, J.M. and Chambers, R.L. 1995. Stochastic Modeling and Geostatistics: Principles, Methods, and Case Studies. Tulsa, Oklahoma: AAPG.


This list of geostatistical terminology selected represents only the most commonly encountered terms. For a more thorough listing of geostatistical terminology, consult Olea's Geostatistical Glossary and Multilingual Dictionary22 Because we have defined the terms as we use them while teaching geostatistics, our definitions may be slightly different from Olea's.

Admissibility(of semivariogram models) - For a given covariance model, the condition in which the kriging variance is ≥0. Also known as positive definite. A semivariogram model is said to be admissible if it does not generate negative variances under any conditions.

Anisotropy - Covariance models that have major and minor ranges of different distances (correlation scale or lengths). There are two types of anisotropy: geometric anisotropic covariance models have the same sill but different ranges, whereas zonal anisotropic covariance models have the same correlation ranges, but different sills.

Autocorrelation - Computation of a spatial covariance model for regionalized variable, measuring a change in variance (variogram) or correlation (correlogram) with distance and/or azimuth.

Biased estimates - Estimates for which there is a correlation between the standardized errors and the estimated values (see Cross-validation ) or for which a histogram of the standardized errors is skewed. Either condition suggests a bias in the estimates, so that one area of the map may always show estimates that are higher (or lower) than expected.

Block kriging - Making a kriging estimate over an area. For example, to estimate the average value at a grid cell, the grid cell is discretized into subcells, a kriging estimate is made for each subcell, and then these are averaged together to produce a single value. This final value is placed at the original grid node.

Cokriging - The process of estimating a regionalized variable from two or more variables by using a linear combination of weights obtained from models of spatial autocorrelation and cross-correlation. Cokriging is the multivariate version of kriging.

Conditional simulation - A geostatistical method to create multiple equally probable images of a regionalized variable on the basis of a spatial model. It is conditional only when the actual control data are honored. Conditional simulation is a variation of conventional kriging or cokriging. By relaxing some of the kriging constraints (e.g., minimized square error), conditional simulation can reproduce the variance of the control data. The final "map" captures the heterogeneity and connectivity that most likely is present in the reservoir. Post-processing conditional simulation produces a measure of error (standard deviation) and other measures of uncertainty, such as isoprobability and uncertainty maps.

Correlogram - A measure of spatial dependence (correlation) of a regionalized variable over some distance. The correlogram also can be calculated with an azimuthal preference.

Covariance - The sill minus the variogram model (or zero minus the correlogram). The kriging system uses covariance, rather than the variogram or correlogram values, to determine the kriging weights λ.

Co-regionalization - The mutual spatial behavior between two or more regionalized variables.

Cross-correlation - The computation of a spatial cross-covariance model between two regionalized variables. This provides a measure of spatial correlation between the two variables.

Cross-validation - A procedure to check the compatibility between a data set and its spatial model and neighborhood design, as well as to check for biased estimates caused by poor model and/or neighborhood design.

Drift - For data that contain a trend, a short-scale trend at the size of the neighborhood (i.e., the mean changes regularly at the neighborhood scale).

Estimation variance - The kriging variance at each grid node. This is a measure of global reliability, not a local estimation of error.

External drift - A geostatistical linear-regression technique that uses a secondary regionalized variable (e.g., a seismic attribute) to control the shape of the final map created by kriging or simulation. External drift uses a spatial model of covariance.

Geostatistics - the statistics of spatially (or temporally) correlated data.

h-scatterplot - A bivariate plot on which zi and zi+h, the pairs for the value for separation distance h , are plotted as the two axes. The shape and tightness of the cloud of points is related to the value of the variogram for h.

Histogram - A plot that shows the frequency or number of occurrences of data (y-axis) in size classes of equal width (x-axis).

Isoprobability map - A map created by post-processing conditional simulations that shows the value of the regionalized variable at a constant probability threshold, e.g., at the 10th, 50th (median), or 90th percentile. Isoprobability maps provide a level of confidence in the mapped results.

Kriging - A method for calculating estimates of a regionalized variable that uses a linear combination of weights obtained from a model of spatial correlation. Kriging is the univariate version of cokriging.

Kriging variance - See Estimation variance.

Lag - A distance parameter used during computation of the experimental covariance model. The lag distance typically has a tolerance of plus or minus one-half the initial lag distance.

Moving neighborhood - A search neighborhood designed to use only a portion of the control data point during kriging or conditional simulation.

Nested variogram model - A linear combination of two or more variogram (correlogram) models (e.g., a short-range exponential model combined with a longer-range spherical model).

Nonconditional simulation - A simulation method that does not use the control data during the simulation process and that frequently is used to observe the behavior of a spatial model and neighborhood design.

Nugget effect - A feature of the covariance model by which the experimental points that define the model do not appear to intersect at the origin. The nugget model shows constant variance at all ranges, but often is modeled as zero-variance at the control point (well location).

Ordinary cokriging - The formal term for the kriging method discussed in this chapter, in which the local mean varies and is re-estimated on the basis of the control points in the search neighborhood ellipse (moving neighborhood).

Outliers - Statistically, data points that fall outside approximately ±2.5 standard deviation of the mean value of the sample population. Outliers can be the result of bad data values or local anomalies.

Point kriging - Making a kriging estimate at a specific point (e.g., at a grid node or a well location).

Positive definite - See Admissibility (of semivariogram models).

Random function - A function that describes the spatial behavior of a regionalized variable. The random function has two components: (1) a regional structure manifesting some degree of spatial autocorrelation and lack of independence in the proximal values of zi; and (2) a local, random component.

Random variable - A variable created by a random process whose values follow a probability distribution, such as a normal distribution.

Range - The distance at which the variogram reaches the sill or the correlogram reaches zero correlation. Also known as the correlation range or the correlation scale.

Realizations - The products generated from conditional simulation. Realizations are equally probable, and they preserve the variance expressed by the regionalized variable. The sum of many realizations approaches the kriged solution.

Regionalized variable - A variable that has some degree of spatial autocorrelation and lack of independence in the proximal values of zi .

Risk map - See Uncertainty map.

Scenarios - Models that represent the first or second order in modeling assumptions. First-order changes represent changes to the input, data, structural model, or stratigraphic model. Second-order changes represent changes to the modeling parameters, such as the variogram sill, range, or random function.

Simple kriging - A kriging method in which the global mean is constant over the entire area of interpolation and is based on all the control points used in a unique neighborhood (or in which the mean is supplied by the user).

Semivariogram - A measure of spatial dependence (dissimilarity or increasing variability) of a regionalized variable over some distance. The semivariogram also can be calculated with an azimuthal preference. The semivariogram commonly is called the variogram. See also Correlogram .

Sill - The level of variance where the variogram reaches its correlation range. The variance of the sample population is the theoretical sill of the variogram.

Stationarity - Most simply put, the condition in which the data do not exhibit a trend. This implies that a moving-window average shows homogeneity in the mean and variance over the study area.

Stochastic modeling - A term used interchangeably with Conditional simulation , although not all stochastic-modeling applications use control data.

Unique neighborhood - A neighborhood search ellipse that uses all available data control points. The practical limit is 100 control points. A unique neighborhood is used with simple kriging.

Uncertainty map - A map created by post-processing conditional simulations. A threshold value is selected (e.g., 8% porosity), and the uncertainty map shows at each grid node the probability that porosity is either above or below the chosen threshold.

Variogram - See Semivariogram.

Weights - Values determined during an interpolation or simulation that are multiplied by the control data points in the determination of the final estimated or simulated value at a grid node. For geostatistical applications, the weights λ must sum to unity for a condition of unbiasedness.

SI Metric Conversion Factors

in. × 2.54* E + 00 = Cm
M × 3.048* E – 01 = ft


Conversion factor is exact.