In parts I and II of this paper series, rigorous tests for equality of stochastic processes were proposed. These tests provide objective criteria for deciding whether two processes differ, but they provide no information about the nature of those differences. This paper develops a systematic and optimal approach to diagnosing differences between multivariate stochastic processes. Like the tests, the diagnostics are framed in terms of vector autoregressive (VAR) models, which can be viewed as a dynamical system forced by random noise. The tests depend on two statistics, one that measures dissimilarity in dynamical operators and another that measures dissimilarity in noise covariances. Under suitable assumptions, these statistics are independent and can be tested separately for significance. If a term is significant, then the linear combination of variables that maximizes that term is obtained. The resulting indices contain all relevant information about differences between data sets. These techniques are applied to diagnose how the variability of annual-mean North Atlantic sea surface temperature differs between climate models and observations. For most models, differences in both noise processes and dynamics are important. Over 40 % of the differences in noise statistics can be explained by one or two discriminant components, though these components can be model dependent. Maximizing dissimilarity in dynamical operators identifies situations in which some climate models predict large-scale anomalies with the wrong sign.

Comparing time series is a common task in climate studies, yet there are few rigorous criteria for deciding whether two time series come from the same stationary process. Such comparisons are vital in climate studies, particularly for deciding whether climate models produce realistic simulations. Part I of this study

The starting point of our approach is the objective criterion derived in Part II for deciding whether two multivariate time series come from the same process. This criterion assumes that each multivariate time series comes from a vector autoregressive (VAR) model. Part II derived the likelihood ratio test for equality of VAR parameters. The associated test statistic, called a

When applied to diagnose dissimilarity in noise processes, discriminant analysis yields a procedure that is similar to principal component analysis and other related techniques for diagnosing variability

These new analysis techniques are applied to diagnose how variability of annual-mean North Atlantic sea surface temperature (SST) differs between observations and model simulations. It is found that differences in noise processes and differences in dynamics both play a significant role in explaining the overall differences. The components that dominate these differences are illustrated for a few cases. This paper closes with a summary and discussion.

This section briefly reviews the procedure derived in Part II for deciding whether two time series come from the same stochastic process. This procedure considers only stationary, Gaussian processes. Such processes are described completely by a mean vector and time-lagged covariance matrices. Standard tests for differences in these quantities

Typically, AR parameters characterize dynamics. For instance, some stochastic models are derived by linearizing the equations of motion about a time mean flow and parameterizing the eddy–eddy nonlinear interactions by a stochastic forcing and linear damping

The model parameters can be estimated by the method of least squares, and the distributions of the estimates are asymptotically normal

Inferences based on the above regression model are identical to inferences based on the model

As discussed in Part II of this paper series, the bias-corrected likelihood ratio for testing

MLEs and unbiased estimates are asymptotically equivalent. Hence, asymptotic theory

Note that covariance matrices

If

Table summarizing the hypotheses considered in the hierarchical test procedure.

Testing significance requires knowing the sampling distributions of

However, if

Given the above restriction, the hypotheses in our problem are summarized in Table

If

Discriminant analysis begins by considering the linear combination of prewhitened variables,

When examining the leading eigenvalue

It can be shown that the variates

If

Although we have decomposed

To further clarify the decomposition Eq. (

SVD has been used in past studies of empirical dynamical models

We now apply the above procedure to diagnose dissimilarities in internal variability of annual mean North Atlantic SST (NASST;

In Part II, we applied the difference-in-VAR test to pre-industrial control runs from phase 5 of the Coupled Model Intercomparison Project

For dimension reduction, NASST fields were projected onto the leading Laplacian eigenvectors in the Atlantic basin. The number of Laplacian eigenvectors

To assess model adequacy, we performed the multivariate Ljung–Box test

For observations, we used version 5 of the extended reconstructed SST data set

The deviance statistics

First, we decompose the deviance as

Range of noise variance ratios for maximizing the noise deviance

Percent of deviance

Regression patterns between local SST and the discriminant associated with the largest model-to-observation variance ratio. Only regression patterns associated with significant maximum variance ratios are shown. Each regression pattern is scaled to have a maximum absolute coefficient of one and is then plotted using the same color scale (red and blue are opposite signs). The CMIP5 model and percent of deviance

Same as Fig.

To diagnose differences in noise covariances, we perform the discriminant analysis described in Sect.

It is worth mentioning that a significant extreme ratio is not always expected when a significant deviance is detected. Discriminant analysis is designed to compress as much deviance as possible into the fewest number of components. The extent of compression depends on the data. If little compression is possible (e.g., the deviances are spread uniformly in phase space), then the most extreme ratio may not be significant, but nevertheless the overall deviance may be significant because individual deviances accumulate. The fact that in some CMIP5 models the most extreme noise variance ratio is insignificant, despite the noise deviance being significant, simply means that the deviance cannot be compressed significantly into a single component.

The percent of deviance

To diagnose the spatial structure, we derive regression coefficients in which the observed local SST is the independent variable and

The leading difference-in-dynamics singular value for each CMIP5 model. The 5 % significance threshold under

We now turn to diagnosing differences in AR parameters. Equality of AR parameters is rejected when

The initial condition

Same as Fig.

The leading difference-in-dynamics singular value

This paper proposed techniques for optimally diagnosing differences between serially correlated, multivariate time series. The starting point is to assume each time series comes from a vector autoregressive model and then test the hypothesis that the model parameters are equal. The likelihood ratio test for this hypothesis was derived in Part II of this paper series, which leads to a deviance statistic that measures the difference in VAR models. However, merely deciding that two processes differ is not completely satisfying because it does not tell us the nature of those differences. For instance, a VAR can be viewed as a dynamical system driven by random noise, but do differences arise because of different dynamics, different noise statistics, or both? We show that the deviance between VAR models can be decomposed into two terms, one measuring differences in noise statistics and another measuring differences in dynamics (i.e., differences in AR parameters). Under suitable assumptions, the two terms are independent and have asymptotic chi-squared distributions, which leads to straightforward hypothesis testing. Then, the linear combination of variables that maximizes each deviance is obtained. One set of combinations identifies components whose noise covariances differ most strongly between the two processes. The other set of combinations effectively finds the initial conditions that maximize the difference in response from each VAR model.

The proposed method was applied to diagnose differences between annual-mean North Atlantic sea surface temperature variability between climate models and observations. The results show that differences in dynamics and differences in noise covariances both play a significant role in the overall differences. In fact, only one model was found to be consistent with observations. However, this consistency is not robust. In further analyses (not shown), we changed the half of observations used to diagnose differences and found that this model no longer was consistent with observations. In fact, no model was found to be consistent with both halves of the observational period. For half the models, the leading discriminant of noise variance explains 40 % or more of the deviance in noise covariances. For most models, the leading discriminant underpredicts the noise variance, and the associated spatial structures are highly model dependent. In contrast, models that have leading discriminants with too much noise variance tend to have similar spatial structures.

Discriminant analysis was also applied to identify differences in dynamics, or more precisely, differences in AR parameters. These discriminants can be interpreted as finding suitably normalized initial conditions that maximize the difference in response from the two VAR models. Examples from this analysis revealed dramatic errors in the 1-year predictions of some models. For instance, the VAR model trained on HadGEM2-ES gives realistic 1-year predictions at the mid to low latitudes of the North Atlantic but are much too persistent at the high latitudes. The VAR model trained on MIROC-ESM produces 1-year predictions with the wrong sign in the North Atlantic.

Although previous climate studies have diagnosed differences in variability using discriminant techniques

Diagnosing differences in dynamics leads to generalized SVD, which differs from the SVD used in previous studies to diagnose dynamics of linear inverse models

One reason for diagnosing differences in stochastic processes is to gain insight into the causes of the differences. For instance, dissimilarities in noise processes indicate differences in the one-step prediction errors of the associated VAR models. Comparison of prediction errors may give clues as to the locations or processes that dominate the errors, which may be helpful for improving a climate model. Similarly, differences in dynamics indicate differences in the memory or in the evolution of anomalies between observations and climate models. The diagnostics derived here may give clues as to the physical source of the differences, which in turn may give insight into how to improve climate models.

This Appendix proves the key results of decomposing

Because the eigenvectors are distinct and form a complete basis set,

Importantly, the above decomposition also decomposes the deviance. To see this, note that Eqs. (

In this Appendix, we prove that the matrix

Starting from the definition Eqs. (

This Appendix proves the key results of decomposing

Now, consider the Rayleigh quotient Eq. (

Comparing Eqs. (

Just as the column vectors of

In this section, we prove that the eigenvalues from Eqs. (

Similarly, substituting transformed quantities into the eigenvalue problem Eq. (

R codes for performing the statistical test described in this paper are available at

The data used in this paper are publicly available from the CMIP archive

Both authors participated in the writing and editing of the manuscript. TD performed the numerical calculations.

The contact author has declared that neither they nor their co-author has any competing interests.

The views expressed herein are those of the authors and do not necessarily reflect the views of this agency. Publisher’s note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

We thank the World Climate Research Programme’s Working Group on Coupled Modelling, which is responsible for CMIP, and we thank the climate modeling groups for producing and making available their model output. For CMIP the U.S. Department of Energy’s Program for Climate Model Diagnosis and Intercomparison provides coordinating support and led development of software infrastructure in partnership with the Global Organization for Earth System Science Portals.

This research has been supported by the National Oceanic and Atmospheric Administration (grant no. NA20OAR4310401).

This paper was edited by Christopher Paciorek and reviewed by Raquel Prado and one anonymous referee.