This is an open access article under the terms of the

Additional Supporting Information may be found in the online version of this article.

Supporting Information

Supporting Information

We have taken advantage of the release of version 2 of the Global Data Analysis Project data product (Olsen et al. ) to refine the locally interpolated alkalinity regression (LIAR) code for global estimation of total titration alkalinity of seawater (*A*_{T}), and to extend the method to also produce estimates of nitrate (N) and in situ pH (total scale). The updated MATLAB software and methods are distributed as Supporting Information for this article and referred to as LIAR version 2 (LIARv2), locally interpolated nitrate regression (LINR), and locally interpolated pH regression (LIPHR). Collectively they are referred to as locally interpolated regressions (LIRs). Relative to LIARv1, LIARv2 has an 18% lower average *A*_{T} estimate root mean squared error (RMSE), improved uncertainty estimates, and fewer regions in which the method has little or no available training data. LIARv2, LINR, and LIPHR produce estimates globally with skill that is comparable to or better than regional alternatives used in their respective regions. LIPHR pH estimates have an optional adjustment to account for ongoing ocean acidification. We have used the improved uncertainty estimates to develop LIR functionality that selects the lowest‐uncertainty estimate from among possible estimates. Current and future versions of LIR software will be available on GitHub at

The locally interpolated alkalinity regression (LIAR) method and software was developed to estimate *A*_{T} globally from other measurable seawater properties (Carter et al. 2016*b*). The original application for the method was providing *A*_{T} estimates as a second carbonate parameter for use with data from the emerging network of biogeochemical floats that measure pH (Johnson and Claustre ; Johnson et al. ; Wanninkhof et al. ). However, LIAR may also prove useful for studies or models interested in estimating a climatological *A*_{T} baseline with limited variability or deviations from such a baseline (e.g., Carter et al. 2016*a*).

Locally interpolated nitrate regression (LINR) and locally interpolated pH regression (LIPHR) are primarily intended to provide cross‐comparisons for nitrate (N) and pH sensor measurements that can be used to assess potential float sensor errors or measurement drifts. Profiling biogeochemical floats cannot typically be retrieved for sensor recalibration, so it is important to have independent means to assess such problems that may arise during or after float deployment. A common approach to this problem is to use known atmospheric, surface, or climatological concentrations (Takeshita et al. ; Bushinsky et al. ; Plant et al. ) to recalibrate sensors, but such known values are not always available for N and pH. LINR and LIPHR are designed to provide estimated values in the stable 1000–2000 m depth range of the ocean as alternatives. All three locally interpolated regressions (LIRs) have secondary scientific applications when *A*_{T}, N, or pH estimates are desirable and some seawater property information is available.

By default, LIRs have the limitation that they are unable to capture changes in the relationships between the estimated properties and the predictor properties. An example of such an unresolved change comes from the influence of ocean acidification (OA), the effect of continually increasing ocean storage of anthropogenic carbon dioxide (CO_{2}) on seawater pH. LIPHR contains an option to adjust for the effects of OA on pH, but we expect OA induced pH changes to result in LIPHR estimates becoming less skillful over time even when this adjustment is used because the adjustment does not account for regional or temporal variations in the rate of OA. All three LIRs are expected to be most skillful at reproducing measurements below the ocean surface where the effects of OA and other changes are smaller, or for estimates made close in time and space to the measurements used to train the LIRs. Another limitation of these algorithms is that they break down any time relationships between predictors and the estimated properties become significantly nonlinear. An example of a region where estimate skill would be expected to be diminished by this limitation would be on the margins of O_{2} deficient zones where the influences of both denitrification and aerobic respiration can be important.

Regressions for estimating pH, N, and *A*_{T} have been reported numerous times. *A*_{T} regressions are the most common variant (e.g., Millero et al. ; Lee et al. ; Alin et al. ; Bostock et al. ; Sasse et al. ; Velo et al. ; McNeil and Sasse ) with regressions for pH being less frequently reported (e.g., Juranek et al. ; Alin et al. ; Williams et al. ) and nitrate regressions being even less frequently reported still (e.g., Williams et al. ; Supporting Information). The LIRs presented here make improvements over earlier versions with respect to global applicability, ease of use, and the ability to scale uncertainty estimates based on input uncertainties. Critically, they also produce estimates that reproduce pH measurements at least as skillfully as earlier versions. The bulk of the improvement results from the larger quantity and span of data available through the Global Data Analysis Project version 2 (GLODAPv2) data product (Olsen et al. ) than was available to train earlier methods. A similar method to the LIRs developed recently is the “carbonate system and nutrients concentration from hydrological properties and oxygen using a neural‐network” (CANYON) approach (Sauzède et al. ). CANYON was also trained using the GLODAPv2 data product and is capable of estimating pH, *A*_{T}, silicate (Si), N, total dissolved inorganic carbon (*C*_{T}), and *p*CO_{2} globally from O_{2}, temperature, salinity (*S*), latitude, longitude, depth, and day of year. We expect the LIRs we propose here will provide complementary estimates to those provided by CANYON for most applications, and note that the LIRs presented here do not require O_{2} and temperature as measurement inputs.

In the remainder of this article, we describe version 2 of the LIAR software (LIARv2) in the context of the improvements relative to version 1 (LIARv1: Carter et al. 2016*b*), and extend the LIR approach to nitrate and in situ total scale seawater pH estimates with LINR and LIPHR. Particular attention is paid to new procedures required to address complications with extending the LIR framework to pH measurements.

As with LIARv1, the LIR methods developed here use regression coefficients that are determined at each location on a 5° latitude and longitude grid with 33 depth surfaces (44,957 total locations). Each set of regression coefficients is determined using a robust multiple linear regression (MLR) of the subset of measurements from the global training dataset that are found within a volume defined by latitude, longitude, and depth/density windows of the grid coordinates (the same grid used by Carter et al. 2016*b*). The windows used are 5° for latitude, (^{−3} for potential density or 50 m for depth (whichever is more inclusive). The dimensions of these windows are iteratively scaled by a factor of the iteration number until at least 100 measurements are selected to train each regression. When generating estimates, the LIAR software then interpolates between regression coefficients specific to these grid locations to arbitrary locations where the user desires regression estimates. LIARv2 works with 16 different combinations of the predictor variables: salinity *S*, potential temperature *θ*, nitrate N, apparent oxygen utilization (AOU), and silicate (Si). LINR uses the same combinations as LIAR with phosphate P in place of N in the eight regressions that included N. LIPHR uses the same predictors as LIAR, but also includes depth (*z*) in meters as a predictor. This additional predictor is intended to allow for the effects of pressure on in situ pH. The specific combinations of variables used are indicated in “LIARv2,” “LIPHR,” and “LINR” sections. A full description of the LIARv1 method is provided by Carter et al. (2016*b*). In this update, we focus on how LIARv2, LIPHR, and LINR adapt and improve upon the LIARv1 methods.

In some instances where spectrophotometric pH measurements are unavailable, we use in situ total scale pH as calculated from *A*_{T} and *C*_{T}. These calculations were made with carbonate constants from Lueker et al. (), borate dissociation coefficients from Dickson (), total borate from Lee et al. (), and HF dissociation constant (KF) from Perez and Fraga (). Calculations are performed using the CO2SYS for MATLAB routine by van Hueven et al. ().

The primary improvement in LIARv2 relative to LIARv1 stems from regression coefficients having been re‐estimated using the GLODAPv2 data product. All measured and calculated values in GLODAPv2 were used except those from 161 cruises (40,303 measurements) that had *A*_{T} quality control (QC) adjustments of ± 10 *μ*mol kg^{−1} or greater, were flagged as poor data, or were not quality controlled for *A*_{T} (Olsen et al. ). The new training data set is comprised of 236,852 *A*_{T} measurements and *A*_{T} estimates from CO_{2}‐calculations based on other CO_{2} parameters, 211,704 of which had the property measurements required for training all 16 regressions (Fig. ). The LIAR test data set omits the 2279 calculated *A*_{T} values that are included in the training data set. We use the coefficient re‐estimation strategy used by Carter et al. (2016*b*) to allow overlap between our training and test data sets without compromising the validity of the assessments (described in “Assessment” section).

LINR regression coefficients were estimated using 684,475 N measurements, 569,761 of which had associated property measurements required for training all 16 regressions. This training dataset is all GLODAPv2 data product N measurements excepting those from 187 cruises that had multiplicative adjustments greater than 10%, that were not QC'd, or that were flagged as having poor quality measurements. GLODAPv2 QC protocols changed reported negative N values to 0 *μ*mol kg^{−1}. The LINR code does likewise. The LINR test data set is identical to the training data set.

There are several additional difficulties for constructing a consistent data product for training LIPHR that originate from changes in ocean pH and in pH measurement practices over time. Dealing with these inconsistencies requires understanding several adjustments that we and others (Olsen et al. ) have made to pH measurements and estimates. We list these adjustments here and explain them in this section and the next.*A*_{T}, and *C*_{T} measurements based on deep crossovers (Olsen et al. ). We do not use these adjustments for pH, though we do use them for *A*_{T} and *C*_{T}.*A*_{T} and *C*_{T}. They are detailed below.*A*_{T} and *C*_{T}. This adjustment is also detailed below.

The primary additional difficulty for pH stems from the variety of ways pH is measured or calculated, as well as the evolution of accepted best practices for pH measurement over the decades for which GLODAPv2 contains data. GLODAPv2 contains a mixture of pH calculated from carbonate system measurements, pH measured using electrodes, and pH measured spectrophotometrically. Also, although the spectrophotometric pH method has been used since the early 1990s, Yao et al. () revealed that impurities in the indicator dye used can significantly bias spectrophotometric pH measurements, and Liu et al. () subsequently published calibration equations that allow seawater pH measurements to be made using purified m‐cresol purple dye. Others (Carter et al. ; Patsavas et al. ; Williams et al. ) have since shown that measurements with purified dyes appear to have an (unexplained) broadly consistent‐but‐pH‐dependent discrepancy from the pH calculated from combinations of *A*_{T}, *C*_{T}, and *p*CO_{2} whether calculated at in situ or laboratory conditions (Fig. c). This pH dependent discrepancy is not unique to a single pH sample handling approach, as it exists for both manual and automated pH measurements. It exists also for multiple carbonate constant sets (Carter et al. ). It exists for multiple characterizations of the properties of purified dyes: there is a small pH‐dependent discrepancy between spectrophotometric pH obtained from various sets of purified dye coefficients (Liu et al. ; DeGrandpre et al. ), but the discrepancy (ranging from ∼ 0.006 at a pH of 8.2 to ∼ 0.002 at a pH of 7.4) is too small to account for the differences between calculated pH and pH measured with purified dyes. The pH‐dependent pH discrepancy is less apparent for electrode pH measurements (Fig. a) and impure dye measurements (Fig. b) considered collectively across many cruises. However, there are many strongly differing discrepancy relationships visible when impure dye measurements are considered on a cruise‐by‐cruise basis (*see* Supporting Information Figures), with some discrepancies increasing and some decreasing with pH. It should be noted that Fig. c includes no measurements from the subset of research groups that produced impure dye measurements showing a relationship between the pH discrepancy and pH with a negative slope.

A second complication arises in the GLODAPv2 data product QC process. This data product relies on deep crossovers to obtain measurement adjustments intended to bring measurements from various cruises in line with one another. However, the variety of pH‐dependent pH discrepancies found in various cruises casts doubt on the comparability of deep‐ocean pH measured on different cruises. Adjustments based on forcing an agreement at depth between pH distributions obtained with different approaches could therefore create, exacerbate, or inadequately capture discrepancies at the surface.

Our approach to these challenges is to first divide the data into three subsets and then apply linear adjustments to the first two subsets to make them comparable to the third. The first subset is the earlier measurements made with impure dyes. The second subset is pH calculated from *A*_{T} and *C*_{T}. These two subsets collectively comprise the majority of the GLODAPv2 data product. The third subset is the subset of the GLODAPv2 data product where pH was measured with purified dyes. We augment the purified dye subset with 11 cruises conducted too recently to appear in the GLODAPv2 data product (Expocodes: 096U20160108, 096U20160426, 29HE20130320, 318M20130321, 320620140320, 320620151206, 33AT20120324, 33RO20150410, 33RO20150525, and 33RO20161119). We further add data from two recent cruises measured with impure dye to the impure‐dye subset (33RO20130803, 33RO20131223). Data from one additional recent cruise using purified dyes along the I09N transect (33RR20160208) is withheld from the pH training data set entirely to provide a completely independent assessment (“Example section”). Linear pH‐dependent adjustments (

After applying

Our use of the purified‐dye adjustment (adjustment 3) reflects our need for a consistent training data product and not any confidence that purified dye measurements are necessarily more accurate representations of the “true” seawater pH than pH calculations. The apparent pH‐dependent pH discrepancy remains an unresolved challenge to our carbonate system knowledge. Our strategy is to allow LIPHR users to decide whether pH estimates specific to purified dye measurements or pH calculations with Lueker et al. ()'s carbonate chemistry coefficients are more appropriate for their own applications. LIPHR therefore includes an optional counter‐adjustment for adjustment 3 (*A*_{T} and *C*_{T}. Broadly, we recommend the default “purified dye estimates” without this counter‐adjustment when pH is the parameter of interest, and “calculation‐pH estimates” with this adjustment when LIPHR estimates are being used as one of two constraints to estimate another carbonate system parameter. Whichever is used, the user should be aware of this mismatch in our understanding of carbonate system chemistry.

In total, the LIPHR training data set consists of 51,325 impure‐dye measurements (adjusted with *C*_{T}) or total seawater titration alkalinity (*A*_{T}) GLODAPv2 adjustments greater than ± 10 *μ*mol kg^{−1}, or that were flagged as having poor quality pH measurements. When viable pH measurements and calculations were both available for a sample, only the pH measurements were included. We also omitted data from seven cruises (Expocodes: 49K619990523, 49HG19950414, 49HG19940413, 49HG19930807, 49HG19930413, 33RR19971202, 318M19940327) either because they came from series of cruises with large and variable GLODAPv2 adjustments or because the calculated and measured pH values did not agree with a ± 0.03 or less root mean squared (RMS) or ± 0.015 average difference. A full list of cruises and how they were classified is provided in Supporting Information.

Johnson et al. () find that recent profiling float sensor pH measurements are significantly lower than most nearby pH stations in the GLODAPv2 record, and that these disagreements are largest in the better‐ventilated surface ocean. LIPHR includes an optional adjustment (on by default) to reflect these expected effects of OA on modern and future seawater pH (adjustment 4). For this adjustment, the rate of pH change (

This is a regression between the reconstruction error (^{th} day of 2020 would be represented by ∼ 2020.55). This regression has been performed for the reconstructions of 10 subsets of the GLODAPv2 data product used separated by every 10^{th} percentile of potential density (*σ _{θ}*) (Fig. ). If the OA adjustment is enabled in the LIPHR code,

The OA pH change rates we find here are consistent with previous estimates (e.g., Feely et al. ). These simplistic OA adjustments may be poor estimates of the impacts of OA on seawater pH generally because they treat all water of a given density identically despite strong regional differences in the degree of water mass ventilation and C_{anth} storage. Nevertheless, we believe the optional adjustment is useful for LIPHR pH estimates made in the coming decades, and note that including the adjustment decreases mean estimate bias by 85% and RMSE by ∼ 51%. Due to the progressive effects of OA, we contend this adjustment will be yet more important for modern estimates than for our test data set. Limited experimentation suggested additional cruises would be needed to adequately constrain regional differences in this adjustment. The LIPHR code therefore contains an option for users to input

The LIRs generate uncertainty estimates for each property estimate returned. As with LIARv1, uncertainty estimates (

*E* terms refer to the RMS uncertainties as assessed in the “Assessment” section. *A*_{T}, N, and pH measurement uncertainties in our data product, and is assumed to be a constant 2.8 μmol kg^{−1} *A*_{T}, 0.3 μmol kg^{−1} N, and 0.005 pH units, respectively (Olsen et al. ). *U _{j}* are the

One difficulty with LIRs is choosing between up to 16 possible estimates. We have added (optional, on by default) functionality to all LIR routines that automatically picks the estimate with the smallest estimated uncertainty from among all estimates it is possible to generate using the suite of input predictor data provided by the user. This feature is intended in part to address a limitation of the method, being that some LIR equations have too many terms (i.e., are over‐fit) for some of the > 2 million combinations of predicted variables, predictor variables, and grid locations. Over‐fitting leads to larger‐magnitude regression coefficients due to “Variance Inflation.” Larger magnitude coefficients (*A*_{T} estimate RMSE improvement with this feature increased from 3% to 10% after simulated errors were applied to AOU (these were normally distributed offsets with a mean of 0 and a standard deviation of 5 *μ*mol kg^{−1} O_{2}).

Estimate bias and RMS errors are calculated in the same way as the error estimates provided by Carter et al. (2016*b*), except using the subsets of the GLODAPv2 data product and additional cruises specified as “test data” sets in “Data products used to train and test LIRs” section. These values are presented as “bias (± RMSE).” The bias is the mean residual for the assessment and can be positive or negative. LIR bias estimates are small compared to RMSE at the global level, suggesting the LIR estimates are appropriately centered on the measured values. However, bias grows (in an absolute sense) as the number of measurements averaged decreases, so the bias estimates are presented alongside RMSE as potentially useful indicators of how correlated LIR errors are for various regions. Bias estimates are also useful when comparing assessments from various algorithms. In particular, lower biases for LIPHR than for other pH algorithms highlight the importance of the OA adjustment and the dye‐impurity‐related adjustments applied to the training data set. An important feature of the error estimation method used is that a separate set of regression coefficients is estimated for each data point in our test data sets, and is estimated without using any data from the cruise that produced that particular test pH value. Data from the same cruise is omitted to avoid under‐estimating error by including numerous measurements in the training dataset found proximally in time and space to the test measurement.

The updates to LIAR decreased the overall reconstruction errors (*A*_{T} regressions in literature (many are compared in Carter et al. 2016*a,b*) and Table shows LIARv2 does somewhat better still. CANYON *A*_{T} estimates reproduce our entire test dataset with errors of −0.2 (± 5.4) *μ*mol kg^{−1} while LIARv2 (Regression 7) has errors of −0.1 (± 5.1) *μ*mol kg^{−1}. These errors are slightly smaller at −0.5 (± 5.2) *μ*mol kg^{−1} for CANYON and 0.2 (± 4.4) *μ*mol kg^{−1} for LIARv2 when limited to the open ocean test regions used by Sauzède et al. ().

Interestingly, regression 3 (*S*, *θ*, AOU, and Si) slightly outperforms regression 1 (*S*, *θ*, N, AOU, and Si) on average, and there is little difference between the error estimates for the various equations for *A*_{T}. This suggests that regression 1 and possibly others are over‐fitting *A*_{T} in places (this observation does not hold true if we include the test data in the training data). See “Minimum uncertainty estimates” section for how the LIR minimum‐uncertainty functionality automatically avoids using over‐fit relationships despite this.

LIPHR pH estimates reconstruct the test pH data set well (Table ; Fig. ). We separately estimate error between 1000 m and 2000 m as these estimates are more likely to be used to compare with float data (Table ).

LIPHR estimates compare well to the few published pH regression estimates. Williams et al. () designed regression estimates for south of 45°S between 2006 and 2017 and between 0 m and 2100 m depth. For the subset of our data product within these bounds and omitting their S04P and P16S training cruises, their published regressions have errors of −0.006 (± 0.017) and −0.006 (± 0.016), while similar LIPHR regressions (6 and 7, respectively) have errors of −0.001 (± 0.010) and −0.001 (± 0.011). Williams et al. () also report a regression for estimates in the same region but trained specifically for estimates between 1000 m and 2100 m depth, the depth range most useful for assessment of biogeochemical profiling float sensor performance. For the relevant subset of our test data product, their algorithm has errors of −0.001 (± 0.005), while the LIPHR regression 7 has errors of 0.002 (± 0.005). LIPHR (also regression 7) estimates have errors of 0.004 (± 0.014) in the California Current Ecosystem specific window of 114°N to 124°W, 27°N to 36°N and 15–500 m depth after 1994 where the algorithm from Alin et al. () uses temperature and O_{2} measurements to generate estimates with errors of −0.008 (± 0.015). CANYON pH estimates reproduce our entire test dataset with errors of 0.009 (± 0.017) while LIPHR (Regression 7) has errors of 0.000 (± 0.010). At mid depths (1000–2000 m), these estimates are 0.013 (± 0.017) for CANYON and 0.000 (± 0.006) for LIPHR. The CANYON error estimates are the same at this precision when the GLODAPv2 adjustments are retained.

LINR estimates also reproduce the test data product well (Table ; Fig. ). Williams et al. () provide an N estimation algorithm specific to the Pacific sector of the Southern Ocean south of 45°S between 1000 m and 2100 m. This algorithm has errors of 0.42 (± 0.65) *μ*mol kg^{−1} for the portion of our data product in the target region for this regression. LINR (Regression 7) has an error of −0.11 (± 0.45) *μ*mol kg^{−1} for this same subset. CANYON nitrate estimates reproduce our entire test dataset with errors of −0.01 (± 0.89) *μ*mol kg^{−1} while LINR (Regression 7) has errors of −0.02 (± 0.86) *μ*mol kg^{−1}. These errors are slightly smaller at 0.03 (± 0.66) *μ*mol kg^{−1} for CANYON and −0.02 (± 0.65) *μ*mol kg^{−1} for LINR when limited to the open ocean test regions used by Sauzède et al. ().

With the changes to the error estimation strategy noted in “Update to uncertainty estimation” section, the overall standard error estimates provided by the software are now greater than or equal to the test data set reconstruction error for 76% of the data product for LIARv2, for 75% for LIPHR, and for 80% for LINR. For perfectly estimated normally distributed RMS uncertainties, this number would be 68%. This was true for 87% of the data product with LIARv1.

Example LIAR, LIPHR, and LINR estimates are derived from hydrographic measurements from the 2016 occupations of the I09 section in the Indian Ocean by the Global Ocean Ship Based Hydrographic Investigations Program (GO‐SHIP) program (Fig. ). These estimates provide an independent validation when compared to the measurements made along the cruise because the data from these cruises were not included in either the test or training datasets for the LIRs. The LIRs do an excellent job of reproducing the measurements with errors of −0.6 (± 4.2) *μ*mol kg^{−1} for *A*_{T}, 0.001 (± 0.008) for pH, and 0.14 (± 0.32) *μ*mol kg^{−1} for N. LIPHR errors increase to −0.014 (± 0.017) when the OA adjustment is omitted.

Climatological distributions of carbonate parameters from LIAR *A*_{T} and LIPHR pH—or calculated from this pair of properties—may be of interest and would be simply calculated for the measurement‐dense World Ocean Atlas climatology (Locarnini et al. ; Zweng et al. ; Baranova ) or similar products. Such a regression‐based climatology—like the *A*_{T} climatologies created by Lee et al. () and used by Takahashi et al. ()—would be one step further removed from the measurements than gridded climatologies like those provided by Lauvset et al. () and Key et al. (). However, it would have the advantage that it could be based on property measurements (such as O_{2}, *S*, and temperature) that are more numerous, more broadly spatially and temporally distributed, and less seasonally biased than the carbonate measurements.

With LIAR and LIPHR, it is now possible to estimate two parameters for the carbonate system, thus—in principle—providing a complete carbonate system description. While measurements would be preferable for most applications, this pair of algorithms allows additional context to be added to historical data products.

As Velo et al. () pointed out, regressions can be potentially powerful tools for data QC. An algorithm that uses many measured properties to estimate many other measured properties and then assesses the various residuals may provide a fast method for identifying apparent outliers and interesting anomalies in property measurement sets. Such automated measures designed to assist human‐QC efforts may be of increased importance as growing sensor networks increase the quantity of data being produced relative to the amount of human‐effort available for data QC.

The OA rate estimation strategy used (Eq. ) provides a means to incorporate a large number of measurements that are disparate in space and time into unified global trend estimates. This framework could perhaps be applied to examine the low‐signal‐to‐noise scientific questions of whether long term trends are occurring in *A*_{T} (c.f. Carter et al. 2016*a*), N, or O_{2} relative to other measured parameters.

We are grateful to Joshua Plant, Tanya Maurer, and Kenneth Johnson for beta‐testing and helping us improve the code, and to Henry Bittig for sharing the CANYON code with us. We are grateful to Marta Álvarez for help classifying cruises by measurement practices. We are grateful to Kelly Kearney for advice regarding function input parsing and accommodating new code options. The GLODAPv2 and repeat hydrography measurements used are available for download from the Carbon Climate Hydrographic Data Office (

None declared.