This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.

Iterative ensemble filters and smoothers are now commonly used for geophysical models. Some of these methods rely on a factorization of the observation likelihood function to sample from a posterior density through a set of “tempered” transitions to ensemble members. For Gaussian‐based data assimilation methods, tangent linear versions of nonlinear operators can be relinearized between iterations, thus leading to a solution that is less biased than a single‐step approach. This study adopts similar iterative strategies for a localized particle filter (PF) that relies on the estimation of moments to adjust unobserved variables based on importance weights. This approach builds off a “regularization” of the local PF, which forces weights to be more uniform through heuristic means. The regularization then leads to an adaptive tempering, which can also be combined with filter updates from parametric methods, such as ensemble Kalman filters. The role of iterations is analyzed by deriving the localized posterior probability density assumed by current local PF formulations and then examining how single‐step and tempered PFs sample from this density. From experiments performed with a low‐dimensional nonlinear system, the iterative and hybrid strategies show the largest benefits in observation‐sparse regimes, where only a few particles contain high likelihoods and prior errors are non‐Gaussian. This regime mimics specific applications in numerical weather prediction, where small ensemble sizes, unresolved model error, and highly nonlinear dynamics lead to prior uncertainty that is larger than measurement uncertainty.

This research is aimed at improving data assimilation methodology for sparsely observed nonlinear problems, such as convection and tropical cyclones in Earth's atmosphere, but it has broad appeal outside of weather prediction. The general goal of the paper is to introduce new methodology and examine its utility for low‐dimensional problems that mimic weather applications. These findings provide a transparent look at new data assimilation techniques and motivate ongoing studies that examine potential benefits for real weather applications.

**Funding information** National Oceanic and Atmospheric Administration, Grant/Award Number: NA20OAR4600281; National Science Foundation, United States, Grant/Award Number: AGS1848363

Particle filters (PFs) are sequential Monte Carlo methods that can solve data assimilation problems characterized by non‐Gaussian error distributions for prior model variables or measurements (Doucet *et al*., 2001). From a geoscience perspective, PFs contain several theoretical properties that make them attractive for research and environmental prediction. Namely, they preserve dynamical balances during data assimilation update steps; they require no special treatment for nonlinear measurement operators or non‐Gaussian errors; and they provide an elegant solution to the underlying Bayesian filtering problem. Recent efforts applying PFs for geophysical models have resulted in “localized” PFs, which approximate a given data assimilation application as a large set of loosely coupled problems that can be solved independently using relatively small ensembles—an approach long used for ensemble Kalman filters (EnKFs). These efforts resulted in a large variety of filtering and smoothing methods, including those introduced by Bengtsson *et al*. (2003), Poterjoy (2016), Penny and Miyoshi (2016), Poterjoy and Anderson (2016), Lee and Majda (2016), Robert and Künsch (2017), Chustagulprom *et al*. (2016), and Morzfeld *et al*. (2018); see Van Leeuwen *et al*. (2019) for a more exhaustive list of PF approaches designed for high‐dimensional applications. Though localization delivers a potentially transformative strategy for implementing PFs for high‐dimensional systems, this technique alone is often inadequate for real geophysical applications. For example, the conditions for PF weight collapse identified in past studies (e.g., Bengtsson *et al*., 2008; Bickel *et al*., 2008; Snyder *et al*., 2008) still hold within the neighborhood of observations for localized PFs. Therefore, filter degeneracy is still inevitable for dynamical systems that are observed by many accurate, collocated measurements, or when observation‐space prior statistics are heavily biased.

Data assimilation problems in atmospheric science that are hypothesized to benefit from PF methodology—because of strong nonlinearity in model dynamics and measurement operators—also tend to be characterized by the problems already listed herein. For example, data assimilation for convective weather regimes introduces many challenges for Gaussian‐based filters, owing to the non‐Gaussian errors needed to quantify model and measurement uncertainty and nonlinearity in the underlying system dynamics and observing systems (Posselt *et al*., 2014; Posselt, 2016). At the same time, prior and posterior statistics are often mischaracterized, owing to the large computational cost of generating many high‐resolution ensemble members or the presence of unknown error sources in physical parametrization schemes for turbulence, cloud microphysics, land‐surface processes, and so on. Data assimilation for tropical cyclones presents a similar set of challenges, except the most abundant measurements come from radiometers onboard satellites. In general, these characteristics present difficulties for numerous research and prediction problems, including many outside of weather forecasting.

This study introduces new methodology for the local PF described in Poterjoy (2016) and revised in Poterjoy *et al*. (2019), which targets the type of data assimilation problems just discussed. The new approaches rely on a factorization of particle weights to draw samples from the localized posterior density through successive iterations. It borrows from past studies that introduce iterations for EnKFs and smoothers to cope with mildly nonlinear model dynamics and measurement operators (Emerick and Reynolds, 2012; 2013). In the context of PFs, these iterations serve a different purpose. Each iteration uses a regularization coefficient on particle weights to maintain a threshold effective ensemble size. For filters that update particles to match the first two posterior moments (e.g., Feng *et al*., 2020) accurate particle updates can be achieved without higher‐order statistics. In addition to showing significant benefits over non‐iterative formulations of the local PF, the method presents a natural framework for extending the local PF into a hybrid PF–EnKF, as proposed by past studies (Frei and Künsch, 2013; Chustagulprom *et al*., 2016; Robert and Künsch, 2017; Grooms and Robinson, 2021).

This article is organized in the following manner. Section 2 discusses the mathematical framework for localized PFs, including a derivation of the posterior probability density function (pdf) assumed by the Poterjoy *et al*. (2019) local PF. Calculations of this pdf are needed to illustrate the behavior of the local PF with and without iterations. Section 3 discusses the collapse of localized PFs to provide motivation for regularized, iterative, and hybrid approaches, which are discussed in Section 4. Section 5 shows numerical experiments performed using a bivariate problem and the 40‐variable dynamical system of Lorenz (1996). The last section discusses major findings from this study.

This section introduces a subset of localized PFs, which behave similarly to the filter introduced in Poterjoy (2016). The primary goal of this section is to provide context for the new filtering methodology introduced in Section 4, which requires a clear definition of the posterior density certain localized PFs attempt to sample from. We direct readers to Farchi and Bocquet (2018) for a comprehensive review of strategies recently adopted for this purpose.

For geophysical data assimilation, the state vector representing variables for a dynamical system is treated as a random variable $$$ \mathbf{x}\in {\mathbb{R}}^{N_x} $$$ with the underlying goal of estimating various properties of the probability density $$$ p\left(\mathbf{x}\right) $$$. The time propagation of $$$ \mathbf{x} $$$ is approximated through a prediction model $$$ {\mathbf{x}}_{t+1}=M\left({\mathbf{x}}_t\right)+{\eta}_t $$$, where $$$ {\eta}_t $$$ is an additive model error. The probability density for $$$ \mathbf{x} $$$ is updated periodically to reflect new information from incomplete and noisy measurements $$$ \mathbf{y}\in {\mathbb{R}}^{N_y} $$$, where the mapping between model variables and observations is given by $$$ \mathbf{y}=H\left[\mathbf{x}\right]+\epsilon $$$. The error $$$ \epsilon $$$ is the combined error from instruments on observing systems, representativeness error, and imperfect estimates of $$$ H\left(\right) $$$ itself. When estimating the probabilistic evolution of $$$ \mathbf{x} $$$, we are often interested in the filtering density $$$ p\left({\mathbf{x}}_t|{\mathbf{y}}_{0:t}\right) $$$, which is the conditional probability of $$$ \mathbf{x} $$$ at the current time given all current and past measurements. By drawing samples from this density and passing them through $$$ M\left(\right) $$$, we can estimate various parts of $$$ p\left({\mathbf{x}}_{t+\tau }|{\mathbf{y}}_{0:t}\right) $$$ for some forecast time $$$ \tau >0 $$$. This framework is used to formulate sequential Monte Carlo filters, such as EnKFs and PFs, thus providing the cornerstone of probabilistic environment prediction systems such as those used for weather forecasting (e.g., Houtekamer and Zhang, 2016).

The local PF attempts to sample from a posterior that assumes a gradual decoupling of marginal state variables displaced spatially, similar to EnKFs used for high‐dimensional geophysical applications. Derivations of this probability density are avoided in past studies, primarily because it is not needed to draw samples from approximate forms of this density. Given the nature of this study, we provide a transparent examination of the localized probability density.

Consider the case where we are given a prior density $$$ p\left({\mathbf{x}}_t\right) $$$ and need to approximate the conditional density $$$ p\left({\mathbf{x}}_t|{\mathbf{y}}_t\right) $$$. For simplicity, ignore time indices and consider the two‐dimensional problem: $$$ \mathbf{x}={\left[u\kern0.3em v\right]}^{\mathrm{T}} $$$, where $$$ v $$$ is observed directly with measurement $$$ y $$$. The sequential Monte Carlo framework requires that we sample from the conditional probability distribution $$$ p\left(\mathbf{x}|y\right) $$$. One way of conceptualizing this problem is to consider the factorization

Multiple filters that are widely used in public community software packages adopt the serial algorithm described in Anderson and Collins (2007), which exploits the factorization described earlier herein. Examples include the NCAR Data Assimilation Research Testbed (Anderson *et al*., 2009) and the NOAA Gridpoint Statistical Interpolation (Shao *et al*., 2016) system used for operational weather prediction in the United States. By construction, the Poterjoy (2016) local PF follows the same algorithmic framework, so it can be implemented easily for a broad selection of geophysical models. Therefore, it is important to examine how the filter solves the bivariate data assimilation problem introduced in this section.

First, consider the delta function approximation of the prior pdf made by PFs:

For many geophysical problems, it is practical to assume state variables displaced by a critical physical distance have independent errors. For example, we can write

For data assimilation applications targeted by this research, such as weather prediction, the decoupling must be specified carefully to maintain true cross‐variable error dependencies that come from physical processes depicted by the dynamical model. Not doing so can result in artificial dynamical imbalances during model integration (Kepert, 2009; Greybush *et al*., 2011), which reduce predictive skill. This challenge is complicated by the realization that spatial structure of error correlations tends to be anisotropic and dependent on the underlying flow (e.g., Poterjoy and Zhang, 2011). Furthermore, errors do not often exhibit abrupt spatial changes that would permit the use of Equation (4) alone for applications of this type. As is true for localized EnKFs, large correlations will inevitably need to be modulated by localization to reduce sampling error. Under these circumstances, localization can introduce a bias in the resulting posterior estimate, but the bias is generally smaller than what would be obtained without localization (for small $$$ {N}_{\mathrm{e}} $$$), which is why localization is useful for many applications. Therefore, an appropriate decoupling should reflect a combination of $$$ {p}_{\mathrm{pf}}\left(u,v|y\right) $$$ and $$$ {p}_i\left(u,v|y\right) $$$. For example, we can choose a conditional probability density of the form $$$ p\left({u}^m|{v}^n\right)={\delta}_{mn}{\rho}_{u,v}+\left(1/{N}_{\mathrm{e}}\right)\left(1-{\rho}_{u,v}\right) $$$, where $$$ {\rho}_{u,v} $$$ is a “localization” coefficient controlling a coupling between the two variables. Note that this choice resorts back to the two limiting cases when $$$ {\rho}_{u,v}=0 $$$ and $$$ {\rho}_{u,v}=1 $$$. The resulting posterior density for $$$ u $$$ and $$$ v $$$ is then approximated using*et al*., 2009; Shao *et al*., 2016).

The choice for $$$ {p}_{\mathrm{l}}\left(u,v|y\right) $$$ introduced in Equation (5) leads to*et al*. (2019).

To extend the bivariate example to multiple state variables and observations, the Poterjoy (2016) and Poterjoy *et al*. (2019) filters assimilate all observations at a given time serially and use a joint observation–model space localization that is identical to the approach discussed in Anderson and Collins (2007). This strategy assumes observations have independent errors and well‐defined physical locations; the latter assumption allows observation‐space priors to be treated as augmented state variables and updated alongside all other variables in $$$ \mathbf{x} $$$. Appendix A discusses potential ways of extending Equation (5) for $$$ {N}_y>1 $$$ and $$$ {N}_x>2 $$$.

PFs designed to operate within pre‐existing local ensemble transform Kalman filter frameworks (e.g., Penny and Miyoshi, 2016; Potthast *et al*., 2019) adopt a sliding window localization identical to Hunt *et al*. (2007). This strategy is also effective at providing the desired spatial decoupling, but results in slightly different formulations for the conditional distributions specified earlier herein. The two methods have similar properties for bivariate applications, which will be explored through simple examples in the following sections. We also note that joint observation–model space localization and sliding window localization are suboptimal for assimilating non‐local observations, such as satellite radiance. Readers are encouraged to review Lei and Anderson (2014) for a discussion on the positive and negative aspects of different localization methodologies.

We can visualize how PFs approximate $$$ {p}_{\mathrm{l}}\left(\mathbf{x}|\mathbf{y}\right) $$$ for a bivariate application to demonstrate the influence of $$$ \rho $$$ on the assumed posterior pdf. Consider the example shown in Figure 1a, where 10 particles are sampled from $$$ p\left(u,v\right) $$$ and reweighted based on $$$ p\left(y|u,v\right) $$$, where $$$ y $$$ measures $$$ v $$$ directly. The value of each sampled particle is indicated in the $$$ u $$$–$$$ v $$$ plane shown in Figure 1 by black circles. The red markers in this figure indicate particle weights, which are proportional to the posterior joint probability of $$$ u $$$ and $$$ v $$$, with larger markers corresponding to a larger weight. Each axis also shows blue markers, which indicate the marginal posterior probability of both $$$ u $$$ and $$$ v $$$.

Figure 1a shows the sample‐estimated $$$ {p}_{\mathrm{l}}\left(u,v|y\right) $$$ when $$$ {\rho}_{u,v}=1 $$$, which is equivalent to $$$ {p}_{\mathrm{pf}}\left(u,v|y\right) $$$. Note that the marginal posterior probabilities for $$$ u $$$ and $$$ v $$$ are the same for particles with like indices—an assumption that becomes relaxed for $$$ {\rho}_{u,v}<1 $$$. Figure 1b,c shows how the sample‐estimated pdf changes as $$$ {\rho}_{u,v} $$$ decreases to 0.8 and 0.0. The marginal weights for the observed variable $$$ v $$$ remain the same, but they become equal to $$$ 1/{N}_{\mathrm{e}} $$$ for $$$ u $$$ to reflect the decoupling. As shown in Figure 1, localization leads to non‐zero probability for all combinations of sampled $$$ u $$$ and $$$ v $$$ values while reducing the joint probability of solutions with matching indices. As a result, the number of plausible model states increases from 10 to 100. The new solutions that arise from localization are linear combinations of the original particles. The resulting pdf is less likely to collapse onto the original samples (marked by circles), thus partially solving the filter degeneracy issues discussed in the previous section. At the same time, there is no guarantee that samples drawn from parts of the pdf that represent combinations of particles will satisfy the model equations used to produce particles (Van Leeuwen, 2009; Farchi and Bocquet, 2018). Most notably, forming combinations of particles comprised of geophysical quantities, such as wind, temperature, and pressure, will ultimately yield large discontinuities between grid points, regardless of whether $$$ \rho $$$ is modeled using a spatially smooth function. Though this problem is inevitable for localized PFs, strategies exist for reducing discontinuities; some are discussed in the following subsection.

Drawing samples from $$$ {p}_{\mathrm{l}}\left(\mathbf{x}|\mathbf{y}\right) $$$ becomes non‐trivial for even moderately sized problems, which is why several forms of localized PFs have been proposed in recent years. In this section, we discuss two approaches that have been used for high‐dimensional geophysical applications. Before discussing these strategies, it is important to note that methodology already exists for drawing particles from $$$ {p}_{\mathrm{l}}\left(u,v|y\right) $$$ via a set of sequential sampling steps for each variable (Metref *et al*., 2014). This approach can be expanded for $$$ {N}_x>2 $$$ and $$$ {N}_y>1 $$$, but it is computationally prohibitive for high dimensions. Though spatial discontinuities still remain following this sampling, the dependence of marginal samples on neighboring marginal samples should yield the smallest possible discontinuities.

To sample from $$$ {p}_{\mathrm{l}}\left(\mathbf{x}|\mathbf{y}\right) $$$, Penny and Miyoshi (2016) form a transform matrix for each variable in $$$ \mathbf{x} $$$, which resembles an independent sampling at each physical grid point for models with a spatial dimension. Starting with a transform matrix populated by binary coefficients, they apply a smoothing operator that turns the coefficients into continuous values between 0 and 1, thus reducing discontinuities in posterior samples. To achieve stable results for weather applications, Robert *et al*. (2018) and Potthast *et al*. (2019) apply a similar form of localized PF update, but on a coarse model grid, which is then extrapolated to the native domain after data assimilation, thus reducing discontinuities in the updated particles.

Poterjoy (2016) propose a localized PF that uses the strategy outlined in Anderson and Collins (2007) for performing joint observation‐ and model‐space updates in parallel. This approach processes observations serially to perform an observation‐space update via bootstrap sampling of particles, followed by a model‐space update that preserves the non‐localized PF solution where $$$ \rho =1 $$$ and the prior solution when $$$ \rho =0 $$$. By performing the observation‐space update first, posterior particles can be adjusted in a manner that is consistent with resampled particles in the vicinity of observations. For the remaining variables, the filter uses an approximation to the more general sampling strategy outlined in Metref *et al*. (2014). When $$$ 0<\rho <1 $$$, this filter only guarantees that the first two moments of marginal quantities in $$$ {p}_{\mathrm{l}}\left(\mathbf{x}|\mathbf{y}\right) $$$ are matched by posterior particles.

For a single observation, the equation used by Poterjoy (2016) and Poterjoy *et al*. (2019) for state updates is given by

The update strategy adopted by Poterjoy (2016) and Poterjoy *et al*. (2019) is similar to the rank histogram filter of Anderson (2010), but with Equation (7) replacing a Kalman filter update. We refer readers to Poterjoy *et al*. (2019) for a full algorithmic description of this filter along with an exhaustive list of variable definitions.

The computational costs of the Poterjoy (2016) and Poterjoy *et al*. (2019) filters are much smaller than strategies that can sample perfectly from the localized posterior discussed in Section 2 (Metref *et al*., 2014), thus making them affordable for high‐dimensional problems such as weather prediction (e.g., Poterjoy *et al*., 2021). Nevertheless, serial filters are more costly than filters that perform an independent sampling step for each variable—but require additional steps to model marginal dependence. The fitting of posterior particles to only the first two moments for unobserved quantities also presents problems for highly non‐Gaussian applications (see Section 5), which is why past versions of this filter use an additional probability mapping step after resampling. Poterjoy (2016) and Poterjoy *et al*. (2019) refer to this step as kernel density distribution mapping (KDDM).

Bengtsson *et al*. (2008), Bickel *et al*. (2008), and Snyder *et al*. (2008) outline conditions for collapse of the standard PF with Gaussian errors. Here, collapse refers to situations where importance weights equal zero for all but one particle, and the standard PF is the bootstrap PF with the prior as the proposal density. To prevent weight collapse, Snyder *et al*. (2008) conclude that $$$ {N}_{\mathrm{e}} $$$ must increase exponentially with $$$ Var\left[{\sum}_{j=1}^{N_y}{V}_j^n\right] $$$, where $$$ {V}_j^n=-\ln \left[p\left({y}_i|{\mathbf{x}}^n\right)\right] $$$, and the variance is taken over the proposal density—or prior density in this case. For example, $$$ Var\left[{\sum}_{j=1}^{N_y}{V}_j^n\right] $$$ would be large if a dense network of accurate measurements were assimilated for a dynamical system containing many variables with independent errors.

Localization provides one mechanism for reducing weight collapse. For example, Poterjoy (2016) chooses a marginal probability density for variable $$$ {x}_j $$$ in $$$ \mathbf{x} $$$ that expands Equation (6) for multiple observations with independent errors:

For geophysical applications with dense networks of observations, $$$ {\rho}_{i,j} $$$ can be insufficient for preventing weight collapse. Most noticeably, the arguments put forth by Bengtsson *et al*. (2008), Bickel *et al*. (2008), and Snyder *et al*. (2008) still hold when $$$ {\rho}_{i,j}=1 $$$, which is problematic for assimilating very accurate measurements or many collocated measurements using localized PFs. In addition, the dynamical system of interest places a minimum bound on length scale parameters used to characterize $$$ {\rho}_{i,j} $$$. This factor is important for global weather and ocean prediction models, where large spatial error correlation structures—commensurate with the Rossby radius of deformation—may dictate the minimum length scale for localization functions. Therefore, alternative strategies are needed to maintain the stability of PFs when $$$ {\rho}_{i,j} $$$ provides an insufficient reduction in $$$ Var\left[{\sum}_{i=1}^{N_y}{V}_{i,j}^n\right] $$$.

This section introduces multiple strategies for overcoming the obstacles discussed in Section 3. Each of these methods builds naturally off one another, following in sequence from regularization to hybridization of PFs with parametric filters. They also resemble strategies that have already been adopted for Gaussian filters and smoothers, as well as different forms of PFs.

Before discussing new methodology, it is important to note that EnKFs suffer the same fate as PFs for small $$$ {N}_{\mathrm{e}} $$$ (e.g., Morzfeld *et al*., 2017), which motivates the use of prior and posterior error variance inflation techniques. Approaches to variance inflation can take multiple forms. Common strategies include multiplying ensemble perturbations by a factor greater than unity (Anderson and Anderson, 1999), adding random noise to posterior samples (Houtekamer and Mitchell, 2005), or relaxing a portion of the posterior update to ensemble perturbations back to the prior (Zhang *et al*., 2004; Whitaker and Hamill, 2012). These strategies, as well as combining PFs with other filters (e.g., Stordal *et al*., 2011; Frei and Künsch, 2013), applying tempered transitions and particle flow methodology (e.g., Del Moral *et al*., 2006; Daum and Huang, 2011), or modifying the transition density between observation times (Van Leeuwen, 2010), are often used to prevent weight collapse for PFs. Likewise, PFs often adopt “pre‐” and “post‐regularization” schemes (Musso *et al*., 2001), which are comparable to prior and posterior additive inflation algorithms used in the geophysical modeling community for EnKFs. Readers are encouraged to review Farchi and Bocquet (2018) for comparisons of these methods using low‐dimensional numerical experiments with PFs.

More relevant to the current study, Poterjoy *et al*. (2019) discuss mechanisms for artificially increasing posterior error variance in a PF framework—in a manner that does not introduce large deviations from potentially non‐Gaussian error approximations by the filter. They propose a strategy that uses observation error inflation to broaden the region of high posterior probability, using “effective sample size” as a metric for determining the amount of inflation. Effective sample size is a heuristic measure of the degrees of freedom in a weighted sample and is defined by $$$ {N}_{\mathrm{eff}}={\left[\sum {\left({w}^n\right)}^2\right]}^{-1} $$$ (Liu and Chen, 1998). The observation error inflation acts as regularization strategy, similar to techniques long used in the machine‐learning community to prevent the overfitting of data (Moody, 1991). The very nature of this approach means that it does not change the shape of posterior distributions in the same manner as the multiplicative and additive inflation methods listed in the previous paragraph. Poterjoy *et al*. (2019) adopted this form of regularization to maintain filter stability in experiments applying the local PF for a real weather event using $$$ {N}_{\mathrm{e}}=36 $$$.

As an alternative to Poterjoy *et al*. (2019), we propose applying regularization directly to marginal weights, which take into account the cumulative effect of measurements on posterior density calculations at a given grid point. This strategy has a number of desirable properties, which will be discussed in the following subsections.

First, recall that for independent Gaussian errors, weight collapse at the $$$ j $$$th grid point occurs as $$$ Var\left[{\sum}_{i=1}^{N_y}{V}_{i,j}^n\right] $$$ becomes large. The most straightforward means of enforcing an upper bound on this term is to multiply $$$ {V}_{i,j}^n $$$ by a coefficient $$$ {\beta}_j $$$, where $$$ 0\le {\beta}_j\le 1 $$$. This strategy is equivalent to raising particle weights to a power of $$$ {\beta}_j $$$ so that $$$ {\omega}_j^n\propto {\prod}_{i=1}^{N_y}{\left({\hat{\omega}}_{i,j}^n\right)}^{\beta_j} $$$. As discussed in Poterjoy *et al*. (2019), a natural choice of $$$ {\beta}_j $$$ is one that changes dynamically as a function of $$$ {N}_{\mathrm{eff}} $$$, which requires precomputing particle weights at each grid point and numerically solving for the largest $$$ {\beta}_j $$$ that gives $$$ {N}_{\mathrm{eff}}\ge {N}_{\mathrm{eff}}^t $$$, where $$$ {N}_{\mathrm{eff}}^t $$$ is a specified threshold value. In the absence of localization, the methods proposed in Poterjoy *et al*. (2019) and the current article are equivalent to inflating the measurement error variance for all observations by a factor of $$$ 1/{\beta}_j $$$.

To understand how the new approach differs from Poterjoy *et al*. (2019), recall that the observation error inflation strategy of Poterjoy *et al*. (2019) first computes a set of $$$ \beta $$$ coefficients $$$ \left\{{\tilde{\beta}}_1,\dots, {\tilde{\beta}}_{N_y}\right\} $$$, where each $$$ {\tilde{\beta}}_i $$$ is calculated for the observation‐space weights $$$ \left\{{w}_i^1,\dots, {w}_i^{N_{\mathrm{e}}}\right\} $$$ that go into the calculation of $$$ {\hat{\omega}}_{i,j}^n $$$ in Equation (8). Following this step, the final inflation coefficient assigned to the $$$ i $$$th observation takes into account $$$ \tilde{\beta} $$$ from nearby observations via

This subsection introduces an adaptive tempering approach that builds off of the regularization discussed in the previous subsection. As mentioned already, tempering (Neal, 2001) is another mechanism for reducing filter degeneracy for PFs. In general, tempering exploits a factorization of the likelihood to break the particle update step into a sequence of smaller update steps, thus allowing for mutations between iterations; see Van Leeuwen *et al*. (2019) for an extended discussion on the use of tempering for PFs. Iterative approaches of this type are already used extensively to form ensemble filters and smoothers that sample from non‐Gaussian posteriors via a set of intermittent linear updates to prior particles (e.g., Zupanski, 2005; Sakov *et al*., 2012; Emerick and Reynolds, 2012; Emerick and Reynolds, 2013; Bocquet and Sakov, 2014; Stordal and Elsheikh, 2015; Evensen, 2018). These strategies sometimes include a periodic relinearization of dynamical models and measurement operators about more accurate reference solutions, similar to incremental formulations of four‐dimensional variational data assimilation methods (Courtier *et al*., 1994), or generating multiple ensemble predictions each iteration to better estimate temporal covariance.

Dubinkina and Ruchi (2019) recently applied tempering for the ensemble transform PF (Reich, 2013), demonstrating advantages over regularization for problems with non‐additive model errors. When applied to the Poterjoy *et al*. (2019) local PF, tempering serves a different purpose, which is to reduce errors introduced by updating unobserved variables based on the first two moments alone. In this section, we discuss two different forms of tempering. The first approach is to apply a factorization of the likelihood, as in past studies. That is, let $$$ p\left(\mathbf{x}|\mathbf{y}\right)=p\left(\mathbf{x}\right)p{\left(\mathbf{y}|\mathbf{x}\right)}^{\alpha_1+{\alpha}_2+\cdots +{\alpha}_{N_k}} $$$, where $$$ {\sum}_{k=1}^{N_k}{\alpha}_k=1 $$$. The tempering introduces an extra recursion to the posterior update; that is,

Because tempering has no benefits when $$$ {\rho}_{i,j}=1 $$$ in the local PF, we introduce an additional mixing coefficient for particles, which increases the contribution of prior perturbations in the particle update equation. This is achieved by multiplying $$$ {\mathbf{r}}_1 $$$ by a scalar $$$ \gamma $$$ in Equation (7), where $$$ \gamma $$$ is specified to be a constant between 0 and 1. The prior weighting vector $$$ {\mathbf{r}}_2 $$$ is then recalculated to preserve the marginal posterior error variance for a given $$$ \gamma $$$; see Appendix C for details. The resulting particles are then recentered about the posterior mean. Typical values for $$$ \gamma $$$ range from 0.3 to 1, with $$$ \gamma $$$ tending toward 1 (i.e., no mixing) for larger $$$ {N}_{\mathrm{e}} $$$. It is important to mention that particles removed during resampling are the most affected by mixing, since the order determined for selected particles guarantees that surviving particles maintain the same index. The tempering strategy, combined with mixing, also does not completely replace the need for regularization, since choosing too small an $$$ {N}_k $$$ can easily lead to local weight collapse.

The tempering provides additional benefits for PFs that fit posterior moments, such as the Poterjoy (2016) and Poterjoy *et al*. (2019) filters. Suppose particle weights are regularized, so that $$$ {N}_{\mathrm{eff}} $$$ is close to $$$ {N}_{\mathrm{e}} $$$, and the bootstrap resampling step leads to one particle being removed and replaced by another. The resulting particles are all assigned equal weights of $$$ 1/{N}_{\mathrm{e}} $$$ following the sampling. Equivalently, one can assign equal importance weights to all prior particles except the pair being removed or sampled twice; in this case, the removed particle would have a weight of zero and the duplicated particle would have a weight of $$$ 2/{N}_{\mathrm{e}} $$$. Knowledge of the first moment alone (estimated using importance weights) is then sufficient for identifying how the removed particle should be updated to provide the same solution as the standard bootstrap PF. Likewise, the local PF update can be separated into a sequence of sampling and merging steps, which only fit the first two moments, as in Equation (7). In doing so, the resulting update achieves a set of particles that more closely resembles samples from the localized PF density introduced in Equation (5). This property of tempering is illustrated through numerical simulations in Section 5.

Similar to iterative methods adopted for Gaussian filters, the $$$ {N}_k $$$ needed for a given problem depends on a variety of factors, such as how well prior densities are approximated by a Gaussian. For example, we find little to no benefits of tempering for Lorenz (1996) experiments performed using observation networks that are dense in space and time, as discussed in Section 5.2. These experiments yield small prior uncertainty between successive observation times, which is approximated well by a Gaussian. Likewise, each iteration can introduce additional noise in the solution, which may negate some positive benefits of tempering for large $$$ {N}_k $$$.

A more efficient way of introducing iterations is to replace likelihood tempering with a tempering over marginal weights. This strategy expands the particle weight regularization in Section 4.1 into a posterior tempering strategy. Recall, the regularized local PF uses the marginal weight equation $$$ {\omega}_j^n\propto {\prod}_{i=1}^{N_y}{\left({\hat{\omega}}_{i,j}^n\right)}^{\beta_j} $$$, where $$$ {\beta}_i $$$ enforces a minimum $$$ {N}_{\mathrm{eff}} $$$ for weights. Following the assimilation of $$$ \mathbf{y} $$$ with regularization, we can repeat the particle updates, calculating $$$ {\beta}_{j,k} $$$ for each iteration $$$ k $$$ based on $$$ {N}_{\mathrm{eff}}^t $$$, and stopping the processes when $$$ {\sum}_{k=1}^{N_k}{\beta}_{j,k}\ge 1 $$$. For the last iteration, $$$ {\beta}_{j,{N}_k} $$$ must be set to $$$ 1-{\sum}_{k=1}^{N_k-1}{\beta}_{j,k} $$$ to ensure the weights sum to 1. The resulting method has properties similar to likelihood tempering, except the number of iterations is determined adaptively while enforcing the requirement that each set of marginal particle weights has $$$ {N}_{\mathrm{eff}}\ge {N}_{\mathrm{eff}}^t $$$. Therefore, no additional regularization methods are needed. As in the likelihood tempering approach, expanding regularization to include an adaptive set of iterations (so that $$$ {\sum}_{k=1}^{N_k}{\beta}_{j,k}=1 $$$) does not change the asymptotic behavior of the filter, as localization is relaxed for large sample sizes. It also has the desirable property of iterating only over subsets of variables in $$$ \mathbf{x} $$$; that is, additional iterations may not be needed for portions of a model domain characterized by low variance in particle weights.

Regularization and tempering strategies provide a natural framework for combining the local PF with alternative filters when appropriate. For example, a recent study by Grooms and Robinson (2021) used an EnKF to update particles following an initial PF sampling step—exploiting the same likelihood factorization of past studies. This idea takes advantage of the fact that the posterior pdf is better approximated by a Gaussian than the prior when observation likelihoods are Gaussian (Morzfeld and Hodyss, 2019). A partial update by a PF can adjust particles to resemble samples from a Gaussian, thus making the EnKF an appropriate choice for the remaining update.

We apply a near identical strategy as Grooms and Robinson (2021) for the iterative local PF, which requires only minor changes to the algorithm. First, the stopping criteria for iterations is altered so that $$$ {\sum}_{k=1}^{N_k}{\beta}_{j,k}={\kappa}_{\mathrm{max}} $$$ for $$$ 0\le {\kappa}_{\mathrm{max}}\le 1 $$$. Note that setting $$$ {\kappa}_{\mathrm{max}}<1 $$$ results in a regularization of the local PF, similar to the method discussed in Section 4.1. Following the initial set of local PF iterations, the last adjustment is performed using an EnKF with the measurement error variance inflated by the factor $$$ 1/1-{\kappa}_{\mathrm{max}} $$$. The coefficient $$$ {\kappa}_{\mathrm{max}} $$$ operates as a hybrid parameter, and the scheme approaches the iterative local PF in the limit $$$ {\kappa}_{\mathrm{max}}\to 1 $$$ and the EnKF in the limit $$$ {\kappa}_{\mathrm{max}}\to 0 $$$.

The state‐space regularization, tempering, and hybrid local PF–EnKF methods are summarized in Algorithm 1. Note that because the three methods share many of the

same calculations, they can be expressed compactly in a single algorithm with switches for determining which operations should be called by each filter. To satisfy an exit criterion, each iteration also requires a running sum of $$$ {\beta}_{j,k} $$$, which is represented by $$$ {\kappa}_j $$$ in the algorithmic description.

This section explores the behavior of regularization, tempering, and hybrid strategies for the Poterjoy *et al*. (2019) local PF through numerical simulations. For the first set of experiments, we use a simple bivariate problem to illustrate how tempering alters the adjustment of particles compared with a single‐step approach. The second set of experiments use the Lorenz (1996) model to compare the performance of regularized, tempered, and hybrid formulations of the local PF to past local PF configurations and the ensemble square‐root EnKF of Whitaker and Hamill (2002). These experiments use simulated measurements to target observation collection scenarios with varying degrees of spatial and temporal density.

We first form a bivariate data assimilation problem to examine the impact of different local PF sampling strategies. Though numerous challenges exist for localizing ensemble filters for high‐dimensional problems, the current demonstration focuses more narrowly on a case where marginal dependence exists across a pair of variables. As discussed in Section 2, many geophysical data assimilation problems require a careful decoupling of variables, owing to gradual spatial transitions in error correlations. Therefore, localization must be able to modulate the dependence across variables in a controllable manner. For an application of this size, we can use Equation (5) from Section 2 to calculate and visualize the localized posterior pdf. This sample‐estimated pdf is adopted for illustrative purposes, as it approximates the posterior density the candidate methods are trying to sample from.

Continuing the notation from past sections, let $$$ \mathbf{x}={\left[u\kern0.3em v\right]}^{\mathrm{T}} $$$ and consider the case where $$$ v $$$ is observed directly by measurement $$$ y $$$ with $$$ \epsilon \sim N\left(0,0.1\right) $$$. $$$ N\left(\right) $$$ indicates the function for a normal distribution, which in this case has a zero mean and variance of 0.1. We will compare four different algorithms for sampling from $$$ {p}_{\mathrm{l}}\left(u,v|y\right) $$$: (1) single‐step Poterjoy *et al*. (2019) filter; (2) single‐step Poterjoy *et al*. (2019) filter with KDDM; (3) direct KDDM resampling; and (4) iterative Poterjoy *et al*. (2019) filter. We do not show results for the regularized local PF in these comparisons, since this method represents one iteration of the iterative local PF. Likewise, the tempered transitions only use the local PF update, instead of the EnKF step discussed with regard to the hybrid filter (Section 4.3). When localization is used, the bivarate test problem used here has a multi‐modal posterior density, which does not satisfy the underlying assumption of a Gaussian posterior for this method to be beneficial. We also draw special attention to the direct KDDM sampling algorithm (experiment 3). When multiple observations are present, this is the only algorithm of the four that bypasses serial updates in place of a less costly independent update to $$$ u $$$ and $$$ v $$$. As the kernel density estimates (KDEs) used for the KDDM approach a zero bandwidth, this method becomes identical to performing an independent bootstrap resampling on sorted particles for each variable. Increasing the bandwidth leads to a smoothing of posterior samples in a manner that is similar to the non‐serial localized PF sampling strategies discussed in Section 2. For these experiments, we use a bandwidth that is equal to the posterior standard deviation for each variable, which is a robust option for real weather applications (Poterjoy *et al*., 2019).

For the prior, we draw 1,000 particles from a bimodal distribution by sampling half from $$$ N\left({\mathbf{x}}_1,0.1\mathbf{I}\right) $$$ and half from $$$ N\left({\mathbf{x}}_2,0.1\mathbf{I}\right) $$$ , with $$$ {\mathbf{x}}_1={\left[-1,-1\right]}^{\mathrm{T}} $$$, $$$ {\mathbf{x}}_2={\left[1,1\right]}^{\mathrm{T}} $$$, and $$$ \mathbf{I} $$$ is an identity error covariance matrix. The large particle size helps illustrate the asymptotic behavior of the localized PF density approximation in Section 2. We acknowledge that localization is not needed for this problem when $$$ {N}_{\mathrm{e}}=1,000 $$$—and even induces bias in the posterior density—but adopt this approximation primarily to examine how various localized filters attempt to sample from the density assumed when localization is introduced. For visualization purposes, it is more practical to calculate the prior and posterior pdfs using Gaussian KDEs instead of a Dirac delta functions, which allows for smooth contours in figures. In doing so, we choose kernels with a standard deviation of 0.2 for all pdfs. For reference, Figure 2a–c shows prior samples and KDEs for $$$ p\left(u,v\right) $$$ and the true (non‐localized) $$$ p\left(u,v|y\right) $$$, as well as marginals for each density. For all subsequent figures in this section, we plot the localized posterior density $$$ {p}_{\mathrm{l}}\left(u,v|y\right) $$$, which is the posterior density resulting from a decoupling of $$$ u $$$ and $$$ v $$$ using $$$ {\rho}_{u,v}=0.75 $$$. This posterior differs from the true posterior in that dependence between $$$ u $$$ and $$$ v $$$ is modified in a manner typically used to treat sampling error during data assimilation—as such, it is always biased whenever true dependence exists, which is inevitable for geophysical models. Without localization, the resulting marginals are close to Gaussian (Figure 3a), since particles would only retain large weights in the positive mode. The resulting decoupling leads to non‐zero posterior probability in the upper left portion of the domain (Figure 4a) as the directly observed variable retains the same posterior marginal, but the unobserved variable retains both modes—albeit with less density in the negative one. We remind readers that the sole purpose of this exercise is to examine how different filters sample from the localized density instead of the true posterior because the true posterior cannot be obtained in practice.

We first examine how the single‐step Poterjoy *et al*. (2019) filter draws samples from $$$ {p}_{\mathrm{l}}\left(u,v|y\right) $$$. By construction, this method properly samples from $$$ p\left(v|y\right) $$$ as indicated by the markers in Figure 3b. For the unobserved variable, the method draws numerous particles between the two modes in $$$ p\left(u|y\right) $$$, thus failing to maintain consistency with the analytical solution. This example shows the limitations of fitting only two moments during the update of unobserved variables, thus motivating the additional probability mapping procedure discussed in Poterjoy (2016). When KDDM is performed following the single‐step update (Figure 4), the suboptimal update is corrected; that is, particles that fall in the low probability region between the two modes are shifted into the modes. The success of this approach, however, requires applying the sampling and merging process (i.e., Figure 3) prior to the additional adjustment for $$$ p\left(u|y\right) $$$. The sampling step ensures particles are properly ordered before their marginals are mapped into posterior KDEs through KDDM.

Figure 5 demonstrates the outcome of applying KDDM directly to adjust prior particles based on marginal posterior pdfs alone. This strategy is similar to the one‐dimensional transport map or anamorphosis technique discussed in Farchi and Bocquet (2018) for updating particles based on marginal quantities alone. They find this approach to give reasonably accurate solutions for the Lorenz (1996) model, using the observation network examined in their study. Particle updates from KDDM provide a consistent sampling from each marginal (Figure 5b,c). The sampling for unobserved variable $$$ u $$$, however, is not explicitly conditioned on the sampling performed for $$$ v $$$. Instead, $$$ p\left(u|v\right) $$$ is determined by how particles are sorted prior to sampling and by parameters that enforce smoothness, such as the kernel bandwidth in KDDM. As a result, KDDM and other transform strategies that operate on posterior marginals independently will lead to a suboptimal sampling from the multivariate posterior. As discussed in Poterjoy (2016), KDDM preserves quantiles when adjusting particles, which allows the method to retain marginal dependence when transforming particles. In this case, dependent structure represented by prior particles persists through the posterior update, which induces artificial bimodal behavior within the first posterior mode (Figure 5a). This behavior is also commonly observed in deterministic ensemble square‐root Kalman filters when the prior pdf is multi‐modal (e.g., Poterjoy *et al*. 2017, figure 4).

Lastly, Figure 6 shows the results of updating particles through a series of tempered transitions. Particle weights are forced to satisfy $$$ {N}_{\mathrm{e}\mathrm{ff}}=0.8{N}_{\mathrm{e}} $$$, which means only a small number of particles are removed when sampling from $$$ p\left(v|y\right) $$$ during each iteration. The update equation in Equation (7) then becomes sufficient for producing particles that satisfy non‐Gaussian properties of the localized posterior, despite being derived using the first two moments. For reasons discussed in Section 4, iterations allow the local PF to provide samples from a non‐Gaussian posterior without a need for an additional KDDM step (Figure 6a).

For the next set of experiments, we adopt the Lorenz (1996) dynamical model to explore the behavior of the new filters in a sequential data assimilation framework. This model consists of $$$ {N}_x $$$ equally spaced variables defined on a periodic domain, which are governed by a set of $$$ {N}_x $$$ differential equations:

As discussed in Section 1, the current study is motivated by geophysical data assimilation problems characterized by biased prior errors, but accurate measurements, which are known to yield challenges for PFs, including ones that adopt localization. To reproduce this problem, we simulate numerous observation networks for the Lorenz (1996) model, each differing in spatial and temporal density to compare the performance of the new filters. We vary the measurement density by simulating sets of observations that are spaced every of one, two, and four grid points; that is, $$$ {N}_y=40 $$$, $$$ {N}_y=20 $$$, and $$$ {N}_y=10 $$$, respectively. We also vary the forecast integration time $$$ T $$$) between measurements, using $$$ T=0.05 $$$, $$$ T=0.10 $$$, $$$ T=0.20 $$$, and $$$ T=0.30 $$$. Measurement errors are sampled from $$$ N\left(0,0.05\right) $$$, thus providing highly accurate observation networks, which can induce particle weight collapse as described in Section 3. For simplicity in notation, we refer to the regularized local PF as “local RPF,” the iterative local PF as “local IPF,” and the hybrid iterative local PF as the “local IPF–EnKF.” We also perform EnKF experiments using the Whitaker and Hamill (2012) square‐root filter with Anderson (2007) adaptive state‐space prior inflation. This filter and inflation methods are commonly used for geoscience applications, thus providing a suitable benchmark for comparing new methodology.

We perform two sets of experiments: one using $$$ {N}_{\mathrm{e}}=40 $$$ particles and another using $$$ {N}_{\mathrm{e}}=120 $$$ particles. Each experiment consists of 11,000 observation times, with the first 1,000 times designated for spin up. For localization, we adopt a Gaussian‐shaped localization function with a width controlled by a radius of influence (ROI). We rigorously tune all filter parameters, namely ROI, $$$ {N}_{\mathrm{eff}} $$$, $$$ \gamma $$$, and adaptive inflation parameters needed for the EnKF, to provide the smallest root‐mean‐squared error (RMSE) for a given observation network. To reduce the parameter space, we fix the hybrid parameter $$$ {\kappa}_{\mathrm{max}} $$$ at 0.5 for local IPF–EnKF experiments. This choice is motivated by the comparable performance of the local IPF and EnKF for the given observation networks. Filter sensitivity to this parameter will be explored in subsequent studies for real problems. We also refer to Kurosawa and Poterjoy (2021) for an examination of the local IPF–EnKF for nonlinear measurement operators and its comparison with iterative ensemble and variational smoothers.

We summarize numerical results from all Lorenz (1996) experiments in Figures 7 and 8, which shows the RMSE of each method as a function of $$$ {N}_y $$$ and $$$ T $$$. In general, we find the relative performance of each method to vary substantially with observation density. For observation networks that are dense in both space and time, the EnKF and local IPF–EnKF produce the smallest errors, owing to the prior distributions remaining close to Gaussian over time. The advantage of the EnKF declines when $$$ {N}_y $$$ is decreased to 10 but $$$ T $$$ remains small, as the prior spread increases and becomes less Gaussian. More notably, the regularized local PF configurations show a sharper decline in skill as $$$ T $$$ increases, regardless of the spatial density of measurements. The decline in performance is less dramatic for experiments performed with $$$ {N}_{\mathrm{e}}=120 $$$, owing to the lesser need for regularization. This result is expected, as fewer particles obtain high likelihoods for the accurate measurements simulated in these experiments, which is most detrimental when $$$ {N}_{\mathrm{e}} $$$ is small. We also note that Poterjoy *et al*. (2019) show the original non‐regularized Poterjoy (2016) filter diverging for sparse observation networks of this type. Regularization helps stabilize these methods by artificially decreasing the impact of potentially accurate measurements. Despite this deficiency, the newly proposed regularization strategy (local RPF) provides comparable RMSEs to the observation error inflation introduced by Poterjoy *et al*. (2019). The new method, however, has the added benefit of being naturally extended to the adaptive tempering strategy used for the local IPF experiments.

When $$$ {N}_y $$$ is small and $$$ T $$$ is large, the local IPF provides improvements over the non‐iterative methods and yields RMSEs that are more comparable to the EnKF. The number of iterations, though chosen adaptively through $$$ {N}_{\mathrm{eff}}^t $$$, remains low for the experiments performed in this study. The maximum number of iterations for a given marginal does not exceed five for any observation network, with the exception of cycles performed during the “spin‐up” period. This algorithm also outperforms the EnKF when model error is introduced (Figure 7d). We hypothesize that the improvements over EnKF for the model error experiments follow mostly from the mixing strategy (controlled by parameter $$$ \gamma $$$), which increases diversity in particles between successive sampling steps. We find further improvements when the local IPF is extended to include an EnKF step during iterations—thus forming the hybrid local IPF–EnKF. This algorithm outperforms all methods tested here and is especially beneficial in experiments with very sparse observations and model error (i.e., Figure 7c,d). We also note that the performance of the hybrid filter does not decline substantially between the $$$ {N}_{\mathrm{e}}=120 $$$ and $$$ {N}_{\mathrm{e}}=40 $$$ experiments, thus showing greater value for real geophysical data assimilation problems, which use limited numbers of particles.

In addition to comparing posterior RMSEs, we perform a rank histogram verification (Anderson, 1996; Hamill and Colucci, 1996; Talagrand *et al*., 1997) on posterior particles generated over one set of $$$ {N}_{\mathrm{e}}=40 $$$ experiments for a regime where RMSEs are below the measurement uncertainty, but non‐Gaussian priors are expected to have a role in interpreting the results. These experiments use a perfect model and an equally spaced observation network defined using $$$ {N}_y=20 $$$ and $$$ T=0.2 $$$ (Figure 7b). Assimilating measurements from this observation network poses challenges because of nonlinearity in the model dynamics and data sparsity, which can be isolated in perfect‐model experiments. With $$$ {N}_{\mathrm{e}}=40 $$$, either regularization (for PFs) or prior inflation (for EnKF) is needed to retain stable solutions, which are expected to degrade the probabilistic skill of the ensemble filters. Furthermore, we extended these experiments to include $$$ 1{0}^6 $$$ observation times to increase the reliability of the rank histogram verification.

Figure 9 shows the rank‐histogram verification for a pair of observed and unobserved variables, $$$ {x}_1 $$$ and $$$ {x}_2 $$$, respectively. In general, the regularized PFs and EnKF produce rank histograms that exhibit larger frequencies near the periphery of the sorted particles, indicating an underdispersion of particles for these methods. This result is expected, as configurations that yield the smallest RMSEs tend to coincide with the smallest regularization needed to prevent filter divergence over the course of the experiments. Though not shown, the EnKF tends to produce flatter histograms for denser observation networks and highly U‐shaped histograms for sparse networks. Therefore, we attribute the underdispersion to Gaussian assumptions made by the EnKF and adaptive inflation. For the regularized PFs, the shape of the histograms tends to remain similar to those shown in Figure 9, which is suboptimal but consistent across experiments.

The local IPF and IPF–EnKF algorithms yield nearly uniform histograms. We also find these results to be less sensitive to choices of $$$ {N}_{\mathrm{eff}} $$$, since the two methods use this parameter to dictate the number of tempered transitions, rather than regulate the update. We also draw attention to the ability of the local IPF–EnKF method to maintain a flat histogram for the current observation network. This result indicates that the local PF is effective at shifting particles into an approximate Gaussian before applying the EnKF step for this particular application.

The theoretical benefits of PFs motivate their application for a large range of numerical weather prediction applications. These applications include high‐impact weather events, such as severe convective storms and tropical cyclones, which are guided by highly nonlinear dynamics and observed by remote‐sensing platforms on radars and satellites. In addition to posing challenges for Gaussian‐based data assimilation techniques, these applications are also characterized by large sampling errors in ensemble estimates for prior probability densities, which limit the direct use of PFs—even with localization.

This study introduces new regularization, tempering, and hybrid approaches for the local PF of Poterjoy (2016) and Poterjoy *et al*. (2019), which aim to improve the effectiveness of this filter for high sampling‐error regimes. Regularization reduces the impact of observations on particle weights, which directly prevents weight collapse when few or no particles contain high likelihoods. Because regularization is achieved by raising particle weights to a power less than one, this strategy permits the assimilation of observations multiple times in the same manner as iterative data assimilation strategies that rely on a factorization of the likelihood function. Likewise, the use of an alternative—possibly parametric—filter during the last iteration of the tempered local PF is trivial. Following past studies (e.g., Frei and Künsch, 2013; Chustagulprom *et al*., 2016; Grooms and Robinson, 2021), we use this factorization to form a local PF–EnKF hybrid.

To examine the behavior of iterations for the local PF, we derive the posterior density that results from localization and form a bivarate data assimilation problem that produces a multivariate bimodal posterior. Through numerical simulation, we examine how different methods draw samples from a multivariate non‐Gaussian density, which is a known challenge for filters currently used for high‐dimensional applications. The serial local PF introduced in Poterjoy (2016) solves multivariate problems of this type by first drawing samples for the marginal posterior density in observation space and then updating unobserved model variables using a mix of sampled and prior particles. Because the update of unobserved variables only considers the first two moments, we find this strategy to be suboptimal for the bimodal application. Likewise, a sampling strategy that draws quasi‐independently from marginals to approximate samples from the full multivariate posterior density is also found to be suboptimal. For the iterative local PF, the single‐step update is replaced with a series of smaller updates, which we find to force appropriate transitions of particles between modes, despite only fitting two moments. This benefit follows from the limited degrees of freedom that exist when only a small number of particles need to be adjusted during each iteration.

We compare the filtering performance of the regularized, iterative, and hybrid local PF with a past implementation of the local PF (Poterjoy *et al*., 2019) and a square‐root EnKF (Whitaker and Hamill, 2012) using the 40‐variable model of Lorenz (1996). From these experiments, the iterative filter yields the largest improvements over regularized local PF strategies when observations are sparse in space and time. The number of prior particles with large likelihoods decreases considerably when prior error variance becomes much larger than measurement error variance—thus producing the low observation error regime that motivates the current study. Furthermore, we find the local IPF–EnKF hybrid to outperform both the iterative local PF and EnKF for nearly all measurement networks examined. The performance gains come in the form of reduced posterior RMSEs, as well as more uniform rank histograms, which demonstrates the benefits of this approach for applications characterized by non‐Gaussian prior densities by approximately Gaussian posterior densities.

Though the iterative strategy introduced in this study brings considerable performance gains to the local PF, it also adds to the computational complexity. The number of iterations is determined adaptively in the algorithm through an effective ensemble size parameter, making it difficult to estimate how many additional calculations are required. This aspect of the algorithm is further complicated by allowing the number of iterations to vary across model variables. Therefore, data assimilation problems characterized by biased priors in significant portions of the state space will require more iterations—and more computational resources—than those with low error. Nevertheless, savings can be achieved by batching observations geographically or ignoring observations that yield very small changes to particle weights. These approaches have already been applied for weather forecasting applications of the local PF and will be discussed in a future study.

Funding for this work was provided by a U.S. National Science Foundation CAREER Award #AGS1848363 and NOAA grant #NA20OAR4600281. The author is thankful for comments and edits provided by two anonymous reviewers.

Calculations of the localized posterior density are not tractable for large $$$ {N}_x $$$, but further insight into the role of localization for higher dimensions can be obtained by considering the case where the variables in $$$ \mathbf{x} $$$ are conditionally independent. This assumption is not as restrictive as those made by filters that sample independently from posterior marginals, but it is likely not appropriate for complicated posterior pdfs, such as those with multiple modes. For this exercise, we will extend Equation (5) for $$$ {N}_y>1 $$$ and $$$ {N}_x>2 $$$. It is sufficient to consider the case of $$$ {N}_y=1 $$$ and $$$ {N}_x=3 $$$, as the posterior pdf calculation for successive observations would acquire the weighted delta function approximation of the posterior marginals obtained from the previous observations in sequence.

First, let $$$ \mathbf{x}={\left[s\kern0.3em u\kern0.3em v\right]}^{\mathrm{T}} $$$, where $$$ v $$$ is directly observed by $$$ y $$$ and $$$ s $$$ and $$$ u $$$ are unobserved state variables. Specify localization coefficients for random variable pairs {$$$ s,v $$$} and {$$$ u,v $$$} to be $$$ {\rho}_{s,v} $$$ and $$$ {\rho}_{u,v} $$$, respectively, and assume conditional independence between $$$ s $$$ and $$$ u $$$ given $$$ v $$$. The posterior pdf for the three‐variable problem can then be expressed as a product of three marginals:

Note that the third line of Equation (A1) holds for the standard PF, because knowledge of how $$$ u $$$ is sampled provides no new information for $$$ s $$$ when provided with knowledge of how $$$ v $$$ is sampled. By adopting a PF approximation for $$$ p\left(v|y\right) $$$ and using the strategy introduced in Section 2 for localizing the remaining conditional distributions in Equation (A1), we can write

Equation (A2) can be integrated with respect to $$$ s $$$ and $$$ v $$$ or $$$ u $$$ and $$$ v $$$ to arrive at the same form of posterior marginal as in Equation (6) for each unobserved variable. Note that this form requires that localization coefficients only need to be specified between an observation‐space variable $$$ v $$$ and unobserved model variables—as is the case for other ensemble filters that perform a joint observation–model space localization.

We can further show that $$$ p\left(s|u\right) $$$ adopts the same formulation as other localized conditionals, but with $$$ {\rho}_{s,u}={\rho}_{s,v}{\rho}_{u,v} $$$. Therefore, dependence across variables is modulated by products of localization coefficients under assumptions of conditional independence. To begin, start with the localized PF approximation for $$$ p\left(\mathbf{x}\right) $$$:

We can then integrate Equation (A3) with respect to $$$ v $$$ to get the following expression for the localized joint pdf for $$$ s $$$ and $$$ u $$$:

These steps result in the same equation that would be obtained if one were to modulate the conditional density $$$ p\left(s|u\right) $$$ using a localization coefficient of $$$ {\rho}_{s,u}={\rho}_{s,v}{\rho}_{u,v} $$$. This exercise has practical importance, as it can be used to compute estimates of the covariance matrix for the localized posterior, which is needed for some filters (e.g., Morzfeld *et al*., 2018).

For large applications, where calculations of $$$ \left\{{\omega}_j^1,\dots, {\omega}_j^{N_{\mathrm{e}}}\right\} $$$ may require $$$ O\left(1{0}^3\right) $$$–$$$ O\left(1{0}^5\right) $$$ products of small numbers, special care is needed to avoid underflow errors. In practice, the log form described in previous sections for $$$ {V}_{i,j}^n $$$ should be used for weights. We also recommend pre‐multiplying $$$ \ln \left({\hat{\omega}}_j^n\right) $$$ by a first‐guess $$$ {\beta}_j $$$ before transforming back through an exponential function to estimate a final $$$ {\beta}_j $$$. Following these steps, calculations of $$$ {\omega}_j^n $$$ require evaluating $$$ {\sum}_{i=1}^{N_y}\ln \left({\hat{\omega}}_{i,j}^n\right) $$$ at each grid point.

To reduce computation cost, we recommend multiplying $$$ {\hat{\omega}}_{i,j}^n $$$ by $$$ {N}_{\mathrm{e}} $$$ and applying a first‐order Taylor series approximation about $$$ {N}_{\mathrm{e}}{\hat{\omega}}_{i,j}^n-1 $$$:

The local PF adopts an additional mixing strategy in Equation (7) for increasing particle diversity between iterations. This approach operates by multiplying $$$ {\mathbf{r}}_1 $$$ by a user‐specified parameter $$$ \gamma $$$, where $$$ 0\le \gamma \le 1 $$$, and re‐recalculating $$$ {\mathbf{r}}_2 $$$ to fit the posterior variance for each variable. Ignoring indices for state variables and considering a single marginal variable $$$ x $$$, we start by setting

where prime terms indicate the posterior mean has been removed and $$$ {\sigma}^2 $$$ is the posterior variance estimated from importance weights. One solution for $$$ {r}_2 $$$ that satisfies this equation is the positive root of

The $$$ {r}_2 $$$ found from this equation is used in the particle update step and the posterior particles are recentering on the posterior mean estimated from importance weights.