This is an open access article under the terms of the Creative Commons Attribution‐NonCommercial License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited and is not used for commercial purposes.

Climate impact models often require unbiased point‐scale observations, but climate models typically provide biased simulations at the grid scale. While standard bias adjustment methods have shown to generally perform well at adjusting climate model biases, they cannot overcome the gap between grid‐box and point scale. To overcome this limitation, combined bias adjustment and stochastic downscaling methods have been developed. These methods, however, are single‐site methods and cannot represent spatial dependence. Here we propose a multisite stochastic downscaling method that can be applied to bias‐adjusted climate model output for generating spatially coherent time series of daily precipitation at multiple stations, conditional on the driving climate model. The method is based on a transformed truncated multivariate Gaussian model and can also be used to downscale to a full field at finer‐grid resolution. An evaluation for stations across selected catchments in Austria demonstrates the good performance of the stochastic model at representing marginal, temporal and spatial aspects of daily precipitation, including extreme events.

We develop a stochastic multisite method for downscaling bias‐adjusted climate model precipitation to the station scale. The method is based on a multisite transformed truncated Gaussian model. The method performs well for marginal, temporal and spatial aspects including extreme events.

**Funding information** Austrian Climate Research Programme, Grant/Award Number: B464795; European Cooperation in Science and Technology, Grant/Award Number: CA17109; Swiss Environment Agency, Grant/Award Number: Extremhochwasser Schweiz ‐ Grosse Einzugsgebiete

Models used to simulate the impacts of climate change often require unbiased meteorological input data at the station scale (Maraun and Widmann, 2018; Roessler *et al*., 2019). But climate models are typically biased compared to observations and provide grid‐box average values. Even state‐of‐the‐art regional climate models (RCMs) usually have a horizontal resolution not finer than some 10 km (Giorgi *et al*., 2009; Jacob *et al*., 2014). The spatial–temporal variability at station and grid‐box scale might be rather different, in particular in complex terrain and for highly variable elements such as precipitation.

Bias adjustment methods such as quantile mapping are widely applied as a pragmatic tool to adjust climate model biases in a postprocessing step. Furthermore, these methods are often used in an attempt to downscale, that is, to bridge the scale gap between the model output and the station scale (Maraun, 2016; Chen *et al*., 2018; Li and Babovic, 2019). But even though many bias adjustment methods generally perform well with skilful predictors (Teutschbein and Seibert, 2012; Dosio, 2016; Gutiérrez *et al*., 2019; Maraun *et al*., 2019b; Doblas‐Reyes *et al*., 2021), current methods cannot add unresolved small‐scale variability. Instead, they rather inflate grid‐box variability (Maraun, 2013). As a result, spatial correlation lengths are grossly overestimated (Widmann *et al*., 2019), areas of heavy precipitation and dry areas are too large (Maraun, 2013) and long‐term trends can be inflated (Maraun, 2013; Maurer *et al*., 2014; Maraun *et al*., 2019a; Casanueva *et al*., 2020). Moreover, local processes and feedbacks that might modify the local climate change signal are not captured (Maraun *et al*., 2017).

Trend preserving adjustment methods have been developed that address the problem of modified trends to different degrees of success (Li *et al*., 2010; Haerter *et al*., 2011; Hempel *et al*., 2013; Pierce *et al*., 2015; Switanek *et al*., 2017; Lange, 2019). Wong *et al*. (2014) developed a bias adjustment method that addresses the issue of unresolved small‐scale variability: in a combined bias adjustment and stochastic downscaling step, nonresolved small‐scale variability is stochastically generated. This approach, however, relies on climate model output being in synchrony with observations—a requirement which is not fulfilled by standard climate model simulations (Eden *et al*., 2012; 2014). Therefore, approaches have been proposed to separate the bias adjustment from the stochastic downscaling (Volosciuk *et al*., 2017; Lange, 2019): the bias adjustment is carried out between climate model output and gridded observations at the same spatial scale. A stochastic model is calibrated between the gridded and station observations and applied to downscale the adjusted model output to the station scale. A limitation of these models is that they are single‐site models and generate noise independently for different locations at subgrid scales (large‐scale structure is imprinted by the driving model). As a result, spatial dependence between multiple sites is in general misrepresented. But for many applications, a realistic representation of spatial dependence is crucial. For instance, distributed hydrological models require spatially covarying fields of precipitation as input to simulate realistically simulate runoff.

Different ways forward have thus been proposed to generate spatially coherent subgrid fields from bias‐adjusted model output. One research strand is based on resampling observations, such as the localized constructed analogs (LOCA) method (Pierce *et al*., 2014). The method resamples and scales observed daily fields to best match bias‐adjusted model output at large scales. The matching with large‐scale modelled fields imprints a coherent large‐scale spatial structure, but as the resampling is carried out individually for each location, subgrid spatial coherence is degraded. A possible alternative strand employs stochastic models, borrowed from multisite weather generators (Onof *et al*., 2000; Ferraris *et al*., 2003; Wilks, 2012; Chen *et al*., 2018; Maraun and Widmann, 2018; Li and Babovic, 2019). Yuan *et al*. (2019) developed a first implementation of such a model for daily temperature based on the Gaussian distribution and using an ARMA process to model temporal dependence.

Here we present a similar and easy to implement stochastic downscaling model for daily precipitation based on a truncated and transformed multivariate Gaussian distribution (Bárdossy and Plate, 1992; Maraun and Widmann, 2018). The model simulates spatially coherent daily precipitation fields conditional on gridded daily precipitation data. As temporal dependence of precipitation is much weaker than for temperature, it is not explicitly modelled but rather imprinted via the precipitation predictors. We apply the method to rain gauges across eight river catchments in Austria and present a thorough evaluation.

In section 2 we briefly present the data used in this study. Section 3 provides a detailed discussion of the new modelling approach. The application and evaluation of this approach is presented in section 4, followed by the discussion and conclusions in sections 5 and 6.

We apply our downscaling model to gridded precipitation data in order to generate multiple simulations of station data in eight different catchments across Austria. The stations that lie within the bounds of these catchments and their names are shown in Figure 1. The precipitation station data are obtained from the Austrian meteorological service (ZAMG) and from the platform *et al*., 2015) provided by the Austrian meteorological service (ZAMG). The observed gridded data cover the period 1961–2010. To test our methodology for bias‐adjusting and downscaling climate model simulations, we employed the EURO‐CORDEX (Jacob *et al*., 2014; 2020) simulation of DMI‐HIRHAM5 (Christensen *et al*., 1998), driven with the EC‐EARTH simulation r3i1p1 (Hazeleger *et al*., 2010) from the Coupled Model Intercomparison Project Phase 5 (CMIP5; Taylor *et al*., 2009) according to the representative concentration pathway RCP4.5. Also here, we use data over the period 1961–2010. This simulation is transient, that is, whereas the forcings are consistent with observations (apart from the last 5 years which follow the RCP4.5), internal variability on time scales from days to multiple decades is not in synchrony with observed variability. This setting is typical of climate change impact studies.

As we did not have complete temporal coverage for most stations, through a quality control effort, we located the greatest number of stations within each catchment that had the maximum number of days that overlap. This was necessary to effectively calculate our observed covariance matrices which are required for our methodology. The data coverage changes as a function of basin and month, and this ranges from 367 days of coverage for Pitten in July (367/(31 days × 50 years) = 24% of the days have data for all of our stations of interest) to 992 days of coverage for Obere Mur in January (64% coverage). As a result, we ultimately use between 367 and 992 days of data to estimate the respective covariance matrices in contrast to approximately 1,500 days, which is the total number of possible samples in a given month over the 50 year period of our observations.

As sketched in the introductory discussion, we develop a two‐step approach to provide bias‐adjusted climate model simulations, downscaled to a finer resolution. For the actual implementation of our model, several choices had to be made regarding bias adjustment and model structure. We present the method as we implemented it and discuss the main choices in section 5.

Because the stochastic downscaling model laid out below is calibrated on observed predictors and predictands, it is a perfect prog (PP) model (Maraun and Widmann, 2018). Typically, PP models take their predictors from large‐scale variables from the free atmosphere, such that the PP assumption, which the predictors are realistically and bias free simulated, is valid. Gridded precipitation, however, is a regional‐scale surface variable which is typically strongly biased in climate models and not realistically simulated by GCMs. Therefore, bias‐adjusting the climate model output prior to the stochastic downscaling is usually required (step 1). This step can in principle be conducted with any sensible bias adjustment approach. Here, we apply scaled distribution mapping proposed by Switanek *et al*. (2017), separately for each calendar month. Refer to section 5 for a discussion of the assumptions underlying bias adjustment. The focus of this study is the development of a stochastic model for the second step, that is, the downscaling of gridded data to spatially coherent fields or multiple sites.

Gridded precipitation—such as precipitation simulated by a climate model—represents an area average of the subgrid precipitation field. One way of simulating realistic subgrid precipitation fields given gridded precipitation (step 2) is to disaggregate the gridded fields, for example, by so‐called random cascade models (Schertzer and Lovejoy, 1987; Thober *et al*., 2014; Maraun and Widmann, 2018).

Here, we pursue a conceptually different way by assuming that the subgrid precipitation field stochastically varies according to a multivariate probability distribution conditional on the gridded precipitation field. We treat precipitation at a station *i* ∈ (1,…,*N*_{Y}) (the predictand) as a random variable *Y*_{i} with realizations (i.e., station observations or stochastic simulations) *y*_{i}. Similarly, precipitation at a grid box *k* ∈ (1,…,*N*_{X}) (the predictor, either gridded observations or climate model output) is a random variable *X*_{k} with realizations *x*_{k}. We join different stations and grid boxes into vectors $\mathbf{Y}={\left({Y}_{1},\dots ,{Y}_{{N}_{Y}}\right)}^{T}$ and $\mathbf{X}=\left({X}_{1},\dots ,{X}_{{N}_{X}}\right),$ respectively. Vectors of station and gridded observations are equivalently given as **y** = (*y*_{1},…,*y*_{N})^{T} and **x** = (*x*_{1},…,*x*_{K}). Time dependence is indicated, if required, by an additional index written as *t* ∈ (1,…,*T*).

In the following we will develop a stochastic model for **Y** conditional on **X** that can be used to downscale climate model output. The overall model allows for describing the local wet‐day probabilities, intensities and the overall intersite dependence.

We calibrate the model on observations *y*_{i}(*t*) and *x*_{k}(*t*) observed at times *t*, stations *i*, and grid boxes *k*, respectively. This stochastic model can then be used to simulate station precipitation fields *y*^{sim}(*t*) by downscaling given gridded precipitation fields *x*^{given}(*t*), for example, simulated with RCM data. As the parameters of the model may have a seasonal cycle, we calibrate a model for each calendar month separately. For the sake of simplicity, we do not introduce a notation specifying the individual months.

To model the intersite‐dependence, we employ a transformed multivariate Gaussian distribution, conditional on the predictors, similar to the model developed by Bárdossy and Plate (1992). The underlying assumption is that the intersite‐dependence is fully determined by transforming the individual precipitation variables to univariate Gaussian variables, and modelling the dependence between these variables solely by means of their covariance matrix. Thus, the simulation procedure is as follows:

To simplify the notation, we combine all *N*_{X} + *N*_{Y} precipitation variables into an *N* = *N*_{X} + *N*_{Y}‐dimensional vector **P**,

Each row of **P** corresponds to a different precipitation variable, the first *N*_{X} to a predictor, the following *N*_{Y} to a predictand.

We transform each variable *P*_{i} of **P** into an approximately Gaussian‐distributed variable ${\tilde{P}}_{i}$. Assuming that local and area average precipitation *Y*_{i} and *X*_{k} approximately follow a gamma distribution, the required transformation is very well approximated by the third root (Yang *et al*., 2005). Thus, we obtain the elements of the transformed predictor vector $\tilde{\mathbf{X}}$ simply as

Equivalently, the elements of the transformed predictand vector $\tilde{\mathbf{Y}}$ read

The overall transformed precipitation vector **P** then reads

We assume that the vector $\tilde{\mathbf{P}}$ of these individual variables follows a multivariate Gaussian distribution, such that the dependence is fully specified by the covariance matrix

This covariance is estimated for each calendar month separately to account for the seasonal cycle. The covariance matrix describes the inter‐dependence between the transformed predictors at different grid boxes, the inter‐dependence between the transformed predictands at different stations, and the inter‐dependence between transformed predictors and predictands. We estimate the entries of the covariance matrix from observations of the transformed predictor and predictand values ${\tilde{x}}_{k}\left(t\right)$ and ${\tilde{y}}_{i}\left(t\right)$.

In our model we make two key assumptions: first, we assume that precipitation including the extreme tail is well approximated by a conditional gamma distribution; and second, that the intersite dependence is fully determined by the covariance matrix of the transformed precipitation field. We will discuss these assumptions in section 5.

The purpose of the stochastic model is to simulate spatially coherent station precipitation fields ${y}_{i}^{\mathrm{sim}}\left(t\right)$ conditional on given grid box precipitation values ${x}_{i}^{\text{given}}\left(t\right)$. To explain the simulation procedure, we first explain how one can use the stochastic model to simulate random predictor and predictand fields with inter‐dependencies according to the covariance matrix ∑_{ij}.

This is performed by simulating a correlated Gaussian random field ${\tilde{p}}_{i}\left(t\right)$ at times *t*, and transforming it back using the inverse of Equation (3). To simulate the random field, we employ the Cholesky decomposition of the covariance matrix, separately for each calendar month. The Cholesky matrix is a lower triangular matrix **L**, defined such that **LL**^{T} = ∑ (where **L**^{T} is the transposed matrix of **L**) and can be calculated using standard mathematical software. In our case we have used the Python package **numpy**.

First, we simulate a random field of independent standard Gaussian distributed values **r**_{t} = (*r*_{1}(*t*),…,*r*_{N}(*t*)). By applying the matrix **L** at each day *t* to the vector **r**(*t*) it is possible to simulate a field of multivariate Gaussian samples ${\tilde{p}}_{i}\left(t\right)$ with prescribed covariance ∑ (and *μ*_{i} = 0) as

This approach corresponds to randomly sampling each variable ${\tilde{p}}_{i}\left(t\right)$ as a linear combination of the previous variables ${\tilde{p}}_{1}\left(t\right),\dots ,{\tilde{p}}_{i-1}\left(t\right)$. The parameters of the linear combination are given by the lower triangular Cholesky matrix **L**. In particular, these parameters are the same that one would obtain by fitting a multivariate linear regression model between the dependent variable ${\tilde{P}}_{i}$ and the independent variables ${\tilde{P}}_{1},\dots ,{\tilde{P}}_{i-1}$, using a least‐squares estimator.

To simulate precipitation fields conditional on predictors, we modify the unconditional sampling methodology. The conditioning allows for obtaining the downscaled predictands ${\tilde{\mathbf{y}}}^{\mathrm{sim}}\left(t\right)$ at time *t* from the given predictors ${\tilde{\mathbf{x}}}^{\text{given}}\left(t\right)$. From Equations (4) and (6) follows that

In our case, ${\tilde{\mathbf{x}}}^{\text{given}}\left(t\right)=\left({\tilde{x}}_{1}^{\text{given}}\left(t\right),\dots ,{\tilde{x}}_{{N}_{X}}^{\text{given}}\left(t\right)\right)$ is given as (transformed) predictor vector. Our aim is to constrain the simulation on this vector. That is, instead of randomly sampling a vector **r**_{X}(*t*), we need to calculate ${\mathbf{r}}_{X}^{\text{given}}\left(t\right)$ from the given ${\tilde{\mathbf{x}}}^{\text{given}}\left(t\right)$. In fact, this vector can easily be calculated solving the first *M* equations of the linear system in Equation (6) (note that **L** is a lower triangular matrix). Solving Equation (6) yields*i* ∈ (1,…,*N*).

The simulation of the predictands ${\tilde{\mathbf{y}}}^{\mathrm{sim}}\left(t\right)$ at time *t* is then obtained by simulating a *N*_{Y}‐dimensional sample of uncorrelated standard normal values ${\mathbf{r}}_{Y}^{\mathrm{sim}}\left(t\right)$, taking the given ${\mathbf{r}}_{X}^{\text{given}}\left(t\right)$ obtained via Equation (8), and evaluating the last *N*_{Y} rows in Equation (6) (i.e., those related to the predictands) as

We get the complete time series of the simulated (transformed) predictands ${\tilde{\mathbf{y}}}^{\mathrm{sim}}\left(t\right)$ via repeating the simulation Equation (9) for all days *t*.

All simulated predictand values ${\tilde{y}}_{i}^{\mathrm{sim}}\left(t\right)$ are transformed back to ${y}_{i}^{\mathrm{sim}}\left(t\right)$ by inverting Equation (3) as

Finally, the resulting field is truncated, that is, we set ${y}_{i}^{\mathrm{sim}}\left(t\right)=0$ if the simulated value is negative.

The statistical properties of the simulated multidimensional time series will typically deviate from those of the observed time series. These differences may partly be a random effect and thus represent local internal climate variability, and partly be biases resulting from the model structure. Depending on the purpose of the application, one may therefore wish to adjust the remaining discrepancies with three additional steps, separately for each calendar month.*i* ∈ (1,…,*N*_{Y}). In the event that the validation period of record is of a different length than the calibration period, the quantile‐based multiplicative bias adjustment function is first interpolated to the desired length of the validation record.**C** of **P**^{obs} and the simulated cross correlation matrix **R** of **P**^{sim}, where the **X**^{sim} component of **P**^{sim} (i.e., the gridded predictor field) is identical to **X**^{obs}. Then, decompose matrix **C** via singular value decomposition (SVD) such that**S**_{1} is the diagonal matrix of singular values and **U**_{1} = **V**_{1} if **C** has non‐negative eigenvalues. Similarly, decompose matrix **R** such that**S**_{2} is the diagonal matrix of singular values and **U**_{2} = **V**_{2} if **R** has non‐negative eigenvalues. Calculate the square root matrix of the decomposition of **C** as**R** as

Then, **F** is defined as

And finally, we use**Y**^{rcsim} component of **P**^{rcsim}, which is our simulated predictand values where the bias in the correlation matrix has been effectively removed.*y*_{i}(*t*), and ${\overline{y}}_{i}^{\text{rcsim}}$ is the mean of the simulated ${y}_{i}^{\text{rcsim}}\left(t\right)$.

In the prior section, we outlined a methodology to produce stochastic simulations of precipitation at the station scale conditioned by observed precipitation at the gridded scale (a so‐called perfect predictor setting). Our aim is to apply this methodology to climate model simulations, both in present and future climate.

In principle, one could directly transfer the stochastic model calibrated to observed predictor and predictand data to climate model output. In general, however, climate model output is biased compared to observational data. A key element of our two‐step approach is therefore an adjustment of biases in the RCM‐simulated gridded‐precipitation predictors. The underlying assumption of this adjustment is further discussed in section 5. Marginal distributions are adjusted using scaled distribution mapping (Switanek *et al*., 2017). But also the spatial dependence of RCM‐simulated precipitation is typically biased compared to real‐world precipitation fields. Simply replacing the predictor component of the observed covariance matrix with the RCM predictor covariance matrix would in general produce local precipitation fields with a biased spatial structure. When the RCM overestimates the observed covariance matrix, the spatial extent of events are systematically too large, and vice versa. This bias in the size of events will thus have adverse implications for phenomena that are aggregated over space and time, such as streamflow discharge.

Prior to using Equation (8) to calculate ${\mathbf{r}}_{X}^{\text{given}}\left(t\right)$, we therefore adjust biases in the RCM predictor correlation matrix. This adjustment is performed using Equations (11)–(16) where, in this case, **C** would be our observed cross correlation matrix for the predictors only, and **R** would be our RCM cross correlation matrix. When the RCM underestimates/overestimates the observed covariance matrix, the spatial extent of events are systematically too small/large. The closer the original RCM correlation matrix is to the observed predictor correlation matrix, the smaller the adjustment will be to the original RCM data. In case of very large discrepancies one should reassess whether the chosen climate models is at all fit for representing grid‐scale precipitation variability. Once the correlation bias of the RCM has been adjusted, we can continue with Equation (8) to find ${\mathbf{r}}_{X}^{\text{given}}\left(t\right)$ provided our recorrelated RCM precipitation predictor data ${\tilde{\mathbf{x}}}^{\text{given}}\left(t\right)$.

In a climate change context, one needs to assume time invariance of the chosen statistical model (Wilby *et al*., 2004; Maraun and Widmann, 2018). This assumption is further discussed in section 5. Note that in this study, we do only apply and evaluate the model under present climate conditions.

A key aspect of this paper is to assess the performance of the downscaling methodology. We conduct the evaluation in two different settings: First, we apply a perfect predictor setting, where we use observed gridded precipitation as predictors. To assess out‐of‐sample performance, we use a split‐sample cross‐validation. Half of the days are randomly chosen to calibrate the model and used to simulate the other half of the days in an evaluation period. In this setting, we fully isolate the performance of the stochastic downscaling model. Second, we use RCM‐simulated precipitation to represent a typical application setting. Here, the performance at simulating local precipitation fields does not only depend on the stochastic downscaling model, but also on the performance of the driving RCM. In both settings, a total of 50 simulations of downscaled precipitation station data are generated independently for each month and each catchment.

A number of diagnostics are used to assess the performance of the proposed downscaling methodology, specifically for temporal and spatial dependence as well as extreme events. These aspects are measured using the following indicators: (a) mean, (b) consecutive dry days, which is the average number of days between precipitation events above a threshold (in our case, 1 mm), (c) Pearson correlation coefficients between all pairs of predictors and predictands (Spearman correlation is covered in Appendix), (d) tail dependence, which is the probability that when one predictor/predictand is extreme another predictor/predictand will also be extreme (we calculate tail dependence using the 98% nonexceedance level), and (e) joint wet day probability, the probability that when one predictor/predictand exceeds a threshold another predictor/predictand also exceeds this threshold (we use 1 mm).

Here, we present evaluation results for the perfect‐predictor setting, where we simulate observed station precipitation conditioned on observed gridded precipitation. We begin with an illustration of model performance for selected stations, followed by a more comprehensive assessment for all considered catchments.

Figure 2 illustrates the random sampling of station precipitation (predictands), conditional on a set of gridded precipitation values (predictors), for the Obere Mur catchment in July. While the predictors are identical for both days (top/bottom rows), the station values are randomly sampled. Yet the values are not sampled individually for each station, but correlated in space.

Figure 3 further illustrates the observed (shading) and simulated (dots) pairwise dependence for the Ill Sugadin catchment in the month of July. The top row shows the predictor/predictand dependence for selected pairs, the bottom row the dependence between selected station pairs. By visual inspection, the distribution of simulated values matches well the bivariate histogram of observed values, indicating a good performance at representing both predictor–predictand and spatial dependence.

In Figure 4, we illustrate the univariate performance for each station in the Pitztal catchment in July. The quantile–quantile (QQ) plots indicate whether different quantiles of the simulated time series deviate systematically from the corresponding observed quantiles. While the randomness of station precipitation creates random deviations from the main diagonale, the QQ‐plots of all 50 simulations scatter randomly around the main diagonale, indicating that the simulated univariate distributions are essentially unbiased.

In the following we present an overall evaluation across all catchments and calendar months, using the diagnostics introduced in section 3.5. To begin with, we illustrate these diagnostics for the Obere Mur catchment in January in Figure 5. The corresponding indicators are derived for a randomly chosen validation period for one simulation. Visually, we find good agreement between the simulated and observed indicators for this example.

Figures 6–11 present summary plots for these diagnostics across all catchments and calendar months and all 50 simulations. Figure 6 shows the mean bias of the average simulated station precipitation amounts with respect to observed station precipitation. Given the random components of the model, the indicators for any one given simulation cannot be seen as statistically robust. Therefore, we average the indicators (mean in the case of Figure 6) across the 50 simulations at each station. We see in Figure 6 the mean bias of our model for our eight catchments and for the 12 months of the calendar year. Averaged across the catchments and the months, there is approximately a 1% overestimation of our mean. Intuitively, a modest overestimation makes sense, given the positive skewness of daily precipitation data. We can simulate precipitation extremes that reside further out into the tails of our distributions, and as a result, that can lead to a modest overestimation of our mean with respect to observations.

In the following we evaluate temporal dependence. We do not consider dependence in precipitation amounts, but focus on precipitation occurrence. Day‐to‐day dependence in the occurrence process can be measured by transition probabilities (see Appendix A). For climate impact studies, long dry spells are particularly relevant. Figure 7, which is conceptually identical to Figure 6, therefore shows the performance at representing consecutive dry days. Even though our model does not explicitly simulate temporal dependence at the station scale, the simulated indicators closely match the observed indicators. This result demonstrates that our assumption, that the dependence in the predictor time series imprints a reasonable dependence at the station scale, is justified.

Figures 8–10 are similar to Figure 6, but summarize bivariate diagnostics. All simulated indicators are found to closely reproduce the observed indicators for Pearson correlation (Figure 8), tail dependence (Figure 9) and joint wet day probability (Figure 10). As the Pearson correlation is explicitly modelled in our stochastic downscaling model by the covariance matrix, the result for this indicator is expected. Notably, even though neither tail dependence nor joint wet day probability are explicitly modelled, our model performs well also for these indicators.

In Figure 11, we use QQ plots to show how well the distribution of catchment averaged precipitation is simulated. We compare the simulated versus observed precipitation, where the precipitation is averaged across all simulations and across all stations for each catchment in each month. We do not see any systematic over‐ or underestimation of any parts of the simulated distribution with respect to observations, indicating that our stochastic model well represents the overall multisite dependence across a full catchment, also for high quantiles.

Here, we present evaluation results for simulations conditioned on RCM predictors. The stochastic model is calibrated to observed predictors as discussed in section 3.4, but applied to precipitation simulated by the DMI‐HIRHAM5 RCM, driven with the EC‐EARTH simulation r3i1p1. This setting represents the typical application in a climate change context. In this setting, the performance at representing observed station characteristics not only depends on the stochastic downscaling model, but also on the performance of the driving RCM. Thus the results are specific to the chosen RCM and should be interpreted as an illustration only. Here, we do not show mean biases as these are calibrated and thus trivially match observations.

Examples of different observed covarying time series are shown alongside two simulated covarying time series in Figure 12. One can see good agreement in how the observed and simulated time series covary. We should point out that there are approximately twice the number of simulated values (for each simulation) than there are observations. This is due to the quality control discussed in section 2; we do not have 100% data coverage in our observed station data in the 1961–2010 period, while we do have 100% data coverage in the RCM.

For consecutive dry days (Figure 13), we do see various systematic over‐ and underestimations. This is visually observed and further quantified by the black numbers in the bottom right of the subplots. These discrepancies are substantially larger than those for the perfect predictor setting (Figure 7), indicating that RCM biases deteriorate the representation of this indicator. This hypothesis is confirmed by an analysis of the corresponding discrepancies in the predictor data, that is, the ratio between consecutive dry days in the RCM‐simulated and observed gridded precipitation. For example, one can see the black and red numbers, respectively, are 1.20 and 1.21 for Ill Sugadin in June. This means that our bias‐corrected RCM overestimated the number of consecutive dry days by 21% (as a spatial average at the grid level), and as a result we observe a 20% overestimation in the consecutive dry days in the simulations (as a spatial average at the station level). The correlation between the black and red numbers across the eight catchments and 12 months amounts to *r* = 0.91 (*p* < .001), demonstrating that much of the discrepancies at the station scale can be explained by corresponding discrepancies of the driving RCM at the grid cell scale.

Similarly to Figure 13, Figures 14–16 show the simulated pairwise indicators with respect to observations: Pearson correlation, tail dependence and joint wet day probability, respectively. The Pearson correlation between pairs of station time series is explicitly modelled and additionally bias‐adjusted for the RCM simulations. Thus, the close agreement between model and observations (Figure 14) is expected by construction. Tail dependence (Figure 15), however, is neither modelled explicitly nor bias‐adjusted in the RCM predictor data. Accordingly, the representation is substantially degraded compared to that in the perfect predictor setting (Figure 9), which reflects again the relevance of RCM biases in representing tail dependence. This reasoning is confirmed, again, by a comparison of the station discrepancies (black numbers in each panel) with the corresponding predictor discrepancies (correlation between black and red numbers: *r* = 0.89, *p* < .001). The same argument holds for joint wet day probabilities (Figure 16, correlation between black and red numbers: *r* = 0.89, *p* < .001).

As in Figure 11, Figure 17 depicts QQ plots to show how well the distribution of catchment averaged precipitation is simulated, but now for RCM‐conditioned simulations. Despite the systematic discrepancies in the representation of pairwise tail dependence and joint wet day probability (Figures 15 and 16), total catchment precipitation is reasonably well represented, demonstrating the overall skill of our stochastic downscaling model at representing spatial precipitation patterns from dry‐days to heavy rain showers, even with RCM predictor data.

A main motivation to use a (transformed) truncated Gaussian model without explicitly simulating the temporal dependence structure was to provide a simple to implement and computationally cheap model to be readily applicable in practice. Additionally, we applied several bias adjustment steps to provide approximately unbiased climate model simulations for impact modelling. Here we discuss the assumptions underlying these aims.

First, when transforming precipitation data to a normal distribution using the third root, we implicitly assume that precipitation is well approximated by a conditional gamma distribution Yang *et al*. (2005). Accordingly, when simulating station precipitation from a normal distribution without conditioning on predictors, the resulting values after backtransformation would by construction be gamma distributed (Katz, 1977; Wilks, 2010). In this case, the tail of the distribution would not be explicitly modelled, for example, to represent a heavy tail. This choice would in general result in a misrepresentation of simulated heavy precipitation events. But in our downscaling model, station precipitation is simulated conditional on predictors representing area average precipitation. As demonstrated and consistent with findings by Yang *et al*. (2005), the combination of the (approximate) Gamma distribution with the influence of the predictors yields realistic magnitudes of extreme rainfall.

Second, all (residual) spatial dependence is modelled by the covariance matrix. In particular no specific model for tail dependence has been implemented. In principle, this may limit the performance, because precipitation typically does exhibit tail dependence: if precipitation at one site is extreme, the probability that precipitation at neighbouring sites is extreme is increased. An unconditional multivariate Gaussian model is not able to simulate such dependence. But again, the conditioning on predictors imprints an overall tail dependence on the resulting simulations: if area average precipitation is extreme, the probability that neighbouring stations within the area exhibit extreme precipitation is simultaneously increased.

Third, temporal dependence is not explicitly modelled, neither the dependence of precipitation occurrence nor that of precipitation amounts. In general, one may therefore expect an underestimation of temporal dependence, in particular for long dry spells. But also here, the conditioning on predictors imprints a realistic temporal dependence in the occurrence process, including long dry spells. Dependence in precipitation amounts has not been evaluated here. A possible additional evaluation could consider the auto‐correlation function (Chen *et al*., 2018; Li and Babovic, 2019), although the correlation mixes information about occurrence and amounts, and is strongly dominated by the occurrences.

Fourth, the method involves bias adjustment of climate model predictors and the resulting simulations at several steps. In fact, bias‐adjusting the climate model output is a core element of the two‐step approach (Volosciuk *et al*., 2017). While bias adjustment is widely used and has often proven to be a sensible pragmatic choice, its sensibility ultimately depends on the chosen approach, the underlying climate model errors, and the characteristics to be adjusted (Maraun, 2016; Maraun *et al*., 2017; Doblas‐Reyes *et al*., 2021). Improper bias adjustment may introduce artefacts and even deteriorate the raw simulations (Maraun *et al*., 2017; Doblas‐Reyes *et al*., 2021). Adjusting marginal distributions by univariate methods such as quantile mapping is often recommended, in particular when critical thresholds are relevant (Dosio, 2016). In our case we use scaled distribution mapping (Switanek *et al*., 2017) to ensure that RCM‐simulated long‐term trends are not altered. Adjusting biases in the spatial dependence structure requires more care, as any such adjustment will change the temporal dependence structure (Maraun, 2016). This limitation holds both for the adjustment of the covariance matrix of RCM precipitation as well as for the recorrelation of the final simulated time series. Here, advantages and disadvantages should be carefully balanced for the individual application (François *et al*., 2020). Overall, we recommend a careful selection of the driving climate models, based in particular on circulation biases relevant for the region and phenomena of interest (Maraun *et al*., 2017; 2021). One could in principle also adjust for climate model biases in the representation of temporal and tail dependence, but here the sensibility is even more questionable (Maraun, 2016; François *et al*., 2020; Doblas‐Reyes *et al*., 2021).

Additionally, our model has limitations due to the choice of gridded precipitation as predictor of local precipitation. In principle, the raw climate model climate change signal may not be representative of the local climate change signal, because it may not represent subgrid effects (Maraun *et al*., 2017). But the signal in local precipitation changes takes its information purely from the gridded model precipitation, that is, it inherently cannot account for such subgrid changes. This effect may be relevant when convection plays an important role as a rainfall generating process such as in localized thunderstorms (Kendon *et al*., 2016). But this effect is mainly important at the subdaily time scale.

We develop and apply a two‐step bias adjustment and spatial stochastic downscaling method. The main purpose of this method is to generate unbiased local spatial precipitation fields, consistent with a driving climate model. The core element of this method is a truncated Gaussian model, which describes the dependence between coarse‐resolution precipitation fields used as predictors and local rain gauges (or fine‐resolution grid points) used as predictands, as well as the spatial dependence between these local gauges. The method is conceptually similar to the single‐site method proposed by Volosciuk *et al*. (2017), but crucially models also spatial dependence. We evaluate this model for rain gauges in selected river catchments of Austria.

With observed predictors, the model performs excellent for a range of diagnostics representing marginal, temporal and spatial aspects for moderate and extreme precipitation. The good performance holds in particular also for those quantities that are not explicitly calibrated, namely the number of consecutive dry days, tail dependence and joint wet day probabilities. Total catchment precipitation is well represented across the full observed distribution including extreme intensities. Using RCM‐simulated precipitation as predictors, the performance is still reasonable, but in particular the representation of consecutive dry days, tail dependence, and joint wet day probabilities is deteriorated because of biases in the corresponding predictor data. However, even in this setting the overall catchment precipitation is again well represented across the full observed distribution including extreme intensities. These findings demonstrate the suitability of our model for use in climate change impact studies, where spatial dependence across a wide range of intensities is important.

The method solves important problems in providing localized spatially coherent precipitation fields and optimally combines the strengths of dynamical downscaling and statistical modelling: regional climate models used for dynamical downscaling explicitly represent mesoscale atmospheric dynamics, but are usually biased compared to observational data. Importantly, also the resolution of these models is usually limited to coarser than 10 km, such that localized orography and extreme events are not well represented. Classical so‐called perfect prognosis statistical downscaling approaches may be calibrated against local observations, but do not represent mesoscale dynamics and may thus miss important drivers of long‐term changes. Bias adjustment has been designed to pragmatically remove biases in climate model output, but fails to bridge major scale gaps between coarse‐resolution model output and local target variables (Maraun, 2013; Maraun *et al*., 2017; Maraun and Widmann, 2018). Our method presented here may be applied to (regional or global) climate model output, adjusts for biases and enables downscaling to finer spatially coherent fields. It shares conceptual ideas with the method for temperature by Yuan *et al*. (2019), but is designed for generating precipitation fields. The method can be considered a conditional weather generator and shares all the advantages of this approach. In particular it can be used to create ensembles of time series conditional on coarse scale observations or climate model output. As such the method can generate unobserved local and spatially coherent precipitation events to inform climate risk assessment and management. In general we are convinced that two‐step approaches combining bias adjustment and stochastic downscaling are a powerful avenue for providing localized climate change scenarios for impact modelling. Of course more complex alternatives to our truncated Gaussian model could be used to better represent spatial extreme events. Similarly, one could envision to directly model temporal dependence by including precipitation at previous days as predictors (Yang *et al*., 2005). Such an extension could be required if, for a given application, temporal dependence was determined by local water recycling rather than by large‐scale weather. Finally, one could implement other and more complex bias adjustment methods, but one should always keep the limitations of these methods in mind (Maraun *et al*., 2017). An interesting research strand could be the use of nonprecipitation predictors to represent subgrid changes.

We acknowledge funding from the Austrian Climate Research Programme (DEUCALIONII, B464795), the Swiss Environment Agency (project “Extremhochwasser Schweiz – Grosse Einzugsgebiete”) and the University of Graz. D.M. and E.B. acknowledge the European COST Action DAMOCLES (CA17109). Observational data have been provided by the Austrian meteorological service (ZAMG) and the platform

**Matthew Switanek:** Data curation; formal analysis; investigation; methodology; software; validation; visualization; writing – original draft. **Douglas Maraun:** Conceptualization; funding acquisition; project administration; supervision; writing – original draft. **Emanuele Bevacqua:** Methodology; writing – review and editing.

Figure 18 shows the representation of the temporal dependence of precipitation occurrence by our model as measured by day‐to‐day transition probabilities. For the four diagnostics, the estimated mean absolute error across all simulations, calendar months and stations is about 1% or smaller.

In Figures 19 and 20 we illustrate the sensitivity of our dependence analysis to the choice of Spearman correlation versus Pearson correlation which was used in the main paper.