This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.

Broadband (spectrally‐integrated) radiation calculations are dominated by the expense of spectral integration, and many applications require fast parameterizations for computing radiative flux. Here we describe a novel approach using a linear weighted sum of monochromatic calculations at a small set of optimally‐chosen frequencies. The empirically‐optimized quadrature method is used to compute atmospheric boundary fluxes, net flux profiles throughout the atmosphere, heating rate profiles, and top‐of‐the‐atmosphere forcing by CO_{2}, in the longwave for clear skies. We evaluate the method against two modern correlated *k*‐distribution models and find that we can achieve comparable errors with 32 spectral points. We also examine the effect of minimizing different cost functions, and find that in order to accurately represent heating rates and CO_{2} forcing, these quantities must be included in the cost function.

Quantifying the way radiation flows through the atmosphere is computationally expensive, and most applications require fast approximation. In this paper, we develop an empirically‐optimized quadrature method that can compute quantities of interest in climate science and weather prediction such as radiative flux throughout the atmosphere, heating rates, and forcing by carbon dioxide. The method uses a simple optimization algorithm, combined with a linear model, to achieve accuracy and computational cost of similar order of magnitude to the modern correlated *k*‐distribution models. Depending on the focus of the application, the algorithm can easily be adapted to prioritize desired quantities.

Spectrally integrated radiative quantities can be estimated from a small weighted set of monochromatic calculations

Tradeoffs between computational cost and accuracy are commensurate with correlated *k*‐distribution models

Optimization priorities can be tailored to a specific applications

Radiation is the mechanism by which the Earth system achieves energy balance; radiative heating and cooling within the atmosphere also play a crucial role in setting the temperature profile of the atmosphere (Manabe & Wetherald, 1967). The strength of the interactions between radiation and the atmosphere depends strongly on frequency because the optical properties of gases, fundamentally determined by their molecular structure along with local temperature and pressure, vary rapidly in frequency by many orders of magnitude. (The optical properties of condensed materials such as clouds and aerosols vary much more slowly with frequency and are not considered here.)

The optical properties of the gases in the atmosphere can be determined at given frequencies based on detailed spectroscopic information describing each absorption line for each species (e.g., HITRAN, Gordon et al., 2022). The spectrally‐integrated (broadband) exchange of radiation can be determined from this information to near‐arbitrary accuracy by direct integration at very high spectral resolution. Results from line‐by‐line models taking this approach (along with high angular resolution) may be considered reference calculations (Pincus et al., 2020).

Highly detailed calculations are, however, impractical in applications such as the calculation of narrow‐band radiances in satellite channels. In these contexts the radiative transfer integral is approximated using parameterizations. One such optimization parameterizes radiative transfer via linear regression (or, more recently, neural networks, Stegmann et al., 2022) against atmospheric and surface conditions, such as the The Joint Center for Satellite Data Assimilation's Community Radiative Transfer Model (Johnson et al., 2023). In the spectral dimension, radiances can be predicted by sparsely sampling frequencies and integrating via a weighted mean, thereby reducing the number of monochromatic radiative transfer calculations required (Buehler et al., 2010; Moncet et al., 2008).

In dynamical models of the atmosphere, radiation calculations must similarly be fast and accurate. The state‐of‐the‐art method for reducing spectral resolution is the correlated *k*‐distribution (CKD) algorithm (Fu & Liou, 1992; Goody et al., 1989; Lacis & Oinas, 1991), which sorts frequencies according to their absorption coefficient *k* and determines the average at some set of quadrature points. The sorting and averaging creates a much smoother integral that can be evaluated with just tens or hundreds of quadrature points, as compared to the millions of points required by line‐by‐line models. These parameterizations seek to minimize the number of these quadrature points, called *g*‐points, to limit computational cost (Hogan & Matricardi, 2022).

There is, however, no theory underlying *k*‐distributions, so developers must make a range of somewhat arbitrary choices (Hogan & Matricardi, 2020a). Moreover, the common experience is that the absorption coefficients determined with any algorithm need to be tuned to match reference calculations (Hogan & Matricardi, 2022; Pincus et al., 2019), making the link between the reference and approximate calculations somewhat obscure.

Here we describe an approximation to spectral integration in the longwave consisting of a weighted sum of monochromatic fluxes computed at a small set of discrete wavenumbers. We optimize both the members of this set and the linear weights to find the best estimate of the broadband integrated flux. The idea builds on approaches for computing narrow‐band radiances originally developed for remote sensing applications (Buehler et al., 2010; Moncet et al., 2008) to sample the spectrum sparsely rather than averaging absorption coefficients. While our work is in the same spirit as de Mourgues et al. (2023), we take a more general approach and treat a wider range of applications including heating rates, boundary fluxes, and CO_{2} forcing. Our data is discretized more finely in the spectral dimension and more sparsely in the vertical, which prompts us to develop a cost function that will allow us to extrapolate the integration to atmospheres of different structures. We seek not only to develop a novel method, but also to understand any inherent properties and tradeoffs of the approach. To this end, we explore the relationship between the quantities targeted in the optimization and individual error profiles, test the effect of the vertical discretization of the training data set, and compare the performance of the empirically‐optimized quadrature method to state‐of‐the‐art correlated‐*k* methods. We find that weighted, sampled monochromatic calculations can predict fluxes, heating rates, and CO_{2} forcing with error‐cost relationships commensurate with modern correlated‐*k* models, while being flexible to the needs of a given application.

The net broadband radiative flux at height *z* in the atmosphere, *F*(*z*), is given by integrating monochromatic fluxes across all wavenumbers:*ν* denotes wavenumber and *F*_{ν}(*z*) is monochromatic net flux. We develop the method for the longwave in clear skies, so we disregard scattering.

Rather than calculating the line‐by‐line integral directly, we seek to compute the broadband net flux from a small subset of representative wavenumbers. This breaks the problem into two parts:

Optimally select the representative wavenumbers, and

Compute the net integrated flux from this subset.

Inspired by approaches originally developed for remote sensing applications (Buehler et al., 2010; Moncet et al., 2008), we select the representative subset via a combinatorial optimization algorithm called simulated annealing (Kirkpatrick et al., 1983), and calculate the net flux by performing a weighted sum of monochromatic fluxes found at the representative wavenumbers. In other words, we rewrite our problem as:*w*_{ν} is the weight corresponding to wavenumber *ν* in the sparse subset *S*, and *F*_{ν}(*z*) is the monochromatic flux found at wavenumber *ν* at height *z*. Though this is similar to the formulation of *k*‐distributions (e.g., Equation 3 in Lacis and Oinas (1991)), here we are sampling monochromatic fluxes rather than averaging absorption coefficients. Furthermore, we sample across the entire longwave spectrum without imposing a band structure (cf. the Full‐Spectrum Correlated‐*k*, Hogan, 2010; Modest & Zhang, 2002).

In order to spectrally sample *F*_{ν∈S} and choose *w*_{ν}, we need accurate, finely‐resolved line‐by‐line spectra at arbitrary frequencies, as well as corresponding integrated fluxes and heating rates throughout the atmosphere. Atmospheric profiles used for training must be independent and provide a representative sample of conditions for a given applications.

We use the set of atmospheric conditions developed for the Correlated *K*‐Distribution Model Intercomparison Project (CKDMIP; see Hogan & Matricardi, 2020a). The CKDMIP profiles are chosen from reanalysis data to represent the mean state as well as minimum and maximum extremes in water vapor concentration, ozone concentration, and temperature for the present day climate. We use software provided by CKDMIP to generate 100 independent present‐day, clear‐sky reference flux profiles, half for training and half for testing, integrated in four hemispheric angles. Heating rates are computed from fluxes using a simple finite difference. Finally, we perturb the CO_{2} concentration of the 100 profiles to generate reference CO_{2} forcing values for the last glacial maximum (LGM) (180 ppm), pre‐industrial (PI) (280 ppm), present day (∼415 ppm), 2× PI (560 ppm), 4× PI (1,120 ppm), 8× PI (2,240 ppm) (Hogan & Matricardi, 2020a), and 16× PI (4,480 ppm) for extreme paleoclimate scenarios (Bice & Norris, 2002).

Given a subset of spectral points *S* of predetermined size |*S*|, we must obtain a set of weights such that the weighted sum of fluxes at those points predicts the broadband net flux across the spectrum. The weights *w*_{ν} are found by minimizing an objective function subject to several constraints.

Our first constraint is that all *w*_{ν} ≥ 0, since quadrature weights are always positive. Positive weights lend the method stability, as they prevent monochromatic fluxes from canceling (Buehler et al., 2010; Moncet et al., 2008). Second, the weights are normalized so that they sum to the length of the spectral domain, a general property of integration:

Next, we must identify an optimal subset at which to compute the predicted flux. We use simulated annealing, a stochastic optimization algorithm that can combinatorially evaluate different subsets in order to find a minimum in a given cost function (Kirkpatrick et al., 1983). The algorithm is guaranteed to converge asymptotically to a global minimum in cost function with probability 1 (Dekkers & Aarts, 1991), and can do so at much lower computational cost than a brute force approach for a large state space.

Buehler et al. (2010) provides a useful tutorial on algorithm implementation, and we follow most of their choices, except as follows. In our implementation of the algorithm, as in theirs, optimization is organized in sets of moves toward a minimum that the cost function value is averaged over (Buehler et al., 2010). However, our blocks consist only of 100 moves of which at least one must be accepted; otherwise, the algorithm is considered to have converged to a minimum and is terminated.

Because the optimization process is stochastic, we perform 16 independent optimizations for each subset size and plot the standard deviation away from the mean error in our results. We refer to a given set of wavenumbers and corresponding weights as a model.

As we are interested in optimizing multiple radiative quantities—fluxes, heating rates, and top‐of‐the‐atmosphere CO_{2} forcing—we take inspiration from multi‐objective optimization in the formulation of our cost function (Yang, 2014). Given multiple optimization targets *F* (flux), *H* (heating rate), and *c* simultaneously is to minimize their weighted sum:

Each *f*_{i} implicitly has the inverse units of its corresponding **c** to make the cost function unitless. Choosing the values of the weights *f*_{i} is considered a subjective task with many valid solutions (Yang, 2014). In our case, the approach is as follows: first, we fix *f*_{2} = 1 and *f*_{3} = 0 as we vary *f*_{1}, to focus on the tradeoff between heating rates and fluxes. For each value of *f*_{1}, we fit linear weights to 100 random subsets of size 32 at every fifth vertical level for each of the 50 training atmospheres. The maximum relative (percent) error across vertical levels is recorded for each of the 100 iterations. The mean of these 100 iterations is plotted, where any random subset that produces an ill‐conditioned problem (noninvertible data matrix) is ignored. The results of this experiment are shown in Figure 1a. The gray line indicates the parameter value *f*_{1} = 0.15 that is chosen, as it occurs where heating rate errors level off.

In Figure 1b, the same procedure is repeated while holding *f*_{1} = 0.15 and *f*_{2} = 1 fixed and varying *f*_{3}. We choose parameter *f*_{3} = 1 in order to minimize increase in heating rate and flux errors.

We illustrate the method by training the algorithm only on the outgoing atmospheric boundary fluxes: the upwelling flux at the top of the atmosphere and the downwelling flux at the surface. These fluxes can be useful for diagnostic calculations such as radiative forcing or the cloud radiative effect, and can help understand the Earth's energy balance. The simulated annealing and linear program both minimize the cost function

Fluxes at the boundaries of the atmosphere can be computed accurately with relatively sparse spectral information (Figure 2). For various representative subset sizes, the best flux and weight configurations are recorded and the root mean squared error (RMSE) away from reference calculations,*n* indexes the atmospheric profiles and *m* indexes vertical levels (here, just the boundaries), is plotted. This is the same as the cost function up to a constant multiplier, which serves to take the average of the error across atmospheric profiles and vertical levels rather than the sum. Additionally, using data submitted to CKDMIP (Hogan & Matricardi, 2020a) and available online, our results are plotted alongside metrics from modern CKD models, the 27 *g*‐point ecCKD (Hogan & Matricardi, 2022), which is trained on the CKDMIP training data, and the reduced 128 *g*‐point RRTMGP (Pincus et al., 2019).

First, to separate the contributions of simulated annealing and linear programming, weights are computed via a Riemann sum (defined by the Euclidian distance between wavenumbers), performed over the quadrature points during optimization, rather than by the linear program. The pink curve shows that for a simple diagnostic calculation, the Riemann sum can perform at around 1 W/m^{2} error with 32 spectral points, similarly to correlated‐*k* models, highlighting the effectiveness of simulated annealing at choosing an appropriate representative subset. The purple and blue curves show the training and testing error, respectively, using both simulated annealing and linear weight optimization. Here, eight spectral points allow fluxes at the boundaries of the atmosphere to be computed to within 1 W/m^{2} (1% accuracy), comparable to ecCKD and RRTMGP. With more than 16 spectral points, errors in testing data decrease much less rapidly with spectral detail than do errors in training data, indicating overfitting to the training data set. However, the difference between the Riemann sum error and the RMS testing error of the model that optimizes Equation 2 is at close to 1 W/m^{2} throughout. This highlights the importance of optimizing both the linear weights and the representative spectral subset.

Boundary fluxes control the total energy exchanged by the Earth and the atmosphere; flux profiles throughout the atmosphere set heating rates and hence the diabatic forcing of circulation. Simultaneous prediction of fluxes and heating rate can be achieved by specifying a cost function that contains both quantities, that is,*ℓ*^{2} norm of the differences between estimated and reference heating rate and net flux with weights *f*_{i} across all included training profiles and vertical levels, formulated similarly to Equation 2 in Hogan and Matricardi (2022). *F*_{est} is defined as the dot product between the weights and the monochromatic fluxes, as in Equation 1, and the heating rate is computed using this predicted flux; for each vertical level *m* in a given training profile:*S* = 3,600 ⋅ 24 s/day is a scale factor; *g* = 9.81 m/s^{2} is the acceleration due to gravity; *c*_{p} = 1,004 J/kg K is the specific heat at constant pressure for dry air; and *p*^{m} is pressure provided by the CKDMIP data set at vertical level *m*, given in Pa. The heating rate is obtained in units of K/day. Finally, *f*_{1} = 0.15 and *f*_{2} = 1 are empirically derived hyperparameters that balance the optimization of heating rates and fluxes, as discussed in Figure 1. This cost function is used both to choose the optimal spectral subset as well as to find the associated weights.

We train the model on every fifth vertical level of the CKDMIP data set to predict vertically‐resolved fluxes for these applications. The algorithm requires random access to a large (∼170 GB) data set, so this is a practical limitation, but also allows us to treat some levels as out‐of‐sample, which can help us understand the impact of the vertical discretization of the atmosphere on the accuracy of the algorithm. In other words, we insist that our algorithm learns to integrate across the spectral dimension independently of any implicit vertical structure.

In Figure 3, we show the profiles of absolute RMSE (W/m^{2}) of flux predicted by the minimization of Equation 3 for each subset size in order to examine this vertical dependence. We compare to ecCKD on the training plot as it is trained on CKDMIP data, and to RRTMGP on the testing plot, as it is not. Both here and in Figure 4, pressure is plotted linearly in CKDMIP model level because training levels were chosen linearly in model level. The figure demonstrates good generalizability of the method in the vertical as well as to out‐of‐sample atmospheric profiles. Throughout the vertical profile, errors remain of a constant order of magnitude. Extrusions between training levels become more pronounced in larger subset sizes, indicating overfitting. Errors in the testing set are of similar order of magnitude to those in the training set. With 32 spectral points, we achieve errors within 0.3 W/m^{2} (about 0.1% relative error), comparable to the correlated‐*k* models.

Calculating heating rates from predicted fluxes with accuracy is difficult, especially at high altitudes, where the small pressure difference will magnify any errors in the flux. Furthermore, since heating rate is a derivative of flux computed by a simple finite difference, the heating rate error will be determined by the difference in flux error at adjoining vertical levels, rather than by the error in the fluxes themselves. Thus, to produce good estimates of heating rate profiles, we include heating rates in the cost function (Equation 3). Without this term, the optimization produces large heating rate errors in the stratosphere and above, even for large subset sizes.

Figure 4 shows the absolute RMSE in heating rates, in units of K/day. Similarly as in Figure 3, levels we trained on are marked with crosses. In panels (a) and (b), heating RMSE is shown only for the levels in the training set. Though errors are relatively large for small subset sizes, the errors for 32 and 64 wavenumbers are commensurate with CKD models throughout the atmosphere, maintaining errors of <0.2 K/day (approaching 1%).

In panels (c) and (d), heating rates computed from predicted fluxes at all levels (as in Figure 3) are shown. Here, for small subset sizes (≤16 representative wavenumbers), heating rate errors are larger, especially high in the atmosphere; in the layers immediately above the ground, the errors subset sizes 4 and 8 reach as high as 8 and 5 K/day, respectively, due to the fine pressure discretization in these levels. Even for 16 predictors, where flux errors are commensurate with correlated‐*k* models in Figure 3 at about 1 W/m^{2} (1%) error throughout the atmosphere, the heating rate error is large where pressure thickness is small. For 64 predictors, the heating rate errors of our model are smaller than 0.5 K/day for both training and testing throughout most of the atmosphere. However, the dependence of accuracy on vertical discretization is highlighted here. The method extrapolates fluxes to out‐of‐sample levels well because the relationship between the monochromatic fluxes and the broadband integral remains fixed. This is not true of heating rates when training is performed on certain pressure levels and extrapolation to different pressure levels is attempted.

The top‐of‐the‐atmosphere radiative forcing is defined as the difference in outgoing longwave radiation (OLR) between two states. This quantity is crucial for studying how Earth's energy balance will evolve with changing atmospheric composition. As with heating rates, it is not sufficient to predict the OLR in the two states; we must also minimize their difference. We adjust our cost function accordingly:*f*_{1} = 0.15 and *f*_{2} = 1 are as in Equation 3, and *f*_{3} is set to 1, as described by Figure 1. Here, we use the difference between 8× PI CO_{2} and present‐day conditions in the cost function, and extrapolate to forcing between present‐day and the scenarios used in CKDMIP: LGM, PI, 2× PI, and 8× PI (Hogan & Matricardi, 2020a). We also extrapolate to 16× PI (4,480 ppm), to test whether our model can be useful in extreme paleoclimate scenarios (Bice & Norris, 2002).

Figure 5 shows the RMSE for different subset sizes across all the CO_{2} scenarios. Our method displays highest errors in extrapolations to very low and very high CO_{2} scenarios, such as the LGM and the extreme paleoclimate scenario. With 64 spectral points, we can achieve about 0.2 W/m^{2} (<10%) error across scenarios, which is similar to the correlated‐*k* models. Training on just the 2,240 ppm scenario is able to maintain errors within 10% for all climate scenarios for 32 and 64 spectral points, including extreme high and low CO_{2} concentrations.

The choice of climate scenario to train on is somewhat arbitrary; training on any perturbation of CO_{2} yields similar results. This is demonstrated in Figure 6, where the RMSE in CO_{2} forcing is plotted for three independent optimizations trained on the various climate scenarios, each with 32 spectral points. The 8× PI scenario is the same as Figure 5. There are minima in the RMS at the scenarios trained on, and scenarios that are further out‐of‐sample yield higher errors. For example, when training on the LGM or PI CO_{2} concentrations, the dramatically‐different 16× PI scenario is predicted with up to 1 W/m^{2} error. However, the quadrature method remains flexible to different applications, as predictions for all scenarios remain within about 10% error for 32 spectral points.

One of our aspirations is to create a framework in which optimization goals can be explicitly and transparently specified. For instance, for a weather model, it may be imperative to predict fluxes and heating rates well, but forcing by CO_{2} may not be a concern. Adding terms to the cost function increases optimization times. It also results in an error tradeoff, as ensuring that multiple quantities are equally optimized yields larger errors in each individual quantity.

Figure 7 highlights the importance of including quantities of interest in the cost function. In panel (a), the mean RMSE of outgoing fluxes is shown across subset sizes. While the model trained only on the outgoing fluxes (dark red) outperforms all others, on average, for almost all subset sizes, the others are not far behind. In panel (b), we examine the mean RMSE of the predicted flux. The tradeoff of adding extra quantities to the cost function is demonstrated—the model trained only on fluxes (red) is consistently more accurate than all others, and adding heating rates (orange) or CO_{2} (yellow) to the cost function decreases slightly accuracy for all subset sizes. As shown above, including the flux derivative in the cost function is imperative in accurately reproducing heating rates. In panel (c), the mean RMSE of heating rates is shown for different models. The models trained on heating rates (orange and yellow) achieve 10% error, on average, with 32 spectral points, while the model trained only on fluxes (red) exhibits larger errors. Finally, in panel (d), we explore forcing by CO_{2}. Here, it is clear that in order to reproduce this TOA radiative forcing, we must include it in the cost function. Although including CO_{2} forcing in the cost function increases errors in the flux profiles and boundary fluxes, these increases are small.

Though simulated annealing is a stochastic optimization algorithm agnostic to the physics of the Earth system, we hope to draw some physical understanding from the wavenumber subsets and associated weights that are chosen. Figure 8 shows the location of subset members on an example OLR spectrum for three independent optimizations. Bars are shown below the spectrum noting major features: the atmospheric water vapor window as well as absorption by water vapor, carbon dioxide, and ozone. On these bars, the dots indicate where subset members lay on the spectrum for three 16‐member models trained on heating rates and fluxes (Equation 3). The size of the dots is proportional to the value of the monochromatic flux at that wavenumber multiplied by its linear weight. This quantifies how much a given subset member contributes to the integrated broadband flux. The colors correspond to the three different models.

The optimization chooses wavenumbers that are distributed across the spectrum, spanning absorption by all three major gases as well as the water vapor window. Where the Planck curve is small, such as the tails of the spectrum, few points are chosen, and those that are tend not to be “important.” The largest contributions tend to come from the water vapor window, followed by the main CO_{2} and H_{2}O bands.

Next, the amount of absorption captured by each subset member is quantified through optical depth. For each wavenumber in a given subset *S*, the pressure level at which optical depth *τ* = 1 is reached from the top of the atmosphere is recorded. The wavenumbers are then sorted by this pressure level, averaged across training atmospheric profiles and independent optimizations, and then plotted in Figure 9. The colorful lines correspond to an optimization minimizing error in both heating rates and fluxes (Equation 3), while the accompanying gray curves result from optimizing only fluxes.

As the size of the subset increases, the members span more of the atmosphere—the pressure of the highest‐altitude (lowest‐pressure) optically‐thick level decreases. Furthermore, more levels that radiate at higher pressures in the atmosphere are included: the slope of the lines below 100 hPa decreases. Finally, the models trained on only fluxes span less of the atmosphere and contain more wavenumbers that radiate from higher pressures on average. This explains why the model trained only on fluxes cannot predict heating rates at low pressures with accuracy, and further justifies our inclusion of heating rates in the cost function. It also proposes a relationship between the vertical discretization of the training atmospheres and the number of subset members required to accurately predict heating rates.

In this paper, we demonstrate that a small set of spectral points can be used to predict flux profiles, heating rate profiles, and CO_{2} forcing with accuracy comparable to CKD methods. The empirically‐optimized quadrature method selects an optimal subset of spectral points at which to compute a weighted sum to estimate the broadband net flux integral. We apply the method to predicting fluxes throughout the vertical column and test its generalizability to out‐of‐sample atmospheric profiles. We can achieve accuracy to within 1 W/m^{2} for vertically‐resolved net fluxes with 16 spectral points while avoiding overfitting.

We show that in order to reproduce heating rates accurately, especially high in the atmosphere, we must optimize not only on fluxes, but also on the difference between fluxes at consecutive height levels. Including heating rates in the cost function yields errors of less than 0.2 K/day throughout the atmospheric column for subsets of 32 points. Similarly, in order to explore applications in climate science, we include forcing by CO_{2} in the cost function. We find that by training only on an 8× PI scenario, we can extrapolate to reproduce top‐of‐the‐atmosphere forcing within 10% (<1 W/m^{2}) with 32 or 64 spectral points across various scenarios. The ability to optimize quantities of interest lends flexibility to different applications to the method.

While our approach performs well on the present‐day clear‐sky scenarios provided in CKDMIP, there are several limitations to this initial demonstration. First, we have chosen to focus on clear‐sky scenarios, as many *k*‐distributions are initially developed using only clear skies (e.g., Pincus et al., 2019). The optical properties of clouds vary only weakly with wavenumber so we anticipate that training on cloudy profiles would not change the accuracy or the flexibility of our quadrature method. However, as with choosing clear‐sky profiles, choosing an appropriate set of cloudy training profiles is a challenge, and so we leave it out of the scope of this work.

We have also limited our training and testing data to the CKDMIP sets, which commits us to certain atmospheric condition and vertical discretization choices. Choosing training atmospheres that will represent diverse conditions and generalize well is an open question for many statistical and machine learning applications, and we use only 50. Furthermore, memory constraints (the total size of the spectral training data set is about 170 GB) also prevent us from training on all model levels in the vertical. Though this allows us to highlight the importance of the vertical discretization of the data set, using a finer grid would certainly improve both flux and heating rate predictions. It may also require more spectral points to parameterize the extra information.

Finally, though we know many greenhouse gases exert a forcing on the climate, we focus only on CO_{2}. Since we fit one set of weights with a linear regression, there is a limit on how many related quantities we can optimize at once before we formulate an ill‐conditioned problem. In order to predict forcing by different gases, or improve our CO_{2} forcing estimates across different scenarios, we will have to consider how best to include many different optimization targets.

We thank Stefan Buehler for many helpful discussions about this work. We would also like to thank two anonymous reviewers and the associate editor for their helpful feedback. This work was supported by the US National Science Foundation (award AGS 19‐16908), the National Oceanic and Atmospheric Administration (award NA200AR4310375), and the Vetlesen Foundation.

Model output data and the Jupyter notebook for creating figures shown in this paper are publicly available at Czarnecki et al. (2023). The line‐by‐line code and data is freely available from the CKDMIP project Hogan and Matricardi (2020b).