Expanding NEON biodiversity surveys with new instrumentation and machine learning approaches

: A core goal of the National Ecological Observatory Network (NEON) is to measure changes in biodiversity across the 30-yr horizon of the network. In contrast to NEON’s extensive use of automated instruments to collect environmental data, NEON’s biodiversity surveys are almost entirely conducted using traditional human-centric field methods. We believe that the combination of instrumentation for remote data collection and machine learning models to process such data represents an important opportunity for NEON to expand the scope, scale, and usability of its biodiversity data collection while potentially reducing long-term costs. In this manuscript, we first review the current status of instrument-based biodiversity surveys within the NEON project and previous research at the intersection of biodiversity, instrumentation, and machine learning at NEON sites. We then survey methods that have been developed at other locations but could potentially be employed at NEON sites in future. Finally, we expand on these ideas in five case studies that we believe suggest particularly fruitful future paths for automated biodiversity measurement at NEON sites: acoustic recorders for sound-producing taxa, camera traps for medium and large mammals, hydroacoustic and remote imagery for aquatic diversity, expanded remote and ground-based measurements for plant biodiversity, and laboratory-based imaging for physical specimens and samples in the NEON biorepository. Through its data science-literate staff and user community, NEON has a unique role to play in supporting the growth of such automated bio-diversity survey methods, as well as demonstrating their ability to help answer key ecological questions that cannot be answered at the more limited spatiotemporal scales of human-driven surveys.


INTRODUCTION
As it enters its full operational stage, the National Ecological Observatory Network (NEON) is poised to deliver approximately 180 openly available data products from 81 field sites across the United States throughout a 30-yr time horizon.The overarching goal of NEON is to "enable the research community to ask and address scientific questions regarding biological responses to environmental change from a regional to continental scale" (Tornow 2019).At the National Science Foundation (NSF), NEON received initial funding from the NSF's Major Research Facilities and Construction budget and is currently affiliated with the Division of Biological Infrastructure.In part because of these relationships, many ecologists associate NEON primarily with its extensive field infrastructure and automated sampling instruments.
Given its associations with instrumentation and biological monitoring, many ecologists may assume that many of the instruments at NEON sites are used to measure some aspect of biodiversity.However, there are currently only two NEON data products that are categorized as measuring biodiversity (i.e., fall within the Organisms, Populations, and Communities theme) using a NEON sensor instrument (i.e., fall within the Aquatic or Terrestrial Instrument System).These are the land-water interface images and phenology images, both of which are collected by The Phenocam Network (Fig. 1).All other biodiversity data collected and shared by NEON are collected the "old-fashioned way" through direct human surveys or physical specimen collection.
While human-driven field methods will remain at the core of ecological research, ecologists are increasingly complementing traditional survey methods with automated instrument-based approaches for studying biodiversity (Kitzes and Schricker 2019).Examples that will be familiar to many ecologists include wildlife cameras, acoustic recorders, and satellite imagery.Such automated or technology-driven biodiversity surveys often include two components: a physical sensor able to record data in the field and a machine learning model that can use these data to detect an organism or trait of interest.The growth of such surveys in recent years has been greatly facilitated by the decreasing costs of sensor hardware and the availability of large, labeled data sets, which can be used to train machine learning models.While challenges remain for both the hardware and software components of automated biodiversity surveys, many such methods are now broadly available to ecologists.
We recognize that traditional field survey methods will remain critical for biodiversity research, including for training and validating machine learning models.However, research that employs instrumentation and machine learning to study biodiversity has several potential advantages that can complement and expand the scope of existing human-driven biodiversity research methods.First, automated methods have the potential to reduce the labor and logistical costs of biodiversity sampling.This is particularly true when sensors can be left in place for long periods that would otherwise require repeated human surveys.Second, automated methods have the potential to radically increase both the spatial and temporal extent of available biodiversity data.Tree crown identification from remote imagery is a clear example of the former (Pouliot et al. 2002, Weinstein et al. 2019), while long-term deployment of camera traps for mammal surveys provides a relatively well-known example of the latter (Swanson et al. 2015, McShea et al. 2016).This large-scale sampling has the potential to increase the power of v www.esajournals.org 2 November 2021 v Volume 12(11) v Article e03795 biodiversity analyses as well as open new research questions that would require both fine grain and large extent data to answer.Third, because many sensors can be deployed easily for long time periods, they are better able to study events that are difficult to detect in the field, such as the appearance of a rare species or phenological events that occur at uncertain times.Fourth and finally, as compared to human-based field observations, sensor-driven data collection provides a digital record of the environment at a field site.In the short term, these records can be reviewed by multiple researchers, reducing observer effects and providing a means of verifying observations.In the long term, these records form a type of digital museum specimen that can be analyzed by future generations of ecologists using new methods to answer new questions.
Our goal in this commentary is to explore possible directions for expanding research at the intersection of biodiversity, instrumentation, and machine learning at NEON sites.We focus specifically on cases where a combination of a remotely operated instrument and a machine learning method could take the place of a human observer who would otherwise be needed to measure a biodiversity variable in the field.We divide our discussion into three sections.First, we begin by reviewing existing work at the intersection of biodiversity, instrumentation, and machine learning that has already taken place at NEON sites.Second, we conduct a broad survey of research at this intersection that has been conducted at locations other than NEON field sites.Third, we present five brief case studies that explore specific directions that we believe could be particularly fruitful areas for future research at NEON sites.These case studies are (1) acoustic field recorders for birds, bats, anurans, and insects, (2) wildlife cameras for mammals, (3) hydroacoustic sensors and remote sensing for aquatic biodiversity, (4) expanded remote and ground-based measurements for plant biodiversity, and (5) laboratory-based analysis of the NEON physical specimen collections.We conclude with a summary of the challenges and promises of expanding research in this area at NEON sites.
Throughout this manuscript, we define "biodiversity" broadly to include the number, composition, distribution, or interactions of species, or groups of species at higher taxonomic levels, as well as their traits (D ıaz et al. 2006).We note, however, that the majority of previous and existing biodiversity research focuses on species richness, occupancy, or population sizes, and many future prospects for research also concentrate in these areas.We define an "instrument" as an automated sensor that records ambient data from the environment without requiring a human operator to view or record each observation as it is made.There are many definitions of "machine learning" (see in particular Chollet and Allaire 2018), but for the purpose of this manuscript, we define machine learning broadly as any quantitative method that can learn to predict variables using data automatically collected by an instrument, thus replacing a human observer that would otherwise be needed to measure these variables.
In our discussions below, we specifically restrict our focus to approaches that use machine learning models to predict biodiversity variables from the properties of organisms themselves, such as visual appearance or sound production, as measured directly by an instrument.The goal is to specifically consider the use of instruments and machine learning to predict biodiversity variables from a data stream that measures attributes of biodiversity, not to predict biodiversity variables based on measurements of variables that may be correlated with biodiversity.Common examples include predicting the presence of a species whose image or vocalization is captured in a photo or audio recording.Conversely, we do not consider approaches such as species distribution modeling or other methods that predict biodiversity based on environmental variables, regardless of how these correlated variables are measured.We similarly do not consider cases in which machine learning methods are applied to perform statistical inference, where the primary goal is to understand the relationship between biodiversity and other variables that may be measured by remote instruments.Given our focus on instrumentation and data streams, we do not devote substantial effort to discussing "omics" methods that can be applied to measuring biodiversity variables from physical samples collected in the field and analyzed in a laboratory setting.We discuss such approaches briefly in our example of laboratorybased imaging below.

WHAT HAS BEEN DONE AT NEON?
We began by reviewing research that has already been completed at NEON sites at the intersection of biodiversity, instrumentation, and machine learning.We used a two-step process to review the literature in this area.First, we searched Google Scholar using the keywords "machine learning" and "national ecological observatory network."All papers located under this search used instrument-based data.Second, we reviewed the titles and abstracts of entries in the list of papers and publications curated on the NEON website (National Ecological Observatory Network, n.d.) as of October 2019 for relevant publications.These publications fell into two broad categories: (1) Research focused on measuring plant species diversity, particularly tree diversity, from aerial sensor data, and (2) research using data gathered by the Phenocams currently deployed at NEON sites.
The vast majority of the work using instrument data to survey biodiversity at NEON sites has used aerial sensors to measure aspects of plant biodiversity.Laser-based systems, for example, have been used to estimate structural diversity (LaRue et al. 2020).Specific metrics such as tree height, crown height, canopy cover, vegetation area, and the structure of the canopy and understory have been measured by using terrestrial or aerial laser scanning and/or high-resolution orthophotos (Kosmala et al. 2016).In other studies, individual tree crowns have been delineated from imaging spectroscopy and LiDAR (Light Detection and Ranging) data using convolutional neural networks or rule-based region growing (e.g., watershed and local-maxima filtering approaches) (Pouliot et al. 2002, Zhou et al. 2018b, Dalponte et al. 2019, Marconi et al. 2019a, McMahon 2019, Weinstein et al. 2019).
A variety of approaches have been used to measure species diversity by identifying individual plant species from NEON Airborne Observation Platform data.Krause (2015) leveraged imaging spectroscopy, full-waveform LiDAR, and high-resolution true-color aerial imagery to characterize distinct vegetation communities.Pixel-level approaches dominate the existing literature, with species classifications generated from imaging spectroscopy and LiDAR data using machine learning methods such as neural networks, support vector machines, and random forests (Nia et al. 2015, Anderson 2018, Zhou et al. 2018a, Dalponte et al. 2019, Huesca et al. 2019, McMahon 2019, Sumsion et al. 2019)."Multiple instance" methods have accounted for imprecision in species labels and/or mixed membership of species within pixels (Jiao et al. 2018, v www.esajournals.orgZou et al. 2019).Convolutional neural networks that account for spatial context have also been developed for plant species classification from RGB and imaging spectroscopy data (Fricker et al. 2019).
The unique variety of data available at NEON sites has provided opportunities to compare biodiversity measurements derived from different sensors.Recent research has focused on estimating different indices from spectral features to monitor changes in vegetation across scales and landscapes, including calculating the contribution of individual plots or communities to spectral beta diversity and of individual spectral bands to alpha, beta, and gamma spectral diversity (Lalibert e et al. 2020), identifying the most important wavelengths for leaf chemical composition and nutrients (Wang et al. 2020), and predicting foliar traits in a variety of ecosystems and biomes (Marconi et al. 2019b, Chadwick et al. 2020, Wang et al. 2020).Other approaches have used Airborne Observatory Platformgenerated data products, in combination with other satellite-generated data to estimate vegetation indices such as leaf area density (Kamoske et al. 2019) and monitor changes in vegetation cover and other ecophysiological dynamics (Kampe et al. 2010, Baishali 2019 andreferences therein).
Aside from aerial sensors, the other NEON instrument that has been used for biodiversity measurement are NEON's PhenoCams (Fig. 1).These cameras, which are currently installed at many NEON sites, take digital camera images of a fixed location every half hour, with the photographs made available on the NEON data portal.PhenoCam images provide high-quality phenology transition dates that agree well with direct field observations, as well as vegetation indices that agree with satellite remote sensing (Richardson et al. 2018).In addition, automatic species identification in PhenoCam images has been achieved using a multiscale classifier, with a training set with known species labels (Almeida et al. 2014).These data have also been the focus of citizen science projects.In the Season Spotter project supported by NEON, citizen science volunteers have detected reproductive phenology events, demarcated tree canopies, and validated phenology transition dates (Kosmala et al. 2016).Such citizen science projects can provide valuable training data for automated processing of complex images using machine learning algorithms.For example, a deep convolutional neural network, integrated with crowdsourced classification, has been shown to accurately detect snow cover, potentially transforming PhenoCams into global "snow detectors" (Kosmala et al. 2018).

WHAT OTHER APPROACHES ARE AVAILABLE?
Our next goal was to identify examples of research at the intersection of biodiversity, instrumentation, and machine learning that has been conducted outside of NEON sites.To be included as an example, the research described in a manuscript had to involve the measurement of a biodiversity variable using a data stream from an instrument that was processed in an automated fashion using a machine learning model.
We conducted a systematic search of Web of Knowledge in June 2020 using the search terms: ("regression tree" OR "decision tree" OR "gradient boosting" OR "machine learning" OR "neural network" OR "support vector machine" OR "deep learning" OR "computer vision" OR "gaussian process" OR "convolutional neural network" OR "generative adversarial" OR "supervised learning" OR "unsupervised learning" OR "deep feedforward" OR "multilayer perceptron" OR "autoencoder" OR "representation learning" OR "boltzmann machine" OR "deep belief network" OR "random forest" OR "linear discriminant") AND (classif We further restricted the search to journals in the categories of Ecology or Biodiversity Conservation to focus on manuscripts involving biodiversity variables.This search yielded an initial set of 2248 abstracts, and a manual review of these abstracts confirmed that 231 included an explicit description of a biodiversity variable, sensor or instrument, and machine learning approach (Appendix S1: Table S1).
While this set of 231 abstracts does not exhaustively cover all manuscripts at the intersection of biodiversity, instrumentation, and machine learning, we believe that it provides a reasonable and systematically sampled set of manuscripts at this intersection.Overall, the results confirm the observation from the previous section that the majority of research at the intersection of biodiversity, instrumentation, and machine learning v www.esajournals.orghas used aerial imagery to study vegetation (107 manuscripts).The vast majority of the manuscripts focused on terrestrial communities, with only 22 of 231 sampled manuscripts studying aquatic or marine biodiversity.The most common ground-based sensors were acoustic recorders and field cameras (38 and 19 manuscripts, respectively).Of the 209 abstracts that stated a particular machine learning method, 63 described using neural networks, either alone or in combination with other classification methods, while the remainder used other longerestablished methods such as decision or regression trees, random forests, or support vector machines.
We then combined these abstracts with a set of manuscripts identified by the authors that meet the criteria described above.We summarize our knowledge of the breadth of biodiversity, instrumentation, and machine learning research in Table 1.This table provides a short description and example manuscripts across 10 unique categories of sensors and 38 unique combinations of sensors and biodiversity variables.We expand on specific examples from this table in the section below.

WHAT MIGHT BE DONE AT NEON IN THE FUTURE?
From the set of research ideas identified in Table 1, we selected five examples that we believe represent particularly fruitful areas for future research at NEON sites.Below, we present five case studies that describe the potential for automated measurement of (1) bird, bat, anuran, and insect diversity using acoustic recorders; (2) medium to large mammal diversity using wildlife cameras; (3) aquatic vegetation, invertebrates, and fish diversity with aquatic sensors; (4) surveying plant biodiversity with expanded remote and ground-based measurements; and (5) characteristics of physical specimens in the NEON biorepository with laboratory-based imaging.
Each case study presents a brief review of work in a particular area along with recommendations for the placement and use of instruments and machine learning methods across the NEON sites.In all cases, the goal of the recommended work is to increase the quantity and diversity of biodiversity measurements available to the community of scientists using NEON data.The availability of these data will support a variety of specific ecological research questions that require such measurements, particularly those involving biodiversity distributions at large scales and changes to these distributions in space and time.Possible pathways for financial support of this new work are discussed below in the Conclusion section.

Surveying birds, bats, anurans, and insects with acoustic recorders
Acoustic recorders are autonomous sensors that record the soundscape, defined as the collection of sounds that emanate from landscapes including those from biological (biophony), geophysical (geophony), and human (anthrophony) sources (Pijanowski et al. 2011a, b).For the purposes of biodiversity surveys, the most important component of these soundscapes are signals made by vocal wildlife, particularly birds and bats but also anurans, insects, and other mammals (Blumstein et al. 2011, Stowell andSueur 2020).Automated recording devices may be used to sample continuously in time over greater spatial and temporal scales than to conventional surveys (Darras et al. 2019).These devices can be efficient tools for baseline biodiversity surveys, long-term ecological monitoring, exploring phenological and distributional shifts to global change, uncovering cryptic biodiversity, and in rapid biodiversity assessments (e.g., Sueur et al. 2008, Obrist et al. 2010, Frommolt and Tauchert 2014, Sugai et al. 2019).More detailed surveys can be conducted using a time-synchronized array of microphones to localize individual vocalizing animals, enabling assessment of animals' abundance, habitat use, and behavior (Rhinehart et al. 2020).Autonomous recording can generate enormous amounts of data, which many researchers outside of NEON analyze using convolutional neural networks and other machine learning models (e.g., Bermant et al. 2019, Lostanlen et al. 2019, LeBien et al. 2020).
Both commercial and open source acoustic recorders and analytical tools are now widely available for wildlife surveys, with costs ranging from several thousand dollars for high-end commercial bat detectors to less than $50 for the increasingly popular AudioMoth open source recorder (Hill et al. 2017) Given that no acoustic data are currently being collected at NEON sites, in the short term, we recommend 200-h single recorder pilot deployments that store data in audible frequencies in compressed format to generate initial pilot data on NEON soundscapes.In the medium term, we suggest that inexpensive AudioMoth (or similar) recorders be spread across the breeding bird points or other permanent survey points on a 3to 5-yr sampling horizon to explore hardware and software needs for longer term sampling.The data generated from even a large network of such recorders would be within the capacity of NEON to store and serve to users.Standard acoustic products (e.g., long-term spectral averages, soundscape/acoustic indices that do not involve machine learning biodiversity classification) could be produced and released immediately from such data, while active research into acoustic species classification continues.Longerterm, higher-quality omnidirectional microphone hardware could be deployed across sites, covering both aquatic and terrestrial habitats, to record audible and ultrasonic signals.Machine learning models could then be developed to classify all species of interest as well as to characterize anthropogenic sounds in these recordings.

Surveying mammals with wildlife cameras
Wildlife cameras, or camera traps, are field cameras that use a sensor to record an image Recently, deep learning algorithms (mostly convolutional neural networks, recurrent neural networks, and restricted Boltzmann machines), along with other traditional machine learning models such as support vector machines, boosting algorithms, and multiple kernel learning, have been employed for species classification purposes (Chen et al. 2014, Duhart et al. 2019).Identification accuracy of these deep learning systems varied, but in many cases was as high as 98% (Tabak et al. 2019, Willi et al. 2019).The Snapshot Serengeti project has developed models that not only identify species but also report the number of individuals, the presence of young, and behavior (Norouzzadeh et al. 2018).
To deploy such camera-trap networks at NEON sites, the first steps will be to design standard protocols for camera deployment and develop methodologies for data curation (Forrester et al. 2016, Young et al. 2018).Once a camera-trap network is deployed and images are collected, the next necessary step of refining, retooling, and validating existing image classification methods will likely require a significant investment.In particular, species identification accuracy of existing camera-trap classifiers is lower for both small-and medium-sized mammals (e.g., Burton et al. 2015, Tabak et al. 2019), and thus in order to apply these methods across all taxa of interest, NEON will need to improve automated classification for these species.Camera image data sets can easily become unbalanced, leading outputs of the machine learning models to become biased toward highfrequency classes (Norouzzadeh et al. 2018, Shahinfar et al. 2020), requiring more refined approaches for the identification of rare species.Additionally, models trained at certain NEON sites may be less applicable to sites in different ecoregions, as identification accuracy can decline due to changes in the presence of previously unseen species.Therefore, we recommend developing extensive training data sets using images from existing wildlife-camera networks and locale-specific mechanisms (transfer-learning) to update the classifiers.The media recorded should ideally be delivered to a NEON online repository via a Wi-Fi/GSM network or optical cables, backed-up by periodic manual retrieval of the SD cards from individual cameras.The possibilities for real-time image classification should be also explored, as these approaches are currently in development (Duhart et al. 2019).

Surveying aquatic species with hydroacoustic sensors and remote sensing
Two broad categories of sensors have been used to study biodiversity in aquatic, coastal, and marine ecosystems.The fir s tc a t e g o r yo fs e n s o r s includes sonar technology and underwater acoustic recording devices that collect hydroacoustic data.Sonar technology has been used to create waterborne fish observatories to generate videolike imagery or screenshots, even under highturbidity low-light environments.Coupled with semi-automated image processing algorithms, these sonar systems have been employed for stock assessment of commercially important fish, documenting fish aggregates, individual fish sizes, and monitoring biomass, abundance and distribution of invasive aquatic fauna (Wolff and Badri-Hoeher 2014, Uranga et al. 2017, McCann et al. 2018, Vatnehol et al. 2018).Machine learning has shown promising success in feature extraction and background elimination, resulting in high detection accuracies (78-88%) for marine and coastal fish (Moniruzzaman et al. 2017, Sung et al. 2017, Salman et al. 2019).Underwater autonomous, passive recording devices (e.g., Digital SpectroGram Recorder system, Bruel and Kjaer hydrophones) have been developed as soundscape-based rapid biodiversity assessment tools to identify freshwater and marine fish, aquatic mammals, waterfowls, and v www.esajournals.orgamphibians (Tonolla et al. 2010, Rupp ee t a l .2015).
The second category of sensors includes optical, infrared, and LiDAR sensors on airborne or spaceborne platforms (e.g., Sentinel, RapidEye, WorldView) that collect remote imagery.Such remote sensing data have been used to identify high productivity fisheries grounds (Klemas 2012, Jalali et al. 2015, Sebasti a-Frasquet et al. 2019), monospecific stands of aquatic or wetland vegetation (Santos et al. 2009), the presence of common reed (Phragmites australis; Husson et al. 2014, Vihervaara et al. 2017), the abundance and distribution of macroalgae, aquatic vascular plants, algal stands, sessile aquatic invertebrates (Bortone et al. 2000, Vihervaara et al. 2017), and fish activities (Geronimo et al. 2018).In floodplain wetlands, dual wavelength co-and crosspolarized Synthetic Aperture Radar imagery have been effective in mapping both woody and herbaceous vegetation (Hess and Melack 2003).
Currently, all aquatic biodiversity surveys at NEON sites are conducted with standard, laborintensive such as electrofishing and netting.Future automated monitoring of aquatic biodiversity at NEON sites might target monitoring of (1) fish counts, especially for migratory species; (2) functional and trait diversity of algae and aquatic/semi-aquatic plants; (3) invasive species in aquatic and wetland environments; and (4) the overall aquatic soundscape.This can be achieved by combining standard hydroacoustic sensing methods with remote sensing data generated by the Airborne Observatory Platform.Available remotely collected data could be augmented with full-waveform bathymetric LiDAR, optical, Radar, and Thermal IR sensors, as well as drones with automated flight paths to follow the course of rivers and survey the entire acreage of wetlands and lentic systems.

Surveying plant biodiversity with expanded remote and ground-based measurements
LiDAR, multispectral imaging, and imaging spectroscopy data from airborne and unmanned aerial vehicle (UAV) sensors can be used to quantify different components of plant biodiversity, including species diversity, foliar traits, and functional diversity (Asner et al. 2017, Schneider et al. 2017, Verrelst et al. 2019, Lalibert e et al. 2020).Functional diversity is an important component of biodiversity that characterizes the variability of functional traits within a community (D ıaz and Cabido 2001).Functional diversity is of particular interest because it can integrate the variation in plant traits that affect ecosystem properties and stability (D ıaz andCabido 2001, Hooper et al. 2005).Remotely sensed measures of functional diversity have been associated with variation in rates of carbon gain in natural systems (Schweiger et al. 2018, Dur an et al. 2019), but these linkages are yet to be explored across NEON sites.NEON infrastructure can be used to monitor spatial and temporal changes in functional diversity, but current methods (e.g., fieldbased and AOP approaches) are challenging due to the spatial mismatch between pixel size and the size of individual plants.To bridge the spatial scale mismatch, it is necessary to deploy sensors onboard towers and drones, to sample at multiple scales and develop approaches to scale up from field plots to airborne pixels (Marvin et al. 2016).
Machine learning algorithms have been shown to provide precise estimates (60-90% accuracy) of plant properties, including species identity, structural, and leaf chemical traits.A wide range of approaches have been used, including random forests, support vector machine classifiers, and more recently deep networks and multi-sensor fusion (Fassnacht et al. 2016, Du and Zare 2019, Ma et al. 2019, Nezami et al. 2020, Pleșoianu et al. 2020, Aguirre-Guti errez et al. 2021).Despite the potential to produce large-scale biodiversity surveys, these approaches are often trained on few sites and show limited transferability across regions and time (Meerdink et al. 2019), undermining the ability to use them for large-scale generalizable and scalable applications.The resulting combination of unbalanced, biased, and limited training data, along with limited co-registered field data, restrict these applications to a reduced number of biodiversity facets (e.g., taxa or traits quantities) and uses.
NEON provides a potential opportunity to overcome these challenges and build algorithms applicable at a continental scale.NEON collects leaf traits, species, and structural attributes for hundreds of plant species within the footprint of LiDAR, spectroscopic images, and orthophotos.Integrating these data has already allowed yearly surveys of hundreds of millions of individual v www.esajournals.orgtrees and their traits (Weinstein et al. 2020).Annual campaigns at NEON sites also provide a unique opportunity to evaluate temporal changes in these different aspects of biodiversity, improve predictions over time, and forecast how plant communities will respond to global changes.
To build on the work that has already occurred at NEON sites, we suggest that it will be important to build "human-in-the-loop" frameworks that support ongoing interactions between NEON field crews and the community of data scientists focused on developing new techniques to better survey vegetation properties from space.These interactions could support flexible and targeted field sampling, directing effort to areas where traits and species predictions show higher uncertainty and to data types that can most reduce errors in aligning remote sensing to field data.Specifically, we suggest recording crown diameter and canopy position for a larger number of stems, and recording information on crown boundaries from the field directly on the remote sensing images (Graves et al. 2018).Similarly, we suggest collecting "one-time" samples of field data in locations of interest within the NEON Airborne Observatory Platform footprint to validate remote sensing surveys from previous years.This information could help to improve the confidence in matching pixel-to-crown labels, using remote sensing surveys for ecological analyses, and guiding future efforts in developing better methods.

Characterizing physical specimens with laboratory-based imaging
Each year, the NEON Biorepository at Arizona State University receives, archives, and facilitates the use of over 110,000 samples and specimens collected from NEON sites.Among these are whole organism vouchers, bulk organismal samples, and tissue samples suitable for image analysis utilizing machine learning techniques.Machine learning algorithms to study specimen images are designed to detect the presence and location of a relevant object in an image and then classify the object into a set of predefined categories or measure its traits.Convolutional neural networks, linear discriminant analysis, and support vector machine learning are most commonly used to accomplish these goals.Outside of NEON, these methods have been applied to identify taxa (Favret and Sieracki 2016, Carranza-Rojas et al. 2017, Schuettpelz et al. 2017, Raitoharju et al. 2018), score phenological characters (Lorieul et al. 2019), and classify species interactions (Meineke et al. 2019).
As many natural history collections and research groups have sufficient imaging equipment and computing power to conduct these studies, machine learning can already be used to analyze images of NEON samples.For example, researchers funded by an NSF Macrosystems Biology grant are currently using neural networks, support vector machines, and regression trees to predict taxonomic identity and body mass of invertebrates collected in NEON pitfall traps (Blair et al. 2020, M. Kaspari and M. Weiser, personnel communication).In addition to extending these macro-and community-level ecology applications to other taxonomic groups collected by NEON, researchers could use machine learning techniques developed for human tissue samples to identify blood-borne parasites or cell morphologies in NEON tissue samples (Jones et al. 2009, Narayanan et al. 2019).Once successfully designed by other researchers, machine learning image analyses could potentially allow NEON itself to publish additional data products without further investment in field instrumentation or labor.Regardless of whether these efforts are conducted by external researchers or NEON scientists, all images and methods can and should be made accessible and discoverable in association with the relevant sample occurrence records in the NEON Biorepository data portal.
Perhaps the most fruitful application of machine learning to physical specimens will be in gaining insight into the species composition and intraspecific variation in NEON's extensive collections of bulk, unidentified terrestrial, and aquatic invertebrate samples.However, generating the training data sufficient for the identification of individuals within these information-rich samples will require substantial and diverse taxonomic expertise.Taxon-specific experts will be needed, not only to provide training identification data, but also to determine the level of taxonomic resolution that is appropriate for each group, given that (1) a large proportion of invertebrate species are undescribed, (2) many taxa v www.esajournals.orgare not readily distinguishable based on external characters, (3) any misidentifications in the training set will propagate into the algorithm output, and (4) rare species not included in the training data will be lumped into more common taxa.
Although it is not a focus of this manuscript, we note that physical specimens could also be analyzed using approaches other than imaging, including -omics methods.Similar to image analysis techniques, machine learning applications to the analysis of -omic sequencing data (select methods reviewed in Libbrecht and Noble 2015, Soueidan and Nikolski 2015) could be used to describe organismal and functional components of many NEON samples: bulk animal, plant, plankton, and microbes samples; aquatic and terrestrial eDNA samples; and microbiomes that can be extracted from whole organism vouchers.NEON already provides shotgun metagenomic and metabarcoding (e.g., CO1 DNA) sequences associated with some of these samples (Brumfield et al. 2020), but many applicable analyses and relevant sample types remain to be explored.It is important to note that when the goal is to provide identifications of taxonomic groups, rather than numerical taxonomy (Operational Taxonomic Units) or within bulk organismal samples, many of the considerations applicable to image analysis of existing data for understudied groups also apply to genomic techniques.

CONCLUSION
The combination of inexpensive hardware and advances in classification methods has launched a new era of remote sensing of biodiversity.Instrument-collected imagery and audio from a variety of sensors are now available at scales that would have seemed unprecedented just a few years ago.The Snapshot Serengeti project, for example, has now collected more than 7 million camera-trap images of 40 mammalian species (Swanson et al. 2015), while the Australian Acoustic Observatory (A20) includes approximately 400 continuously operating acoustic recorders that are expected to collect 2 petabytes of audio data over a five-year period (Australian Acoustic Observatory, n.d.).The continued development of accurate and efficient machine learning classification methods will further unlock these large data sets to answer questions about species diversity, abundance, traits, and distributions at large spatial and temporal scales.
Despite its broad focus on biodiversity and instrumentation, NEON has not yet begun producing data products that take advantage of sensor-collected data and machine learning methods to evaluate biodiversity patterns and trends.We have described above several particularly fruitful avenues for future progress toward this goal at NEON sites.Many of these ideas build on methods that are already in widespread use outside of NEON and, from a technical perspective, could be relatively easily adopted at NEON sites given sufficient financial resources.While the core and relocatable sites might present slightly different opportunities and constraints for sensor deployment, we believe that the majority of this work can be completed relatively uniformly across NEON sites.Through its uniquely data science-literate staff and user community, NEON has a particularly important role to play in supporting the growth of such automated biodiversity survey methods, as well as demonstrating their utility for answering basic and applied ecological questions.
Any deployment of new sensors across NEON sites will result in large volumes of additional data that will need to be appropriately managed, archived, and shared.The NEON staff and user community have prior experience managing large data sets, particularly those generated by the Airborne Observatory Platform, which can inform policies and procedures for new data sets of similar size.NEON also has existing standards for product documentation, including metadata, field protocols, and quality reports, that could be extended to cover new data types.Any expansions or modifications to existing infrastructure and standards associated with new biodiversity surveys could be informed by discussions with NEON Technical Working Groups that cover Data Standards, LiDAR, Terrestrial Instrument Data QA/QC, and other related topics.
Perhaps the most significant constraint on the addition of new automated biodiversity surveys at NEON sites will be the availability of financial resources to support both sensor deployment and data management infrastructure.The National Science Foundation remains perhaps the most likely source of funds to support new v www.esajournals.orgresearch projects at the scales envisioned above.Several co-authors have had discussions with staff at NEON and the National Science Foundation about adding new biodiversity sensors and data products through either standard grants to individual PIs or additional funds directed to NEON itself.NEON's Assignable Assets Program, which allows individual investigators to pay for NEON staff time to assist in deploying field equipment, could play a particularly important role in enabling individual investigators to collect new sensor data across multiple NEON sites.We note that such collaborations between NEON and external researchers are named as a key objective in the current NEON Strategic Engagement Plan, which includes an objective to "Facilitate NEON-related research through scientific engagement" (National Ecological Observatory Network 2019).
We conclude with four broad observations on the role that NEON can play in future of automated biodiversity surveys.First, NEON has an ability to create "de facto" standards for data collection, metadata, analysis, and sharing by developing protocols for the management of remotely collected data across its sites.Such standards should be developed in close collaboration with the Technical Working Groups and other academic and non-academic scientists working at this intersection.Second, and relatedly, the broad coverage of the NEON network together with the taxonomic expertise of its staff provides an opportunity for newly developed hardware and algorithms to be validated with ground-truth data across a variety of communities.The ability to test the performance of automated species classification from camera-trap imagery, for example, would be greatly enhanced by the availability of such imagery from many distributed sites, with a variety of habitat conditions and biological communities.Third, the NEON data platform is uniquely poised to both permanently archive and broadly distribute such sensor-collected data and the derived data products from these sensor streams.Such sharing could greatly help to democratize the use of remotely sensed biodiversity data, making it accessible to faculty, students, and other scientists across a variety of institutions and academic backgrounds.Finally, and perhaps most importantly, NEON can continue to serve as a focal point for building social networks of scientists actively engaged in this research, providing a forum and common platform for the sharing of knowledge, tools, and collaborative project ideas.

Fig. 1 .
Fig. 1.Examples of (a) phenology image and (b) land-water interface image from the Smithsonian Conservation Biology Institute and Posey Creek NEON sites, both taken at 12:00 pm EST on 1 November 2019 (National Ecological Observatory Network 2020).

Table 1 .
Survey of research areas at the intersection of instrumentation and biodiversity.

Table 1 .
Continued.) (Note: Within each of these combinations, many machine learning approaches are possible.† Satellite, drone, airborne observation platform.‡ Still or video imagery from or underwater towed cameras or remote operated vehicles.§ Sonar imagery.