# **Weather stations for biodiversity: a comprehensive approach to an automated and modular monitoring system**

*Edited by:*

J. Wolfgang Wägele and Georg F. Tschan

#### **Weather stations for biodiversity: a comprehensive approach to an automated and modular monitoring system**

*Edited by:* J. Wolfgang Wägele and Georg F. Tschan

**Cover design** by Georg F. Tschan: *Hoplia coerulea* (Drury, 1773) (Coleoptera: Scarabaeidae), male individual sitting on a grass leaf. Photograph, taken in July 2021 along the River Jonte in the Lozère department, southern France. The beetle can be locally abundant, especially during the mating season. Its dispersal ability, however, is limited, and the species only occurs at a few scattered, isolated sites in Southwest Europe. Due to anthropogenic influence along the watercourses, many populations are disappearing, and in some of the species' range it is categorised as endangered.

This is an open access book distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

First published: 2024 ISBN (e-book): 978-619-248-123-0 ISBN (paperback): 978-619-248-122-3 DOI: 10.3897/ab.e119534

Pensoft Publishers 12, Prof. Georgi Zlatarski Str. 1111 Sofia, Bulgaria www.pensoft.net








# **Acknowledgements**

The editors, in the name of all authors and other participants in the project, would like to express their thanks to the German Federal Ministry of Education and Research (BMBF) for funding and to both the DLR and the VDI/VDE-IT for administering the AMMOD project.

Furthermore, we are grateful to the personnel at the three test sites for allowing access, helping with the maintenance, and facilitating our visits: the Melbgarten of the University Botanic Gardens in Bonn, the Ecological research station in Britz, and the Energieberg Georgswerder site in Hamburg.

Last but not least special thanks to all the colleagues who helped us with the administration of the project, as well as various students, all who are rarely seen or mentioned, but have been essential to its success.

J. Wolfgang Wägele and Georg F. Tschan

# **Preface**

Nobody would doubt that it is good to have weather stations. They provide measurements of precipitation and temperature, and often also of humidity and wind speed. Tens of thousands of weather stations worldwide form a Global Observing System that has proven indispensable for weather forecasting, prediction of extreme events such as floods or droughts, and monitoring of climatic changes.

Weather and climate are things that everyone experiences in their everyday lives: Will it rain tomorrow? I should better bring a raincoat. Will it be hot in two days? I should better bring a pair of sandals.

The situation is less clear when it comes to biodiversity. Coined in 1988 by world-famous biologist Edward O. Wilson, the term "biodiversity" is used to describe "the greatest wonder on the planet" – the diversity of life on Earth in all its facets. While we are still struggling to discover how many species there are on Earth, we are already losing thousands of them forever at unprecedented rates, much higher than the background extinction rate. Yet, we are lacking "weather stations" to continuously assess the status of biodiversity on the planet. How can we take conservation decisions, how can we manage landscapes more sustainably, if we don't know anything about the ups and downs of biodiversity? Nobody remembers how many birds were singing in the 18th century, or how many bumblebees were buzzing, how many butterflies were in the air.

Advances in technology now make it possible to change all this. Even if the past is hidden in the dark or in natural history collections, we can still make a difference in the future – showing, hopefully, how biodiversity recovers from anthropogenic pressure and how we can shape a biodiversity-friendly future.

The present book, edited by Wolfgang Wägele and Georg Tschan, provides an as yet unprecedented collection of technological advances that could form the basis for future biodiversity weather stations. The book describes a range of approaches that could make biodiversity assessment as easy as measuring temperature or rainfall in a reliable, reproducible manner. Of course, there are still quite some more steps to go – but the combination of approaches shown in the book would make it worthwhile to consider carefully when designing the next level of Global Observing Systems.

The spectrum of approaches covered is simply amazing: From assessing plant diversity using volatile organic compounds and pollen traces, to automated insect trapping systems, multi-channel bioacoustics and depth-aware visual monitoring and non-destructive DNA metabarcoding, the authors provide insights into the most advanced approaches to assess biodiversity using as much automation as possible. The book concludes with a description of a base station integrating the inputs from multiple sensors, including a web-based data portal for long-term data management.

Clearly, we still don't have weather stations for biodiversity. But it can be hoped that the contents of this book will serve as a basis for eventually arriving there – so that measuring biodiversity will one time be as straightforward as measuring temperature. The greatest wonder on the planet deserves it.

Sincerely,

Christoph Scherber Head of the Centre for Biodiversity Monitoring and Conservation Science Vice director of the Leibniz Institute for the Analysis of Biodiversity Change

# **Contributors**

Paul Baggenstoss • Fraunhofer FKIE, Fraunhoferstr. 20, D-53343 Wachtberg, Germany

Sarah J. Bourlat • Centre for Biodiversity Monitoring and Conservation Science, Leibniz Institute for the Analysis of Biodiversity Change (LIB), Museum Koenig, Adenauerallee 127, D-53113 Bonn, Germany https://orcid.org/0000-0003- 0218-0298

Karl-Heinz Frommolt • Museum für Naturkunde – Leibniz Institute for Evolution and Biodiversity Science, Invalidenstr. 43, D-10115 Berlin https://orcid. org/0000-0002-5157-7358

Birgit Gemeinholzer • Institute of Biology, Botany, University of Kassel, Heinrich-Plett-Str. 40, D-34132 Kassel, Germany https://orcid.org/0000-0002-9145- 9284

Frank Oliver Glöckner • Alfred Wegener Institute, Am Handelshafen 12, D-27570 Bremerhaven, Germany https://orcid.org/0000-0001-8528-9023

Timm Haucke • Institute of Computer Science 4, University of Bonn, D-53115 Bonn, Germany https://orcid.org/0000-0003-1696-6937

Olaf Jahn • Museum für Naturkunde – Leibniz Institute for Evolution and Biodiversity Science, Invalidenstr. 43, D-10115 Berlin https://orcid.org/0000- 0001-7936-033X

Ameli Kirse • Centre for Biodiversity Monitoring and Conservation Science, Leibniz Institute for the Analysis of Biodiversity Change (LIB), Museum Koenig, Adenauerallee 127, D-53113 Bonn, Germany

Ivaylo Kostadinov • GFBio – German Federation for Biological Data, Unicom 2 Haus 2-4, Mary-Somerville-Str. 2–4, D-28359 Bremen, Germany https:// orcid.org/0000-0003-4476-6764

Frank Kurth • Fraunhofer FKIE, Fraunhoferstr. 20, D-53343 Wachtberg, Germany

Kathrin Langen • Centre for Biodiversity Monitoring and Conservation Science, Leibniz Institute for the Analysis of Biodiversity Change (LIB), Museum Koenig, Adenauerallee 127, D-53113 Bonn, Germany https://orcid.org/0000-0002- 4519-2689

Mario Lasseck • Museum für Naturkunde – Leibniz Institute for Evolution and Biodiversity Science, Invalidenstr. 43, D-10115 Berlin

Sascha Liedtke • ION-GAS GmbH, Konrad-Adenauer-Allee 11, D-44263 Dortmund, Germany

Florian Losch • Nees Institute for Biodiversity of Plants, University of Bonn, D-53115 Bonn, Germany https://orcid.org/0000-0001-7519-440X

Deniss Marinuks • Jacobs University Bremen gGmbH, Campus Ring 1, D-28759 Bremen, Germany https://orcid.org/0000-0003-2350-0449

Mario Paja • Hamburg University of Technology, Institute of High-Frequency Technology, Denickestr. 22, D-21073 Hamburg, Germany

Krzysztof Piotrowski • IHP GmbH – Innovations for High Performance Microelectronics / Leibniz-Institut für innovative Mikroelektronik, Im Technologiepark 25, D-15236 Frankfurt (Oder) https://orcid.org/0000-0002-7231-6704

Hanna Raus • Institute of Biology, Botany, University of Kassel, Heinrich-Plett-Str. 40, D-34132 Kassel, Germany https://orcid.org/0009-0009-0493-6600

Lukas Reinhold • Hamburg University of Technology, Institute of High-Frequency Technology, Denickestr. 22, D-21073 Hamburg, Germany

Alice M. Scherges • Centre for Biodiversity Monitoring and Conservation Science, Leibniz Institute for the Analysis of Biodiversity Change (LIB), Museum Koenig, Adenauerallee 127, D-53113 Bonn, Germany https://orcid.org/0009-0002- 0824-7991

Pierre-Louis Sixdenier • University of Erlangen–Nuremberg, Department of Computer Science, Chair of Computer Science 12 (Hardware-Software-Co-Design), Cauerstr. 11, D-91058 Erlangen

Volker Steinhage • Institute of Computer Science 4, University of Bonn, D-53115 Bonn, Germany https://orcid.org/0000-0002-3172-3645

Stephanie J. Swenson • Institute of Biology, Botany, University of Kassel, Heinrich-Plett-Str. 40, D-34132 Kassel, Germany https://orcid.org/0000-0002-7550- 6693

Georg F. Tschan • Centre for Biodiversity Monitoring and Conservation Science, Leibniz Institute for the Analysis of Biodiversity Change (LIB), Museum Koenig, Adenauerallee 127, D-53113 Bonn, Germany https://orcid.org/0000-0002- 5108-9602

Wolfgang Vautz • ION-GAS GmbH, Konrad-Adenauer-Allee 11, D-44263 Dortmund, Germany https://orcid.org/0000-0002-6193-3766

Domenico Velotto • MARUM – Center for Marine Environmental Sciences, University of Bremen, D-28359 Bremen, Germany https://orcid.org/0000-0002-8592-0652

Maximilian Weigend • Nees Institute for Biodiversity of Plants, University of Bonn, D-53115 Bonn, Germany https://orcid.org/0000-0003-0813-6650

Benjamin Werner • Museum für Naturkunde – Leibniz Institute for Evolution and Biodiversity Science, Invalidenstr. 43, D-10115 Berlin https://orcid.org/0009- 0001-2796-012X

J. Wolfgang Wägele • Centre for Biodiversity Monitoring and Conservation Science, Leibniz Institute for the Analysis of Biodiversity Change (LIB), Museum Koenig, Adenauerallee 127, D-53113 Bonn, Germany

Vera M. A. Zizka • Centre for Biodiversity Monitoring and Conservation Science, Leibniz Institute for the Analysis of Biodiversity Change (LIB), Museum Koenig, Adenauerallee 127, D-53113 Bonn, Germany

# **1 Introduction**

J. Wolfgang Wägele, Georg F. Tschan

# 1.1 Towards a multisensor station for automated biodiversity monitoring

Biodiversity is one of the most valuable resources of our planet. There are 10+ Million extant species and most of these are still unknown to science (Mora et al. 2011; Locey and Lennon 2016). The biosphere retains a wealth of hitherto untapped genetic resources, which are relevant to food production, medicine, bioenergy production, and life-supporting ecosystem functions. These resources are currently deteriorating, with consequences for the quality of life of future generations. The ongoing steady loss of species is irreversible and leads to an impoverished world (Ehrlich and Pringle 2008; Dirzo et al. 2014) that will not recover its species richness within the next 5 million years (e. g. Benton and Harper 2009). Lost species will never evolve again, after regeneration the biological world will be different.

More than 30 years ago, large-scale destruction of habitats and loss of biodiversity alarmed researchers and policy makers (Ripple et al. 2017). Goals to protect biodiversity were defined by the *Convention on Biological Diversity* (1992). The 2010 goals of the European Parliament, later renewed for 2020, had little effect (e. g. van Swaay et al. 2013). One important reason for the implementation failure is the lack of comprehensive and reliable biodiversity data (Mihoub et al. 2017). In analogy to climate research, such data are needed as a basis for making informed decisions, to analyze causes of local extinctions, to provide evidence for trends, to model scenarios that explain changes and for prediction of future processes, and to assess the effectiveness of conservation interventions and their usefulness based on scientific data.

It is a fact that the disappearance of a large part of the insect fauna in Central Europe (Hallmann et al. 2017) was not discovered for many years despite the efforts of public nature conservation institutions to document trends of biodiversity. Taxon specialists have been observing the dwindling numbers of single taxa (e. g. Laussmann et al. 2009; Van Dyck et al. 2009; Schuch et al. 2012; Brooks et al. 2012; Goulson et al. 2015), however, until 2017 it was not obvious that all flying insects are affected. A comprehensive monitoring of insects did not exist and until now it has not been established, even though scientists have frequently been calling for a broad, institutionalized, systematic monitoring of habitats and populations (Balmford et al. 2003; Field et al. 2007; Hausmann et al. 2020).

Unfortunately, there are six major obstacles impeding biodiversity monitoring at the species level:


The workload implies high costs (e. g. Gardner et al. 2008) and forces ecologists to select indicator taxa and representative sampling sites (e. g. Herzog et al. 2016). However, the most significant obstacle is the taxonomic impediment. Even when sampling campaigns are well planned and executed, the samples have limited value if the majority of species cannot be identified. This difficulty is mainly due to the fact that taxonomists are scarce and specialized for selected taxa. Furthermore, the sorting and curation of samples requires time and resources. Describing the regional diversity only for a few taxa (e. g. bees) or limitation of identifications to higher taxa (e. g. 'Chironomidae') is insufficient because even closely related species can react differently to environmental changes, negative trends of single species and the climate-related change of their geographical distribution will be overlooked, (e. g. Bommarco et al. 2011; Ekrem et al. 2007; Elbrecht et al. 2016; Homburg et al. 2019; Janzen et al. 2009; Köhler et al. 2005; Schuch et al. 2012; Smith et al. 2006).

Large-scale and long-term automated monitoring of biodiversity (as established for climate research) does not exist, among other reasons, because the required technology is not yet currently available, however, the technical prerequisites for this are in place. It is therefore crucial to adapt existing technologies for the development of automated, reliable, and verifiable biodiversity monitoring. Similar to climate research, we need 'weather stations for species monitoring' in addition to remote sensing.

It is possible to construct automatized multisensor stations for monitoring species diversity (**AMMOD**s) using available technologies (see Wägele et al. 2022). These technologies include bioacoustic sensors, tailored imaging systems, automated image analysis, DNA-barcoding, analyses of volatile organic compounds (VOCs) and discriminators to distinguish specimens by their inherent characteristics (such as movement and velocity) using artificial intelligence approaches or multivariate discriminators (Figure 1). Thus, it is possible to detect simultaneously and with high temporal resolution various classes of organisms present in a locality.

The goal of the first phase of the AMMOD project, funded by the German Federal Ministry for Research and Education, was to build single prototypes of automated sensors and base stations, and to develop data workflows. After this proof of concept, we intended to build a small network of stations with more cost-efficient hardware. Here we summarize results from the first three years of the project. Unfortunately, the project came to a halt, because of the changed priorities for the use of public funds, a consequence of the Russian invasion of Ukraine and the subsequent support of the Ukrainian military by other countries, including Germany.

We hope that the detailed descriptions of technologies, laboratory procedures and data workflows included herein will help those who have the chance to develop networks of biodiversity monitoring stations.

# 1.2 The test sites

Since the technologies had to be tested in the field, three test sites were selected together with all project participants. These sites had to meet the following requirements:


It should be noted that the last point, 'Biodiversity' – although central to the project's objective – was of less importance during the pilot phase, because the technological, logistical and informatics basis had to be developed first. For biodiversity research, significantly more stations need to be set up based on clearly defined *ecological* questions, which are based on the AMMOD technique. The above requirements also clearly limited the potential sites where the stations could be assembled. However, in collaboration with local research institutes, we could select three test sites that could be reached relatively easily (by bicycle and public transport) during the project duration. That the locations were well chosen became particularly evident during the Corona pandemic, when access was severely limited. The three test sites were:


The three test sites are geographically well distributed across Germany (Figure 2A). Both urban (sites **①** and **③**) and rural areas (site **②**) are covered. On an overall map of Germany, the sites are located in the Atlantic and Continental zones of the biogeographical regions as defined and used by the European Union to classify Natura 2000 sites (European Commission 2021; Figure 2C). The degree of fragmentation of the German Natura 2000 sites can be seen in Figure 2B.

# 1.2.1 Melbgarten, Bonn

The *Melbgarten* is a branch of the University Botanical Gardens and is located on the Venusberg in Bonn (**①** in Figure 2). The site partially borders a nature reserve and has a slight slope towards the west (LANUV NRW 2013). At the end of the Second World War, the area of the Melbgarten was donated by the city of Bonn to the university to establish experimental facilities (Botanic Gardens Bonn 2023). Completely enclosed by a fence, with power supply and internet access, the garden area is well suited for experiments with sensitive electronic equipment. The site was used to set up a fully equipped prototype of AMMOD

**Figure 1.** Schematic view of the AMMOD experimental setup. **①** Bioacoustics, microphones of the four-channel recording device (Chapters 6 and 7). **②** 'Smellscapes', stationary module for the detection of volatile organic compounds (VOCs) emitted by plants and fungi (Chapter 2). **③** 'Moth scanner', photographic imaging and identification of mostly-nocturnal flying insects (not presented here). **④** Camera trap, representing the automated visual monitoring of vertebrate animals, especially mammals (Chapter 8). **⑤** 'Base station', the communication and processing block for the sensor data and (partly) the electricity supply (Chapter 9). **⑥** Data management, the connection of the base station to the data portal (Chapter 10). **⑦** Automated 'Malaise' trap for catching flying insects (Chapters 4 and 5). **⑧** Automated pollen trap (Chapter 3). The material from both the insect and the pollen trap is subsequently analysed in the laboratory using Metabarcoding techniques. – Graphic design and creation: J. W. Wägele.

(Figure 3). Due to budgetary constraints, the Melgarten site remained the only site where all modules were operated simultaneously. Thus, the stationary GC-IMS device (Smellscapes module) was not set up at the other sites, but visits with the mobile version of the device were made to the other locations.

For this test phase of the AMMOD design, an external power supply was planned, as the actual consumption of electricity first had to be assessed before a completely autonomous supply would be possible. However, after an initial

**1 Introduction**

**Figure 2.** (**A**) The AMMOD test sites in **①** Bonn, **②** Britz and **③** Hamburg. (**B**) Natura 2000 sites in Germany and neighbouring countries. The high degree of fragmentation is evident. (**C**) Biogeographical regions and urban areas in relation to the sites. – Maps: G. F. Tschan, created with QGIS (QGIS Development Team 2020); projected coordinate system, ETRS89 (EPSG: 25832). Data sources: EEA (2021), Mundialis (2021) and Natural Earth (2021).

**Figure 3.** The Melbgarten site in November 2020, looking west. The control cabinet of the base station is approximately in the centre of the picture, to the left of it the automated insect trap (but without the Malaise net) and its solar panels can be seen. – Photograph: G. F. Tschan.

site visit in December 2019, the first component to be permanently installed in July 2020 was an automated Malaise trap (Barcoding module) that fully autonomously collects insect samples with the help of two solar panels (Figure 4). Power and internet cables specifically for the project were laid later that year, and could be used from within the control cabinet from November 2020.

Between 2020 and 2022, all sensors were successively installed at the Melgarten site (Malaise and pollen trap, Smellscapes module, wildlife cameras and animal sound recorders). Many of the components received updates during the course of the project, especially the acoustics and visual monitoring devices, as well as the base station. Some devices such as the pollen trap were still in operation in 2023 (Figure 5).

An important contribution to the understanding and subsequent acceptance of the use of the AMMOD technologies remained outreach, more specifically the explanation of the devices used on-site to interested visitors. Since the Melbgarten site is not open to the public, specific events were used to promote the project locally in 2022 and 2023, when the collections can be visited during the 'Spring Festival' (cf. Botanic Gardens Bonn 2023). In addition, some of the AMMOD equipment was presented to the public during the University of Bonn's 'Biodiversity

**Figure 4.** The automated malaise trap of the Metabarcoding module, July 2020. Note the two solar panels, each facing in a different direction. The garden's greenhouse can be seen in the background on the right. – Photograph: G. F. Tschan.

**Figure 5.** The automated pollen trap of the Metabarcoding module, February 2022. In the beginning, and at the end of the sampling season, the external power supply was used, when the sunlight was not strong enough to enable adequate supply via the solar panels. – Photograph: G. F. Tschan.

Day' in the same years. The AMMOD project and the Melbgarten site were also featured on German public-service television in February 2023 (ZDF 2023).

# 1.2.2 Britz, Ecological research station of the Thünen Institute near Eberswalde

The ecological research station Britz of the Thünen Institute for Forest Ecosystems (Thünen 2023) was used as the second AMMOD experimental site (**②** in Figure 2). The site near Eberswalde covers an area of 4 hectares, is surrounded by pine forest and comprises different forest structures and open areas (Figure 6). It is a Level II plot of the nationwide forest monitoring program. Through a cooperation agreement between the Museum für Naturkunde Berlin and the Thünen Institute, use of the site for the AMMOD project was made possible. The Thünen Institute also contributed relevant data from the forest monitoring to the AMMOD project. The research station has an internet connection on site (both LAN and WLAN), which could be used for the project. In addition, the site is completely enclosed by a fence and Thünen Institute staff are on site during the week.

Due to the relative closeness of the Museum für Naturkunde Berlin, regular visits of project personnel could be made to the site. In contrast to the Melbgarten site, however, not all devices were permanently employed here. As a benefit, the bioacoustic sensors received special attention at Britz (Figure 7).

### 1.2.3 Energieberg Georgswerder, Hamburg

Through the Museum of Nature Hamburg (formerly CeNak), contact was established with the operators of the so-called '*Energieberg*' (literally meaning 'energy mountain') in Georgswerder, which is located northeast of Hamburg's Wilhelmsburg quarter (Hamburg 2023). The Energieberg is a former landfill site that was covered with trees and undergrowth and had already been used by the Museum of Nature for setting up Malaise traps (**③** in Figure 2). The site provides an area enclosed by a fence where a base station could be placed and connected to the electricity grid. Its central location offered a convenient infrastructure to test the first prototypes. Accessibility to the Energieberg, mobile phone reception and power supply could be ensured.

**Figure 6.** Habitat structures at the ecological research station Britz. (**A**) Coniferous forest. (**B**) Open space. – Photographs: K.-H. Frommolt.

**Figure 7.** Autonomous acoustic recording unit at Britz. The sounds in the audible range are continuously recorded in loops of 5 minutes recording and 10 minutes pause. – Photograph: K.-H. Frommolt.

The industrial environment of the Energieberg, on the other hand, posed a challenge for the sensors. In particular, the acoustic sensors could only be used to a limited extent due to the noise pollution and were also only used for short time periods. The site was not commissioned for the project until the first experiences and results from the Melbgarten site were available, which was at the end of 2021. A base station and some of the sensors were then set up. The site was used for the project until May 2023.

# **References**



D, Hausmann A, Kitching I, Lafontaine D, Landry J, Lemaire C, Miller J, Miller J, Miller L, Miller SE, Montero J, Munroe E, Green SR, Ratnasingham S, Rawlins J, Robbins R, Rodriguez J, Rougerie R, Sharkey M, Smith MA, Solis MA, Sullivan JB, Thiaucourt P, Wahl D, Weller S, Whitfield J, Willmott K, Wood DM, Woodley N, Wilson J (2009) Integration of DNA barcoding into an ongoing inventory of complex tropical biodiversity. Molecular Ecology Resources 9: 1–26. https://doi.org/10.1111/j.1755-0998.2009.02628.x


# **2 Smellscapes: automated monitoring of volatile organic compounds in ambient air**

Florian Losch, Sascha Liedtke, Wolfgang Vautz, Maximilian Weigend

#### **Abstract**

Plant volatile organic compounds (pVOCs) are emitted by plants into the atmosphere. The emissions are influenced by a variety of abiotic and biotic factors (e. g. Herbivory, drought, heat, etc.) and can therefore provide information about the physiological status of plants within an ecosystem. However, ambient air is a complex and humid mixture and the concentrations of pVOCs are very low. Thus, highly sensitive and selective analytical tools are required for a continuous monitoring. In the AMMOD project, we installed an ion mobility spectrometer with coupled gas chromatographic pre-separation combined with an in-line preconcentration systems in the field (ppq-tec-GC-IMS). This allowed automated monitoring, with minimal maintenance and good results in terms of the robustness of the device in the field. Based on this, annual courses of emissions could be analysed from the ambient with high time resolution, revealing a clear seasonal course of emissions. Furthermore, 15 compounds have already been identified in reference experiments and assigned to plant origin, including typical green leave volatiles such as (*Z*)-3-hexenyl acetate or (*Z*)-3-hexen-1-ol and monoterpenes such as α-pinene, β-pinene and camphene.

In addition to seasonal changes, the temporal resolution was sufficient to record detailed diurnal concentration differences of individual volatile compounds. In our data, especially monoterpenes such as α-pinene showed a maximum in the morning hours, while other substances showed an early afternoon peak. Furthermore, correlations with abiotic factors could also be identified by comparing the data with weather data, whereby temperature seems to be the main driver.

### 2.1 Introduction

#### 2.1.1 Volatile organic compounds

The volatile organic compounds (VOC) released by living organisms carry information about their identity and physiological status. Kesselmeier and Staudt (1999) gave a broad overview of various VOC patterns characteristic for different plant species and the influence of light and temperature on their expression. The emissions of VOC are influenced by various abiotic and biotic factors, such as seasonal variation (Bracho-Nunez et al. 2013; Bourtsoukidis et al. 2014; Brilli et al. 2016), air pollution (Li et al. 2017), weather (Šimpraga et al. 2011; Niinemets and Sun 2014; Llusia et al. 2015) and even herbivore activities (Clavijo McCormick et al. 2012; Erb et al. 2015; Bouwmeester et al. 2019).

VOC emissions from vegetation (plant volatile compounds = pVOC) are primarily determined by the taxonomic composition, the relative abundance of individual taxa and their phenology (Kigathi et al. 2019). Taxonomic composition and abundance of taxa change slowly and minor changes may be difficult to trace, especially for rare taxa. However, major components of the vegetation and especially their phenology (bud break, flowering, fruiting, leaf fall, attack of pest insects, effects of pesticides, ploughing) should leave a strong imprint on VOC patterns. Highly resolved VOC-patterns would thus provide a uniquely detailed picture of the reaction of plant communities to short term weather patterns on an hourly rather than daily or weekly basis (as typical of phenological calendars). These data could be invaluable for understanding and interpolating ecosystem reactions in view of the complex and poorly understood reactions of different plant species to changing weather patterns in times of global climate change (Peñuelas und Llusià 2003; Yuan et al. 2009).

Plant abundance and phenology are of crucial importance for the consumers in a habitat and have a dramatic influence on the entire food chain. Thus, detailed real-time data on vegetation development would permit conclusions on ecosystem integrity and trends. A simultaneous documentation of anthropogenic compounds released into the ambient air (e. g. pesticides) would be of particular interest for understanding ecosystem reactions to xenobiotics. However, environmental VOC (eVOC; Sum of the VOCs that are in the ambient air at a given point in time) or more precisely plant VOC (pVOC) concentrations in ambient air are in the lower ppb down to the ppt level. The expected mixture of compounds is

extremely complex and humid, and could be influenced by biogenic and anthropogenic emissions. Therefore, a sensitive and selective analytical tool is required for a continuous monitoring of characteristic volatiles. The most common and well-established analytical technique is mass spectrometry combined with GC pre-separation or GC combined with flame ionisation detectors (FID). However, broader time resolved measurements of VOC-fluxes using GC-MS or GC-FID often rely on highly sophisticated and extremely costly experimental set-ups, i. e. the Amazonian tall tower observatory or the prophet tower (Yáñez-Serrano et al. 2018; Fischer et al. 2021). Therefore, a rapid and automated analytical tool providing the necessary high selectivity and sensitivity is the ideal alternative.

# 2.1.2 Ion mobility spectrometry

Ion mobility spectrometry (IMS) is an analytical tool for detecting traces of ionised molecules in the gas phase after separation for size and shape. It is an extremely sensitive method that can detect molecules in the low ppb to ppt concentration range.

In general, a measurement can be divided into two phases, first the ionisation phase and second the separation phase. In the first phase, molecules are ionised in the ionisation chamber. Most commonly this is achieved via a ß-radiation source, which ionizes the drift gas flushing through the spectrometer, thus forming the so-called reactant ions in the ionization region. If a sample with other gasphase compounds is introduced in ionization region, analyte molecules are ionized mainly by proton or charge transfer. In the second phase, separation of ionised molecules occurs. For this, the ions are accelerated in a weak electric field towards the detector, but only clouds of ions are introduced periodically into the drift region by a Bradbury-Nilsson ion grid. During their drift from the grid to the detector, the ions collide with the molecules of the drift gas counterflow (Figure 1).

Collision frequency depends on size and shape of the ions, thus influencing drift velocity. Drift velocity can be determined by measuring the drift time of the particular ions under known drift length and the ion mobility is obtained by normalization to electric field strength. Further normalization of the measured drift time to the position of the reactant ion peak (RIP) is leading to reproducible relative ion mobilities, which are independent of instrumentation and environmental parameters such as temperature and air pressure (Vautz et al. 2009).

**Figure 1.** Schematic illustration of an ion mobility spectrometer. Analytes are ionized and then accelerated in an electric field towards the detector. The drift time is influenced by the collision frequency of analytes with an opposing drift gas.

Such separation and detection can be operated in the positive and negative ion mode just by changing the polarity of the electric field. Thus, alternating analysis of positive and negative ions is carried out. For details of ionisation and separation in ion mobility spectrometry see Eiceman et al. (2016).

Coupling fast gas-chromatographic (GC) pre-separation to IMS provides the characteristic retention time as additional measure for the identification of the analyte and furthermore avoids clustering of different analytes in very complex mixtures in the ionisation and drift region of the IMS. Analysis time of a full GC-IMS run is in the range of 10–30 minutes, depending on the experimental setup of the GC. Additionally, innovative in-line MEMS-based (microelectromechanical systems) in-line pre-concentration systems are applied. This further increases the sensitivity by ca. 1–3 orders of magnitude without increasing the total analysis time and allows automation of pre-concentration and analysis without separating these steps in time and space, enabling a high sample throughput (Liedtke et al. 2019; Vautz et al. 2018).

# 2.1.3 Aim & scope

After construction of the stationary GC-IMS for continuous measurements and optimization for the detection of VOCs in the ambient air, the prototype was implemented for long-term, time resolved measurements in the study area. Furthermore, a mobile version of the GC-IMS with the similar experimental setup was provided for measurements in the lab or in the field, e. g. directly to characterize different plants and plant parts. Concomitantly, a reference database was compiled. This includes an inventory of frequently occurring species in the Melbgarten, as well as a substance database for identification and characteristic emission patterns of common plant species. The aim is to use this reference database to clearly identify plant-related volatiles and, if possible, to assign signals from the long-term measurements to individual plants. In addition, trends are to be analyzed via the time-resolved measurements with regard to seasonality, diurnal and weekly variations and on the influence of particular weather conditions.

The resulting initial questions were "Are there differences in the emission patterns of different plants?", "Can we detect pVOC signals in the ambient air?", "Can we assign detected signals to plant sources?", "Are there substances that show a diurnal/weekly cycle?", "How do the emission data correlate with abiotic factors (temperature, humidity)?".

In the following, we want to go into more detail about the methods and the structure of the station, as well as the first results of long-term time resolved measurements.

# 2.2 Material and methods

## 2.2.1 Sampling location and plant community

The sampling site is located at the Melbgarten (Bonn, Germany) at 50.71297°N, 7.09035°E (Figure 2). The Melbgarten is an experimental station of Bonn University Botanic Gardens. The area is surrounded by a forested area (Melbtal) with typical European tree species (e. g. *Fagus*, *Quercus*, *Acer*) and pastures. The nearest major road is located approximately 150 m west of the monitoring station and is shielded by vegetation. Built-up areas are situated some 100 m north and 150 m west of the station. Due to the distance to roads and human settlements, an-

#### **2 Smellscapes: automated monitoring of volatile organic compounds in ambient air**

**Figure 2.** Detailed map of the immediate vicinity of the Melbgarten (yellow area) and the localization in the greater Bonn area. Base map: OpenStreetMap (OSM). Additional layers: Geoportal NRW (Naturschutzgebiete, Straßen, Wasserfläche, Bebauung).

thropogenic influences on emissions are expected to play a subordinate role. The station is placed on sloping meadow with herbaceous plants and scattered trees. This meadow is mowed once a year (late June). A large part of the area is dominated by Poaceae (Gramineae) and *Urtica dioica*. In addition, there are various ecosystem-typical representatives of plants from various families, with Asteraceae, Lamiaceae and Fabaceae contributing most to the diversity of the plant community (Appendix 6.1). *Sorbus* and *Sambucus nigra* are the typical trees. The meadow borders on a lawn, which is cut at regular intervals (ca. every four weeks).

### 2.2.2 Instrumentation

Two identical ion mobility spectrometers are used for the measurements in the *Smellscapes* module. The first device is designated to stationary use and

is installed in the monitoring station at the Melbgarten. The second device is additionally equipped with batteries and is used for reference measurements in the laboratory or mobile measurements in target areas.

The ion mobility spectrometers are ppq-tec-GC-IMS from ION-GAS GmbH (Dortmund, Germany) based on hardware provided by STEP GmbH (Pockau-Lengefeld, Germany). The default sample loop was substituted with a MEMS-based in-line pre-concentration chip filled with Carbograph 4 as adsorbent (CNR-IMM, Bologna, Italy, for detailed description see Liedtke et al. 2019) for increased sensitivity. We used a MXT-200 capillary column (30 m × 0.53 mm, 1.5 µm coating) for GC pre-separation, which is operated isothermally at 80 °C, with a carrier gas flow (filtered air from internal gas circuit) of 21 mL min-1.

The internal gas flow is purified via a filter system. In total, the GC-IMS has two external filters (Molsieve and activated carbon) and four larger internal filters filled with molsieve 9 Å. Filters need to be replaced approx. after 12–18 months. The filters are regenerated externally by the STEP GmbH.

Ionization is carried out with a tritium source of β-radiation (100 MBq). Drift length of the IMS is 5.61 cm. The drift tube is operated isothermally at 70 °C at a field strength of 300 V cm-1. The instrument provides an automatic polarity switch between measurements, which allows measurements to be made in both positive and negative modes.

The ppq-tec-GC-IMSs have built-in computers that permit on-site control, but also permit remote access via the internet (e. g. via TeamViewer). Data is stored on a built-in 256 GB hard disk. A user manual for detailed technical information and operating instructions of the ppq-tec-GC-IMS is available from the Nees-Institute and ION-GAS GmbH.

## 2.2.3 Housing and sample transfer

To protect the GC-IMS from harmful influences (weather, vandalism etc.), it is housed in a safety cabinet (SciCab12, Figure 3). The sample inlet is located at a height of approx. 2 m outside this cabinet. To prevent contamination of the GC-IMS filters with pollutants such as pollen or other microparticles, the ambient air is initially filtered using a PTFE-round filter (diameter 47 mm) with a pore size of 6 µm and a thickness of 0.1 mm (RCT- SVX from RCT Reichelt

**Figure 3.** Stationary GC-IMS in the safety cabinet (**A, B**). the safety cabinet is located on a green area in the Melbgarten (**C**). The samples are taken at a height of approximately 2 m (**D**).

Chemietechnik GmbH & Co., Germany). The sample inlet and the GC-IMS are connected through a PFA-tube (diameter ¼ inch). A suction pump regulates an ambient air flow of 500 mL min-1. The GC-IMS is connected via a T-junction with the constant air flow of ambient air.

### 2.2.4 Sampling and time-resolved monitoring

Time-resolved measurements of VOCs have been performed at the sampling site in the Melbgarten since 2021-03-22 (including pauses for optimization and maintenance). The main intervals of measurements were from 2021-06-15 to 2021-08-13 and from 2022-03-22 ongoing.

The samples for the GC-IMS are taken through the T-junction at a sampling rate of ~ 60 mL min-1 from the constant flow of ambient air. For the long-term monitoring, a total sample volume of 1000 mL was used for the enrichment (sampling time approx. 20 minutes, enrichment temperature = 43 °C). Subsequently, substances were released via heat desorption (50 °C for 5 s, then temperature ramp to 290 °C in 5 s, 290 °C for 12 s) and transferred to the GC-IMS for separation and detection. To eliminate carry-over, the pre-concentrator chip is purged at 300 °C for 2 minutes prior to each measurement. Furthermore, the IMS and GC should be baked out at regular intervals to prevent the accumulation of impurities. For this purpose, there is an internal bake-out procedure in which the IMS and GC column are heated above the usual sample temperature for 24 hours.

The duration of the GC pre-separation is 25 min, followed by rapid mobility separation and detection in the IMS (a few µs). After each measurement, the polarity of the IMS switches automatically for the next measurement to cover a wider range of compounds (positive and negative ionisation modes address different substance classes). Combined with a short sampling break (3 min) after each measurement, we are able to record one measurement per hour. In-line enrichment and analysis are fully automated and run 24/7. The data generated, including telemetry data, are directly uploaded into the AMMOD-cloud and are additionally stored on the built-in hard disk.

# 2.2.5 In-house reference database

For signal identification we generated a reference database of common pVOCs and some characteristic anthropogenic compounds. Substances of interest were selected based on literature research and GC/MS reference measurements. Measurements of pure reference substance were conducted for validation. Furthermore, some substances could be identified by comparing *Kovats* retention indices against a series of n-ketones combined with GC/MS reference measurements. However, this is only suitable for less complex mixtures of volatile compounds.

The database contains signals for 26 aliphatic hydrocarbons, 27 aromatic compounds, 27 Monoterpenes and two nitrogen containing compounds. Most of these substances are biogenic, but they also include some predominantly anthropogenic substances such as toluene or xylenes. A list of all the substances identified is provided in Appendix 6.2 (Table A2).

# 2.2.6 Targeted measurements of characteristic plants

Reference measurements of individual plants were carried out in the laboratory and in the field. For measurements in the field, a means of transport (pram) was converted as required and equipped with additional storage area, suspension and fixing options (Figure 4). An additional battery pack makes autonomous operation for up to eight hours possible.

The sample inlet is positioned directly in front of the relevant plant part (e. g. the flower) and an appropriate sample volume (between 10 and 100 mL, depending on pVOC concentration) is taken for enrichment and subsequent analysis. In the case of unknown or unpredictable samples, it is advisable to proceed slowly from low sample volumes to higher sample volumes in order to avoid overloading the GC-IMS.

Alternatively, the plant parts of interest may be enclosed with inert material (e. g. frying hose) and the samples are taken from the enclosed volume. This static headspace is particularly beneficial for plant parts that are difficult to reach or for low emitting parts (e. g. foliage), as volatile compounds accumulate in the enclosed gas space. Under laboratory conditions, higher concentration of pVOCs compared to environmental measurements were expected. Therefore, the valve opening times of the PreConcentrator could be reduced with no influence on ion mobility and retention time, but leading to a reduction of signal intensity, and thus avoiding GC-IMS overload. The particular setup with regard to sensitivity was adapted to the specific plants and measurement conditions. Blank measurements and background measurements are conduct-

**Figure 4.** Exemplary images for measurements of flower volatiles in the genus *Narcissus* in the field (**A**) and in the laboratory (**B**).

ed prior to sample measurements, to identify pre-concentrator background and background signals from ambient air.

Based on the procedure described above, over 40 samples of typical plants or plant parts, respectively, where already analysed. This included commonly distributed plants such as Elder (*Sambucus nigra*), beech (*Fagus sylvatica*), maple (*Acer pseudoplatanus*), hornbeam (*Carpinus betulus*), birch (*Betula papyrifera*), different species of rowan (*Sorbus* spp.), walnut (*Juglans regia*), cornel (*Cornus mas*), different species of cherry (*Prunus* spp.), apple (*Malus sylvestris*), Scots pine (*Pinus sylvestris*), snowdrop (*Galanthus* spp.), ramps (*Allium ursinum*), dandelion (*Taraxacum* spp.), daisy (*Bellis perennis*), daffodil (*Narcissus*), *crocus* (*Crocus*), lilac (*Syringa vulgaris*), herb-rober (*Geranium robertianum*), creeping thistle (*Cirsium arvense*), clover (*Trifolium* spp.) and greater stinging nettle (*Urticia dioica*). Furthermore, differences in floral scents within individual genera were investigated using the example of *Narcissus*.

#### 2.2.7 Data evaluation

*IONysos*, a custom-made software developed by ION-GAS GmbH, Dortmund, Germany is used for processing, evaluation and visualization. In a first step, drift times are normalized to obtain the relative ion mobility and a gaussian smoothing algorithm is applied to reduce noise. Next, the retention times of each measurement are aligned on the basis of ubiquitous occurring signal with known retention time, allowing comparison between different measurements. This is necessary, as small changes in gas flow can lead to small fluctuations in retention time. After blurring, normalization of drift time and an alignment of retention times, emission patterns are visualized as two-dimensional heatmaps. Substance-specific signal position is determined by the retention time in the GC and the relative ion mobility from the IMS. The peak volume of individual signals permits semi-quantitative statements of the abundance for individual compounds (Figure 5). Individual signals can now be selected manually.

Substances are identified by a comparison of the retention times and relative ion mobilities with those of reference substances in our database. Unidentified substances are characterised by retention time and relative ion mobility and the peak data is stored for future identification. Peak volumes of the signals detected can be calculated across all measurements for semi-quantitative evaluation.

**Figure 5.** Exemplary visualization of the raw data as a 2-dimensional heat map (left) and in a 3-dimensional plot (right) using the example of the emission pattern of green daffodil (*Narcissus viridiflorus*). Three main components could be identified based on the retention time and the relative drift time as eucalyptol (1), benzyl acetate (2) and phenethyl acetate (3).

Furthermore, telemetry data (e. g. GC temperature, IMS temperature, gas flow, etc.) is collected in addition to the actual measurements.. This permits (remote-) monitoring of the performance of the instrument.

#### 2.2.8 Database design

The collected data is stored in a database based on MySQL (Oracle, Texas), a relational database management system (RDBMS). The database is composed of different tables These tables contain a list of species from the Melbgarten (Species-List), a list of reference substances (refcomp), information on sampled plants (samplestatus), emission patterns of sampled plants (emissionpattern) and information on the equipment used (device). These parts are linked to each other via so-called keys, allowing information from different tables to be accessed quickly (Figure 6).

Individual plant names are recorded with an ID in the *SpeciesList* table, together with taxonomic information. Data on substances that have already been identified as well as unknown signals are stored in the *RefComp* table. An ID is assigned to each signal or each substance, respectively with the corresponding retention time and reduced ion mobility. For identified substances, there is additional information on substance name, CAS number, retention index, mass weight and signal type (monomers or dimers).

Device-specific information is stored in the device table. Since only two devices are currently in use, there are only these two entries here.

**Figure 6.** Scheme of the relational database for plant emission patterns. The database contains various parameters that make it possible to identify the presence of certain signals in species that have already been measured individually.

The last two tables refer to the sample status, which contains information of reference measurements performed (e. g. plant part, plant status, polarity of the measurement, number of identified signals, origin of the sample) and the actual emission patterns recorded. The table emissionpattern contains information about the relative abundance of individual signals in reference measurements, but also information about the nominal signal strength.

Information can thus be extracted from the measurements employing different queries, while the keys guarantee database integrity. The aim is to make this database publicly available. The basic framework and functionality have already been implemented, but the expansion with measurements is still an ongoing work process.

## 2.2.9 Meteorological data

Meteorological data were provided initially by the Meteorological Institute of Bonn University, whose weather station is located in Endenich, approx. 2.5 km from the sampling site. Later on, they were obtained directly from the weather station in the Melbgarten, which was implemented during the AMMOD project. The data considered here include temperature, air pressure, relative humidity, precipitation and global radiation, as well as wind data (wind direction and wind speed).

The year 2022 was an overall warm year with low precipitation. The summer months of July and August, as well as March and April were particularly dry. Even in October and November there were still high temperatures with rather little rainfall. The rainiest month was September, the driest month was August. The coldest month was March, the warmest August (Table 1).

The conditions in the summer of 2021 differed considerably from those of the following year (Table 2), when temperatures were lower and the amount of precipitation higher. This applies in particular to the month of July, with the heavy rain events that occurred there (more than 100 mL m- ² in a single day).

### 2.2.10 Statistical analysis

Conventional descriptive parameters (e. g. mean, median, standard deviation) are calculated for the eVOC and weather data. Signal intensities are plotted


**Table 1.** Monthly summary of weather conditions in Bonn for the measurement series in 2022.


**Table 2.** Monthly summary of weather conditions in Bonn for the summer months of 2021. Long-term measurements in that year were limited to June to August.

against time for time-resolved immission patterns. Correlations are identified using Pearson correlation coefficient.

*Shapiro-Wilk test* and *Levene's test* are used to test for normal distribution and the homogeneity of the variance. In case of non-normal distribution data, we use the non-parametric *Kruskal-Wallis test* as significance test and the *Wilcox-test for* pairwise testing of significant differences between two or more groups. Statistical tests and data evaluation are conducted with the open source software R (version 4.0.4; R Core Team 2023).

# 2.3 Results

# 2.3.1 Automation & telemetry

Key points in the recording of long-term data are reliable automation and system stability to generate a high throughput of comparable measurements. We are able to achieve an automated measurement for environmental volatiles using GC-IMS by combining in-line enrichment and subsequent GC-IMS analysis. Minimal maintenance is required. However, sensor failure can still lead to occasional measurement gaps. Therefore, the functionality of the device needs to be monitored at regular intervals. Critical parameters for an overview of performance parameters of the GC-IMS are drift, sample and carrier gas flow, detector temperature, GC temperature, preconcentration temperature (precon-chip), Baseline, RIP position and RIP width at half height (WHM).

The mean sample gas flow during the measurement series was 60.74 ± 2.8 mL min-1 and the mean carrier gas flow was 21.53 ± 0.72 mL min-1

**Figure 7.** Telemetry data of gas flow rates, GC and IM temperatures over a selected time period. Carrier gas flow and temperatures in the GC and IMS are very stable. The sample gas flow is subject to greater fluctuations, which can be attributed to the viscosity of air at different temperatures.

across measurements for both polarities (Figure 6). Gas flows were largely constant with only minor fluctuations. IMS temperature and the GC temperature were essentially constant (Figure 7). The mean GC-temperature was 79.91 ± 0.61 °C (target value 80 °C) and the mean IMS-temperature was 69.94 ± 0.06 (target value 70 °C). The minimal variations are a result of small fluctuations around the set point and result from technical constraints of the temperature control.

The mean enrichment temperature for the PreCon-Chip was 38.1 ± 3.93 °C. Enrichment temperature was slightly sensitive to ambient temperature, especially at high ambient temperatures. This can cause problems, especially in summer. Conversely, the enrichment temperature was broadly stable in low ambient temperature. The mean reduced inverted ion mobility of the RIP was 0.446 ± 0.002 Vs cm-2 for negative polarity and 0.487 ± 0.002 with a mean WHM of 0.20 ± 0.01. The baseline was constant at 4.97 V. We thus find that all critical parameters are reasonably constant around the target values. Ion mobilities and retention times can be aligned to known signals and a comparability of measurements over longer time periods is ensured.

# 2.3.2 Selective measurements of common plants

We were able to record emission data for different plants and observed differences between the detected pVOC patterns. However, there were also plants where we could not detect volatile compounds under undamaged conditions (e. g. *Trifolium pratense* or *Convolvulus arvensis*). Individual substances could be identified on the basis of the retention time and the relative ion mobility (Figure 8).

There are distinct differences in the emission patterns of flowers and vegetative tissues. Flowers in particular stand out due to their diversity and, in some cases, high intensity of individual signals. Benzaldehyde, benzyl alcohol, (*E*)-β-ocimene, methyl salicylate, linalool, methyl benzoate or lilac aldehydes could be identified as typical components of floral odours. We also found as

**Figure 8.** Emission patterns of different plant species or plant parts, respectively, showing characteristic emissions of certain compounds.

yet unidentified substances e. g. in different species from the genus *Sorbus* (Figure 9). Unique combinations of substances were often observed in the flower fragrances, even if individual substances are very common. There can be also large differences in the overall signal intensity. Some flowers have no, or only weak emissions (e. g. *Galanthus*). Others, such as *Sambucus nigra* or *Ligustrum vulgare*, have strong emissions. Emission patterns of flowers can be species-specific, as we have shown for species from the genus *Narcissus*. However, there is often signal overlap in and sometimes differences are only

**Figure 9.** Comparison of emission pattern. (**A**) Characteristic patterns of different plants and plant parts, respectively. (**B**) Comparison of leaf emission patterns of *Fagus sylvatica* for undamaged and damaged leaves.

seen in nuances such as the absence or presence of individual specific substance or a different abundance of the same substances (Losch et al. 2023).

For vegetative tissue or green plant parts (e. g. leaves), respectively, little or no emissions where observed under undamaged condition and there is hardly any diversity in the composition of the substances. 6-methyl-5-hepten-2-one (e. g. *Fagus sylvatica*, *Acer pseudoplatanus*, *Sambucus nigra*, *Quercus robur* etc.) or (*Z*)-3-hexenyl acetate (*Urtica dioica*) were detected as characteristic substances of green plant parts. *Pinus sylvestris* represents a notable exception with several typical monoterpenes such as α-pinene, camphene or β-pinene detected. These common components of essential oils are widespread in, e. g. conifers. Damaged leaves, on the other hand, show significantly stronger emissions of substances such as (*Z*)-3-hexen-1-ol or (*Z*)-3-hexenyl acetate. In *Sambucus nigra*, even aromatic substances such as benzyl alcohol were emitted by damaged leaf tissue. In general, a comparatively small number of common and widespread substances are released from mechanically damaged leaf tissue.

#### 2.3.3 Detection of VOCs in ambient air using GC-IMS

During the first measurement phase in the summer of 2021 (n = 539 for positive ions and n = 541 for negative ions), a total of 64 positive ions and 32 negative ions were detected, amounting to a total of 96 unique signals. The measurement series for 2022 was noticeably longer (n = 2152 for positive ions and n = 1913 for negative ions) and 146 unique signals could be detected. The overall number of ambient VOCs present is likely higher, but some signal overlap compromises resolution. Despite these limitations, considerable sensitivity and selectivity could be demonstrated for both negative and positive ions. For positive ions, we observed some superimpositions for highly volatile compounds (retention time < 50 s), due to similar characteristics in retention time and ion mobility. Some of these signal clusters could not be resolved. Many short-chain carbon bodies (< 6 C-atoms) such as ethanol, ethylene, acetone, methanol and also the ubiquitous isoprene are included in this region. Separation was sufficient to identify individual signals at retention times > 50 s. Negative ionisation provides overall fewer and weaker signals, but selectivity is higher due to the lower degree of superimposition. Only very few substances are retrieved after 300 seconds, with the last peaks observed at ca. 700 s (both polarities). While we were not able to identify substances in the negative range, we could identify 15 substances amongst the detected signals for positive ionisation, including monoterpenes (MTs) such as α-pinene, β-pinene, camphene, DL-limonene, p-cymene, (*E*)-β-ocimene and Eucalyptol, as well as aromatic compounds (e. g. Benzaldehyde, Anisole, p-cresol) and aliphatic compounds such as 6-methyl-5-hepten-2-one or (*Z*)-3-hexen-1-ol. Surprisingly, we failed to identify any typical anthropogenic compounds (e. g. xylene or toluene). Floral volatiles were also less strongly represented than expected.

#### 2.3.4 Time-resolved monitoring of VOCs

The data obtained in this way contain information on changes over time such as weekly averages for the sum of signal intensities, but also for individual substances (Figure 10). The sum of the signal intensities reveals information about the general course of variations, while individual substances give a more finegrained insight of seasonal variation. For 2021 and 2022, the temporal course of total signal intensity partly reflects the prevailing weather conditions, with the highest signal intensities in June (warmest month). For monoterpenes such as α-pinene we observed deviating temporal courses with irregular spikes.

In general, the measurement period of 2022, covers a wide range of seasonal variation and thus allows a more detailed view, compared to the rather short series of 2021. June was again the month with the highest average signal intensities. However, patterns across the summer months are less divergent, which may be due to the overall less variable weather patterns in 2022.

# 2.3.5 Weekly and diurnal variation of VOCs in seasonal intervals

The data can also be utilized to look at weekly or diurnal variation, respectively. In principle, intra-week variation could be an indication of the anthropogenic origin of substances, because emission of air pollutants tends to be higher on weekdays (e. g. NO<sup>x</sup> and VOCs from traffic) Especially for the first period (March/ April) it seems that there are differences between weekends and weekdays (Figure 11). We observed no statistically significant differences in concentration

across the weekdays (p > 0.05). This applies to the sum of all signals, but also to individual substances such as α-pinene, indicating an essentially nonanthropogenic, i. e., biogenic origin of the substances detected.

**Figure 10.** Time resolved data as weekly average for the total signal intensity (Positive & negative ions) and α-pinene for the period of 2022.

**Figure 11.** Weekly variation of total signal intensity and α-pinene during three seasonal intervals for 2022.

The diurnal changes can provide information about the emission behaviour of plants. Fine-grained analysis revealed that signal intensities across both polarities display significant diurnal patterns, which is particularly pronounced during the main period of activity from May to August 2022 (Figure 12). Overall signal intensity reaches a maximum in the afternoon and a minimum in the early morning hours (χ² = 65.615, *p* < 0.001). Conversely, for some compounds, such as α-pinene, we observed a significant diurnal variation (χ² = 114.38, *p* < 0.001) with a maximum in the early morning hours and a minimum towards the afternoon.

**Figure 12.** Diurnal variation of total signal intensity and α-pinene within the three seasonal intervals for 2022.

# 2.3.6 Correlation of eVOC/pVOC Concentration with meteorological data

Correlations with abiotic factors can also be investigated, in addition to time patterns. A comparison of pVOC data to meteorological data is relevant since meteorological factors are expected to affect pVOC-emissions.

Over the entire measurement period of 2022 (n = 2152), we obtained two more or less separate clusters with regard to the correlation to temperature, which reflect the abovementioned seasonality quite well. Almost all substances show a positive correlation to temperature across the year, which is reflected by total signal intensity. We find a strong positive correlation (Pearson correlation) to temperature, with R = 0.57 (*p* < 0.001) for the total signal intensity for positive ions and with R = 0.57 (p < 0.001) for negative ions (Figure 13). However, the positive correlation between temperature and concentrations is less pronounced for monoterpenes such as α-pinene (R = 0.35, p < 0.001). We only have good data for June, July and August (n = 539) from the year 2021. Here, significant correlations could also be observed with R = 0.67 (p < 0.001) for the total signal intensity. Again, α-pinene showed a considerably weaker correlation with temperature (R = 0.27, p < 0.001).

**Figure 13.** Correlation of total signal intensity and signal intensity of α-pinene with temperature for the measurement period of 2021 (**A**) and 2022 (**B**).

# 2.4 Discussion

### 2.4.1 Robustness of measurements

Overall, we achieved good results in terms of the robustness of the measurements. The measurements were automated and maintenance was minimal, in particular as remote quality control was available. Ion mobility and the retention time from gas chromatographic pre-separation are essential for the identification or comparability of substances across a series of measurements. These parameters are heavily influenced by the temperature of the GC pre-separation, detector temperature and carrier gas flow. It was therefore important that these parameters were successfully kept sufficiently constant over the course of the measurement series. The automated normalization of ion mobility to the RIP (or RIN for negative polarity) also worked excellently and no deviations were observed.

Fluctuations occurred mainly for the enrichment temperature of the Pre-Con-chip. For the adsorption of VOCs, the chip is cooled to approx. 40 °C using air cooling. Especially at very high and very low ambient temperatures, a deviation from the target temperature can occur, as the temperature gradient is no longer sufficient for cooling. This is reflected in an enrichment temperature of ~ 50 °C during the hottest days in June (ambient temperature 39 °C). In theory, a higher enrichment temperature is associated with lower performance for adsorption. However, we observed no drops in signal intensity due to high temperatures, indicating that sufficient adsorption of volatiles also takes place at higher temperatures and/or that the effect of increased emissions of volatiles at higher temperature overcompensates the effect of reduced adsorption efficiency. In order to further improve reproducibility of the measurements, a cooling system could be implemented in the station in future.

Slight fluctuations were also observed in the sample gas flow. This is related to the viscosity of gases at different temperatures and air pressures. As the sample volume is fixed to 1000 mL, this factor only affects the duration of sample collection and should not affect signal intensity. In summary, the continuous stationary field measurements can be considered a complete success in technical terms.

### 2.4.2 GC-IMS for time resolved detection of VOCs

We have here demonstrated for the first time a novel method for automatic long-term monitoring of volatile organic compounds in environmental air using a ppq-tec-GC-IMS as alternative to existing monitoring techniques such as PTR-MS, zNose® or relaxed eddy accumulation (Tholl et al. 2006; Sarkar et al. 2020b; Sarkar et al. 2020a). We were able to fully automate data collection with minimal maintenance and without additional consumables (e. g. adsorbent). The pre-concentrator chip permits in-line sampling and analysis at a high level of automation and with good time-resolution (one measurement per hour). The high sensitivity obtained by the in-line enrichment was sufficient to detect a total of 146 signals (109 in the positive ion mode and 37 in the negative mode).

The actual number may well be higher, but a clear assignment of additional individual signals is difficult, especially for highly volatile substances, due to superimpositions. The technique, as currently implemented, thus reaches its technical limits especially for substances with < 6 C atoms (including isoprene). The time resolution of the measurement permits the identification of seasonal changes and a correlation to abiotic factors (weather) and in the long-term even interannual trends. Furthermore, it was possible to record time-resolved changes of pVOC-imission and observing even more fine-grained diurnal variation of certain signals.

#### 2.4.3 Differentiation of biogenic/plant related VOCs

Based on the parameters (retention time and ion mobility), it was possible to identify certain signals that can be directly assigned to a biogenic or plant origin, respectively. These substances comprise typical plant monoterpenes such as α-pinene, β-pinene, camphene, DL-limonene and eucalyptol. In addition, aromatic compounds such as benzaldehyde or *p*-cresol, but also aliphatic compounds such as 6-methyl-5-hepten-2-ones were identified. The location of the monitoring station and the fact that we found concentrations to be essentially uniform across the week, suggest that the vast majority of volatiles detected can be considered as biogenic. However, the substances that could be identified represent more or less ubiquitous plant volatiles and cannot at present be assigned to any particular plant species or plant group. Eucalyptol represents a notable exception and can be assigned to the emissions of *Juglans regia* at our specific sampling site with reasonable certainty.

Furthermore, anthropogenic air pollution, in particular from traffic, would usually be expected to have a significantly lower impact at weekends (Blanchard et al. 2008). The same applies to seasonality, where anthropogenic emissions from heating and transport would be expected to increase during the colder months. It remains unclear whether this would lead to lower intensities of certain VOCs due to less input into the environment, or to an increase in all signal intensities due to lower NO<sup>x</sup> input from e. g. traffic. However, neither case could be observed.

# 2.4.4 Outlook

We have first promising results regarding the seasonal and diurnal variation of volatile organic compounds in ambient air, permitting the identification of trends. Many signals have not so far been identified as to chemical identity - the identification of additional signals and their inclusion in the reference database are therefore certainly major issues that need to be addressed in future. Due to the nature of GC-IMS, the physical signal parameters also permit a future identification of currently unidentified substances. An expansion of the sampling of plants from the target area to identify characteristic emitters will hopefully reveal characteristic source species for particular signals, permitting a clearer biological interpretation of the patterns observed.

However, one of the most important points should be the evaluation of the raw data. At present, the spectra are visually revised and the signals segregated and identified. With more than 4000 measurements per year from a single sampling locality, the volume of data generated is enormous. If this is extrapolated over several years, and possibly with an increasing number of stations, a visual approach for comparing individual emission patterns does not appear to be expedient. We are therefore planning to automate pattern recognition in cooperation with AI specialists.

The statistical analysis of the selected data only scratches the surface of what is possible. In fact, the interplay of abiotic and biotic factors can involve a complex network of interactions (Figure 14). Anthropogenic substances, e. g. from traffic or agriculture, also influence the composition of volatiles. Of course, above all this are the prevailing weather conditions, which influence both biogenic and anthropogenic inputs, but also the distribution of these (through

#### **2 Smellscapes: automated monitoring of volatile organic compounds in ambient air**

**Figure 14.** Various influencing factors that can affect measured VOC concentrations.

wind). Therefore, more sophisticated statistical analyzes and models should be considered in the future in order to obtain more information from the data.

#### 2.5 Conclusion

During the project, we were able to show that a GC-IMS is suitable for automated, continuous and time-resolved pVOC monitoring in the ambient air. Concentrations can be recorded in great detail and some of the signals could already be successfully assigned to plant sources. Significant progress was made in the area of the database, although at present it shows a strong bias towards floral volatiles. In the ambient air, however, these floral volatiles were found to represent only minor components of the ambient air, with other volatile compounds dominating.

All in all, it could be demonstrated that the sensitivity of the PreCon-GC-IMS is sufficient to detect VOCs even at the very low concentrations found in ambient air. The first clear seasonal and daily variations could be identified with statistically significant results. This is an encouraging starting point for future studies into the interaction between phytosphere and atmosphere, inviting a range of investigations on plant physiology and ecosystem function.

# **References**

Blanchard C, Tanenbaum S, Lawson D (2008) Differences between Weekday and Weekend Air Pollutant Levels in Atlanta; Baltimore; Chicago; Dallas–Fort Worth; Denver; Houston; New York; Phoenix; Washington, DC; and Surrounding Areas. Journal of the Air & Waste Management Association 58(12): 159–615. https://doi.org/10.3155/1047-3289.58.12.1598



Yuan J, Himanen S, Holopainen J, Chen F, Stewart CN (2009) Smelling global climate change: mitigation of function for plant volatile organic compounds. Trends in Ecology & Evolution 24(6): 323–331. https://doi.org/10.1016/j.tree.2009.01.012

# **Appendix**

### A.1 Plant list Melbgarten


**Table A1.** List of plants that can be found in direct vicinity of the measuring station.

#### **2 Smellscapes: automated monitoring of volatile organic compounds in ambient air**



# A.2 Reference substance database


**Table A2.** Reference substance database for signal identification.



Florian Losch, Sascha Liedtke, Wolfgang Vautz, Maximilian Weigend


<sup>a</sup>M = Monomer, D = Dimer, T = Trimer. Cluster formation at high concentrations of certain compounds.

# A.3 Abbreviations


# **3 Plant metabarcoding of volumetric air samplers and malaise traps**

Hanna Raus, Stephanie J. Swenson, Birgit Gemeinholzer

#### **Abstract**

The development of high-throughput and quality techniques to monitor plant biodiversity and plant-insect interactions has become critical with the documented decline in biodiversity and abundance of both plants and insects. Metabarcoding has the potential to outperform conventional methods for large-scale biomonitoring due to lower cost, potential for automation, and lack of technician bias. Here we present the optimization of plant trace monitoring within the AMMOD stations via metabarcoding, with two different collection methods, airborne pollen from Hirst traps and plant traces from the preservative ethanol of Malaise traps. A new wind pollen trap was developed that allows autonomous operations in the field over a longer period than previous models, and offers the more options for sampling intervals. Recommendations for laboratory processing are made, and an optimized data analysis workflow are described.

# 3.1 Introduction

In the age of accelerated changes to biodiversity due to anthropogenic influences and climate fluctuations, increased monitoring of biodiversity, as well as increased speed in data retrieval are critical. Traditional methods of pollen identification as well as monitoring plant-insect interactions require expert knowledge with a large amount of training and are extremely time consuming. Pollen of several plant taxa can only be identified to family or genus level, such as Poaceae, a major allergen, for which the pollen is usually only identifiable to family level with morphological characters. In addition, monitoring plant-insect interactions traditionally has involved many hours in the field with direct observation.

Airborne pollen monitoring via the Hirst spore trap is used worldwide. In Germany, the *Stiftung Deutscher Polleninformationsdienst* (PID) is using this method since 1983 to monitor pollen and fungal allergens. Malaise traps are a common trap used to collect flying insects and to monitor their populations across the world. Both the Malaise and the Hirst traps were optimized for single sample intervals and require human intervention after each interval. Optimizations for multiple autonomous sampling in the field and a considerable extension of the necessary service intervals for metabarcoding purposes were addressed in this project.

Here, we have developed and optimized tools and workflows for metabarcoding of plants. Metabarcoding allows for automated high throughput species identification, even though the methods require further development, and at present it is still necessary to transport the samples to a laboratory for processing. Future miniaturization in laboratory equipment is emerging, such as Nanopore sequencing with a MinION sequencer, which is quite small in size (105 mm × 23 mm × 33 mm) and can be attached to a laptop for sequencing outside of traditional molecular labs. With further developments, it is conceivable that in a few years monitoring and species identification can be automated in the field. In this study we developed, tested, and optimized preparatory steps for long term, continuous monitoring of plants and insects via passive collection and DNA metabarcoding.

#### 3.2 Material and methods

#### 3.2.1 Collection site considerations

Collection sites for the airborne traps should be in an open habitat, as bushes and trees, as well as boulders, buildings, streets and railroads in the near vicinity can lead to air turbulation and will affect the pollen collection. The trap should be equipped with a built-in spirit level and height-adjustable feet to level the device even on uneven ground and make it easier to turn depending on the wind direction. When a trap is not able to rotate in the direction of the wind, it results in non-optimal catch conditions, which influence the comparability of the results. The trap is designed to operate in all weather conditions in the field and can be powered by a solar panel and battery. However, the currently available solar modules do not yet guarantee year-round continuous operation and therefore require connection of the trap to a power grid. The control module and the battery are located in a lockable compartment of the trap, where a GPS tracker could also be attached. To reduce rodent infestation, all openings were reduced in size compared to the original prototype.

# 3.2.2 Pollen trap sampler setup and programming

We made changes to the design of the Hirst spore trap (Hirst 1952), which has been used to monitor airborne pollen across Germany by the *Stiftung Deutscher Polleninformationsdienst* (PID) since 1983. The newly designed pollen trap, called the 'A1 Volumetric Air Sampler' (Khan et al. 2022; Figure 1), can work independently in the field for an extended period as it collects pollen in 24 microcentrifuge tubes with adjustable time intervals. The 0.2 ml microcentrifuge tubes are clicked into a forward and backward rotating platform that can switch from one

**Figure 1.** The A1 Volumetric Air Sampler.

**Figure 2.** Interior of the A1 Volumetric Air Sampler, with the cover removed, revealing the carousel for 24 sample tubes (Photograph: Gulzar Khan, test site Britz).

tube to another by electronic timing (Figure 2). This allows interval flexibility, and sampling can be carried out over hourly, daily or weekly intervals. In addition, the time at which the fan actively sucks pollen into the trap can be time-controlled.

The sampling intervals for the wind-dispersed pollen was coordinated with the sampling of the Malaise (insect) trap to compare airborne pollen with the pollen transported by insects (fan suction time: 45 min/h; fan cycle: 75 %; fan duration: 1350; day mode only; sampling period: 7 days per tube).

# 3.2.3 Pollen Metabarcoding: Quality assurance to avoid contamination

DNA extraction and PCR were performed in a laboratory designed to minimize external contamination. Best practice recommendations for highest quality assurance in the laboratory are the following:


• Personal protective equipment: We recommend the use of disposable ME DI-INN overalls (Carl Roth GmbH + Co. KG, Karlsruhe, Germany), shoe covers (VWR, Radnor, USA) and disposable gloves (Th. Geyer GmbH & Co. KG, Renningen, Deutschland; Figure 3). The disposable gloves are changed several times while working in the laboratory, and the disposable overalls and shoe covers are replaced after each day.

# 3.2.4 Pollen Metabarcoding: DNA extraction from plant traces in Malaise traps

Following removal of insects, the preservative ethanol from the Malaise trap samples were vacuum filtered using a cellulose nitrate (CN) membrane (GVS Filter Technology, Sanford, USA; diameter 47 mm and 0.22 µl pore size, Figures 4, 5). After filtration, the CN filter was cut in half and each part was placed in a separate 2 ml SafeSeal micro tube (Sarstedt AG & Co. KG, Nümbrecht, Germany). One part of the filter paper was used for DNA isolation, the other part can be used as voucher or for an additional DNA-isolation. Short term storage is recommended at a temperature of −20 °C until further use. We recommend −80 °C or cryo-storage for long-term storage of vouchers or DNA.

DNA isolation was performed with the NucleoMag Plant kit (Macherey-Nagel, Düren, Germany) for all samples and two DNA extraction blanks. Prior to DNA isolation three sterile tungsten carbide beads (Qiagen, Hilden, Germany; diameter 3 mm) and 600 µl of lysis buffer MC1 were added to the filter paper in the microcentrifuge tube. These samples were disrupted in a Retsch MM400 bead mill (Retsch GmbH, Haan, Deutschland) for 2.5 minutes at a frequency of 30 Hz. 10 µl of Proteinase K (20 mg/ml) and 5 µl of RNase A were added to the ground material and the samples were incubated at 60 °C for 45 minutes with constant shaking in a ThermoMixer C (Eppendorf, Hamburg, Germany; 400 rpm). The incubation step was followed by centrifugation of the samples (12,000 × g; 10 min). 400 µl of the cleared lysate were transferred to a clean 2 ml microcentrifuge tube and 400 µl of binding buffer MC2 and 10 µl of magnetic beads were added. In the further steps of the manufacturer's protocol, only 25 % of the specified amounts of MC3, MC4, 80 % ethanol and MC5 were used. The washing step with ethanol was performed twice. 35 µl of

**Figure 3.** Active laboratory work by a person wearing personal protective equipment at the laminar flow cabinets in the clean rooms.

**Figure 4.** Vacuum filtration system for the preservative ethanol from the Malaise trap insect samples. (Compare with Figure 5).

**3 Plant metabarcoding of volumetric air samplers and malaise traps**

**Figure 5.** The cellulose nitrate membrane following vacuum filtration (see Figure 4).

MC6 elution buffer (55 °C) was added to the samples and incubated at room temperature for 5 minutes. Up to 30 µl was removed after application to magnets for a final elution volume.

# 3.2.5 Pollen Metabarcoding: DNA extraction from pollen material from the wind pollen trap

For DNA isolation of pollen from air, we recommend the use of the DNeasy PowerSoil Pro Kit (Qiagen, Hilden, Germany), which seems to overcome the humic acid components as well as other PCR inhibitors contained in these sample types. Some changes to the standard protocol are recommended. The Power Beads provided in the kit were transferred from the Power Beads Pro Tubes to the 2 ml micro tube containing the airborne pollen. This step is necessary because it is not possible to transfer the collected pollen directly. After adding

800 µl of the lysis buffer CD1, the tubes were placed in a Retsch MM400 bead mill for 2.5 minutes at a frequency of 30 Hz. A subsequent centrifugation step was carried out at 15,000 × g for 5 minutes. All other steps were performed as described in the manufacturer's protocol, except for the elution step; here only 50 µl of the C6 solution was used.

## 3.2.6 PCR: Barcode choice

Several publications have addressed the quality of plant specific barcoding regions (CBOL Plant Working Group 2009; Hollingsworth et al. 2011; Li et al. 2014; Kolter and Gemeinholzer 2020). Plastid markers like *mat*K, *rbc*L, *trn*H-*psb*A and others are mostly uniparental inherited in the plant kingdom and have comparatively slow mutation rates (Birky 1988) and therefore primarily provide higher taxonomic identifications (Vijayan and Tsou 2010). The *rbc*L-gene currently has the best taxonomic coverage in reference databases (Kolter and Gemeinholzer 2020); however, many sequences are without a voucher and limited metadata information, which is not the gold standard for metabarcoding purposes. The non-coding plastid DNA regions often vary in fragment length throughout the plant kingdom (Palmer et al. 1988; Clegg et al. 1991) and comprehensive reference databases have not yet been established.

Here, we used the nuclear internal transcribed spacer region (ITS) for plant identification which has strongly conserved primer binding sites in adjacent mRNA genes. Kolter and Gemeinholzer (2021) and others showed that this region between the 18S, 5.8S and 28S mRNA genes provides a high probability of correct metabarcoding identification at the genus or species level throughout the plant kingdom. However, because it is a bi-parentally inherited multicopy region of the nuclear genome that is usually subject to concerted evolution, identification could be ambiguous in hybrids, polyploids, and apomictic species complexes, as well as in recently strongly radiating groups. Recommendations are to use both the ITS1 as well as the ITS2 region with the relatively uninformative 5.8 mRNA gene in between for plant identification; however, Illumina MiSeq sequencing only covers regions up to 350 bp (e. g. China Plant BOL Group et al. 2011; Kolter and Gemeinholzer 2021). Therefore, we recommend using either the ITS1 or the ITS2 region. The ITS1 region is very informative but varies in length, especially for gymnosperms, and

therefore cannot be used for airborne pollen. For this reason, we used the ITS2 region for airborne pollen as well as Malaise trap plant trace identification. The ITS2 reference database has good species coverage, but with a large proportion of single individuals with no geographical coverage of genetic diversity. Kolter and Gemeinholzer (2021) tested various proposed primer combinations on mixed mock communities and evaluated the identification ability of ITS2. Following the recommendations of these authors, we used the following primer combinations: ITS-3p62plF1 (ACBTRGTGTGAATTGCAGRATC) and ITS-4unR1 (TCCTCCGCTTATTKATATGC).

#### 3.2.7 PCR chemicals and conditions

PCR allows for the amplification of specific barcoding regions as well as tagging specific samples with unique labels, as metabarcoding is achieved via parallel sequencing of multiple samples in one run (Lundberg et al. 2013; Schnell et al. 2015). The primers carry a long nucleotide tail, which consists of a flow cell binding adapter, a binding site for sequencing primers and a label for sample identification. These fusion PCR primers can be applied to the samples using one-step or two-step PCR (Lundberg et al. 2013). The 2-step PCR is the most cost-effective multiplexing method and has higher consistency and taxa detection efficiency than the 1-step approach, although PCR always introduces bias in the laboratory step, making quantity estimation unlikely. Nevertheless, the 2-step PCR has an excellent sensitivity to PCR-inhibiting substances (e. g. humic acids and tannins), since performing the PCR twice results in a dilution of these inhibiting substances. The advantages of 1-step PCR are less time and less susceptibility to cross-contamination and a reduced PCR bias. Therefore, this approach is suitable for metabarcoding of clean samples where the maximum number of species is not relevant. For more detailed and larger biodiversity assessments, 2-step PCR is a better choice (Zizka et al. 2019), which is why we recommend using it.

PCR was performed with three replicates per sample. Two DNA extraction blanks and two PCR blanks were added to every 96 well PCR plate. The Canadian Centre for Barcoding PlatinumTaq Protocol (Ivanova and Grainger 2007) was used, with the addition of 0.3 µl of BSA (10 ng/µl) and 1.25 µl of 50 % DMSO in a total volume of 12.5 µl per reaction. Initial denaturation was performed at 95 °C for 3 minutes, followed by 30 cycles of denaturation at 95 °C for 30 seconds, annealing at 50 °C for 30 seconds, and extension at 72 °C for 45 seconds. The final extension was performed at 72 °C for 8 minutes.

After PCR, 5 µl of each of the three PCR replicates of a sample were combined to a total volume of 15 µl and purified with 1.5 µl Exonuclease 1 (Thermo Fisher Scientific, Waltham, USA) with 37 °C incubation for 30 minutes followed by 80 °C incubation for 15 minutes. Illumina sequencing was performed at LCG Genomics GmbH (Berlin) on a MiSeq (2 x 300 bp) platform with 12 additional cycles. For the additional PCR cycling MyTaqTM Red Mix polymerase (Meridian Bioscience, Cincinnati, USA) was used and it consisted of three cycles with a low annealing temperature (15 sec 96 °C, 30 sec 50 °C, 90 sec 70 °C), followed by nine cycles with increased annealing temperature (15 sec 96 °C, 30 sec 58 °C, 90 sec 70 °C).

## 3.2.8 NGS-Sequencing

We recommend double-stranded sequencing on a MiSeq or NextSeq platform (2 × 300 bp, Illumina, San Diego, USA), depending on the amount of parallel sequencing and sequence read depth. This platform has currently the highest sequence accuracy and quality score. Long reads can also be performed with a MinION-tool (Oxford Nanopore Technologies, Oxford, United Kingdom) which has the benefit of being small and transportable, but due to the high error rates, species level identification via metabarcoding is currently complicated. Another long-read sequencing platform is the Revio or Sequel platform (Pacific Biosciences, Menlo Park, USA). This PacBio sequencing platform has a shorter sequencing time than the MinION, but the sequencing equipment is very expensive, and the sequences have high error rates. In addition, this sequencing platform is significantly larger than the MinION and can therefore not be taken into the field (Hu et al. 2021). Innovations in sequencing technology could lead to further developments in this area.

# 3.2.9 Metabarcoding data pipeline

The sequencing data were demultiplexed by index sequences into three different files, and primers and the index sequences were trimmed with Cutadapt (Martin 2011). Then, the sequences were quality-filtered with FastQC (Andrews 2012). After quality filtering, the R package DADA2 (Callahan et al. 2016) was used to estimate the error rate, followed by denoising with the error profile using a pseudo-pooling function. Forward and reverse reads were merged by using DADA2 (Callahan et al. 2016). Chimeras from denoised and merged read pairs were removed using the removeBimeraDenovo function of the DADA2 package (Callahan et al. 2016). The resulting amplicon sequence variants (ASV) table was used for ASV identification by comparison with the PLANiTS2 database (Banchi et al. 2020), performed with the DADA2 assignTaxonomy function (Callahan et al. 2016). Currently, the PLANiTS database is the most comprehensive for curated, reliable and updated ITS reference sequences. However, this may change in the future as initiatives (e. g. such as NORBol and ABol) aim to create national reference databases. Fungal contamination was confirmed and removed by performing a local BLASTn search (Camacho et al. 2009) using the plant sequences and a custom fungal ITS-BLAST database. The final species list was thoroughly checked, to remove potential laboratory contaminations or wrongly assigned sequences. ASVs with ambiguous species identifiers received only genus-level identifiers. ASVs from the DNA extraction and PCR blanks were used to calculate a relative abundance, ASVs below this threshold were removed.

### 3.2 Results and discussion

Several studies have demonstrated metabarcoding to provide identifications of pollen to a lower taxonomic level than traditional morphological methods, and identification of plant traces from digested food material or plant fragments is nearly impossible with morphological characters (Richardson et al. 2015; Macgregor et al. 2019; Bänsch et al. 2020; Swenson et al. 2021; Polling et al. 2022; Kilian et al. 2023). As part of the project, we optimised the plant metabarcoding pipeline for plant traces from Malaise traps as well as from the air. The collection methods, initial laboratory processing and DNA extraction methods are different for these sample types, but the same workflow can be used for PCR, sequencing and data analysis.

We have developed a volumetric air sampler that is powered via an external connection or a solar cell and battery, but delivers samples that are compatible with historical pollen counts. There are several passive pollen traps worldwide that do not require electricity, for example the Durham pollen trap (Durham 1928) or the Tauber trap (Tauber 1974; further devices are for example reviewed in Giesecke et al. 2010). However, these devices collect different amounts of pollen, airborne pollutants and non-pollen contaminants, so they were not considered for use in the context presented here. The A1 Volumetric Air Sampler (Khan et al. 2022), is active both with and without wind and has a controlled extraction of airborne particulates. The newly integrated, automatically operating control system makes it possible to answer completely new questions, for instance, day-night cycles or temperature-controlled pollen release in plants for which very little knowledge is currently available. Since mice had nested in the first prototype over the winter, all openings on the device were reduced in size; however regular checks and additional rodent deterrents are recommended.

In the workflow presented here, molecular processing and sequencing of samples must be performed in the laboratory. However, analysis of environmental samples in the field will be feasible in the near future. Miniaturization of next generation sequencing (NGS) in the field using the MinION (Oxford Nanopore Technologies) is already possible. However, nanopore sequencing with the MinION currently still struggles with high error rates in sequences, especially those with a high GC bias (e. g. Delahaye and Nicolas et al. 2021) which is an issue in several plant families. For this reason, we implemented the higher-quality MiSeq sequencing technology, which can only be performed in the lab. Portable thermal cyclers for on-site PCR experiments (e. g. miniPCR, https://www.minipcr.com/portable-pcr-testing-the-minipcr-for-dna-sequencing-in-the-field/) and also DNA extraction kits are already available on the market, but these are not yet optimized for plant traces and pollen.

Caution must be taken in pollen metabarcoding and we stress the importance of a sterile laboratory infrastructure. Contamination from airborne pollen is extremely high in insect-mediated pollen studies (e. g. Bell et al. 2019), especially in the spring and summer months, however this is nearly unavoidable due to their pervasiveness in the environment. For this reason, we recommend environmental collection blanks to aid in accounting for insect-mediated pollen collection that is due to passive contact. With the approach presented here, the risk of contamination can be significantly reduced, but the additional costs for optimal infrastructure and laboratory conditions are high.

The best practice recommendations during laboratory procedures presented here are critical due to the high sensitivity of the sequence detection method. Bell et al. (2017), Swenson and Gemeinholzer (2021) and Kolter and Gemeinholzer (2021) have recommended the use of DNA extraction and PCR negative controls for the study of plant-pollinator interactions due to the frequency of contamination and false positive reads. The use of sterile laboratory plastic, filtered pipette tips, and regular and careful DNA decontamination is strongly recommended.

For the analysis of metabarcoding data, complete reference databases are needed, both regarding the taxa represented and replicates of these, in order to accurately identify all species in a given sample. Universally available databases that are frequently updated would increase the comparability and utility for plant metabarcoding projects worldwide.

Despite optimized metabarcoding workflows and reference databases, species-level identification will not be possible for some rapidly evolving plant groups due to hybridization, polyploidization, apomixis and adaptive radiation. The incorporation of additional molecular DNA barcodes poses an assignment problem and is expensive. For species complexes, this is unlikely to be a solution. Due to the highly variable size of plant genomes, whole genome sequencing (WHS) or reduced complexity WHS is also currently unrealistic. However, complementing metabarcoding results with existing vegetation information on local occurrences and biological processes with deep learning is promising, but still requires the development of algorithms and the crosslinking of already existing databases. Furthermore, complementing metabarcoding with multispectral flow cytometry could enable not only species identification but also abundance estimation (Dunker et al. 2021).

Further development of plant metabarcoding is promising and also serves to achieve UN development goals by monitoring not only species and ecosystems, but also genetic diversity. Swenson and Gemeinholzer (2021) demonstrated that Red List species and neophytes can be detected for conservation purposes via metabarcoding of Malaise trap plant traces. There is a growing awareness that biodiversity monitoring is becoming increasingly important not only in nature conservation, but also on agricultural land and in urban environments (Perino et al. 2022).

The technology developed here can be used to detect changes in biodiversity, set up early warning systems that provide a basis for further preciseness as technology develops. Plant-insect interactions and aerial pollen flight times can be monitored in a high-throughput process with unprecedented accuracy and speed, and new insights into biodiversity interactions can be gained, which was previously not possible.

# **References**



works of caraway (*Carum carvi* L.). Molecular Ecology 32(13): 3702–3717. https://doi. org/10.1111/mec.16943


post-2020: Closing the gap between global targets and national-level implementation. Conservation Letters 15: e12848. https://doi.org/10.1111/conl.12848


# **4 Non-destructive DNA extraction and metabarcoding of arthropod bulk samples: a step-by-step protocol**

Vera M. A. Zizka, Kathrin Langen, Ameli Kirse, Alice M. Scherges, Sarah J. Bourlat

#### **Abstract**

In the AMMOD project, flying insects are collected using Malaise traps equipped with automated bottle changers (see Chapter 5). Insect catches are identified using DNA metabarcoding, a promising tool for biodiversity assessments, especially targeting highly diverse groups such as the arthropods. Non-destructive DNA isolation methods are highly desirable for the preservation of sample integrity and subsequent morphological analysis of specimens. In this chapter, we present a comprehensive step-by-step laboratory protocol for the non-destructive DNA extraction of insect bulk samples from lysis buffer followed by all subsequent amplicon library preparation steps required for the sequencing of insect bulk samples on Illumina platforms.

#### **Keywords**

DNA isolation, lysis buffer, fixative, sample integrity, non-destructive, metabarcoding, amplicon library preparation

## 4.1 Introduction

While the morphological identification of every individual specimen in complex arthropod bulk samples is extremely time consuming and even impossible for certain taxonomic groups, DNA metabarcoding represents a fast and reliable tool, enabling high-resolution biodiversity assessments. The method is based on the bulk isolation of genomic DNA and the subsequent amplification of a specific marker gene fragment from mixed samples (Taberlet et al. 2012). For the arthropods, a fragment of the mitochondrial cytochrome oxidase 1 (COI) DNA barcode region is typically used (see Bruce et al. 2021 for barcode markers commonly used for metabarcoding in various organism groups). So far, most studies targeting terrestrial arthropods destroy the sample by homogenization, which prevents retrospective analysis of specimens (Marquina et al. 2019; Zenker et al. 2020). However, retrospective analysis of samples can be an important tool to improve metabarcoding resolution, describe new species, complement reference databases by single specimen barcoding or apply further innovative techniques (e. g. image recognition) for the calculation of biodiversity estimates (Carew et al. 2018; van Klink et al. 2022; Wührl et al. 2022; Nielsen et al. 2019). While the isolation of DNA from sample fixatives ensures sample integrity and has been successfully applied to bulk samples of freshwater invertebrates (Zizka et al. 2019), inconsistent results have been reported for terrestrial arthropod samples as e. g. from highly diverse Malaise trap samples. Arthropod mixtures are incubated in specific lysis buffer solutions, ensuring a mild lysis of samples, and allowing DNA release from the specimens into the solution. Different non-destructive protocols have been tested, including varying buffer compositions (Giebner et al. 2020; Ji et al. 2020; Kirse et al. 2021). Here, we present a detailed step-by-step protocol for a non-destructive approach which produces comparable results in terms of species number and composition as destructive extraction from sample tissue (Kirse et al. 2023).

### 4.2 Laboratory protocol

The metabarcoding workflow introduced here is a two-step PCR protocol based on Bourlat et al. 2016 and allows the processing of samples in a 96-well plate format as described in Elbrecht and Steinke 2018. The workflow can be adjusted to smaller sample numbers using e. g. single reaction kits. All reusable materials need to be thoroughly cleaned with DNA AWAY or 10 % bleach and UV irradiated to remove all DNA traces before each use. Insects are size sorted into two size fractions (< 4 mm, ≥ 4 mm) using a wire mesh sieve prior to DNA extraction and library preparation to ensure detection of smaller sized specimens.

# 4.2.1 Materials and reagents

#### 4.2.1.1 Size sorting of insect samples


#### 4.2.1.2 DNA extraction reagents


#### 4.2.1.3 Gel electrophoresis


### 4.2.1.4 Amplicon library: two-step PCR protocol for Illumina

#### platforms

	- forward primer fwhF2 TCGTCGGCAGCGTCAGATGTGTATAAGAGA-CAG **GGDACWGGWTGAACWGTWTAYCCHCC** (Vamos et al. 2017)
	- reverse primer Fol\_degen\_rev GTCTCGTGGGCTCGGAGATGTGTATAA-GAGACAG **TANACYTCNGGRTGNCCRAARAAYCA** (Yu et al. 2012)
	- i7: CAAGCAGAAGACGGCATACGAGAT TCGCCTTA GTCTCGTG-GGCTCGG
	- i5: AATGATACGGCGACCACCGAGATCTACAC CTCTCTAT TCGTCGG-CAGCGTC


#### 4.2.1.5 Library normalisation


### 4.2.1.6 Library Pooling


### 4.2.1.7 Concentration measurement


# 4.2.1.8 Solid Phase Reversible Immobilisation (SPRIselect) size selection


#### 4.2.1.9 Final library check


# 4.2.2 Methods

#### 4.2.2.1 Size sorting of insect samples

(See Note 3)


### 4.2.2.2 DNA extraction

	- 1st centrifugation step: centrifuge at 4100 g for 2 min to pellet the cell debris.


## 4.2.2.3 Agarose gel check


# 4.2.2.4 Amplicon library preparation: two-step PCR protocol for Illumina platforms

PCR 1 (amplicon PCR):


reaction volume of 25 µl. To save resources, half reaction volumes can also be used.


PCR 2 (index PCR): Incorporation of Illumina index adapters:


#### 4.2.2.5 Library normalisation

(See Note 5)


#### 4.2.2.6 Library pooling

(See Note 6)

1. Pool 10 µl of each sample into one pool, e. g. if you have several plates, pool each plates' 96 samples first.

2. If the sample number exceeds 96 and samples are processed in plate format, final library concentration can be improved if concentration is measured for each plate separately. Pooling of plates can then be conducted equimolarly.

#### 4.2.2.7 Size selection and purification

Proceed with a left side size selection of the sample pool using magnetic beads at a ratio of 1 : 0.7 (PCR product : beads) to remove primer dimers (SPRIselect, Beckman Coulter) using a slightly modified protocol (see below). Depending on the sample and fragment size, other PCR product to beads ratios may be considered (see Users Guide SPRIselect, Beckman Coulter).

Preparation steps:


#### Clean up:


16. Pipet the size selected solution into one new safe-lock tube.

#### 4.2.2.8 Library concentration


# 4.3 Notes

**Note 1:** The primer pair used here, from Vamos et al. 2017 and Yu et al. 2012 targets a 313 bp long fragment of the mitochondrial cytochrome oxidase 1 (COI) DNA barcode region, typically used for the detection of arthropods. This 313 bp fragment is long enough for accurate taxonomy assignment and short enough to allow overlap of fragments during paired-end merging after 2 × 250 bp sequencing on Illumina platforms.

**Note 2:** Different Illumina indexing strategies exist, including unique dual indexing and combinatorial indexing. In general, unique dual indexing, which requires distinct, unrelated index sequences for each of the i5 and i7 index reads, will allow the highest accuracy and allow detection of index hopping (https://support.illumina.com/bulletins/2018/08/understanding-unique-dualindexes--udi--and-associated-library-p.html). A list of Illumina Nextera i5 and i7 indexes is available here (https://support-docs.illumina.com/SHARE/ AdapterSeq/Content/SHARE/AdapterSeq/Nextera/DNAIndexesNXT.htm).

**Note 3:** Here, insects are size sorted into two size fractions (< 4 mm, ≥ 4 mm) using a wire mesh sieve prior to DNA extraction and library preparation to ensure detection of smaller sized specimens. Mesh size and number of fractions needed will depend on the range of insect sizes in the sample which is highly variable depending on sampling location (temperate versus tropical). Size fractions can be either processed separately, or pooled again at the lysate stage in a ratio favouring the small size fraction and thereby enriching the sample with the smaller specimens (Elbrecht et al. 2021).

**Note 4:** Note on the use of experimental controls: In general the use of DNA extraction negative controls as well as positive controls is recommended. See Bruce et al. 2021, Chapter 6.3 for details on all types of experimental controls, including field, laboratory and sequencing controls. Positive controls used in metabarcoding should consist of mock communities of known species composition. Ideally, the mock community species should not be expected to occur in the samples being processed so that potential cross contamination can be detected (e. g. tropical species can be used, where temperate samples are analysed). For the layout of negative controls on the 96 well plate, a good example is shown in Elbrecht and Steinke 2018. All positive and negative controls should be sequenced and included in subsequent bioinformatic analysis steps.

**Note 5:** Normalisation of the samples is required to ensure equal sequencing depth across samples. Formerly, this was done manually, but this process can be highly simplified for large sample numbers using a limited binding capacity solid phase kit in plate format. The SequalPrep kit (Thermo Fisher Scientific) yields up to 25 ng of DNA per well when starting with at least 250 ng of PCR product.

**Note 6:** Most metabarcoding projects are sequenced on Illumina platforms due to low error rates and long read lengths. The number of samples that can be pooled depends on throughput of the selected Illumina sequencing platform and desired sequencing depth for each sample (e. g. species rich samples with higher biomass should have a higher sequencing depth, of at least 200,000 reads per sample). Similarly, amplicon length will determine which sequencing kit should be used (e. g. MiSeq 250PE, HiSeq 150PE or 250PE). Currently, the NovaSeq 6000 system provides the highest sequencing depth and throughput of all Illumina platforms, and is set to replace the HiSeq 2500 which is now discontinued (https://www.illumina.com/systems/sequencing-platforms/novaseq/ specifications.html).

# 4.4 Challenges and recommendations

While sample integrity is fully given when DNA is isolated from sample fixative (e. g. through ethanol filtration or evaporation), the incubation time in lysis buffer solution directly influences the quality of sampled specimen preservation. After two to four hours of incubation, no decrease in sample integrity was observed but the abdomen of especially small specimens with soft body structures started to dissolve after eight hours of lysis (Kirse et al. 2023). Incubation time of samples in lysis buffer should therefore be as short as possible, which can also counteract overrepresentation of large taxa in final sequencing read allocation (Iwaszkiewicz-Eggebrecht et al. 2022). In addition, it is recommended to avoid subsampling of the original lysis buffer volume in the metabarcoding workflow, since the highest diversity estimates could be reached when DNA was isolated from the total lysis buffer solution. Similar patterns have been observed for destructive protocols from arthropod bulk samples, showing an increase in diversity estimates if more ground tissue subsamples are processed (Zizka et al. 2022). However, circumventing subsampling of initial sample volume results in higher costs per sample and applicability should be evaluated for individual studies. In general, non-destructive DNA isolation from lysate solution is associated with higher costs than destructive extraction from homogenised tissue as well as from sample fixatives such as ethanol. Reducing the volume of Proteinase K per sample has already been successfully tested (Kirse et al. 2021) and would be one option to decrease overall costs in this protocol.

# 4.5 Application and outlook

The non-destructive DNA metabarcoding protocol from commercial lysis buffer solution presented here reveals comparable biodiversity estimates (species number and composition) of a complex arthropod mixture as destructive extraction from homogenised sample tissue (Kirse et al. 2023). Given appropriate incubation times, the protocol ensures sample integrity which is a prerequisite for retrospective analysis. Additionally, the protocol circumvents the handling of finely homogenised tissues thereby ensuring a lower risk of cross-contamination through tissue dust particles, which can be a problem in DNA based biodiversity assessments handling samples with low DNA concentration. Sample integrity is also a key factor for the applicability of DNA metabarcoding to monitoring schemes where large sample numbers are processed at high frequencies, resulting in possible archival for long term time-series. While preservation of the original samples is ensured with this protocol, high costs still hinder applicability to large-scale monitoring schemes. While reducing Proteinase K in the lysis solution is one attempt to reduce costs (Kirse et al. 2021), the effect on DNA degradation in the sample and resulting biodiversity estimates still needs to be tested. As mentioned earlier, the highest diversity estimates were obtained when the total lysate volume was processed in the metabarcoding workflow. Aside from increased cost, this also increases processing time if the samples are processed in plate format (96 samples in parallel) as it necessitates a centrifugation step with sample volumes > 20 ml. The processing of samples in plate format should however be standard as it is more compatible with large-scale sampling campaigns such as in the AMMOD project where sample collection is automated. Processing samples in plate format also allows the implementation of laboratory automation, such as the use of DNA extraction and liquid handling robots (Buchner et al. 2021). As an alternative to this protocol, DNA isolation from sample fixative (e. g. ethanol) through filtration is an option, although previous studies show that this non-destructive approach reveals different community compositions than tissue-based extraction in complex arthropod bulk samples (Marquina et al. 2019). Further research and protocol adjustments (variable filtering steps) could lead to more accurate biodiversity estimates.

# **References**



# **5 Development of an automated Malaise trap multisampler**

Ameli Kirse, Mario Paja, Lukas Reinhold, Sarah J. Bourlat, Wolfgang J. Wägele

#### **Abstract**

The development and refinement of automated biodiversity assessment technologies could be a game changer for the identification of drivers of ongoing insect declines. In the AMMOD ("Automated Multisensor Stations for Monitoring of BioDiversity") project, experts from various disciplines joined forces to develop an autonomous multi-sensor station for the assessment of biodiversity, which can provide species occurrence data across trophic levels and taxonomic groups (Wägele et al. 2022). Insects are one of the most diverse groups on earth and are routinely used as bioindicators for the health of ecosystems. Using metabarcoding, laboratory workflows can be automated, processing hundreds of samples in parallel and resulting in the rapid identification of thousands of insects. However, insect collection in the field is still the limiting factor, and requires, depending on study design, large capacities in human resources associated with high costs. Long-term monitoring studies are therefore often hindered by staff and budget shortages, especially in remote, inaccessible, and rural areas. Here we present an automated Malaise trap multisampler, designed for the autonomous capture of flying insects over a period of up to six months without human intervention. The AMMOD multisampler can collect up to 12 individual bulk samples at programmable intervals and is equipped with a communication system which keeps the user informed about the status of the system.

## 5.1 Introduction

#### 5.1.1 Background

In times of climate change and biodiversity declines the protection of our ecosystems is of major importance. Insects, one of the most diverse groups on earth, are indispensable for the function of ecosystems as they provide fundamental services such as pollination (Badenes-Pérez 2022), decomposition of biomass (Marschalek and Deutschman 2022) and pest control (Campbell et al. 2017). Therefore, data on composition, shifts in the structure or decline of local insect communities allow us to draw conclusions about the health of the associated habitats (Brown 1997). Permanent, long-term biodiversity monitoring sites could provide those long time-series and serve as early-warning systems for a decrease in ecosystem health. However, traditional biodiversity surveys are time consuming, costly and highly human resource intensive. In Germany, a dramatic loss of insect diversity advanced unnoticed for many years due to the lack of a thorough insect monitoring (Hallmann et al. 2017).

Metabarcoding, a genetic approach developed in the last 10 years with the emergence of next-generation sequencing, enables the rapid identification of several thousand of specimens in parallel, partly circumventing the increasing shortage of taxonomic experts. Despite these technological advances in the lab, permanent biodiversity surveys still continue to require human intervention to capture specimens for monitoring on a regular basis. Many of the traditionally used insect traps like Malaise traps, vane traps and pitfall traps are usually emptied in a bi-weekly cycle, depending on weather conditions, season and chosen fixative, to ensure specimen quality. Especially in rural areas or in extreme environments, which are often only poorly accessible, permanent insect surveys are often excessively difficult or virtually impossible, unless they rely on a team of local volunteers to empty the traps. Here, we present a prototype for an automated insect trap which can sample up to 12 bulk samples without human intervention.

As a passive flight interception trap, the Malaise trap has a broad target spectrum, including all insects actively flying through the habitat but also some ground dwelling taxa (Geiger et al. 2016). Besides others, the most commonly found taxa in Malaise trap catches from Germany are the Diptera, Hymenoptera, Lepidoptera, Hemiptera and Coleoptera (Geiger et al. 2016; Kirse et al. 2021). The first Malaise trap was designed by the Swedish Hymenopterist René Edmund

Malaise in 1934, based at the Riksmuseum in Stockholm (Malaise 1937). Over the years he modified the structure and design of the trap several times to allow for new arising purposes (Uhler et al. 2022). Today, several types of Malaise traps are available, which differ in colour and mesh size (Uhler et al. 2022). One of the most commonly used Malaise trap types is the bi-colored Townes trap (Townes 1962; Matthews and Matthews 1983). The set-up resembles a white mesh tent which is separated into two halves along its central axis by a black mesh panel. The black mesh functions as a barrier for flying insects, which try to escape by flying upwards towards the brighter white mesh roof until they reach the apex of the trap. The apex consists of a reversed bottle (from here on referred to as trapping bottle) which the insects enter through a 10 cm² side opening. From here, insects are directed towards a collection bottle, which is connected to the lower outlet of the trapping bottle. The collection bottle is filled with a preservative fluid (e. g. ethanol or propandiol) to ensure the best possible preservation of the morphology and DNA of specimens. Depending on habitat and insect activity the collection bottle should be exchanged every one to two weeks.

## 5.1.2 The automated AMMOD multisampler

The automated AMMOD multisampler replaces the traditionally used collection bottle. The AMMOD multisampler has a housing measuring 66 cm × 66 cm × 35.5 cm which is mounted onto a tripod stand (Nedo, heavy-duty Aluminium Tripod, Ref.-No. 200204). The height of the tripod stand must be manually adapted to the height of the apex of the trap, which should be around 180–190 cm above ground. Here, we used a Malaise trap which was designed by the Entomological Society Krefeld (Hallmann et al. 2017), and is very similar to the bi-coloured Townes model (Townes 1962) but differs in two important features: First, the black mesh is slightly extended so that the Malaise trap can be tightly anchored to the ground, preventing insects from escaping between ground and mesh (Hallmann et al. 2017). Secondly, the trapping bottle is at a slightly angled position to lower the risk of water running into the collecting bottle on rainy days. To connect the AMMOD multisampler with the Krefeld Malaise trap, a 3D printed plug-in connection adapter sits on top of the AMMOD multisampler box (Figure 1). The counterpart adapter can be screwed onto the trapping bottle (S65 thread) before being plugged onto the connection adapter.

#### **5 Development of an automated Malaise trap multisampler**

**Figure 1.** Setup of a Townes Malaise trap (Krefeld model) equipped with an automated AMMOD multisampler.

The AMMOD multisampler consists of six subsystems: (1) the rotary mechanism, (2) the user interface, (3) the external sensors and modules, (4) the stopping mechanism, (5) the energy supply and (6) the microcontroller (Figure 2).

#### 5.2 The microcontroller

The entire system is controlled by an Arduino Mega 2560 microcontroller which is connected to each of the five other subsystems. Like all microcontrollers the Arduino comes with a central processing unit (CPU), input/output (I/O), memory, and peripherals. The programming language used is 'Arduino language', which is based on the C programming language. The Arduino board is comparatively cheap and comes with the freely available development tool 'Arduino Integrated Development Environment' (IDE) used to write code and upload it to the board. The Arduino IDE can be freely downloaded from https://arduino.cc/en/software and is available for all major platforms (Windows, Linux and Mac). Another great

**Figure 2.** The electronic set-up of the AMMOD-multisampler consists of six subsystems: (1) rotary mechanism, (2) stopping mechanism, (3) external sensors and modules, (4) user interface (5) energy supply and the microcontroller which connects all subsystems.

advantage of Arduino is the wide variety of available libraries, boards and extensions, making the system modular and easily adaptable for future requirements.

The Arduino Mega 2560 was chosen because of its low power consumption, paired with a high number of pins enabling the connection of several modules. Additionally, it comes with a pull-up resistor, allowing the connection of low-power sensors and modules. In contrast to other microcontrollers, the Arduino Mega 2560 supports up to 12 V DC input power due to a built-in voltage regulator (see section 5.7, 'Energy supply').

# 5.2.1 Controller code

The code of the controller is separated into 4 main sections (Figure 3): (1) Core functions, (2) Data log, (3) Sampling programs and (4) human machine interface (HMI).

• The core function consists of the global functions, declarations, and libraries which are used throughout the code and determine the most basic functionalities of the controller such as display refresh rate, turning on/off the display when there is no human interaction etc.


Additionally to the controller code, there are two more codes which are used to assist the development.


# 5.3 The rotary mechanism

The rotation plate system (RPS) is installed under the upper cover plate. At first sight, the rotation plate system consists of three matching round plexiglass plates which are vertically aligned and fixed by a steel pole through the center of the plates. The lower and the middle plate are located approximately 10.5 cm from each other (from here on referred to as lower RPS). 13 plexiglas dividers ensure equal distance and alignment of the two plates. The upper plate is located approximately 2.5 cm

**Figure 3.** Flowchart visualizing main code functionalities.

above the middle plate and is fixated with plexiglas dividers, again to ensure equal distance between the two plates (from here on referred to as upper RPS).

The lower RPS is divided into 13 equal sized sections by the plexiglas dividers, each providing a screw-in connector consisting of a S65 thread fitting for commercial wide mouth bottles (Kautex 1000 ml) and an opening towards the upper cover plate. Openings are located on a circle around the center of the rotation plate with equal distance to each other. When rotating, each opening is pushed under the connection adapter, forming a passage between the trapping bottle and associated collection bottle. To prevent insects from escaping through the small gap between the upper side of the rotation plate and lower side of the lid plate, openings are surrounded by Teflon foil on the upper site. A total of 12 collection bottles can be screwed into the multisampler using screw-in connector positions 1 to 12. Position 13 should remain empty to ensure that insects can escape from the trap when the AMMOD multisampler is in standby mode (e. g. between, before, and after sample trials).

The upper RPS is surrounded by a toothed belt which is driven by a 24 V DC electric wiper motor. The motor is controlled by a H-bridge, hooked up with the microcontroller. As the system uses a 12 V battery for power supply, the motor operates at half speed allowing enough time for signal processing between stopping mechanism sensors (see section 5.4) and microcontroller.

# 5.4 The stopping mechanisms

To align trapping and collection bottle two sensors are implemented: (1) a stopping switch and (2) a magnetic switch (Figure 4). The stopping switch (push

**Figure 4.** Inside view of the AMMOD automated multisampler.

button) is tripped by the rotation plate. Each bottle position is exactly marked by a protruding convex curvature on the rotation plate's margin. While turning, the stopping switch is tripped by the next of the 13 protrusions of the rotation plate. The exact bottle number is simultaneously assessed by the magnetic switch. The magnetic switch is positioned underneath the lower rotation plate. Each bottle position is marked with a unique combination of four magnets embedded on the bottom site of the lower rotation plate. While rotating, the magnetic switch assesses the magnet combination and interprets it as a binary number. If the bottle number (signal sent by magnetic switch) and position (stop switch) matches the microcontroller bottle request, the required collection bottle is aligned to the trapping bottle and the system will stop.

# 5.5 Additional sensors and modules

# 5.5.1 Additional sensors

To allow for a wider variety of research questions, but also to monitor the system itself, the AMMOD multisampler is equipped with several sensors. The sensors can be divided into two categories: environmental sensors and system sensors.

#### 5.5.1.1 Environmental sensors

The environmental sensors periodically collect air and soil temperature, relative soil moisture, relative air humidity, and illuminance. All sensors are located outside of the control system box but are physically connected to the control system with (long) cables. On the one hand this allows for more flexibility in choice of sensor position, but on the other hand bears the risk that wildlife gets entangled in the cables. Therefore, sensor position should be thoroughly considered. While there are two different sensors implemented for soil moisture (Capacitive Soil Moisture Sensor v.1.2) and soil temperature (DS18B20), a single sensor measures air temperature and moisture (DHT-22). However, this sensor is not waterproof and should under all circumstances be located beneath the AMMOD multisampler. To measure illumination a BH1750 light sensor has been implemented. To fulfill its purpose the sensor must be exposed

to sunlight, although it is not waterproof. To avoid sensor failure, the sensor is placed inside an acrylic glass box on top of the multisampler.

#### 5.5.1.2 System sensors

The system sensors are implemented to monitor system-related data to check for eventual errors and to ensure measured environmental data quality. In detail two parameters are monitored: controller enclosure data and system stand.

The system stand data checks for the upright position of the system. As the AMMOD multisampler is comparatively heavy and mounted to a tripod stand, strong winds could topple the multisampler or the tripod can tilt in softened rain-sodden soils. The ADXL335 accelerometer is attached to the downside of the controller box lid. The sensor measures its position in a three dimensional room sending three acceleration values to the microcontroller. If the sensor position changes, the communication system is triggered and an error SMS is sent to the user and the running sampling program is immediately ended. The importance of the sensor should not be underestimated as a collapse of the system will result in data loss and could possibly damage the system.

The second important system sensor controls the enclosure conditions. The controller box houses the highly sensitive electronic control unit which connects and partly contains the 6 subsystems of the AMMOD multisampler. To control ambient temperature and humidity in the controller box a BM280 air sensor is used. The sensor monitors operating conditions and informs the user in the case of exceptionally high temperature which could possibly damage the system and the samples. Additionally, the sensor monitors unexpected increases in humidity. Although the controller box is waterproof, it cannot be guaranteed that no plug or cable fitting becomes loose over time, allowing water to enter the box.

#### 5.5.2 External modules

Next to a real-time clock (RTC), which provides the actual time and date, the system comes with an SD card and a communication module. The SD card module equipped with an 8 GB SD card is used as a hard disk drive to log information about trap performance in a simple text file (log file). The log file is an important tool as it allows to keep track of sensor data, but also system failures. As soon as the system is turned on, booting information is collected including local time, machine ID, UTC time and geographical parameters (longitude, latitude). With the start of a new sampling cycle (program start) every 5 minutes the measured sensor values but also the in-use collection bottle is logged. Additionally, the log file keeps track of system failures, which will be indicated with an "E" next to any questionable sensor value sensor value.

The implemented communication module functions on European frequencies and consists of a Plug and Play Arduino shield. The shield comes with a GPRS/LTE and GNSS antenna. Like all external modules is the communication module placed inside the metallic AMMOD multisampler box which can reduce signal quality. To amplify the signal, an external antenna is installed outside the AMMOD multisampler and connected via an antenna cable using an adapter connector to connect to the SMA cable extender. The GNSS system can use navigational satellites from other networks beyond the GPS. Here the GNSS module is used to record the geo location but also the exact UTC-time. The GPRS is used to send SMS notifications to the users e. g. in case of an error. After inserting a new SIM card into the shield, the user has to activate the modem and register the network via AT-command prompt.

# 5.6 User Interface

The AMMOD multisampler is capable of performing time-controlled, temperature-controlled and also illuminance-controlled sampling cycles. The user can directly program the trap on-site by using either a manual or a serial connection.

## 5.6.1 Manual use

The main Menu provides four different submenus: 'Manual Program', 'Auto Program', 'Config Data' and 'Info'. By using the built-in 1×5 keypad the user can choose between menus. The initial screen shows the actual date and time that future programs will be based on. If the provided information is incorrect the user can change settings in the 'Config Data' menu. Additionally, the user can retrieve additional information from the system by selecting the 'Info' menu. This menu contains information about the AMMOD multisampler-ID and readings of implemented sensors: air temperature, air humidity, soil temperature, soil moisture, light intensity.

The 'Auto Program' menu provides two built-in standard programs which the user can choose from: (1) ambient illuminance program and (2) soil temperature program. Collection bottles will be turned under the trapping bottle depending on either measured illuminance or soil temperature. Thus, the trap can be used to monitor insect activity patterns depending on abiotic factors (Table 1). The standard settings can easily be changed in the program code and adapted to the required purposes. After selecting the desired built-in program, the user will be asked to provide the planned sampling starting date and time.

Next to built-in programs users have the possibility to design their own time-controlled program in the 'Manual Program' menu. This way, the sampling interval can be easily adapted to the respective research question.


**Table 1.** Selectable sensor-dependent standard settings for automated sampling programs. The user can choose out of four standard sampling programs for which the conditions and associated sampling bottles have been predefined. Depending on program sampling bottle is chosen based on measured illuminance, soil temperature, relative air humidity or relative soil moisture.

First the user must enter the planned starting time and date. In the next menu the user is prompted to enter the total number of intervals (number of bottle changes) including pauses (position 13). In the last step each interval must be exactly defined in terms of duration lasting from 15 minutes to up to 99 days. After entering data for the last interval, the system will turn position 13 under the trapping bottle before starting with the program at the defined starting time.

#### 5.6.2 Serial communication

Depending on the number of collection intervals, defining the sampling program manually can be very time consuming. With the Graphical User Interface (GUI) ArduinoGUI.exe a desired sampling program can be defined in advance. The GUI is only available for windows machines on which the Arduino IDE is installed. To use all functions of the serial communication, the microcontroller must be connected to the windows machine using the USB ports of the two systems. Like the 'Info' and 'Config Data' menu of the in-built manual control system, the user can now read information but also change data via serial communication by clicking on 'Info' and 'Config Data' respectively (Figure 3). Additionally, the 'Create Program' menu of the GUI interface can be used to generate a program file (.txt format) in advance. The text file contains a data string consisting of the defined starting date and time, number of intervals and detailed information about duration of each sampling interval. The text file can be uploaded to the ArduinoGUI in the 'Start Program' menu choosing the option 'Manual Program' and transferred to the microcontroller by clicking 'Transfer Data' as soon as a connection between windows machine and microcontroller is established. Next to the 'Manual Program' option the user can choose the option 'Automated Program'. In comparison to the built-in manual control system, the user can choose among four different predefined programs, each controlled by a different sensor. In detail the following programs are available: (1) Illuminance Program, (2) Soil Temperature Program, (3) Soil Moisture Program, (4) Air Humidity Program. The automated multisampler will choose the collection bottle according to the measured light intensity (lux), soil temperature (°C), soil moisture (%) and air humidity (%) respectively. Again, the user only has to define the sampling starting date and time, choose the program and click on transfer data. Thus, insects flying at a defined time interval, daylight intensity, ambient temperature or humidity can be collected automatically over longer collection periods.

With either program (manual and automated) the system will turn position 13 under the trapping bottle and will be in stand by until the defined starting time.

# 5.7 Energy supply

As already mentioned, the system is based on a low-power consumption microcontroller which comes with internal pull-up resistors and a voltage regulator, meaning that the system can be run on a 12 V DC battery. Over nighttime and in the absence of sunlight the system relies on power from the rechargeable battery. To allow for continuous running of the system, the AMMOD multisampler is equipped with an integrated automatic battery charger, which is connected to two solar panels each providing up to 30 W under perfect conditions.

# 5.8 Outlook

The trap presented here has been used in various field experiments with good success. The system enables new study designs e. g. continuous insect activity tracking depending on illuminance, temperature, but also to study circadian rhythms. Thereby the AMMOD sampler can significantly contribute to broadening our understanding of underlying causes for variations in insect activity patterns which in the long term could provide the basis for models calculating the effect of climate change.

# **References**


# **6 Bioacoustic data acquisition and species recognition**

Benjamin Werner, Olaf Jahn, Mario Lasseck, Karl-Heinz Frommolt

#### **Abstract**

This chapter presents a solution for automated long-term acoustic monitoring. The four-channel sensor is based on available commercial components. The configuration allows the recording of acoustic signals in the field. It is suitable for on-site data preprocessing and transmission to a base station for in-depth analysis.

The classifier BirdID-Europe254 is presented. It provides probability values for 254 common European bird species in sound recordings and audio streams. Furthermore, we describe a web interface for further analysis of raw classification results. By setting species-specific thresholds for recording sites, it is possible to obtain information on the presence–absence and acoustic activity patterns of species in large audio datasets. The tool also supports manual validation of classification results by providing full or random samples of audio snippets with confidence values above the selected threshold.

# 6.1 Introduction

Numerous animal species use acoustic signals to communicate with each other, especially for the delimitation of territories and in the context of reproductive behaviour (Bradbury and Vehrencamp 1998). The sounds, which in some cases can be heard over several hundred metres, provide a good basis for non-invasive monitoring of the presence of species. In terrestrial habitats, birds and bats have been particularly well studied.

First attempts to use acoustic recordings to document nocturnal bird migration date back to the 1950s (Graber and Cochran 1959). However, only digital recording techniques and analysis methods made it possible to quantitatively assess nocturnal bird migration based on flight calls (Farnsworth 2005; Hill and Hüppop 2008).

The use of autonomous sound recorders for continuous acoustic monitoring began in the 1990s. This method is of particular importance for bats. Not surprisingly, in the period 1992–2018, 50 % of all publications on passive acoustic monitoring (PAM) were related to bats (Sugai et al. 2019). However, most stationary long-term acoustic surveys focus on the audible range. An impressive example is the Australian Acoustic Observatory (Roe et al. 2021). In most PAM studies energy-efficient, battery-powered devices are used and audio recordings are usually made in mono or stereo. Stereo recordings facilitate the analysis of the recordings by listening because some spatial separation of sound sources is achieved (Rempel et al. 2005).

Four-channel recordings were used for bittern monitoring (Frommolt and Tauchert 2014). The configuration allowed the direction of calling birds to be estimated and multiple individuals to be tracked and distinguished. A similar setup was used for long-term monitoring of wetland birds (Frommolt 2017). Here the directionality of the microphones allowed the observer to listen in several directions. Other researchers used MEMS-microphone arrays for sound source separation of bat and bird vocalisations (Suzuki et al. 2018; Hochradel et al. 2019).

Various attempts have been made to transfer acoustic recordings wirelessly to a remote server, e. g. ARBIMON project (Aide et al. 2013). In a recent project, Raspberry Pi microcomputers have been used to record, analyse, and transmit audio recordings (https://www.birdweather.com). An overview on the use of PAM for estimating bird density was provided by Pérez-Granados and Traba (2021).

Algorithms for automated analysis of animal sound recordings have undergone rapid development in recent decades. Encouraging results have spawned an active field of research (Towsey et al. 2012; Ross and Allen 2014; Kahl et al. 2019; Kahl et al. 2021). An important impulse has also been given by several national and international research projects dedicated to this topic (Ganchev et al. 2012; Fagerlund and Laine 2014; Potamitis et al. 2014).

In recent years deep-learning methods based on training artificial neural networks are gaining importance and increasingly replace classical pattern recognition and machine learning methods like logistic regression, support vector machines, hidden Markov models and decision trees; see Priyadarshani et al. (2018) for a comprehensive overview on conventional methods. This trend can be observed in submissions to annual large-scale identification challenges, such as the LifeCLEF bird identification task (Goëau et al. 2018; Kahl et al. 2019; Kahl et al. 2021). Currently the most successful approach to identify large numbers of different audio events or individual species is based on training convolutional neural

networks (CNNs) on spectrograms, a visual 2-D representation of the acoustic signal. Other, less common methods use a combination of recurrent and convolutional neural networks (e. g. Çakir and Virtanen 2017), replacement of CNNs with transformers (Puget 2021), or work directly on raw audio data (Kong et al. 2019).

Deep-learning methods have several advantages over conventional methods. Single models can be trained in an almost end-to-end fashion where features are learned directly from the data rather than being hand-crafted by experts. They also show a significant performance increase if large numbers of species or classes of audio events need to be identified. A recent review on deep-learning methods for computational bioacoustics is given by Stowell (2022).

# 6.2 Material and methods

# 6.2.1 Sensor

The sensor components have been selected with the following goals in mind:


Since the area to be monitored grows quadratically with the range, we decided to use a high-quality sensor array, even though this solution is more cost-intensive than standard audio recording devices. The main advantage of using a single long-range sensor over a larger number of close-range sensors is that significantly less data needs to be stored for virtually the same information. It also facilitates maintenance and data collection in the field.

# 6.2.2 Hardware

To obtain a good spatial resolution, we chose a setup with four cardioid microphones arranged in a plane and each microphone oriented in one of the four cardinal directions. The plate above the microphones protects them from rain (Figure 1). Low-noise microphones and audio interfaces enabled recording of audio signals at the greatest possible distance (Table 1). In this study, we operated three four-channel audio sensors, two at Britz (BRITZ) near Eberswalde, Brandenburg, and one at Melbgarten (MBG), Bonn, North-Rhine Westphalia. At Britz we also tested two types of ultrasonic microphones. In addition to the AMMOD sensors, we continuously operated one commercial SM4 (Wildlife Acoustics) stereo recorder each at Britz (BRITZ01) and Melbgarten (MGB01).

As a computing platform we chose the widely used Raspberry Pi (Figure 2). It offers a large community with good documentation and support. The option to run Python supports easy implementation of new features. It also has enough computing power to perform preliminary on-site data analysis. However, a Raspberry Pi 4 with at least 4 GB RAM is required for this purpose. We also recommend using a StromPI 3 with Battery Shield. The expansion shield allows setting date and time after rebooting even without an internet connection. This

**Figure 1.** Four-channel audio sensor BRITZ02. The ultrasound sensor of BRITZ03 was placed at the tower in the background.

is mandatory for stable and correct file naming. It also allows bridging short power failures and proper shutdown of the system in the event of a longer power outage. In addition, we added an external USB WiFi adapter (TPLINK TL-WN722N WLAN-Adapter) to connect an external directional antenna. This allows the sensor to be placed at a greater distance from the base station. A robust low-power external 1TB SSD hard disk (SanDisk) was used for storage.


**Table 1.** Details on hardware components of the AMMOD bioacoustic modules.

**Figure 2.** Audio interface connected to Raspberry Pi.

# 6.2.3 Software

#### 6.2.3.1 Settings

#### **Recording Operation**

The control software was implemented in Python as system services based on the Raspbian operating system. All settings were made via the 'config.yaml' file. An example configuration file can be found in 'config-default.yaml'. Setup instructions are available in the repository (https://code.naturkundemuseum. berlin/tsa/ammod-acoustic-sensor).

#### **ammod-audio.service**

Takes care of audio recording and, if requested, the recording is also analysed. There is the option to adjust the recording regime to the needs of the project, including continuous recording. Likewise, the recording schedule can be adjusted to the time of sunrise and sunset.

#### **ammod-autossh.service**

A small service that starts 'autossh' to enable the 'ssh' connection to the server, providing administrator access to the sensor for maintenance. The server address, port, and user can be changed in the service file. The Readme file in the repository provides details on the setup.

#### **ammod-connection.service**

Monitors the stability of the internet connection. If the connection fails, it tries to re-establish it by restarting the network adapter. This process helps to maintain the connection to the base station in case of very weak or strongly fluctuating WLAN signals.

#### **ammod-send-report.service**

Sends a daily report to one or more email addresses of your choice. The report contains a list of recently recorded files and the most important system health information. The configuration is done via 'config.yaml'.

#### **ammod-log.service**

Logs system temperature and internet connection status.

#### **ammod-base-station-client.service**

Generates the JSON files needed for the AMMOD Cloud, both for telemetry data and audio recording data. The former are provided to the base station for transfer via COAP API. The service is also configured in the 'config. yaml' file. Note that the serial number and device ID for the AMMOD Cloud must be set.

#### **On-site inference**

If on-site data analysis is desired, a 64-Bit based system is required, such as a Raspberry Pi 4 with at least 4 GB of RAM. Species-specific thresholds can be set in a separate YAML file, see the example configuration in the 'species\_ threshold\_example.yaml' file. The classifier results are stored in the result folder of the connected USB storage. The inference results for each recording are stored as a JSON file with the following structure:

**Figure 3.** An example of the JSON file structure.

#### 6.2.3.2 Classification model

The classifiers for bird species identification are based on models described in Lasseck (2018, 2019). Most audio recordings for model training were retrieved from the Animal Sound Archive at the Museum für Naturkunde (https://www. animalsoundarchive.org) and Xeno-Canto (www.xeno-canto.org). The model used by AMMOD, BirdID-Europe254, can identify 254 common European bird species. BirdID-Europe254 is available on MfN GitLab with instructions for installation and use in various applications and platforms (Linux, Windows, Raspberry Pi). Git repository: https://code.naturkundemuseum.berlin/tsa/birdideurope254-2103. The Git repository contains detailed information on setup and usage in the Readme document.

#### **Usage**

Run 'inference.py' to analyse audio files. To set the input path, either select a file or a folder with several audio files. If set to a folder, audio files in subfolders are also analysed. Optionally, an output path to save result files may be set, e. g.:

```
python inference.py -i /path/to/audio/folder/or/file -o / 
path/to/output/folder
```
If no output path is set, result files are saved to the folder of the input path.

Audio analysis can be customised in different ways. In most cases default parameters work across a wide range of application scenarios. Default parameters for inference can be set in the 'config.py' file or changed via command line arguments. To see a list of all arguments use the command (Table 2):

```
python inference.py -h
```


**Table 2.** Complete list of command line arguments to customise BirdID-Europe254.


# 6.2.3.3 Examples for changing inference behaviour with command line arguments

Set start and end time in seconds by passing '-s' and '-e' (or '--startTime' and '--endTime') to select a certain part of the recording for inference, e. g. first 10 seconds:

```
python inference.py -i example/ -o example/ -s 0.0 -e 10.0
```
To mix a stereo or multi-channel audio file to mono before analysing it, pass '--mono'. Alternatively pass a list of channels, so inference is performed only on the selected channels. For instance, to select only the first and last channel of a 4-channel recording use '--channels':

python inference.py -i example/ -o example/ --channels 1 4

Usually, inference is successively done on 5-second intervals because audio segments of this duration were originally used for training. Optionally segment duration can be set to smaller values (between 1 and 5 seconds). This leads to a higher time resolution of output results and usually to more accurate onset/offset times of detected sound events. Smaller segment durations can also increase identification performance for some species, especially in soundscapes with many different birds calling at the same time. However, in some cases, performance might decrease because certain birds and song types need longer intervals for reliable identification. For example, to set segment duration to 3 seconds (default duration in BirdNET) use argument '-sd' or '--segmentDuration':

```
python inference.py -i example/ -o example/ -sd 3
```
Overlap of analysed segments can be set in percent via '-ov' or '--overlapInPerc'. For instance, to analyse 5-second segments with a step size of two seconds use an overlap of 60 %:

python inference.py -i example/ -o example/ -ov 60

The repository includes three models of different sizes. All models are trained on the same data but differ in the number of layers and parameters. Larger models usually give better identification results but need more computing resources and time for inference. If run on small devices like Raspberry Pi or in real-time, the small model might be the better or even only option. Results of different models can be assembled in a post-processing step (late fusion) to further improve identification performance. The small model uses an Efficient-Net B0, the medium model an EfficientNet B2 and the large model an Efficient-Net V2 backbone. To switch model size use '-m' or '--modelSize':

```
python inference.py -i example/ -o example/ -m small
```
If input is a folder with several audio files, analysis can be accelerated by preprocessing multiple files in parallel. Use '-f' or '--batchSizeFile' to specify how many files to read and preprocess at the same time. If one or multiple GPUs are used to accelerate inference, the number of batches to be processed in parallel by the GPUs may be passed via '-b' or '--batchSizeInference'. The maximum number of batches depends on the selected model size and the available memory (RAM) of the GPUs. Choose the number of CPU threads to prepare the audio segments in parallel for inference with '-c' or '--nCpuWorkers'. For single short audio files, small values for 'batchSizeInference' and 'nCpuWorkers' should be chosen (if only a single file is passed, 'batchSizeFile' is set to 1 by default). If files with large durations or a folder with many files are passed, batch sizes and number of CPU workers may be set as high as computing resources allow to increase processing speed.

Analysis results can be customised in various ways. Different output and file formats can be selected. Output files have the same name as the original audio file, but with different extensions and/or file types, depending on output type and format. A list of desired output files can be passed via '--fileOutputFormats'. The following formats can be selected:

'raw\_pkl'

Raw results can be saved in a dictionary as a binary (pickle) file for further post-processing in Python. The result dictionary holds information and data accessible via keys, e. g. 'modelId', 'fileId', 'filePath', 'startTime', 'end-Time', 'channels', 'classIds', 'classNamesScientific', 'classNames-German', 'startTimes', 'endTimes' and 'probs'. With 'startTimes' and 'endTimes' the start and end times for each analysed audio segment can be accessed. Likewise, use 'probs' to access a three-dimensional NumPy array that holds prediction probabilities for all channels, audio segments, and species. It has the shape: [number of channels, number of time intervals, number of species]. Raw results can also be saved as Excel and/or CSV files:

'raw\_excel' / 'raw\_csv'

In Excel files, results for each channel are saved in separate sheets. For CSV, results for each channel are saved in separate files with the channel information added to the filename (e. g. filename\_c1.csv for first channel results). Output files consist of a header line and rows for each time interval. Each row has two columns for start and end time of the audio segment and additional 254 columns for each species, listing their prediction probability per time interval. A total of four output formats can be selected for species labels:

'labels\_excel' / 'labels\_csv' / 'labels\_audacity' / 'labels\_raven'

Besides saving raw data, results can also be post-processed and aggregated to allow user-friendly access to the more relevant information on what species was identified at what time within the audio recording. So instead of outputting probabilities for all species and time intervals, labels are created only for those species and time periods where the model's prediction probability exceeds a minimal confidence threshold. Resulting label files can be saved in the following formats: Excel, CSV, Audacity label track and Raven selection table. The following is an example of saving the results as raw data and aggregated labels in Excel format:

```
python inference.py -i example/ -o example/ --fileOutputFor-
mats raw_excel labels_excel
```
The minimum confidence value (prediction probability threshold) necessary to decide if a species was identified can be set by passing a value between 0.01 and 0.99 to '--minConfidence'. Classifications below the threshold are

not included in the output label file. Higher confidence values lead to better precision, lower values to better recall rates.

For label files, predictions of multi-channel audio files are pooled or aggregated by taking either the mean or maximum value of each channel. The pooling method can be selected by passing 'mean' or 'max' to '--channelPooling'.

Species labels are provided for each time interval analysed. With '--mergeLabels' adjacent or overlapping time intervals with labels of the same species are merged.

How species are named can be customised by passing a name type via '--nameType'. Possible types/languages are: Scientific ('sci'), English ('en'), German ('de'), short identifier ('id'), or index number ('ix').

Output in result files can be further customised by passing additional arguments. To save storage space, the size of binary pickel files can be reduced by passing '--useFloat16InPkl' to store 16 Bit instead of 32-Bit floats. For float values in text output files the number of decimal places can be changed via '--outputPrecision'. The columns in raw data text files and the label rows in label files within the same time interval can be sorted in descending order regarding species prediction confidence by passing '--sortSpecies'. With '--csvDelimiter' select the delimiter used in CSV files.

By default, all 254 species are predicted. The species are listed in the file 'species.csv' (including scientific and common names in different languages). A custom species list can be created to filter output results by modifying the original CSV file (Table 3). This might be useful if the species composition at a given recording site is known or if only a certain subset of species is of interest to your project. In these cases, we recommend making a copy of the original file in order to remove all rows with species that are not needed to appear in the output label files. Make sure, however, to not change the data in the remaining rows and that there are no empty lines between rows. The path to the modified species file can then be specified via '--speciesPath /path/to/folder/ or/file'. If a folder is passed, the custom CSV file needs to be placed in that folder and named 'species.csv'.

The custom species CSV file can also be used to assign individual minimum confidence thresholds to certain species (Table 3). For that purpose, add custom threshold values (between 0.01 and 0.99) at the end of the rows for those species for which a threshold other than the global minimum confidence threshold should apply. In the example shown in Table 3, a minimum confidence threshold of 0.85 is set for the Song Thrush *Turdus philomelos*, while for the other three species the global minimum confidence threshold is used, either the default value of 0.75 from 'config.py' or the value passed via '--minConfidence'. By using individual thresholds, it is possible to control the precision/recall trade-off for each species.

**Table 3.** Custom species CSV file. Only the species listed are considered in the classification process.


Errors during analysis are saved to an error log file. Its path is specified via '--errorLogPath'. If no path is assigned, the file is named 'error-log.txt' and saved to the output directory.

With '--terminalOutputFormat' the terminal output is controlled. By default, a summary is printed for each input file listing the top three species with the highest confidence scores identified in the entire recording.

# 6.2.4 Recording and analysis pipeline

On site, audio signals are recorded and optionally analysed. The raw data, sensor information, and preliminary results are sent to the base station. Any updates can be made through a remote maintenance connection to the station. A backup of the raw data is also saved on local storage. Depending on the size of the storage and recording schedule, the data is periodically collected and taken to a data centre for further analysis. The in-depth analysis can lead to new insights and improvements, by fine-tuning species-specific detection thresholds or by using larger versions of the classifier models. The results and metadata are uploaded to the AMMOD Cloud (Figure 4).

**Figure 4.** Workflow for audio data acquisition, processing, and storage.

#### 6.2.4.1 Software components

Passive acoustic monitoring projects often result in big collections of audio data that are difficult to handle. To address this challenge, a service with two components was developed to make the process more efficient (Figure 5):


The software components of the service can be found in the Git repository (https://code.naturkundemuseum.berlin/tsa/monitoring-data-analyze-service).

#### 6.2.4.2 Web service

The web service consists of two parts. The front end is accessible via a web browser. It displays all background tasks on the landing page (Figure 6). The left navigation menu lists all imported collections ordered by recording station, year of data acquisition, and classification model used for bird identification. The user can filter

#### **6 Bioacoustic data acquisition and species recognition**

**Figure 5.** The two components of the Web Service share the database and file system.


**Figure 6.** Landing page of the web service.

them to find the desired collection. All jobs are displayed on the Dashboard view, so the user can see whether the background job is already done or still running.

Clicking on one of the entries in the navigation menu switches to the view of a collection (Figure 7). Basic statistics (Stats) as well as eligible query parameters are shown for the selected collection.


This view comprises five components.

**Figure 7.** Collection view, showing statistics and eligible query parameters for the selected dataset "BRITZ01: 2019".

	- Selected species can be provided with a database index to speed up queries for these species (Figure 8).
	- Recording period
	- Number of recordings


**Figure 8.** Pane showing all species indexed for faster analysis.


**Figure 9.** Stats pane with basic statistics about the selected audio dataset.



**Figure 10.** Query Parameter pane, offering options for efficient data analysis.

In this pane the user can set parameters for different queries (Figure 10), specifically:

	- "Calculate bin sizes" produces a histogram for all predictions within the selected period and confidence intervals (cf. Figure 24).
	- "Get daily histograms" creates histograms for each day within the selected time interval. Only bins within the selected confidence interval are considered (cf. Figure 23).
	- "Get predictions" generates a CSV file with all predictions within the selected parameters.
	- "Query" counts all prediction windows within the selected parameters and opens the Query Results pane.


**Figure 11.** Query results, offering options to draw samples of positively classified audio snippets for manual validation by an expert.


**Figure 12.** The job section shows the list of running and completed tasks.

#### 6.2.4.3 Setup

Requirements are a Linux system with docker and docker-compose installed. The docker-compose file allows easy setup of the service. For detailed instructions, look for the Readme file in the repository (https://code. naturkundemuseum.berlin/tsa/monitoring-data-analyze-service/-/blob/main/ README.md).

## 6.2.4.4 Import Script

For better management of monitoring data, it is best to organise them by recording location and year. Creating separate collections for each year keeps the number of entries in the database tables manageable, thus making it easier to search and retrieve data.

To import the data, run an import script. The corresponding settings need to be specified in a config file written in YAML format, cf. the Template in the repository (https://code.naturkundemuseum.berlin/tsa/monitoring-dataanalyze-service/-/blob/main/import/config/config-template.yaml). The following settings are available:

	- ammod: STATION\_"%y%m%d\_%H%M%S"
	- custom: If needed parse your own filename format. Add another parsing function to 'tools.py'. Add this function name to the dictionary 'parse\_filename\_for\_location\_date\_time\_function\_dict' (https://code.naturkundemuseum.berlin/tsa/monitoring-data-analyze-service/-/blob/main/import/util/tools.py#L75).

# 6.3 Results

## 6.3.1 Preliminary inventory of bird species

Up to this date, 104 bird species in Britz and 54 in Melbgarten have been documented by sound recordings. For several other bird species, no reliably identifiable voucher recordings have been found yet (Britz = 21, Melbgarten = 6 species), mostly due to poor signal quality. However, we anticipate that audio signals from numerous other species could be found in the AMMOD soundscape recordings through systematic screening.

The species lists were complemented by means of classical observation. Systematic breeding bird surveys were carried out in 2021 (Melbgarten) and 2022 (Britz). In addition, unscheduled observations up to a distance of 5000 m from the AMMOD stations were taken into account. For Britz, reliable observations by third parties, some dating back several years, were considered in addition to project-related records. In the near vicinity of Melbgarten, i. e. up to a distance of 500 m from the recording site, observations took place in 2021 and 2022. For the wider area of the Melbgarten, reliable observations obtained from 2010 to 2022 were also considered.

In Britz, 49 bird species were detected by territory mapping of a 16-ha area with 7 visits. Together with the unscheduled observations, a total of 74 bird species were detected in the close vicinity of the recording station. Visual records of two additional species were reported by third parties. Only 11 of the species observed within a 500 m radius have not been found in sound recordings to date. Observations within a radius of up to 5000 m revealed 47 additional bird species. Of these, verifiable voucher recordings could not yet be found for 24 species. In total, 139 bird species could be reliably identified for Britz, 16 of them so far only in the sound recordings, but not by classical audiovisual observations.

In the Melbgarten area, 44 bird species were found by territory mapping of a 25-ha plot during 5 visits. Together with the unscheduled observations in 2021 and 2022, the number of species observed within a 500 m radius increases to 46, of which 10 species have not yet been found in sound recordings. Within a 5000 m radius, at least 68 additional bird species have been detected in the last 13 years, 53 of which have not yet been found in the sound recordings. Most of the additional bird species found through classical observation were migrants, roaming non-breeders, and species with territories outside the range of the AMMOD recording site. In total, 117 bird species could be reliably detected in the Melbgarten and its wider vicinity by the end of 2022, three of them so far only in the sound recordings but not by classical audiovisual observations.

### 6.3.2 Four-channel audio recordings

Natural soundscapes are often complex. Four-channel recordings with high-quality directional microphones make it easier for the listener to distinguish between sound sources, since the signal amplitude is highest at the microphone pointing in the direction from which the sound originated (Figures 13, 14).

The difference in signal amplitude between channels can be exploited for computing directional spectrograms (D-SPEC) in which colour represents direction of arrival (DOA) (Figure 15, top), see Baggenstoss et al. (2021) and Chapter 7 for details. Subsequently spatial processing is used to achieve grouping of nearby time-frequency bins into the same cluster (Figure 15, bottom). The directional

#### **6 Bioacoustic data acquisition and species recognition**

**Figure 13.** Spectrogram of a four-channel recording. BRITZ02, 13 May 2022, 08:02 h; spectrogram settings – downsampling to 24 kHz, high-pass filter up to 950 Hz, DFT size 512 samples, grid spacing 46.9 Hz, Window Hann, overlap 50 %.

**Figure 14.** Oscillogram of the recording presented in Figure 13. Since each microphone faces in another cardinal direction, the amplitude of audio signals differs considerably between channels.

clusters can be separated from each other (Figure 16, Table 4). Since in the directional clusters there are significantly fewer overlapping audio signals from different bird individuals and species, sounds emitted by specific individuals can be identified much more easily than in the original spectrogram (cf. Figure 13). However, some of the low-amplitude signals may be lost in the processing procedure.

**Figure 15.** D-SPEC of the recording shown in Figures 13 and 14 before (top) and after spatial processing (bottom).

**Figure 16.** Source separation into directional clusters, exemplarily shown for the recording presented in Figures 13 through 15 (Baggenstoss et al. 2021; Chapter 7 in this book). See Table 4 for details on the sound sources associated with each cluster.


**Table 4.** Sound sources associated with the directional clusters shown in Figure 16. Some clusters contain audio signals of more than one species because they roughly sang in the same direction.

# 6.3.3 Ultrasonic recordings

The ultrasound sensor of AMMOD station BRITZ03 was deployed in a small forest glade. It was operated only between 14 and 26 July 2021, since hail destroyed the protective membrane of the microphone, which led to a device failure. A total of 686 minutes were recorded, of which 293 minutes contained bat activity found by manual screening (Figure 17).

**Figure 17.** Spectrogram of an ultrasonic recording made with the microphone Dodotronic Ultramic 384. Besides echolocation and social calls of Leisler's Bat *Nyctalus leisleri* there is a frequency band around 8 kHz, representing stridulation sounds of Great Green Bush-cricket *Tettigonia viridissima*. BRITZ03, 16 July 2021, 21:50 h; spectrogram settings – downsampling to 192 kHz, FFT size 1024, Frame size 50 %, Window Hann, overlap 87.5 %.

Another ultrasound device, BRITZ07, was deployed inside the forest, starting on 1 September 2022. In total 610 minutes of ultrasound recordings were revised manually, of which only 54 minutes contained bat activity. In general, the signals were weaker than those recorded at station BRITZ03 and more affected by reverberation. Figure 18 shows one of the better recordings.

**Figure 18.** Spectrogram of an ultrasonic recording made with a Pettersson M500-384 mounted in a funnel, showing echolocation calls of Common Noctule *Nyctalus noctula*. BRITZ07, 7 September 2022, 21:50 h; spectrogram settings as in Figure 17.

# 6.3.4 Performance assessment

Manually annotated recordings were used for the performance assessment of the classification model BirdID-Europe254. In total 19,844 audio signals were labelled in 99 soundscape recordings, most of which were collected at the AMMOD study sites Britz and Melbgarten. Since it was not possible to reliably identify all the signals to species level, in the end 14,683 annotated signals were available for performance tests (Table 5).

All visible and audible signals were labelled with bounding boxes in the spectrograms, even very faint sounds (Figure 19). The boxes also included the harmonics and reverberations of the signals (Figure 20). Besides species identity and identification level, additional tags were used, e. g. for vocalisation type. Since corresponding labels were not available in the training dataset used for the development of BirdID-Europe254, the latter annotations were ignored in the present work.

**Table 5.** Number of annotated signals per animal class and identification level (ID1 = certain, ID2 = probably correct, ID3 = uncertain). Only those signals that could be safely identified to species level (ID1) were used for validation. The training dataset was explicitly not used for building the BirdID-Europe254 classification model.


**Figure 19.** A fully annotated stereo recording used for performance assessments. The bounding box #638, highlighted in red in the lower spectrogram, does not refer to a labelled audio signal, but to the segment shown in Figure 20. BRITZ01, 19 May 2019, 08:45 h.

Robust monitoring of bird vocalisations requires reliable species records, making high precision of classification results crucial. However, there always is a trade-off between precision and recall rate (Figures 21, 22). In other words, a higher threshold for confidence may lead to many correct-positive classifications excluded from analysis, while a lower threshold may result in unacceptably high levels of false-positive classifications. Re-

Benjamin Werner, Olaf Jahn, Mario Lasseck, Karl-Heinz Frommolt

**Figure 20.** Detail of the spectrogram area highlighted red in Figure 19. Two 5-second segments are shown to illustrate the length of BirdID-Europe254 classification windows in relation to common bird songs and calls. Since in practice the analysis is computed with a step size of two seconds, we have omitted the respective overlapping classification windows for the sake of simplicity. Spectrogram settings – sample rate 48 kHz, DFT size 1024 samples, grid spacing 46.9 Hz, Window Hann, overlap 50 %. The spectrogram was truncated above 14 kHz because only noise was present in the high-frequency range.

garding the performance of BirdID-Europe254 on AMMOD validation data (Figure 21), for some species, such as Mistle Thrush *Turdus viscivorus* and Marsh Tit *Poecile palustris*, performance dropped very fast when using a lower threshold for confidence, while in some other species, namely Great Tit *Parus major*, precision first increased before dropping off again. Only for some species it seems possible to find an acceptable compromise between precision and recall, e. g. Common Redstart *Phoenicurus phoenicurus*, Common Chaffinch *Fringilla coelebs*, Wood Warbler *Phylloscopus sibilatrix*, and European Robin *Erithacus rubecula*. However, since in the AMMOD validation dataset the number and quality of labelled audio signals differs considerably between species, there is the suspicion that at least some performance results may be biased (see Discussion). Not surprisingly, BirdID-Europe254 performance results look much better when validation is performed mainly with strong and non-overlapping audio signals typical of Xeno-Canto derived datasets (Figure 22). High average precision values are then achieved for most species. The strikingly low values for Eurasian Blackcap *Sylvia atricapilla*, Eurasian Siskin *Spinus spinus*, and Wood Warbler *P. sibilatrix* require further analysis. However, we suspect that these are artifacts due to, for instance, small sample sizes or a data processing problem, rather than true weaknesses in the classification models.

**Figure 21.** Precision-recall curves for 24 representative species based on AMMOD validation data, computed for the classification model BirdID-Europe254. The average precision (AP) stated in the legend facilitates the correct assignment of similarly coloured curves to the correct species.

**Figure 22.** Precision-recall curves for 24 representative species based on Xeno-Canto validation data, computed for the classification model BirdID-Europe254.

# 6.3.5 Raw classification results

The raw classification results are the basis for all further analysis with respect to robust acoustic monitoring. In the first step, we made example plots for selected species, showing the distribution of classifications per day and confidence level (Figure 23). Here we chose the example of Common Redstart *Phoenicurus phoenicurus*, a long-distance migrant bird. In Germany, breeding populations usually arrive in April, though a few birds are sometimes detected already at the end of March (Südbeck et al. 2005). Considering this background, the onset of singing activity occurred on 15 April 2019. Assuming that the species was mostly absent before that date, almost all classifications in the first half of April 2019 potentially represent false-positive classifications.

In the next step, we plotted the bin-size distribution for selected species, i. e. the number of classifications over bins of confidence values, exemplarily shown here for Common Redstart (Figure 24A–D). An exponential increase in bin size for the lower part of the confidence interval is observed at both sites considered, i. e. Britz (BRITZ01) and Melbgarten (MGB01) (Figure 24A, C). This pattern is probably due to the many classification windows in which the target species is not present (false positives). However, when zooming in on bins of higher confidence,

**Figure 23.** Raw classifications for Common Redstart *Phoenicurus phoenicurus* at BRITZ01 in April 2019 plotted by date and confidence level. The onset of vocal activity of the local breeding population on 15 April 2019 is clearly visible.

#### **6 Bioacoustic data acquisition and species recognition**

**Figure 24.** (**A–D**) Bin-size distribution of raw classifications for Common Redstart *Phoenicurus phoenicurus* at Britz (BRITZ01) (**A, B**) and Melbgarten (MGB01) (**C, D**) in spring 2022 (1 March to 30 June). X-axis: confidence level with a resolution of 0.02 (2 % steps), y-axis: number of classifications. Top (**A, C**): Bin-size distribution when the entire confidence interval is considered (≤ 0.02 to 1.0). Total number of classifications ‒ BRITZ01, n = 1,639,352; MGB01, n = 1,671,813. Bottom (**B, D**): Zoom-in on bins with confidence levels ≥ 0.2.

there is a striking difference in patterns between sites. At BRITZ01 the increase in bin size towards higher confidence values is a strong indication for the species being present and abundant, while at MGB01 the species seems to be almost completely absent, as confirmed by manual validation and field surveys (Figure 24B, D). That is, at Melbgarten almost all classifications obtained for Common Redstart represent false positives, even those with rather high prediction probabilities.

#### 6.4 Discussion

Passive acoustic monitoring (PAM) is a complex task, and thus accompanied by formidable challenges. With the aim to provide some information on how to address these challenges, we decided to focus our discussion on the lessons learned during the implementation of AMMOD.

# 6.4.1 Importance of interdisciplinarity

Team composition is crucial for the success of projects aiming at the development of hardware and software tools for PAM. Since acoustic monitoring of biodiversity concerns diverse aspects of life sciences, engineering, and logistics, multidisciplinary teams need to closely cooperate for data acquisition, device maintenance, software development, database management, performance assessment of classification models as well as analysis, interpretation, and validation of classification results. For instance, only due to the permanent interchange of ideas among computer engineers and bioacousticians it was possible to develop the web interface for efficient analysis of raw classification results presented here.

# 6.4.2 Selection of the recording equipment

Custom-built hardware has some advantages over standard solutions. For example, it provides flexibility regarding the usage of high-quality directional microphones and wireless data transmission. Additional functionality is easily implemented, such as remote-control options and on-site data processing. However, long-term PAM studies must also consider other important aspects such as power supply, protection against vandalism, and the cost of purchasing and maintaining equipment and software. Hence, there are also many arguments in favour of using commercial audio recorders. If standard equipment is chosen, we strongly recommend energy-efficient stereo recorders with weatherproof electret microphones, for two reasons: Compared to mono recordings, stereo recordings provide some spatial information that facilitates data processing in many ways, whether for manual or computer-aided analysis. Equally important, electret microphones produce less self-noise and deliver much better audio quality than MEMS microphones, increasing the range monitored by each sensor unit.

# 6.4.3 Use of four-channel recordings

In bioacoustic research, microphone arrays have been used to localise and track animals in the field (Rhinehart et al. 2020). Recently it was demonstrated that even an array consisting of only four directional microphones supports the separation of animal sound signals that overlap in time but originate from different directions (Baggenstoss et al. 2021) (cf. Figures 13–16, Table 4). Consequently, individual songs and calls can be reliably assigned to a sequence, even if they are superimposed by other sounds. Although further studies are needed for the wide practical application of this method, it clearly has an enormous potential for research on biodiversity, ecology, and animal behaviour. For example, the directional spectrograms may inspire the development of tools for signal annotation as well as for automated identification of species and single individuals. However, a crucial prerequisite for the widespread use would be the availability of weatherproof directional microphones, making sound-wave-deflecting constructions for rain protection superfluous (Baggenstoss et al. 2021; cf. Figure 1). For technical reasons, the development of weatherproof directional microphones is very challenging and thus requires the involvement of hardware engineers with experience in microphone design and acoustics.

#### 6.4.4 Training and validation datasets

A critical bottleneck for the development of robust pattern recognition models has been the availability of training and validation data (Jahn et al. 2017; Morfi et al. 2019). Soundscape recordings contain many acoustic signals that first need to be labelled, identified, and annotated (Figures 19, 20). This is a tedious task and requires expert knowledge and time. As resources are rare, the principal strategy for obtaining labelled audio recordings has been the extraction of audio files from public citizen-science and institutional animal sound collections. For the development of BirdID-Europe254 about 72,000 sound recordings for 254 European bird species were compiled from different sources, including the Animal Sound Archive at the Museum für Naturkunde (https://www.animalsoundarchive.org) and Xeno-Canto (www.xeno-canto.org). However, most of these recordings have only simple annotations, i. e. the identification of the principal species or species lists at best. Many of these recordings contain additional sound signals of competing species and other sound sources that are not annotated. This is problematic, as incomplete annotation of the training data can lead to a significant reduction in recognition performance. Therefore, considering that tens of thousands of manually annotated sound recordings may be required to develop successful recognition models (Kwok 2019), it is advisable to assign more than one bioacoustician to this task, especially if several animal groups such as birds, mammals, and insects are to be studied as part of a single project.

We emphasise that even if fully annotated validation files are available, this does not guarantee objective performance evaluation of classification models. The output may be biassed by the effect of long-tailed distribution of audio signals in soundscapes (Horn and Perona 2017). The reason is that few sound classes have many annotations and many classes are represented by only a few examples, particularly rare species and uncommon vocalisation types. Similarly, uneven distribution of sound quality can affect performance assessment results. A detailed analysis of BirdID-Europe254 precision-recall curves for AMMOD validation data revealed that at least some of the strange-looking curves are the result of such biases and do not reflect the actual performance of the classification models (Figure 21). For example, the rapid decrease of precision for the Mistle Thrush *Turdus viscivorus* is caused by the long-tailed distribution regarding signal quality, with many low-amplitude song phrases and very few high-amplitude signals annotated. In other species, strange effects are caused when the validation dataset accidentally contains a higher proportion of rare call and song types than the training dataset. This was likely the case for the Great Tit *Parus major* for which precision first increased before dropping off again. For both species the precision-recall curves look very differently when assessed with Xeno-Canto data (Figure 22), where recordings with high-amplitude signals of commonly heard vocalisation types outnumber recordings of poor quality and rarely heard call and song types. Conclusively, it is not enough to assess classification results only on the basis of allegedly objective validation datasets. Instead, representative random samples of classification results should be drawn to manually determine precision for each target species, recording station, and monitoring period.

#### 6.4.5 Interpretation of raw classification results

In multilabel classification, each classification window is assigned a confidence value for the presence of each species included in the model. This means that for a given number of audio recordings, the number of classifications is the same for all species. Because classification windows in which a particular species is highly unlikely to occur receive much lower confidence values than those in which the species is likely to occur, the bin-size distribution typically takes the form of an L lying on its long side (Figure 24A, C). Consequently, the short side of the L contains almost exclusively false-positive classifications, while the long side may also contain correct-positive classifications. To get a better idea of the number of classification windows that may contain correct-positive results, it is necessary to zoom in on the range of higher confidence within the bin-size plots (Figure 24B, D). For Common Redstart *Phoenicurus phoenicurus*, the difference in bin-size distribution between sites where the species regularly occurs (e. g. BRITZ01) and those where it does not (MGB01) is striking. Therefore, based on bin-size plots we anticipate the development of a tool for automated analysis of the presence/absence status of species. To this end, several aspects should be considered:


# **Acknowledgments**

This work was sponsored by the German Federal Ministry of Education and Research (Grants 01LC1903F).

# **References**


Artificial Intelligence: Theories and Applications – 7th Hellenic Conference on AI, SETN 2012, Lamia, Greece, 190–197. https://doi.org/10.1007/978-3-642-30448-4\_24


# **7 Directional Spectrogram (D-SPEC) and Signal Source Separation: software description and operational guide**

Paul Baggenstoss, Frank Kurth

# 7.1 Overview

This software description is based on the technical paper that provides motivation and a complete technical and mathematical description of the processing (Baggenstoss et al. 2021). The correspondence between software and specific equations from the paper are indicated. A neural-network based implementation of the directional spectrogram (D-SPEC), which was developed since the paper was published, is also described. Finally, a instructions for use of the software is provided.

# 7.2 D-SPEC Algorithm, Software Description

The purpose of the D-SPEC algorithm is to create a spectrogram with a color dimension that approximates the direction of arrival of the signal. It operates by frequency-domain beam-forming. A beam amplitude response is calculated at each and every time/frequency bin, and then this beam amplitude response is converted to a 3-dimensional color.

# 7.2.1 Parameter determination

The following are the key parameters that affect the calculation of D-DSPEC.


# 7.2.2 Beam-forming setup

#### 7.2.2.1 Beamforming delays

In the setup phase, the delays are calculated based on the array geometry. These time delays are then used to compute the steering vectors. We will use the case of 4 microphones to illustrate the ideas in Figure 1. The code is located in function ammod\_util.cspec(), and in colorspect.m (MATLAB/Octave). The Python code:

```
pos =(-d/2, d/2, d/2, d/2, d/2, -d/2, -d/2, -d/2)
pos = np.asarray(pos).reshape((nmic,2))
am = np.asarray( (-pi/4, pi/4, 3*pi/4, -3*pi/4) )
```
calculates the positions of microphones 1 through 4, as illustrated in Figure 1.

#### 7.2.2.2 Beam angles

The vector of angles in radians, for which a pre-formed beam will be computed is created. There are na angles and there will be na pre-formed beams in the beam response:

# beamformer beam angles angs=np.asarray(range(0,360,ang\_res))\*(pi/180) maxang=360 na=len(angs)

#### 7.2.2.3 Time distances

Next, one imagines that sources located a distance r from the array center (indicated as microphone 0 in the Figure) with angle angs[1...na] (See Figure 1). The distance from this source to each microphone is computed, and the distance to the array center is subtracted. When computing the beam former steering vectors, the absolute distance is not important, just the difference between the distances to the microphones. This is why we can subtract distance to the array center.

This is implemented by the code:

```
#
# dst[i,ia] is distance from microphone i to a distant point 
at
# angle ia minus dctr, where dctr is distance to center of 
array
```
**Figure 1.** Array geometry.

```
dst=np.zeros((nmic,na))
r=1000 # target at this range
for ia in range(na):
    ry=np.cos(angs[ia])*r
    rx=np.sin(angs[ia])*r
    dctr = np.sqrt(np.square(rx) + np.square(ry))
    for i in range(nmic):
        dst[i,ia] = np.sqrt(np.square(rx-pos[i,0]) + \
            np.square(ry-pos[i,1])) – dctr
```
#### 7.2.2.4 Steering vectors

The steering vectors are 4-dimensional (4 microphones) vectors that describe (the conjugate of) the complex value of a frequency-domain signal that would appear at the array from a source at a distant location with specified angle, as illustrated in Figure 1. It includes the microphone characteristic (normally cardioid). Naturally, this steering vector depends on angle and frequency, so has to be computed for each FFT bin and angle. There are n=N/2+1 frequency bins, and na angles The frequency corresponding to each bin is stored in the variable sfrqs[j] for frequency j=1...n. This is all calculated by the code:

```
# set up steering vectors
n=N//2+1
sfrqs=np.asarray( range(n))/N*fs
s=[None]*nmic
for i in range(nmic):
    # ft = (f)requency times delay (t)time
    ft = np.outer(sfrqs, dst[i,:]/c)
    if nmic==4:
        # microphone cardioid pattern
        mr=(1+np.cos(angs-am[i]))/2
        #mr=np.exp( -np.power((angs-am[i])/(np.pi),4))
    else:
        # omnidirec
        mr=np.ones((na,))
```

```
# steering vector
s[i] = np.exp(1j*2*pi*ft) * mr
```
#### 7.2.2.5 High-pass filtering

High-pass filtering is needed to remove low-frequency noise and is accomplished in the frequency-domain. The coefficients of a Butterworth filter are computed and the frequency response is calculated. This is done by the code:

```
# high-pass filtering
Bf,Af = scipy.signal.butter(2,cfg.fmin/fs*2,btype='high')
w,h=scipy.signal.freqz(Bf, Af, worN=N, whole=True, 
plot=None)
frsp = np.square(np.abs(h))
```
#### 7.2.2.6 Color map

A color map is initialized. This will be later used to convert beam response to RGB color. It assigns a 3-dimensional RGB color to each of the na beam angles.

```
# create hsv color map of size equal to number of angles
mp = matplotlib.cm.get_cmap('hsv')
co=np.zeros((3,na))
for ia in range(na):
    ct = mp(1.0*ia/na)
    co[:,ia] = ct[0:3]
```
This code creates a color map array that corresponds to the angle array angs described above, which is of length na and for ang\_res=3 runs from 0 to 357 degrees (but has values in radians). For reference, it can be deduced from the code that Red (color 1,0,0) corresponds to 0 degrees, green (color 0,1,0) corresponds to 120 degrees (2\*pi/3 radians), and Blue (color 0,0,1) corresponds to 240 degrees (4\*pi/3 radians).

# 7.2.3 Data Processing

Data is read in for a specified time range (tstart,tend) from the .wav files, which contains 4-channel data. The function read\_wave.read\_ wav() is reads just one channel (microphone) at a time from the specified time range. This time-series is passed to the spectrogram function that calculates a spectrogram with FFT size N and overlap 2/3. The FFT size N must be divisible by 3. High-pass filtering is done by multiplying by the filter frequency response, which is stored in the variable frsp. Some scaling is done so that the Python code has the same values as the MATLAB/OCTAVE code. The spectrogram data for channel ch is stored in the list variable B[ch].

# 7.2.4 Beamforming and Calculating the D-Spec

The code to do beamforming and calculating the D-Spec is provided in the function ammod\_utils.get\_cspect(). Memory for the D-Spec output is allocated using the variable I, which has shape (n,NSEG,3), where n is the number of FFT bins and NSEG is the total number of spectrogram time segments (called nseg elsewhere). The third dimension is the RGB color space.

The code loops over the NSEG segments, processing one segment at a time. For a given time step "ismp", the following steps are made:

#### 7.2.4.1 Multiplication by steering vector

The beams are formed for each of the angles angs[i] by taking the inner product of the spectrogram output with the steering vectors, and summing over the microphone channels:

```
for i in range(M):
    bt = B[i][:,ismp]
    xbf = xbf + bt * s[i].T
```
#### 7.2.4.2 Sidelobe suppression

To suppress side-lobes, the magnitude of the resulting complex beam outputs (xbf) is computed and then this is raised to the power 16 by repeatedly squaring. The reasons for this are explained in the paper referred to above (cf. section 7.1), section III-B. The result is stored in the variable b. Then, this is normalized so that it sums to 1 over the na beams and stored in variable bn.

#### 7.2.4.3 Conversion to color

There are two approaches, selectable by the operator, to convert the normalized beam response bn into a RGB color, and the method can be chosen using the variable cstyp, which is settable on the GUI. The "MAX" approach just finds the location of the maximum response and assigns the color corresponding to that direction (na possible directions). The normal approach (default and recommended) is to form the dot product of bn with the color map, which is described above, resulting in an RGB color. After computing the dot-product, the result is multiplied by an estimate of the combined spectrogram amplitide, stored in variable bs. This is needed because the color vector after inner product with the color map is normalized and has no amplitude information. The final result is stored in the D-SPEC output vector I. This can be directly plotted as a D-SPEC image using MATLAB/OCTAVE or Python.

# 7.3 Clustering Algorithm, Software Description

The purpose of the clustering algorithm is to assign each of the n\*nseg D-SPEC bins to a cluster, with the hope that each cluster should represent a bird individual. This is helped by the tendency of birds, which are territorial in nature, to sing from different directions. The clustering algorithm is implemented by the Python function ammod\_util.cluster\_C(). In the following sections, the code is explained in the order in which it appears in the function. The input variable C corresponds to the D-Spec image variable I in Section 7.2.4 and has dimension (n,nseg,3), where nseg is the same as NSEG.

# 7.3.1 Parameters

Relevant parameters are listed as follows, along with default values.


# 7.3.2 Pre-processing

The following steps are required prior to clustering iterations.

#### 7.3.2.1 Amplitude calculation

The color image C is converted to an amplitude image Cs of dimension (n,nseg) by taking the root-mean-square of C over the color dimension.

Paul Baggenstoss, Frank Kurth

### 7.3.2.2 Determining Location indexes

Location indexes are integers that determine the time and frequency locations within the D-SPEC to make spatial processing more efficient. Once the D-SPEC is thresholded, and the bins that passed the threshold are gathered into a single vector, the index of these bins are no longer organized as a 2-dimensional array. This makes it difficult to determine the relative positions betweeen two D-SPEC bins. To make this more efficient, we need to create the location indexes, and then when thresholding, select the subset of indexes for bins that exceed the threshold. .The D-SPEC matrix C has dimension (n,nseg,3) and the D-SPEC amplitide matrix Cs has dimension (n,nseg). If we combine the first two dimensions, we can view these matrices as dimension (n\*nseg,3) and (n\*nseg). The vectors ipos and jpos are also of dimension n\*nseg and are equal to the time and frequency bin indexes within the spectrogram. Having these index vectors makes it much more efficient to compute, for example, the euclidean distance (in bins) between two spectrogram bins. For example, let k, l be two arbitrary locations within the thresholded D-SPEC. Then, the eucliden distance between k and l is simply

```
d=sqrt( (ipos[k]-ipos[l])^2+(jpos[k]-jpos[l])^2 )
```
the index vector ijpos points back to the location within the original array. So, ijpos[k]=k, etc. It is necessary to keep this pointer into the original arrays, once the D-SPEC gets thresholded.

#### 7.3.2.3 Extracting Hue value

To determine the hue values, the input D-spec array C is normalized to produce array which has a color dimension that adds to 1. This is then converted to a 'feature' using function rgb2feat(), but all this does is extract the hue value, which is the first dimension of the HSV color representaion. The color 'feature' dimension, dim, is then equal to 1. The result is hue array Cn, which has dimension (n,nseg, dim), where dim=1. The correspondence of hue to RGB color and direction is


**7 Directional Spectrogram (D-SPEC) and Signal Source Separation: software description and operational guide**


#### 7.3.2.4 Thresholding

It would not be practical to do clustering using all the n\*nseg bins of the D-SPEC. For this reason, it is necessary to cluster only the bins with higher amplitude. The user can select two thresholding methods defined by the variable rel\_thr. If rel\_thr is true, a relative threshold is found by sorting the amplitude values in matrix Cs, and choosing the threshold that passes the desired fraction of the bins. The user sets the desired fraction using the threshold parameter thr. For example, if thr=0.1, then 10 percent of the data will be passed. To make it more efficient, the threshold is found using just 1/50 of the bins. If rel\_thr is true, the manually entered threshold thr is used directly as an ampltude threshold. Once the bins that pass the threshold are located, all the important arrays such as C, Cs, ipos, jpos, ijpos, are also compressed.

#### 7.3.2.5 Data weights

In clustering the D-SPEC bins, each bin is assigned a weight (higher amplitude bins are considered more important). The variable dwts is the weight value which is calculated as the amplitude Cs raised to the power expfac. Normally, a power less than one is used. Default is expfac=0.5.

#### 7.3.2.6 Initial clustering

The D-SPEC bins that exceeded a threshold are clustered using the k-means clustering method. Clustering is based only on the hue value.

## 7.3.2.7 Gaussian Mixture Model (GMM)

Each cluster is assumed to have a separate Gaussian mixture distribution with P components (by default, P = 2); refer to the paper mentioned above (cf. section 7.1), section IV. Assuming there are M clusters, and P components to each cluster, the GMM variables are:


### 7.3.2.8 GMM Initialization

The mean and variance of the data assigned to each cluster by the k-means algorithm is calculated. Then, to initialize the P-component GMM for each cluster, some equally-spaced locations are found within the cluster and these are used as the component means.

### 7.3.2.9 Neighborhoods

A neighborhood of a given D-SPEC bin is the collection of other bins that are near a given bin. Within a neighborhood, the 'other' bins are weighted according to how far away they are. By default, we use an exponential weight w = exp( - d^2/r^2 ) where r is the radius and d is the euclidean distance in pixels. To make spatial processing more efficient, the neighborhoods of each D-SPEC bin (that passed threshold) is computed in advance and the weights are computed. Suppose there are nc D-SPEC bins that exceeded the threshold. The neighborhood indexes and weights for D-SPEC bin i are:


# 7.3.3 Clustering algorithm

The clustering algorithm has two phases. The first phase is essentially a Gaussian mixture model estimation algorithm, which is a type of E-M algorithm. The GMM is applied to 1-dimensional features (just hue), so is unable to apply spatial information. Without spatial information, the pixels assigned to a cluster can be distributed randomly across the spectrogram. A second spatial processing phase is applied once the GMM has converged. In the spatial phase, we take a look at the pixels around a given pixel. The cluster that these neigboring pixels are assigned to will then influence the assignment of the given pixel.

#### 7.3.3.1 Variable correspondence

The GMM estimation algorithm is explained in the paper (cf. section 7.1), section IV-B-D. To help understand the code, we provide the correspondence between the mathematical notation in the paper, and some of the variables in the code. Vectors and matrices are shown with indexes that are the size of each dimension.


#### 7.3.3.2 Cluster merging

The clustering algorithm is initialized with a larger number of clusters than required, then un-needed clusters are removed. Two clusters are merged into one

if they get too close to one another. Closeness is measured by the function cluster\_dist(). Only one merging can occur per iteration.

# 7.4 Using the D-Spec clustering software

# 7.4.1 File preparation

To use a recording for the D-SPEC software, it needs to be 4-channel in .wav format. Optimal sampling rate is 24 Khz. If the file is in 48 kHz sampling rate, a utility program is provided to convert it to 24 kHz. Using MATLAB or OCTAVE, invoke the program dsamp2.m:

>> dsamp2('file1.wav','file1\_ds2.wav')

This will down-sample the file file1.wav and save it as file1\_ds2.wav. To enter the file into the file list, open the file file\_list.txt in an editor, we enter a text line with 7 fields. Using the example of file1\_ds2.wav), the line looks like

File46 24000 0.27 343 4 "file1\_ds2.wav" "None"

The first entry is a 'nickname', a short string to identify the file. The convention is to use FileXX, and make XX a number that has not already been assigned. The second entry is the file sampling rate. The third entry is the array microphone separation in meters. The fourth is the speed of sound in m/s when the recording was made. The fifth entry is the number of microphones. The sixth entry is the file name in quotes. The last entry is the name of an annotation file in quotes, or 'None' if there is none.

### 7.4.2 Launching the program

The D-SPEC software is launched simply by invoking python on the program ammod\_gui.py. In Linux:

```
$ python ammod_gui.py
```
Launching the software from Windows might differ slightly.

#### 7.4.3 Loading and saving parameters

The graphical interface comes up with default parameter settings. These can be changed by the operator, and saved in a parameter file by specifying the file name and pressing SAVE on the top of the window, or LOAD to retrieve the saved parameters.

#### 7.4.4 Loading data

To load a data set, use the File: drop-down manu to select the file. On the left of the drop-down menu appears the nick-name that was entered in the file list. Once this has been selected, the information stored in the file list (microphone separation d, speed of sound c, sample rate fs, number of microphones nmic) is transferred to the parameters displayed in the GUI.

#### 7.4.5 Setting time range

Before computing the D-SPEC, the operator can specify a desired time range (T1, T2). The total time processed at a time can be up to 30 seconds, although the time to process and cluster increases with time range. The time window can be moved forward or backwards by 2 seconds using the (+2, -2) keys.

#### 7.4.6 Computing and Plotting D-SPEC

Once the time range has been set, press the CSPEC button to compute the D-SPEC. The D-SPEC will not be displayed until the requested by the PLT button. The brightness (brt) and contrast (contr) can be changed to optimize the display, then only the PLT button needs to be pressed. Note that these values will be saved or loaded when using the SAVE/LOAD buttons. Note: never 'kill' the D-SPEC graphics window. This will cause errors.

# 7.4.7 D-SPEC parameters

Many of the parameters that affect D-SPEC are pre-determined by the array and recording, but the following can be changed: N, ang\_res, fmin, T1, T2 Some details of these are listed in Section 7.2.1. The FFT size N can be experimentally changed. Generally, for a 24000 Hz sample rate, 384 is a good choice, and this would change proportional to sample rate. The highpass-filter cuttoff 'fmin' should be set experimentally for each recording to eliminate low-frequency noise.

# 7.4.8 Clustering and plotting clusters

Once the D-SPEC has been computed, the clustering can be started by pressing 'Cluster'. Information is displayed in the system console. Once the clustering operation ahs stopped, the cluster-specific spectrograms can be displayed using the 'PLOT Clusters' button.

# 7.4.9 Clustering parameters

Operator-settable parameters include the following:


#### 7.4.10 Generating and Playing Time-Series

To generate cluster-specific time-series, press the 'Gen Timeseries' button. This causes the files 'cluster1.wav', 'cluster2.wav', etc. to be written. These files can be used for external purposes, such as classifying. Or, they can be played by selecting the desired cluster 'CLUS' and pressing 'Play'.

# 7.5 Neural Network Implementation of D-SPEC

An experimental version of a neural network implementation of D-SPEC (not including clustering) has been developed. This explains, for purposes of documentation, how it works and points to the required software.

# 7.5.1 Overview

NN-DSPEC processing is the same as D-SPEC up to the formation of beams. Instead of forming 120 beams in 3-degree increments, NN-DSPEC operates by creating 32 pre-formed beams in 11.25 degree increments, but is otherwise the same as laid out in Section 2.4. The amplitude of these beams is computed, resulting in a beam output with dimension (nseg,n,32). This is a spectrogram with 32-dimensional bins. This 32-dimensional feature is passed to a neural network to convert the 32-dimensional bind to 3-dimensional (RGB) bins. From the perspective of the neural network, it can be seen as just a 32-dimensional feature with nseg\*n samples. The neural network itself is very simple. There are 5 network layers, each has 32 neurons except for the last that has 3 neurons. Each layer has the soft-plus activation function. The network is simply trained to re-create the ground-truth which is an artificial RGB D-SPEC.

# 7.5.2 Software implementation

Since the implementation is spread across mutiple experimental programs, we cannot give a complete description here. Instead, we point to the required programs and their purpose, so that the process can be repeated in the future.

#### 7.5.2.1 Conversion to 32-dimensional beam spectrograms

The program 'collect\_calls.py' is a multi-purpose Python program. By running it with the task 'circle', it reads in a specified 4-channel recording (the same files that the GUI reads) for a specified time range. It then calculates the spectrogram on all microphones, then sums the mic-specific spectrograms to get a 'sum beam'. The 'sum beam' is then converted into an artificial set of 4-channel spectrograms with the signal arriving from a specified direction. From the artificial 4-channel recording, the 32 beams are extracted and saved to a file along with the ground truth, which is the desired RGB color values that correspond to the requested direction (an artificial D-SPEC). This is repeated for 120 directions in 3-degree increments. There are in the end 120 files, each for a different direction. This forms a data set called 'circle32' which is made available to the neural network toolkit (PBN Toolkit). As mentioned, any arbitrary data set can be used to produce the 'circle32' training data. But, it is a good idea to use a data set with a bird call having a loud wide-band vocalization, so that all frequency bins are excited. We used the file 'BRITZ02\_20210331\_121000\_ d2.wav' in the time range 1–4 seconds.

#### 7.5.2.2 Training the neural network

The PBN Toolkit (http://class-specific.com/pbntk) is used to train the neural network using the network model 'cspec32'. The data set 'circle32', which is in streaming format, is read in by the toolkit, and the network is trained to re-produce the ground-truth RGB color. Once this is trained, the parameters of the network are saved to files 'cspec32\_circle32\_lyrX.mat', where 'X' runs from 1 to 5.

#### 7.5.2.3 Using the neural network to produce CSPEC

The program 'collect\_calls.py' is used to complete the process. Using the 'bfm' function, 32-beam responses are created for arbitrary data sets, and the result is converted to D-SPEC using the trained network parameters.

# **References**

Baggenstoss PM, Frommolt KH, Jahn O, Kurth F (2021) Separation of Bird Calls and DOA estimation using a 4-Microphone Array. Paper presented at the 2021 29th European Signal Processing Conference (EUSIPCO), 23–27 Aug. 2021, 166–170. https://doi. org/10.23919/EUSIPCO54536.2021.9616173

# **8 Depth-aware Visual Monitoring**

Timm Haucke, Volker Steinhage

# 8.1 State of the art

The visual monitoring of animals using automatically triggered cameras (so-called *camera traps*) has a long history, even dating back to the late nineteenth century (Kucera and Barrett 2011). Today, mass-produced digital camera traps are a widespread, versatile, and indispensable tool for ecologists (O'Connell, Nichols, and Karanth 2011). Until recently, such camera traps have exclusively been monocular. Monocular cameras capture only a two-dimensional image from a single viewpoint. There are a variety of approaches for this that deal with the detection and tracking of the animals (Schindler and Steinhage 2022), as well as behavioral research (Schindler and Steinhage 2021). However, if we instead reason about the three-dimensional structure of captured scenes, tasks such as distinguishing (camouflaged) animals from the background, localizing animals more accurately, or incorporating camera-animal distances for modeling absolute abundance become much easier (Haucke et al. 2022). There exist a wide variety of techniques to capture this three-dimensional structure. However, most techniques, such as structured light, time-of-flight, or interferometry require an active illumination of the scene by the sensor. In natural environments, such active illumination has two significant drawbacks. First, the illuminator has to be powered both during day- and nighttime for the sensor to be able to infer depth information. Second, during the day, the illuminator must compete with the natural sunlight and be powerful enough to produce a sufficient signal for the sensor. Passive stereo cameras, on the other hand, have no need for an active illumination and infer depth information by simply comparing the distance of features in the images produced by two slightly displaced cameras. This process is called *stereo matching*. There exist a wide array of commercially available passive stereo cameras. However, most fail to satisfy important requirements imposed by the wildlife monitoring context, for example by only being optimized for short distances, or by being unable to image in the infrared (important for nighttime observations). A number of recent works have therefore introduced stereo cameras more suited for wildlife monitoring applications. The stereo camera of Xu et al. (2019, 2020) is built around an FPGA (field-programmable gate array) which controls two vertically mounted CMOS (complementary metal–oxide–semiconductor) sensors. A pyroelectric infrared sensor is connected to a micro-controller, which is in turn responsible for powering on the FPGA once motion is detected. The performance of this system is evaluated with respect to absolute size estimation of artificial and human targets. A different approach by Haucke and Steinhage (2021) uses an Intel RealSense D435 stereo camera, which computes stereo correspondence on the camera itself. However, due to the small baseline distance of roughly 5 cm, accuracy is limited at high distances. These earlier devices (Xu et al. 2019, 2020; Haucke and Steinhage 2021) are powered using mains electricity and are not optimized for being energy efficient and powered by battery.

# 8.2 Hardware

In contrast to prior works, we designed a novel **S**tere**O C**ame**RA T**rap for monitoring of biodiv**E**r**S**ity (SOCRATES), which is optimized for:


In the following sections, we first describe our final implementation and how we addressed the requirements above. We then discuss the challenges and limitations of our approach.

# 8.2.1 Design and specification

We first address the stereo camera design (**cameras and Baseline**, design goals 1 and 3). The raw data produced by the cameras is processed and stored by the **control** unit (design goals 2 and 3). Weather-resistance (design goal 4) is provided by the **case**. Infrared **motion Detection** and **illumination** facilitate energy efficiency (design goal 2) and operability at night time (design goal 1a). We additionally describe in detail the **power supply**, how we obtain animal-camera distances using **stereo correspondence** and how the captured data may be transferred using different **connectivity** options. Table 1 summarizes the bill of materials.

Cameras and Baseline: A pair of Raspberry Pi High Quality Cameras were chosen for their cost-effectiveness and the high sensitivity of their Sony IMX477 sensor (Sony Semiconductor Solutions Corporation, no date). Interchangeable lenses allow adaptation to specific scenarios (i. e. shorter focal lengths for closeup scenes, higher focal lengths for more distant objects). Removal of the infrared filter allows sufficient exposure at night using artificial infrared illumination. The cameras have an additional Bayer filter above the sensor, which is usually responsible for filtering different wavelengths to create a color image. We leave this filter intact to not risk damaging the sensor itself. As near-infrared illumination (either from the environment or the illuminator) illuminates all color bands, we do not try to recover any color information and instead average all bands to obtain a grayscale image. The cameras are mounted on a long U-shaped aluminum rail with holes drilled at regular intervals to allow configuration of different baseline distances between both cameras. Both cameras are connected through long ribbon cables to the two MIPI CSI-2 interfaces of an NVIDIA Jetson Nano Developer Kit. Both design aspects, i. e., the interchangeable high quality lenses as well as the configurable baseline construction, allow for adaptation to specific scenarios, namely free fields, feeding places, animal crosses, green bridges, etc., where animals are observable at different distances.

**Control:** We use an NVIDIA Jetson Nano Developer Kit as the central control and storage unit. It is responsible for taking motion detection signals from the PIR sensor, turning on the power to the IR illuminator, capturing, encoding, and archiving image material from the cameras. We decided on the Jetson Nano for the following reasons: (1) compared to most single-board computers, it provides two MIPI CSI-2 interfaces for the two cameras, (2) it provides a powerful GPU that can be used for encoding video efficiently, and (3) it supports a power-efficient hibernation mode. The Jetson Nano uses a 128GB microSDXC card for persistent storage.

**Case:** To make SOCRATES as weather-resistant as possible, most components are placed inside a single weather-proof case. The case is made of 0.8 cm thick birch plywood and is 80 cm wide, 11.6 cm high and 20 cm deep. We decided for a very wide case to be able to adapt the baseline of the stereo camera to different configurations. The front of the case is shielded by a piece of acrylic glass. In the bottom, we add a 4 cm wide, circular hole for ventilation, which is covered from the inside with an insect screen. The battery is mounted via Velcro strip onto a hatch in the bottom of the case, to allow quick replacement. We add two further holes for the wiring of the IR illuminator and motion detector, respectively, both of which are sealed using silicone. The top of the case is sealed using a silicone strip and secured by screws, which can be loosened to take it off for maintenance. All exposed wooden parts are further treated with marine varnish for weather resistance.

**Motion Detection:** Like most camera traps, we use a **p**yroelectric **i**nfra**r**ed (PIR) sensor for detecting motion and thereby triggering capture. We choose an HC-SR501 PIR sensor due to its compatibility with the 3.3 V GPIO pins of the Jetson Nano. We initially mounted the PIR sensor inside the case, just behind the acrylic glass. However, we found that this severely impaired the ability of the sensor to detect any kind of motion outside the case. This is because acrylic glass is opaque around wavelengths of 10 μm (Altuglas International 2000), which corresponds to the body temperatures of most animals. Therefore, we mount the PIR sensor in a separate 3D printed weatherproof casing below the main case.

**Illumination:** We employ a simple 12 W, 850 nm infrared illuminator to ensure properly exposed images at night without disturbing most animals. The illuminator has a weatherproof case and is mounted on the bottom of the main case. The 12 V power supply is switched by a Jetson Nano GPIO pin using an IRLZ44NPBF MOSFET.

**Power supply:** All components are powered by a lithium ion polymer battery, which has a high power density. We employ a battery with a theoretical capacity of 236.8 Wh (1600 mAh at 14.8 V). A generic 4S balancer circuit board provides over-discharge protection. The variable voltage of the battery is then regulated to 5 V for the Jetson Nano and 12 V for the infrared illuminator by Mean Well SCW20A-05 and SCW12A-12 converters, respectively.

**Connectivity:** SOCRATES may transmit the recorded data via three different means: wired ethernet cable, wireless LAN (Edimax EW-7811UN) or cellular connection (Huawei E3372H). If no basestation is available, we use the cellular connection to manually download the captured data. Otherwise, we connect via wireless LAN and the CoAP protocol (Bormann, Castellani, and Shelby 2012) to the *AMMOD Basestation* (Wägele et al. 2022; Sixdenier et al. 2022), which in turn uploads the captured data to the *AMMOD Portal* (cf. section 8.4).


**Table 1.** Bill of materials for SOCRATES, excluding the case.

# 8.2.2 Challenges and limitations

A significant drawback of the custom SOCRATES hardware is the difficulty of producing a high volume of units, which is primarily due to the manually assembled case. Compared to commercially available monocular camera traps, mean power consumption is relatively high at around 1W, which necessitates relatively short battery exchange intervals of around 8 days. Finally, if possible, the wavelength of the infrared illuminator should be increased to around the widespread 940 nm. We did not notice any adverse reaction of observed individuals to the current 850 nm illumination, however, it might scare some other species with wider spectral sensitivity.

### 8.3 Software

The software of SOCRATES is freely available (github.com/timmh/socrates) and is divided into three parts. The control software is responsible for controlling the camera hardware and runs directly on the device (cf. section 8.2.1). The stereo processing and animal localization run on a dedicated GPU-accelerated server (cf. section 8.4). The following sections describe the role and implementation of each software part in detail.

# 8.3.1 Control

The control software of SOCRATES is responsible for reacting to signals from the PIR motion detector (cf. section 8.2.1), triggering capture, and communicating with the AMMOD basestation (cf. section 8.4). It is implemented in Python and runs on Linux via the NVIDIA JetPack SDK. We furthermore adjust the device tree to allow the PIR to trigger a wake up signal via the GPIO16 pin. This allows the control software to put the SOC into the power-efficient SC7 mode during periods of inactivity and resume once motion is detected. Once motion is detected, video material is encoded on the Jetson Nano's GPU by synchronizing the image streams from the left and right cameras, concatenating them horizontally, and compressing the resulting video of resolution 2 × 1920 × 1080 using the HEVC video codec (Sullivan et al. 2012). This results in a bitrate of roughly 6.7 Mbit/s.

# 8.3.2 Stereo processing

The central goal of SOCRATES is to infer depth information through stereo vision. In the natural world, as well as in computer vision, this is achieved by solving the stereo correspondence problem. To solve the stereo correspondence problem efficiently, the left and right images must be rectified. To obtain an accurate rectification, the intrinsic (internal camera parameters) and extrinsic (rotation and translation between the cameras) parameters have to be obtained by a calibration procedure. For the calibration of the intrinsic parameters, a calibration object (e. g. checkerboard pattern printed on cardboard) has to be captured by the camera(s) to be able to associate 3D points in the scene with 2D points in the resulting image. To obtain the extrinsic parameters, eight or more correspondences between images of points in the projections of both cameras must be established (Longuet-Higgins 1981). We perform both intrinsic and extrinsic calibration using Kalibr (Maye et al. 2013) with a grid of AprilTags (Olson 2011) mounted on a wooden board as the calibration target. During the setup of SOCRATES, the calibration target is manually moved through the scene such that it covers as much as possible of each camera's field of view. After SOCRATES is assembled and calibrated, calibration does not have to be repeated when deployed to different locations, as the calibration is not dependend not on a specific location but only on the camera configuration. Given the intrinsic and extrinsic parameters, we rectify the images of both cameras and compute the disparity of each pixel using using Li et al. (2022).

# 8.3.3 Animal localization

To detect and localize animals in the images produced by SOCRATES, we use the mmdetection (Chen et al. 2019) implementation of the Cascade Mask R-CNN (Cai and Vasconcelos 2018) instance segmentation model. To take advantage of the additional depth information obtained via stereo correspondence (cf. section 8.3.2), we fine-tune an Omnivore-L (Girdhar et al. 2022) backbone, which is pre-trained on both color and depth images. Compared to a color-only baseline, incorporation of the depth information yields a substantial improvement in detection accuracy (e. g. by in bounding box; Haucke et al. 2022). The resulting bounding boxes and instance masks may then be combined with the depth images (cf. section 8.5) to sample per-animal distance measurements which can be used for estimating abundance of unmarked animal populations (Haucke et al. 2022).

# 8.4 Workflows

A central goal of the AMMOD project is to automatically collect all observed data in a central repository (the *AMMOD Portal*, https://data.ammod.de), which will eventually be accessible to biologists and the general public. For SOCRATES, we ensure this by uploading the captured raw data via the CoAP protocol (Bormann et al. 2012) to the *AMMOD Basestation* (Wägele et al. 2022; Sixdenier et al. 2022). The AMMOD Basestation takes the role of scheduling and prioritizing data transfer from different sensors according to the energy available from energy harvesting. Once the raw data is uploaded to the AMMOD Portal, a server runs the instance segmentation and distance estimation workflows. To increase throughput and energy efficiency, the server is equipped with an NVIDIA GPU to accelerate neural network inference. Both methods are packaged as *Docker* images to simplify dependency management and updates. The resulting instance masks and distances are then again uploaded to the AMMOD Portal and are available for further analysis by ecologists. The entire data flow is fully automated and visualized in Figure 1.

**Figure 1.** Fully automatic flow of data from SOCRATES over the basestation to the AMMOD Portal and the expert end users. A GPU server runs the instance segmentation and distance estimation steps, and uploads the results back to the AMMOD portal.

# 8.5 Data quality

Figure 2 shows some exemplary pairs of near infrared images and corresponding depth maps obtained using SOCRATES and the stereo correspondence

**Figure 2.** Samples of the data collected. The photographs on the left show the grayscale image of the left camera, the right image the colorcoded depth map obtained using stereo correspondence.

approach described by Li et al. (2022). As can be seen, the depth maps generally represent the scene well and clearly highlight the boundaries of the deer. To evaluate the depth maps quantitatively, we employ the temporal quality metric proposed in Vandewalle and Varekamp (2014), which is defined as:

$$E\_t = \frac{1}{\left(N\_T - 1\right)N\_P} \sum\_{n=2}^{N\_T} \sum\_{(\mathbf{x}, \mathbf{y})} \left| D(\mathbf{x}, \mathbf{y}, n) - D\left(\mathbf{x} - m\_x, \mathbf{y} - m\_y, n - 1\right) \right| $$

where *NT* is equal to the number of frames in the input video, *NP* is the number of pixels in a single frame, *D*(*x,y,n*) is the scalar disparity at some pixel (*x,y*) at time *n*, and *mx* , *my* is the optical flow from frame *n* to frame *n* − 1, calculated using Farnebäck 2003. Using Li et al. 2022, we obtain *ET* = 0.4439, which is on-par with the temporal error of the ground truth disparity in Vandewalle and Varekamp (2014).

As can be seen, the temporal error is low for the vast majority of observations. Like with regular camera traps, at night time, some regions in the field of view might be insufficiently lit and therefore underexposed in the resulting images. In these regions, insufficient image information is available to perform successful stereo correspondence, leading to the outliers with poor temporal error apparent in Figure 3. One such outlier case is shown in Figure 4. Still, the depth of the well-lit area is correctly inferred.

**Figure 3.** Histogram plot of the mean per-observation temporal stereo correspondence error.

**Figure 4.** Stereo matching inevitably fails in underexposed regions.

# 8.6 Outlook

Due to the manual assembly, SOCRATES is hard to produce in large volumes. We plan to address this by working with commercial manufacturers and modifying existing camera traps models to include an optional wired trigger synchronization cable. By synchronizing the triggers of two or more camera traps in this way, we would be able to build flexible stereo setups with commercially available hardware and their superior power efficiency. The SOCRATES stereo calibration and correspondence procedures could be adapted to support this new hardware with minimal effort.

# **References**

Altuglas International (2000) Plexiglas – optical & transmission characteristics. https://minplastics.com/wp-content/uploads/2018/06/Acrylic-Plexiglas-Optical-Transmission.pdf [Accessed on: 2023-08-04]


# **9 The Base Station**

Lukas Reinhold, Pierre-Louis Sixdenier, Krzysztof Piotrowski

# 9.1 Introduction

An AMMOD measuring station consists of the base station and a set of sensors that capture and deliver the measurements to the base station. The measurements are then preprocessed at the base station and can be forwarded to the cloud storage. The architecture of an AMMOD measurement station and the interactions between its components is depicted in Figure 1.

As shown in the figure, the sensors are connected to the base station communication and processing block either via wired or wireless connection. In the former case, the sensor can be also provided with energy from the base station internal power supply. In the latter, the sensor has to be also equipped with external power supply.

The base station computer is currently deployed as a Raspberry Pi 4 single-board Computer running a Debian Linux operating system. Communication with the AMMOD cloud is done via an LTE modem that is connected to the Raspberry Pi via an Ethernet cable and, forwards collected data from sensors to the Internet.

# 9.2 Power supply

The power supply module developed for the AMMOD project in the first phase consists of three elements (see Figure 2). The main component of the power supply is performing the tasks related to energy harvesting and to the management of energy. It provides the energy to the consuming part (i. e., the communication and processing components as well as all sensors using this internal power supply) and stores the surplus energy in an energy storage (i. e., battery). The exact implementation of these two components of the power supply depends on the energy requirements of the specific energy consuming part - it has to be properly dimensioned. Proper scaling of the energy harvesting component

**Figure 1.** An AMMOD base station with external sensors, connected via wired (solid lines) or via wireless (dashed lines) connections.

and the size of the energy storage is crucial for reliable work of the system and improve system availability despite periods without enough sun or wind. In the current implementation of the AMMOD power supply module the energy harvesting mainly focuses on PV, but we also did experiments and measurements with wind turbines.

The power supply functions as a sensor as well, so that the parameters of the power supply (i. e. drawn currents and telemetry data of the power supply) are monitored and provided to the base station for further processing and maintenance. These measurements can be used to monitor the state of the power supply, such as state of the battery, charge state, power consumption and detect potential outages, but it may also be used to control other sensors and the base station, in order to adapt to the available energy. The power supply sensors are connected to the base station via low power, long range communication means that can also be used to connect external sensors that do not require high data throughput. Based on the developed module there were research works done towards predicting the available energy and the time remaining for the given consumer to run on it (Turchan and Piotrowski 2022).

**Figure 2.** The architecture of the AMMOD power supply.

The currently deployed power supply modules are defined to support energy to constant load of up to 10 W. The energy storage options used range between 500 Wh and 2000 Wh, while the PV power is rated at 200 W peak nominal.

# 9.3 Software

The main piece of software running on the board is written in C++. The code is currently hosted on a private repository on a Gitlab instance. The code can be moved to another repository once the current phase of AMMOD is finished.

The application goal is to receive files from the sensors and to upload them to the cloud. Due to the fluctuating power availability, the software can buffer files on a disk upon their transmission to the cloud. An internal component of the program can decide whether or not to transmit data to the cloud, compress it or not, etc. by applying a policy. A way to find an optimal policy (along with hardware components dimensions) for a given deployment site is given in Section 9.3.5. Figure 3 gives an overview of the software modules of the base station and their relations.

**Figure 3.** An overview of the software modules of the base station.

# 9.3.1 Software architecture

The software context is divided into modules, each running on its own thread, which can belong to three classes:


# 9.3.2 Communication with sensors

As of the current time of writing, all sensors are connected to an Ethernet network (except for the sensing functions in the power supply, as already mentioned and the weather station, which is directly connected via USB). To unify their interfacing, it has been decided to write a library that handles the transmission to the base station, this library is referred to throughout this chapter as *SensorAPI*. The library uses the Constrained Application Protocol (CoAP), which is more lightweight, but similar, to HTTP, to transmit the given files to the base station.

The *resource discovery* feature is used for the automated discovery of the sensors, which means they can be added and removed from a site without touching the configuration file. The *observation* feature allows for sensors to trigger a GET request from the base station when a new file is produced. This is more energy-efficient than active waiting, and its embedding in the library is more convenient for a developer integrating a sensor.

The library is written in C++ and a Python binding is provided. It is available on a publicly accessible Gitlab repository1, alongside more in-depth documentation.

# 9.3.3 Configuration file

The base station software is configured via a JSON file that follows a predefined schema. The configuration has three main properties:


<sup>1</sup> https://codesignp211.informatik.uni-erlangen.de:450/ammod/ammod-sensor-coap-api/

An exhaustive description of the configuration file is available in the git repository.

# 9.3.4 Maintenance flow

To access the base station from the Internet for maintenance purposes, we deployed a VPN running on a server accessible from the outside. For security reasons, it is possible to open an SSH connection to a base station only when inside this VPN.

The VPN server is currently hosted on a computer on the FAU campus. In the production phase, it could be hosted alongside the other AMMOD services.

# 9.3.5 Design Space Exploration

From a research-oriented perspective, we tackled the problem of designing a base station (e. g. finding the right battery size, the right PV module, and an optimal energy management policy) by leveraging Design Space Exploration techniques. An overview of this process is shown in Figure 4.

An analytical parametric model of a typical AMMOD deployment site (a base station and its sensors) was designed and run inside a custom simulator.

**Figure 4.** An overview of SIDAM.

The parameters of this model are divided into a site-specific part (e. g. number of sensors, their type, their power consumption, etc.) and a base station part (e. g. the battery size, the size and number of hard drives, etc.) At the end of a simulation, the simulator outputs some metrics of the system: uptime of the base station, total amount of data transmitted, etc. These metrics, alongside the financial cost of the components, can be used as objectives for a Multi-Objective Evolutionary Algorithm (MOEA), which optimises the base station parameters.

Our results show that it can be financially interesting to have different base stations that have a different cost range to be deployed in different locations. For example, a base station deployed in a sunny region of Germany requires cheaper components than one deployed in a rainy region. We also use our tool to evaluate different energy management policies and to optimise the parameters of such policies. This methodology, which is named SIDAM, and its results were published in Sixdenier et al. (2022).

## 9.4 Communication links

For the communication links two areas with different requirements will be distinguished. The local link to the sensors and the internet link to the cloud. Both fields have different requirements which need to be addressed. All the communication links discussed can be found in Figure 5. In this section, the needed hardware is discussed as well as the configuration of said hardware. The software and protocols which were developed to communicate over these links were already discussed in Section 9.3.

#### 9.4.1 Internet connectivity

All the data collected by the sensors and send to the base station is preprocessed before it is forwarded on to the AMMOD cloud. Erroneous data and false positives are discarded and data is compressed. Depending on the configuration of the base station and the activity of the sensors still many gigabytes of useful data may be generated each day. To get any generated files to the AMMOD cloud where it can be stored for the long-term, fully analysed and presented to the public.

**Figure 5.** An overview of the communication links of the base station. Dashed lines signify wireless links.

To transfer this amount of data different technologies can be utilised. Wired solutions like Ethernet or DSL are the most efficient approach but usually not available as the AMMOD stations need to be placed where animals roam freely, not where the man-made infrastructure is best.

The wireless technology that provides the best compromise between data rates, power consumption, availability of parts and long range network coverage within Germany is 4G/LTE. Any Internet of Things (IoT) focused long range technology like LoRa usually does not allow for a high enough data throughput. Satellite communication links are an interesting option to look at for very remote locations where 4G/LTE coverage is insufficient. Those however usually require more expensive hardware, a more complex setup and have a higher power consumption. For the default AMMOD station these are too expensive.

Another alternative is the implementation of a directional radio link between the AMMOD base station and either an 4G base station or a 4G client specifically set up for the usage in AMMOD in a location with better cell reception, possibly via multiple hops. This is always a more expensive setup than just implementing a 4G modem and antenna in the AMMOD base station itself but can be very valuable for the setup in for example a remote valley. The directional link itself can be more efficient in transmitting the sending power to to next communication partner, but that is only an advantage if that device does not need to supply a 4G modem with battery power as well.

The huge advantage 4G/LTE has over most any competitive technology at the moment of writing is the availability of systems. A 4G modem can be bought relatively cheaply and be effectively enhanced with a directional antenna on a mast a few meters high. Such an additional antenna allows for a very efficient transmission of signals and therefore the required sending power can be greatly reduced. The mast assures, that the antenna is as close as possible to having a direct line of sight to the nearest 4G base station. Such a static setup prevents any mobility of the system but this is in line with the requirements for the stationary AMMOD base station. A precise orientation of the antenna needs only to be done once when the station is assembled first.

For the test sites that where set up within the AMMOD Project, a few different configurations where examined. The following is the configuration of the ones used at the current test sites. Different hardware can be used in a similar manner to achieve comparable results.

In this example implementation a RUT240 Industrial Cellular Router by Teltonika Networks is used for the 4G/LTE radio link. It is specifically designed for Machine to Machine Communication, provides good interfaces which can be accessed and modified easily externally and a low power consumption. The modem accepts a comparatively wide range of DC voltages which allows it to be used with either a 12 V or a 24 V supply voltage from the base stations power lines. The two external antennas for MIMO-4G communication were exchanged for a WB 23 antenna by Wittenberg, which is shown in Figure 7. This allows for a very directional and efficient radio link. Combined with a 2 m mast the test sites can be provided with an more than adequate internet connection.

For the test site in Bonn, a mast-setup was used that has minimal impact on the setup area and is thus suitible to be set up in natural reserves. Four ground anchors are used in conjunction with one steel wire each to keep the mast upright. This setup is depicted in Figure 6.

**Figure 6.** The setup of a minimal-impact antenna mast.

### 9.4.2 Sensor links

The sensor links allowing for data transfer between the sensors and the base station need to be very flexible as they are required to work for many different scenarios. The positioning of the sensors is determined through its function. Visual sensors need to have a clear view onto places of interest, for example an animal trail or a den. Audio sensors work best if they are shielded from unwanted noise, and insect traps will be placed where insects are commonly found. Sensors may be placed directly at the base station or 100 m away. There might also be a free line of sight or many obstacles between base station and sensor. Therefore the path between the sensors and the base station will usually be

suboptimal for the transfer of data. To provide these data links, we investigated different technologies.

As with the internet link, a wired connection will always provide the best reliability in data transfer and the highest power efficiency compared to wireless solutions. This is always the preferred solution for sensors close or directly attached to the base station. An Ethernet connection is usually the most commonly available option and can be integrated into many systems easily. While it also allows for power transfer via Power over Ethernet, it does impose a certain amount of overhead and is not really adequate for minimal systems. A lower level wired link that is commonly used, especially for industrial applications, would be a serial interface like RS-232 of RS-485 over which protocols like Modbus can be transferred. This is a simpler and more efficient approach that does not require full internet capabilities from the sensor but is limited in its bandwidth when compared to the Ethernet standard. This is the preferred option when connecting a close sensor up top 10 m away from the base station, which does not generate a large data volume. When connecting a sensor that generates high data volumes of 1 GB per day or more, like video sensors, to the base station, a wired connection is usually the only option as any wireless solution with a high enough data rate consumes also a very high amount of data. In these cases, Ethernet will usually be chosen as such sensors are also fairly powerful in terms of computational capabilities and Ethernet does not provide much of a burden. Also any sensors that draw power from the base station need a wired power connection. Along the same routing, a data link cable can also be placed which makes a wired solution realisable with very little effort.

The most powerful wireless technology in terms of data rate, flexibility, and compatibility is WiFi. It does, however, also consume more power if compared to other technologies, which are more focused on an application within the Internet of Things (IoT). It also has a range limit of 100 m, which may not be enough for some sensors. The same problem of range limitation has Bluetooth which builds upon a lot of WiFi technologies and does not provide any real benefit for the AMMOD scenario. ZigBee also focuses on short range communication usually below 100 m. It is capable of communication over longer distances in good conditions and might be a good fit for specific AMMOD scenarios but is limited to low data rates and is not tested further in this AMMOD phase as there are alternatives that offer a better feature set. WiFi is the preferred option for sensors that generate considerable amounts of data between about 100 MB and 1 GB per day and need to be placed too far away from the base station to actually be connected with a cable.

Some sensors are spread out over a large area to collect data optimally. For example insect traps may be set up across a large field or around a wooded area. In a constellation like this, distances between 100 m and 1 km are common. As these traps actually collect biological samples and do not generate large amounts of data, a long-range, low throughput wireless link is ideal to realise an efficient link. An approach with multipath-propagation might also be interesting to research but is not within the scope of this project. LoRa is a long range protocol useful for Internet of Things (IoT) application which is suitable for the aforementioned scenario. It allows for communication over up to 10 km at data rates of up to 50 kbit per second. It is also widely available. The LoRa-WAN network is a large area network which covers wide parts of Germany. It provides an alternative access method to the internet for base stations or standalone sensors which produce only limited amounts of data. It covers mostly areas which are also covered but LTE but could be expanded upon for specific AMMOD-scenarios and is considerably cheaper to access than a contract with an LTE service provider. LoRa is the solution of choice for the AMMOD project to connect any sensor that is placed far away from the base station above 50 m.

For the LoRa connectivity of the sensors, different approaches are possible. If the sensor utilises a Raspberry Pi or an Arduino compatible system, there are extension boards available that integrate a LoRa modem on a low level. There are also USB solutions which can be utilised for the base station connectivity and can be an option for the sensors as well, if other methods of integration are not viable.

#### 9.4.3 Weather and time data

To analyse the large amount of data collected by all AMMOD systems and correlate this data correctly, a precise time synchronisation over all systems is necessary. That way all deployed sensors as well as all AMMOD base stations assign the correct time stamp to each measurement. The external timer used within the AMMOD project is derived from the Global Positioning System (GPS). LTE/GSM could also be used as a reference. Their coverage is however not as universal in very remote areas. A GPS receiver can also be integrated

**Figure 7.** An view on the used LTE antenna (white) and the weather station on its extension arm.

into every wireless sensor to achieve time synchronisation independent of the AMMOD base station. Sensors directly connected to the base station with a wire can have their clocks updated from the base station through NTP and do not need to invest in their own GPS receiver. The GPS modem also provides location data which can be used to map the layout of an AMMOD site automatically and possibly be useful in the recovery of lost or stolen equipment.

The time synchronisation can be used to connect two or more local measurements. For example if an animal is detected by a video camera and a microphone, movements and sounds can be correlated if they are registered at the same moment. Also data from multiple base station can combined. For example reactions to a lightning strike may be observed at multiple stations within a certain range. Also this synchronisation remediates clock drift. For example, events like a sunset can be reliably targeted with scheduled measurement runs.

Just like the time of day, or more practically the position of the sun, weather data is very useful to give context to animal and plant behaviour. Rain and wind for example can greatly influence the flight of insects and birds. Certain temperatures or values of air humidity may trigger the expulsion of pollen which can be detected. Also certain weather conditions may prevent reliable measurements by some types of sensors. Heavy movement of leaves through strong winds may falsely trigger motion sensors for video sensors. The noise of heavy rainfall may drown out any animal sounds detectable by audio sensors. Thus deactivating these sensors in adverse weather can save a lot of power and reduce the amount of unusable data collected.

Data drawn from weather services may also be an option for a minimal AMMOD station were it is not feasible to invest into a local weather station. A dedicated local solution is however preferable as it provides a far better spacial and chronological resolution. It should be emphasised that only a local station can actually measure local effects such as the wind direction at a forest boundary or the air temperature and humidity on a river bank.

A good weather station measures all the necessary weather data as well as GPS data needed for the operation of an AMMOD station. For the test sites of the AMMOD project, the Weather Station Compact WSC11 from Thies Clima was used. It is a compact system that was designed for local weather monitoring. It measures a wide range of values like temperature, humidity, air pressure, wind speed and direction, precipitation, and illumination. A GPS receiver is also integrated. The precision of the data provided and the measurement setup do not allow for a usage in a meteorological application but is more than enough for the kind of context data required.

The WSC11 can be mounted to the same mast as the antennas of the base station. Ideally the weather station is mounted to the very top as any components above it may obstruct its sensors for sun and rain detection. Many adaptors and extensions are available from the manufacturer to facilitate this. The WSC11 can give analogue outputs which can be digitised externally. It can also be controlled and read out via Modbus, which was implemented for the AMMOD project. The WSC11 was bought with the optional, ready-for-use 5-core cable through which the Modbus and power lines run. These can be split in the base station housing and connected to the base station's power supply and a RS485-to-USB adaptor. The Modbus calls for the base station where implemented as a dedicated library. This way a modular integration into the base station as described in Section 9.3.2 was realised.

## 9.5 Conclusions

As shown above, the power management, the communication links and the software of the base station could be implemented so that the AMMOD test sites can be operated to specification. For optimal operation, power efficiency can be increased further by integrating components and evaluating collected data about power consumption. Alternative communication technologies can be tested to improve the adaptability of the setup for different scenarios. The operation requires further field tests to standardise the connection links throughout the AMMOD platform. To realise a feature-rich and efficient computational unit as the core of the base station, an FPGA-board would be ideal. It can be programmed in software and hardware, and thus, would enable power-efficient hardware acceleration for sensor data processing, compression, etc., at the site. However, these boards require a waterproof housing that still allows adequate cooling. Availability of such boards were poor for the duration of the project and could therefore not be investigated further.

# **References**


# **10 Data management: connecting the AMMOD base station to the AMMOD data portal**

Domenico Velotto, Ivaylo Kostadinov, Deniss Marinuks, Frank Oliver Glöckner

#### **Abstract**

This chapter provides all the necessary steps to connect the AMMOD base station to the AMMOD data portal. The AMMOD data portal is the web service developed in the first phase of the AMMOD project to meet the project's data management requirements. The AMMOD data portal consists of: (1) a backend for storing and managing the data, (2) a web-based user interface and (3) an Representative State Transfer (REST) Application Programming Interface (API) for programmatic upload, search and download of data. The data is either raw, processed or telemetry. The website provides additional support for data and sensor management tasks like: visualizing the status of the network of deployed sensors and base stations, a dashboard for displaying and plotting the telemetry data, a metadata schema validation tool and access to documentation. The backend and frontend of the AMMOD data portal are containerized and deployed in a cloud environment. The system has a dependency to the third-party web service, https://sensor.awi.de, which is highlighted in the requirements section.

# 10.1 Introduction

Since AMMOD stations are designed for modularity and flexibility of both sensors and data processing, a large amount of heterogeneous data is produced. The data management workflow put in place is designed to accommodate the needs of the AMMOD station and sensors developed in the pilot phase. Automated transfer, preservation, lineage and accessibility of the data produced are ensured provided that the requirements explained in Section 10.2.1 are met.

The AMMOD data portal is the cloud-based web service developed as part of the automatization of the AMMOD data management and provides:


Currently, the AMMOD data portal consists of two separate systems, a production system and a staging system. Both run the same code base and offer the same functionality with some significant differences, described below.

	- Website https://data.ammod.de.
	- API documentation https://data.ammod.de/api/index.html and base URL: data.ammod.de/api/v1.
	- Dedicated to managing the data collected by the deployed base stations and sensors.
	- Large cloud storage space with database backup in two distinct locations.
	- Base stations and sensors-related information are managed via the AWI web service registry accessible at https://sensor.awi.de.
	- Website https://ammod.gfbio.dev.
	- API documentation https://ammod.gfbio.dev/api/index.html and base URL: ammod.gfbio.dev/api/v1.
	- Dedicated to software development, features and data flow integration testing.
	- Small cloud storage space with no backup. In addition, due to space constraints, development tasks, etc. data might be deleted at any time.
	- Mocks of base station and sensor are pre-registered via the staging instance of the AWI web service registry accessible at https://sandbox. sensor.awi.de.

The system is deployed in a cloud environment using docker-compose, allowing it to work with multi-container applications. Backend and frontend both have their dockerfiles, so that a docker image can be created and used anywhere. All required environmental variables are included in the repository's readme file.

In the remainder of the chapter, the essential steps needed to use the system as-is are listed. It is highly recommended to comply with these requirements and connect future AMMOD stations to the existing systems.

# 10.2 Material and methods

An AMMOD station acts as a local hub for the network of sensors deployed nearby it. It can be considered a middleware between the sensors and the cloud solution (see Figure 1).

Base station and sensor produced (meta-)data and telemetry files are stored first at the base station before being automatically transferred to the AMMOD data portal. When and how often data is automatically uploaded to the AMMOD data portal is managed by the base station.

Base stations need to be registered in the sensor registry and assigned a valid access token (see Section 10.2.1 for more details) before they can upload data to the AMMOD data portal. Authorized base stations upload (meta-)data and telemetry to the dedicated end-points of the API. The API is providing back a message response which is used by the base station for housekeeping and error handling operations. All models of the responses are included in the API documentation.

Please note that for cases where the data is not physically suitable for being stored at the base station and need to be processed in a lab or facility first, it can be uploaded manually to the AMMOD data portal using the same API.

# 10.2.1 Requirements

The requirements for connecting base stations to the AMMOD data portal are grouped as follows:


**Figure 1.** Schematic representation of an AMMOD station with its hardware and software components.

#### 10.2.1.1 Base station and sensor requirements

Base stations and sensors need to be registered and assigned a unique identifier (deviceID) before the integration with the AMMOD data portal. This is done using the AWI sensor registry, (which instance of the sensor registry is used for which instance of the AMMOD data portal is detailed in the introduction to this chapter). Upon filling up the registration form with all mandatory information of the base station or sensor, the deviceID is assigned by the sensor registry manager. The AMMOD data portal is automatically pulling the base stations and sensors-related information from the registry to visualize all the deployed AMMOD devices in a map and tabular view. Only uploads with valid deviceID are accepted. Figure 2 clarifies this aspects.

It is highly recommended to nominate a sensor registry manager, who is responsible for the registration process and updating sensor's metadata. The Domenico Velotto, Ivaylo Kostadinov, Deniss Marinuks, Frank Oliver Glöckner

**Figure 2.** Registration, assignment and management of sensor-related information.

AMMOD base station operator/s is/are responsible for the sensor and data for one or more collection sites.

The sensor registry manager needs to create an account for the AWI sensor registry. This is currently done by sending the request via e-mail to o2a-support@awi.de. To get familiar with the functionalities of the service is recommended to visit the tutorial webpage of the AWI web service registry https://sensor.awi.de/?site=tutorial.

Please refer to Figure 3 and note the following for a correct integration of the base station during registration:


Please refer to Figure 3 and note the following for a correct integration of a sensor during registration:

• The field "Long Name" MUST start with the word "Ammod" (case insensitive).


**Figure 3.** Example of registered base station with assigned deviceID 7390.



The AMMOD data portal uses the API of the AWI web service registry to retrieve all information of the AMMOD collection. The collection AMMOD has a persistent ID named "collectionID" which is included in the environmental variables of the AMMOD data portal.

An AMMOD base station is considered a non-movable piece of hardware which keeps the same geographical location till it is dismissed. In contrast, AMMOD sensors can be:


#### 10.2.1.2 Data requirements

Metadata files are a mandatory requirement for any data file upload. Each REST API POST request to the AMMOD data portal metadata endpoint must have a minimum of 2 files, one of which must be the metadata file. Only the telemetry endpoint of the API accepts just one file which contains the base station or sensor telemetry record.

To ease the integration and development process, JSON schemas for the metadata and telemetry file are publicly accessible at the following repository: https://gitlab-pe.gwdg.de/gfbio/ammod-examples-schemas. The repository provides the full schema documentation with examples of data and their metadata. Figure 4 shows the mandatory key-value pair included in the schema of the metadata file (Figure 4A) and telemetry file (Figure 4B). Data providers can add any base station or sensor specific key-value pairs to enrich the description of the data in the metadata and telemetry file. Those optional key-value pairs are not checked against any schema and their validity and correctness is the responsibility of the data providers.

**10 Data management: connecting the AMMOD base station to the AMMOD data portal**

**Figure 4.** Example of JSON schema for metadata (**A**) and telemetry (**B**).

#### 10.2.1.3 User requirements

The API access of the AMMOD data portal is restricted to authenticated base stations and users. The authentication procedure is token-based.

Once the registration and assignment procedure of a new base station is complete (see Subsection 10.2.1.1, 'Base station and sensor requirements') the sensor registry manager will provide a token pair (access and refresh token) to the base station operator. As default, base stations get a token only with upload permission.

Users that want to consume the API for upload, search and download data, either via command line or using the web graphical interface, need to login in the AMMOD data portal and request the desired permissions. Users can login in the production and/or stage system using the GFBio Single-Sign-On-Service (SSO) service by entering the credentials of:


The portal will automatically detect for which service you don't have permissions to interact with the data (you should see a lock on the services where access is not granted). By clicking any of the locked services it pop-ups a message with the instructions to request the specific access. After a first login, the user is available in the website administration page and the admin can then set the required permissions. Please note that as the production and staging systems are two identical but independent systems, the login and permissions parameters are not synchronised among them.

# 10.3 Results

Figure 5 displays the section "Home" of the AMMOD data portal. The section provides the list of deployed AMMOD base stations in a practical map (Figure 5A) and table view format (Figure 5B). The service automatically fetches the metadata of the base stations that have been previously stored in the sensor registry and assigned to the collection AMMOD. In the legend are counted the active (blue pin) / inactive (red pin) base stations as well as the number of devices per sensor type. In the map view, each pin represents the geographical location of the AMMOD station. Users get a list of installed sensors and a link to their metadata by clicking the pin. In addition, a useful shortcut search button allows the users to retrieve all data collected by the selected base station. In the table view, users can filter the list of deployed sensors according to their specifications.

Figure 6 displays the section "Data and metadata" of the AMMOD data portal. The section provides access to the API UI for the Search (Figure 6A) and Upload (Figure 6B) services. As mentioned in the User requirements sub-section, to access the services the logged-in user permissions need to be granted by the website admin.

The first access to the Search will list all the data available in the database with a pagination of 20 results per page. The list of results can be

#### **10 Data management: connecting the AMMOD base station to the AMMOD data portal**

**Figure 5.** AMMOD data portal, Home section. Devices overview in map (**A**) and table (**B**) format. (URL: https://data.ammod.de, accessed 24 January 2023).

refined by providing user-defined values for the parameters in the filter page: time frame, location (base station based), data type (raw, processed) and sensors (Figure 6A). With the advanced filter, users can specify a specific key–value pair to search for, e. g. data provider-defined optional key–value pair in the metadata file. All entries matching the query are selected and the total size of the download is displayed. By clicking the download button, the service starts to pack the selected data in a zip archive. If the packaging time is in the order of 20 seconds, the user will be directly prompted with the zip data file, otherwise a notification email will inform the user when the zip file

is ready to download. Clicking the share icon will copy the filter setting and the string will be formatted as a shareable URL.

Users can upload data collected manually using the Upload tool (Figure 6B). The UI accepts single uploads (either one metadata file + data file or one telemetry file) or a folder containing multiple uploads, e. g. a series of data files with their metadata). In the latter case, files that do not belong to an upload will be automatically discarded. It is important to note that Data requirements apply for manual uploads using the UI as well.

**Figure 6.** AMMOD data portal - Data and metadata section. Search results from the applied filter settings (**A**) and upload data GUI interface for manual transfer of data to the Cloud (**B**). (URL: https://data.ammod.de, accessed 24 January 2023).

Figure 7 displays the section "Tool" of the AMMOD data portal. The API documentation is streamlined using the Swagger UI (Figure 7A). Each service lists the implemented HTTP verbs, contains a short description, the list of parameters and the dedicated response code. Users are encouraged to get familiar with the AMMOD services by testing the API using the interactive console accessible with the "Try it out" button available in each HTTP verb. The API Tokens is a practical UI to let AMMOD users create and manage additional token pairs according to their needs (Figure 7B). The additional token pairs created with the tool will reflect the current status of the user's permission, e. g. users with only "upload" permission cannot create token pairs that include the "search" permission.

**Figure 7.** AMMOD data portal – Tools section. API documentation in Swagger (**A**) and the user token management interface (**B**). (URL: https://data.ammod.de, accessed 24 January 2023).

# 10.4 Discussion

In this section some key factors related to the current limitations and possible extension for future usage of the AMMOD data portal are discussed.

# 10.4.1 Possible storage space limitations and the NFDI4Biodiversity core storage solution

The storage component of the AMMOD data portal has been sized according to the estimated volume of data being generated during the project pilot phase. Accordingly, the storage is set up to an initial amount of 5 TB. This is also backed up in two different locations making a total size of 15 TB. These limitations are given by the cloud service providers hosting the application. For the runtime of the AMMOD project, the data portal is deployed using the cloud computing services offered by the German Network for Bioinformatics Infrastructure (de.NBI) and by the storage in the GWDG. The current storage space is not a hard limit, and the cloud service providers offer the possibility to attach new volume to the running instance. Nevertheless, this is not an automated process and the used space needs to be monitored and managed accordingly.

As a long term solution, the data uploaded to the AMMOD portal will be transferred to the NFDI4Biodiversity core storage. Dedicated APIs to upload/download the data from the core storage are already prepared and tested. They can be deployed once a stable version of the NFDI4Biodiversity core storage is available.

# 10.4.2 Maintenance of the system and bug/feedback report after the end of the project

Although the system has been proofed and tested under operational circumstances, bugs and errors can show up unexpectedly anytime. To continue to operate the AMMOD data portal in a resource-saving way, the performance monitoring and error tracking application Sentry (https://sentry.io/) is integrated in the AMMOD data portal. Errors are automatically collected and sent to the Sentry instance, and the admin gets informed via email immediately when something is wrong. Moreover, users can report bugs/errors using the "contact us" feature on the AMMOD website and an issue is automatically created in the software development environment.

# 10.4.3 Scalability of the service and critical aspects

The system has been designed to be ready to scale in case the number of users and base stations increases. At the moment the workload in terms of requests for data uploads is not considered a critical point. Indeed the workload in terms of requests for data download could increase exponentially in case the data is public. Therefore, users have no limits on the amount of files they can download, but only a 1 GB data volume per request. Once the archive with the download request is generated and the link to download the data has been sent to the user, the link is alive for 24 h. Afterwards, the link expires and the archive file is deleted from the server. These limitations might change in the future when the AMMOD data portal will be integrated with the NFDI Core Storage.

### 10.4.4 Data archival and publication

The AMMOD Data Portal allows data to be shared mainly between project members and possibly authorized collaborators. Only the status and telemetry data of the stations is publicly available. The data portal is neither a long-term archive, nor a data publication platform. To make any of the collected scientific data public, it is recommended to deposit it on one or more of the long-term GFBio data centers (https://www.gfbio.org/data-centers). This can be done using the GFBio Data Submission System (https://submissions.gfbio.org).

# **Acknowledgments**

We thank our colleagues from the German Federation of Biological data (GFBio e.V.) and Alfred Wegener Institute (AWI) who provided insight and expertise on Research Data Management (RDM), metadata standardization and sensor management.