Monograph |
Corresponding author: Kat Bruce ( kat@naturemetrics.co.uk ) © 2021 Kat Bruce, Rosetta Blackman, Sarah J. Bourlat, Ann Micaela Hellström, Judith Bakker, Iliana Bista, Kristine Bohmann, Agnès Bouchez, Rein Brys, Katie Clark, Vasco Elbrecht, Stefano Fazi, Vera Fonseca, Bernd Hänfling, Florian Leese, Elvira Mächler, Andrew R. Mahon, Kristian Meissner, Kristel Panksep, Jan Pawlowski, Paul Schmidt Yáñez, Mathew Seymour, Bettina Thalinger, Alice Valentini, Paul Woodcock, Michael Traugott, Valentin Vasselon, Kristy Deiner.
This is an open access book distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Bruce K, Blackman R, Bourlat SJ, Hellström AM, Bakker J, Bista I, Bohmann K, Bouchez A, Brys R, Clark K, Elbrecht V, Fazi S, Fonseca V, Hänfling B, Leese F, Mächler E, Mahon AR, Meissner K, Panksep K, Pawlowski J, Schmidt Yáñez P, Seymour M, Thalinger B, Valentini A, Woodcock P, Traugott M, Vasselon V, Deiner K (2021) A practical guide to DNA-based methods for biodiversity assessment. Advanced Books. https://doi.org/10.3897/ab.e68634
|
This publication is an output from EU COST Action DNAqua-Net (CA 15219 - Developing new genetic tools for bioassessment of aquatic ecosystems in Europe) and would not have been possible without the opportunities for international collaboration provided by the network, supported by COST (European Cooperation in Science and Technology). Therefore, our highest gratitude is due to Florian Leese and Agnès Bouchez who designed and led DNAqua-Net, and to programme managers Alex Weigand, Sarah Kückmann and Charlotte Frie who coordinated it.
In addition to the authors, hundreds of researchers and practitioners from across Europe and further afield have contributed to the body of knowledge synthesised herein. DNAqua-Net workshops have served as the primary mechanism for consolidating knowledge and were particularly valuable for bringing together research scientists with regulators and end-users, which helped to emphasise the practical considerations in the implementation of DNA-based monitoring programmes. Workshops that ultimately fed into this publication were hosted in Germany (Florian Leese; University of Duisburg-Essen, 2017), Bosnia and Herzegovenia (Belma Kalamujić; University of Sarajevo, 2017), Hungary (Zoltán Csabai; University of Pécs, 2018), Austria (Michael Traugott; University of Innsbruck, 2018), Portugal (Pedro Beja; CIBIO, 2018), Italy (Stefano Fazi; Water Research Institute IRSA-CNR, 2019) and Cyprus (Marlen Vasquez; Cyprus University of Technology, 2019). The workshops highlighted what a collaborative community has emerged among researchers in this field, enabled to a large degree by programmes like DNAqua-Net as well as by a strong collective sense that our research has important real-world applications and is building a foundation for the years to come when we need every tool in the box to promote the protection and recovery of the natural world.
We are particularly grateful to all those non-expert users of environmental DNA who fed back to us their experiences and challenges in engaging with these new methods and provided wider context as to the practical, logistical and financial constraints of routine monitoring (Iwan Jones, Simon Vitecek, Willie Duncan, Kerry Walsh and Martyn Kelly, to name just a few). These insights have helped to guide and shape research priorities, and we hope that this guide will prove a useful resource for these users as they begin to integrate these new technologies into the suite of tools at their disposal.
DNA-based methods for species detection and identification have revolutionised our ability to assess biodiversity in terrestrial, freshwater and marine ecosystems. Starting from the seminal study that used eDNA to detect invasive american bullfrogs in France (
As the field developed fast and the approaches were applied to a wide range of research and monitoring objectives, a high level of methodological variation was introduced at all stages of the workflow (
As environmental practitioners and policy makers are now increasingly starting to integrate DNA-based methods into routine monitoring applications including protected species licensing
Thus, emphasis now shifts from fundamental research to robust and efficient application of DNA-based methods for operational use at large scales. This requires that scientific robustness is balanced with consideration of the practical realities faced by environmental managers. Moreover, there is increased need for strong quality assurance in a setting where non-expert field samplers and commercial laboratories are involved with the generation of data that non-specialist decision-makers then rely on to inform potentially costly action (or non-action). This places increased emphasis on robustness, replicability, traceability and ease-of-use, which may not always be the central focus of studies carried out in the academic research environment.
This document aims to summarise the scientific consensus relating to every step of the field and laboratory workflows involved in the most common types of samples and analyses. We do not go into great detail regarding bioinformatics (computational processing of sequence data) and data analysis since these are extensive topics in their own right. We uniquely set the field and lab steps in the context of the practical and logistical constraints faced by environmental managers in terms of cost, logistics, safety, ease-of-use, and quality assurance, highlighting key decisions to be made and the inherent trade-offs associated with the various options. We hope that this will support non-experts, and those new to the field, to navigate the key considerations associated with planning or evaluating monitoring programmes using DNA-based monitoring methods. Additionally, it will aid decision-makers in writing and evaluating tenders and proposals, ensuring that the methods used for a given project are fit-for-purpose and that results are correctly interpreted.
Alongside the many areas of emerging consensus, there remain some areas where further research is still required to balance scientific best-practice with the constraints and priorities of end-users. We hope that by shining a light on the importance of these issues, the research community will be encouraged to address them. More generally, we hope to inspire researchers in this now highly-applied scientific field to consider end-user constraints when designing and implementing research projects. This will help to accelerate uptake by users and maximise the impact of research.
DNA-based bioassessment methods continue to evolve, and there are several emerging technologies that show exciting promise to move beyond even what is possible today. Examples include in-field sequencing using the MinION device from Oxford Nanopore Technologies (
The two main challenges of bioassessment are (1) the detection of species and (2) their correct taxonomic identification. DNA-based monitoring tools can help address both aspects. For small-bodied and species-diverse groups such as benthic macroinvertebrates and diatoms, the monitoring challenge lies not so much in species detection but in the need for rapid, cost-effective and accurate identification of taxa. The best-validated approach for DNA-based biomonitoring of these groups is to follow established sample collection protocols (as outlined in existing standards e.g. ISO 16665:2014; ISO 10870:2012; ISO 10870:2012; ISO 10870:2012; CEN/EN 13946:2014; CEN/EN 14407:2014), substituting morphological identification of taxa with metabarcoding and DNA-based taxonomy (
For fish and invasive non-native species, the challenge for monitoring lies principally in detection rather than identification. Conventional fish survey methods (e.g., electrofishing and netting) are labour-intensive, inefficient for community assessment, and often cause harm or stress for the fish (
We first consider field sampling and preservation methods for each of four sample types:
We then outline key quality control checks to be applied to DNA extracts and a framework for positive and negative controls to be integrated into the workflow.
Next we give a detailed overview of the laboratory steps, decisions and trade-offs associated with the two broad approaches to sample analysis:
For completeness, we give a brief overview of the major choices and considerations in bioinformatics processing, focusing on those that materially affect the results obtained.
Finally, we summarise the key factors that influence methodological decision-making and outline key practical recommendations for DNA-based biomonitoring (See Figure
DNA can be captured in various states, and the state in which it is captured influences how it needs to be handled, processed and interpreted. In particular, we make a key distinction between organismal DNA, which is captured in the form of whole organisms, and extra-organismal DNA, which is captured in the absence of the organism they originated from.
While various definitions have been proposed and employed, we define environmental DNA (eDNA) as genetic material that has been isolated from environmental samples such as water, soil or air (
Spatial and temporal interpretations based on detection of species from extra-organismal DNA are complex because the DNA may have travelled away from the point at which it was released from the organism. Extra-organismal DNA is also typically present at very low concentrations in a sample, which makes it highly vulnerable to contamination. Special precautions need to be taken both in the field and the lab in order to mitigate this risk (
In the field, key considerations in working with eDNA include:
In the lab, key aspects of working with eDNA include:
A variety of different water collection strategies provide robust data with good detection probabilities. These range from sampling continuously across the area of the waterbody for a set period of time, or pooling subsamples from different point locations into single merged sample (
As with any ecological survey, robust sampling design prior to field collections is essential to ensure the data obtained are fit for the purpose required. A significant advantage of an eDNA approach is that biological replication - crucial for robust statistical analysis - is easily incorporated into survey design.
Environmental DNA sampling design needs to account for (1) the physical and chemical properties of the matrix from which it is isolated, (2) environmental variability, and (3) the ecology of the target species to be surveyed. While many studies have calculated sampling effort for particular species in given environments using occupancy modelling and allied methods (e.g.
Environmental DNA persistence in space and time is influenced by a multitude of factors. These include season, waterbody size and depth, temperature, stratification, connectivity, substrate, water chemistry, and flow. It is often difficult to tease apart specific effects, especially in natural settings, since combinations of factors will work synergistically or antagonistically to directly or indirectly facilitate degradation of eDNA (
One of the most important aspects of eDNA is its spatial distribution in the environment, which integrates how far eDNA travels from its “point of release” and how well mixed it is in the water column (
Summary of considerations for sampling eDNA from water in aquatic systems.
To consider | Applies to all types | Lentic | Lotic | Marine |
---|---|---|---|---|
When to sample? | • When the target species is most likely to be in the waterbody based on what it is known about its ecology and life history • During spring and summer higher bacterial and algal load may interfere with the analyses • To coincide with statutory monitoring |
• Consider seasonal thermal stratification classification (7 groups) patterns; In seasonally stratified lakes a more efficient sampling strategy can be deployed when the waters are mixed and samples from depth are not required | • Sample during typical flow levels - avoiding low flows/ drought and flood conditions • Consider seasonal patterns of migratory species |
• Nearshore- consider season. Many fish species move to shallow waters for mating and move to deeper waters in the winter, simultaneously some species prefer cold deep water in the summer season • Consider migration patterns, mating & spawning / breeding sites |
Where to sample? | • Avoid entering the water prior to sampling to avoid transfer of DNA on footwear or clothing, transfer of pathogens from other sites and disturbing the substrate • If necessary to enter lotic water, stay downstream of sampling points • Consider sewage pipes, nearshore restaurants and close human dwellings as contaminants from these sources may affect the eDNA results |
• Collect samples from around the shoreline/edge of the lake/pond • Also collect from the middle of the lake if there is large variation in depth or if the lake is stratified |
• Where possible, collect subsamples across the river width, including flow types such as riffles and pools • Sample at regular intervals along the river network • Consider tributaries, any connected lentic water bodies and changes in elevation |
• Sample collection should consider depth profiles, habitat heterogeneity and current/tidal influences • Several depths should be included if trying to capture the full community |
Sample number | • The number of samples should reflect the spatial complexity, size of the system and access to the area you wish to represent • Also aim to include subsampling of distinct sub-habitats (such as areas of differing flow or vegetation) to reflect the habitat as a whole • To reduce the number of samples, the extent of pooling of subsamples can be increased. However, this reduces statistical power, detection probability and spatial resolution of the data |
• The number of samples will depend on: • Size of the waterbody and accessibility • Water sampling strategy (whether or not merging subsamples and how much water can be passed through a filter) • Topographic variation of shoreline - more samples are required in more complex habitats |
• Flowing water should be sampled at regular intervals along the river length to ensure the collection of eDNA prior to degradation or dropping out of the water column • Some replication at each site is always recommended to increase confidence in data |
• Depends on the spatial scale of the area you wish to represent • Deeper water will require more samples to cover the different depth zones of the water column • Offshore sampling will need more samples to account for the very high dilution factor |
Sample volume | • The sample volume is dependent on a number of factors including turbidity, access to pumps and on-site or lab filtering protocols. Studies show a wide variation in the volumes chosen from 500ml - 50 L, however most studies which filter water process between 500 ml to 5 L per technical sample • Sample volume also depends on sampling strategy; pooled subsamples may require fewer replicates |
• Small ponds may require less sampling volume than larger lakes, however filterable volume may be lower in small ponds due to turbidity - aim to maximise the volume filtered | • eDNA distribution in river systems may be very stochastic and dilute compared to lentic samples • Taking regular samples/subsamples is important |
• eDNA in marine systems is very dilute therefore you should maximise your sample volume to be representative of the environment • Good results have been obtained with 2-5 L samples but depends on the target taxa (microbial taxa usually require smaller volumes) |
Turbidity | • Turbidity found in water samples causes a number of problems while sampling (i.e., filtering time) and can also cause inhibition | • Freshwater samples can often be turbid. Avoid disturbing the substrate when sampling. Consider using a prefilter or larger pore size, and avoid sampling after rainfall or during algal bloom events | • Usually less of a problem in marine waters although some inshore areas can become turbid due to coastal run-off and wave action disturbing the seafloor |
Chemical, physical and biotic factors influence the persistence of eDNA in the environment by affecting the rate at which it is degraded. Faster degradation reduces the time window for species detection, which carries advantages and disadvantages. On the one hand, it gives less opportunity for the DNA to travel long distances from the point of release, giving somewhat greater precision in temporal and spatial inference. On the other hand, sampling may need to occur more frequently or with greater spatial sampling effort to fully characterise communities.
Factors that affect the rate of eDNA degradation include:
It is also important to note that different subcellular components degrade at different rates, so detectability may vary according to the gene region targeted for analysis. Most commonly-used mitochondrial gene fragments (e.g. COI, 12S and 16S) will persist in the environment longer than ribosomal DNA fragments (e.g. 18S) due to the more resilient structure of the mitochondria once cells start to degrade. However, the higher abundance of ribosomal genes may offer a better alternative for localized monitoring under certain conditions (
Factors related to the biology and ecology of the target organism(s) can also affect detection probability. These include:
Behavioural factors interact with life-history and physiology to affect the amount of DNA released by a given species at any one time. Animals have been documented to release more eDNA when they are stressed, when they are active, and when they are warm (
Taking these factors into account allows estimation of how detection probability is likely to vary temporally and spatially for a given species, and more intensive sampling regimes may be required when conditions or timing are not optimal for detection, or when the species’ behavioural or physiological traits mean that it is likely to be underrepresented in aquatic eDNA. Table
Factors that could be expected to reduce detection probability for aquatic eDNA.
Factor | Reason for possible lower detection | Mechanism | Counteracting factors | Recommendations | |
---|---|---|---|---|---|
Habitat properties | Cold water | Reduced eDNA production & reduced mixing in lakes | Reduced activity of some animals Stratification of water column | Greater persistence (accumulation) of eDNA through reduced microbial activity. Some groups have higher detection probability in cold water | • Increase sampling effort if targeting species that are likely to be less active in cold water • Collect samples from different depths if thermocline likely to be present |
Warm water | Faster degradation of eDNA | Increased microbial activity | Increased activity of some animals in warmer conditions leads to greater production of eDNA | • Increase sampling effort in warm water especially if target is expected to be rare or transient, and if it is not expected to be more active in warmer conditions | |
Large water volume | Reduced eDNA concentration | Dilution | • Increase sampling effort in line with water body size | ||
Low pH | Faster eDNA degradation | Positively charged enzymes | • Consider increasing sampling effort in acidic environments | ||
High nutrient inputs | Faster eDNA degradation | Microbial activity | • Consider increasing sampling effort in water bodies with high nutrient input | ||
Target species properties | Exoskeleton | Reduced eDNA production | Physical barrier to eDNA release | Moulting, release of gametes | • Greater sampling effort needed for arthropods - Sample during and after breeding season when juveniles are growing |
Ectothermic | Reduced eDNA production | Lower metabolism & less shedding | Production of mucous or shedding of skin/scales | • Increase sampling effort, especially for reptiles | |
Low activity | Reduced eDNA production | Lower metabolism & less shedding | Greater accumulation of eDNA | • Increase sampling effort when activity is expected to be low | |
Terrestrial latrine | Reduced eDNA input to water | Major source of eDNA lacking | May be more detectable after rainfall | • Increase sampling effort • Try to sample after rainfall |
|
Not fully aquatic | Reduced eDNA input to water | Inconsistent release of eDNA | May be seasonal | • Increase sampling effort and align sampling with species’ expected use of aquatic habitats |
Two principal methods have been used for capture of eDNA from water.
Although early studies (e.g.
Note that in certain environments where cellular breakdown happens very fast (e.g. environments of extreme heat or acidity), precipitation may be a more effective method because a greater proportion of the available DNA will be extracellular.
Here, we give a detailed explanation of the reasons for our recommendation to use filtration for eDNA capture because it runs contrary to one of the few cases in which eDNA is currently applied within a regulated monitoring context - detection of great crested newts (Triturus cristatus) in the UK - for which ethanol precipitation is stipulated in the standard protocol.
Sensitivity: The likely greatest limitation of the precipitation approach is the volume of water that can be processed because the corresponding volume of ethanol required for precipitation quickly becomes prohibitive in terms of both cost and logistics. For instance, the precipitation-based protocol widely employed in the UK for capturing eDNA of the Great Crested Newt (Triturus cristatus) (
Contamination risk: Precipitation-based methods commonly use multiple separate collection tubes per sample to maximise the volume of water tested, but the DNA must then be combined into a single tube during extraction, which typically involves vortexing to dislodge the DNA from the tube surface and then pouring all the DNA pellets into a single tube. This is an imprecise process compared to other processes used in molecular biology, and poses a high risk of cross-contamination. DNA extraction protocols from filters are typically simpler, quicker and more contained, which both lowers the labour cost per sample and reduces the risk of sample cross-contamination.
Logistics, safety & disposal: The ethanol precipitation approach requires relatively large volumes of ethanol, which is subject to extremely heavy taxation in some countries unless it can be procured under a duty-free licence. It is also a flammable liquid and therefore classed as dangerous goods for transportation purposes (molecular grade ethanol: UN1173, class 3, packing group II) which means that specialist couriers are required, specific packing requirements apply (especially for air transport under IATA regulations), and shipment costs can become high. From a safety perspective, ethanol fumes from a single kit pose a negligible risk in the field during sampling, but pose a much greater risk in the lab, especially when large numbers of samples are processed. This kind of work requires adequately-ventilated laboratory spaces and should ideally take places in fume hoods with extractors. Where testing is taking place at large scales, huge volumes of ethanol waste are generated, which is highly flammable and must be properly stored and disposed of through specialist waste companies at significant cost. All these issues can be avoided by the use of a filtration-based eDNA capture approach.
A wide variety of equipment is used for filtration-based capture of eDNA from water, including different filter membrane materials, pore sizes, and filtration mechanisms, different transportation, storage and preservation methods, and different DNA extraction protocols. Due to the number of variables in a given workflow, it is difficult to robustly assess the importance of the choices made at any particular step. Moreover, there are almost certainly interactions between different elements of the process; for instance, certain membrane materials or filter designs may be best suited to use with particular pore sizes or DNA extraction methods, while others work optimally with a different combination of choices (
Our overarching message is that the first step should be to identify the main constraints for the project (e.g. time, remoteness, budget, availability of equipment), the characteristics of the study system, and the key aspects of the methodology that they affect. Next, optimisation should be carried out to determine the most effective combination of choices in the rest of the workflow, given the identified constraints. Reassuringly, many studies now show that with appropriate workflow optimisation, eDNA analyses are highly robust to different choices made (e.g.
We identify three distinct categories of filters used for eDNA capture (See Figure
Water eDNA filter types. 1: Open filters are exposed to the air during filtration either in the field or lab (a and b). 2: Housed filters are a membrane placed in a solid unit during filtration (c). Filters from Open and Housed units need to be removed from the filtration unit and stored in a petri dish or eppendorf tube until extraction (c and d). 3: Enclosed filters are systems in which the membrane is enclosed within the outer housing (e and f). Extraction is carried out directly from the enclosed filtration unit.
The combination of membrane material and pore size is important for determining how much volume can be filtered through a single unit. A membrane must be hydrophilic for water to easily pass through, and note that some membrane materials (e.g. PVDF) can be purchased in both hydrophilic and hydrophobic versions.
Commonly used materials include Cellulose Nitrate (CN), Polyethersulphane (PES), Polyvinylidene Difluoride (PVDF), glass fiber (GF) and Polycarbonate Track Etched (PCTE). Some membrane material may introduce constraints in relation to other parameters; for instance, GF membranes cannot be produced with consistent pore sizes due to the matrix-like nature of the material, so the stated pore sizes are nominal only and are rarely below 0.7 μm. PCTE membranes are difficult to incorporate into enclosed filters because the material is less strong than others and cannot withstand the same amount of pressure during filtration without additional support. Cellulose nitrate filter membranes have been found to disintegrate when stored for long periods in ethanol.
Pore size selected for eDNA capture varies substantially but is most commonly below 1 μm. The volume of water that can be passed through a filter membrane generally increases with pore size due to a reduced rate of clogging, but the trade-off is that the smallest particles containing eDNA may pass through the filter. Experiments on fish eDNA (
The most commonly used pore sizes in published studies are 0.22 μm and 0.45 μm. While there are often good reasons for choosing these smaller pore sizes, this is also partly a function of the limited range of commercially available filters with larger pore sizes. For instance, the widely-used Sterivex filters are only available in 0.22 μm or 0.45 μm. Note that if you aim to capture total microbial diversity as well as eDNA from larger organisms, a smaller pore size is advised (generally around 0.2 μm;
In particularly turbid waters, or where a small pore size is required, pre-filtering through a membrane with a larger pore-size can be an effective way of maximising sample volume by removing larger particles of sediment and plant material before passing the water through the main filter for capturing eDNA. This carries a risk of losing some eDNA particles during pre-filtering so it is recommended that DNA is extracted from the pre-filter and processed alongside that from the main filter, at least until it can be determined that results from the main filter are not negatively impacted by the prefiltration.
Many published studies describe collecting water in sealed containers and transporting it to a clean laboratory for filtration using vacuum pumps (
The alternative is to perform filtration on-site, either manually with syringes or hand pumps, or with the aid of a powered pump (vacuum or peristaltic) (Figure
Water eDNA collection and filtration methods. Water collection should cause minimum disruption to the substrate of the water body, either by collecting the sample from the bank (a and b) without entering the water or by collecting the sample upstream of the sampler (c and d). Pairing filtration or sampling in the field with an enclosed filter is recommended to minimise potential contamination. Filtration in the field can be carried out with a peristaltic pump (e) or by using a disposable syringe (f, g and h).
Note that as filtration pressure increases (whether using pump or syringe filtration), there is some evidence of reduced DNA retention on the filter membrane, as more molecules are forced through. However, this effect seems to be offset by the benefits of processing higher water volumes (
The volume of water filtered in published studies ranges from as little as 15 ml to over 100 l, but the most common volumes are between 500 ml and 5 l. There is little consensus on the minimum viable filtration volume, which will depend to some extent on other factors, such as:
While for any given sampling system the volume of water filtered tends to correlate positively with the amount of DNA recovered and detection probability of rare species (
In general, as detection probabilities decrease due to environmental, physical or biological factors, sampling effort should be increased. This can be achieved through both increasing the number of samples collected and increasing the volume of water filtered per sample. Increasing the number of samples is often likely to be more effective in increasing detection probability, and has the added benefit of enabling assessment of frequency or occupancy of species across the replicates. It is also usually easier to achieve given that volume may be restricted by filter clogging, although there are some high-volume sampling systems that enable filtration of much larger volumes of water in a single sample (
eDNA on filters is preserved for transportation and storage by either freezing, drying, or adding liquid preservative to the filter.
Properties of four commonly-used preservative solutions, with recipes where applicable.
Preservative solution | Recipe | Lyses cells? | Kills microbes? | Preserves RNA? | Practical considerations |
Ethanol | NA | No | Yes | No | Flammable liquid subject to dangerous goods transport regulations (UN1170, Class 3, packing group II) Can inhibit downstream reactions if samples are not completely dried prior to DNA extraction |
RNAlater | 25 mM Sodium Citrate, 10 mM EDTA, 70 g ammonium sulfate/100 ml solution, pH 5.2 | No | Yes | Yes | DNA extraction can be challenging. Requires specific optimisation |
Longmire’s buffer | 0.1 M Tris-HCL at pH 8.0, 0.1 M EDTA, 0.1 M NaCl, 0.5% w/v SDS | Yes | aided with addition of (hazardous) sodium azide | No | Precipitates at low temperatures (< 10 oC), but will return to solution if warmed |
Sarkosyl buffer | 100 mM Tris, 100 mM EDTA, 10 mM NaCl, 1% sodium N-lauroylsarcosinate | Yes | Yes | No | Does not precipitate at low temperatures, making it an attractive alternative to Longmire’s |
LifeGuardTM Soil Preservation Solution | Commercially purchased from Qiagen | No | Yes | Yes | Mostly used for small-volume sediment samples (< 1g). Cost is likely to be prohibitive for larger-volume samples |
Exogenous internal positive control (IPC) DNA can be added to the sample to check that DNA has been adequately preserved. This DNA, which can be purchased commercially along with the qPCR primers and probes needed for amplification, can be added, in a well defined concentration to the filter capsule shortly after filtering. If a liquid preservative is used then the IPC can be pre-mixed into the preservative and efficiently added this way. Note that IPC added to the water prior to filtering may not be captured effectively in the filter since it is not in the same state as eDNA (predominantly cellular, subcellular or particle bound).
After DNA extraction, IPC concentration can be quantified using qPCR or ddPCR to check that it is recovered at the expected concentration. Testing should be carried out using the specific sampling and DNA extraction method to be employed, to ensure IPC recovery, and to determine the concentration to be added to the sample and the expected results in the absence of DNA degradation and inhibition.
We recommend that commercial IPCs are used where possible, and this should always be the case for analyses carried out in a regulatory or management context. If custom controls are to be used in a research setting, the DNA used as IPC should be completely absent in the study system, and should not interfere with the DNA of target organisms and/or downstream applications. If the IPC is designed to be analysed alongside the target group in a metabarcoding analysis, it should be ensured that there are no primer mismatches, as this could reduce efficiency of recovery in some circumstances.
eDNA extraction protocols from filters can be based on a number of different commercial DNA extraction kits (e.g. Qiagen DNeasy PowerWater or PowerSoil kits, Macherey Nagel NucleoMag Water kits), custom column-based methods (
Note that this is not to say that commercial kits are without health and safety concerns and must be handled appropriately and need proper disposal of waste. For example, the most widely used kits contain guanidine thiocyanate or guanidine hydrochloride which are reactive with sodium hypochlorite (i.e., bleach) to produce chloramines, chlorine and hydrogen cyanide gases. Due to the common use of bleach as a decontamination agent in most laboratories, this presents a major health risk if laboratory personnel are not properly trained to avoid contact between commercial extraction kit liquids and bleach (e.g. they mistakenly wipe the bench top with bleach first instead of with a mild detergent after an extraction procedure).
Initial steps in extracting DNA from filters must be optimised according to the type of filter and preservation strategy used and the biological targets for analysis.
First, the target group(s) must be considered in the selection of a lysis method. Chemical lysis is sufficient for extraction of animal DNA, but mechanical lysis is required for disrupting cell walls of some unicellular groups such as diatoms. Since mechanical lysis cannot easily be applied to enclosed filters, open or housed filters are recommended if you plan to target such groups in your samples, and a DNA extraction that includes bead beating (a type of mechanical lysis using beads to break down cell walls) or equivalent is required.
Second, the preservative solution and filter type will influence the lysis procedure. It is vital to know whether the storage solution has lysed the cells.
An additional consideration is that not all extraction methods will release DNA bound to particles such as clay. If it is suspected that much of the DNA in a sample is particle bound (e.g. in highly turbid waters), then using kits optimized for soil extraction (e.g. Qiagen PowerSoil) or lysis buffers containing trisodium phosphate are needed to release adsorbed DNA (
Organic compounds co-extracted from the sample along with the DNA may inhibit downstream PCR reactions. This particularly affects samples from turbid waters and small water bodies containing lots of rotting leaves, which introduce tannins and other dissolved organic compounds to the water. Other sources of organic material that may cause inhibition include faeces from livestock (e.g. cattle;
Extraction efficiency and the presence of inhibitors can be assessed by including internal positive control DNA in the lysis buffer and checking via qPCR that it is recovered in the expected quantity after DNA extraction. This is discussed in more detail in Section 6.2 below.
Key takeaways:
Priorities for future research:
Well-established methods already exist for capturing aquatic macroinvertebrates and terrestrial arthropods by passive trapping (e.g. Malaise traps for flying insects, pitfall traps for ground invertebrates) or active sampling (e.g. kick sampling for benthic macroinvertebrates in streams). By-and-large the use of molecular methods for species identification does not demand development of new sampling approaches for these groups - although it should be noted that methods that yield clean samples with minimal detritus are highly preferred - so we do not go into detail about sampling methods here.
The first point at which DNA- and morphology-based assessment processes for bulk samples differ is in the preservation of bulk samples for storage and transportation. However, there is currently no clear consensus on the ideal preservation and storage strategy. Fixation in formaldehyde, which is commonly used to preserve bulk samples for morphological processing, is incompatible with DNA-based analyses and should be avoided at all cost.
Research studies commonly use ethanol for sample preservation. Ethanol works as a preservative by replacing water in biological tissues, but water drawn out from the tissues serves to dilute the ethanol, reducing its effectiveness for long-term storage. The dilution effect decreases as the ethanol to sample volume ratio increases so it is important to add ethanol in at least double the sample volume and it is standard practice to replace the ethanol for long-term storage of samples. However, while ethanol is certainly an effective preservative, it also poses considerable difficulties for use in routine biomonitoring. Pure, undenatured ethanol can be expensive to purchase in countries where alcohol duties are applied, requires special storage conditions, and is difficult to transport because it is a flammable liquid that is subject to dangerous goods regulations. This also raises health and safety concerns for many organisations that carry out fieldwork. Moreover, the large volumes of ethanol required for preserving bulk macroinvertebrate samples are expensive to dispose of correctly, and the sample needs to be completely dried prior to DNA extraction because ethanol residues interfere with the extraction chemistry.
The cost and accessibility barrier can be overcome through the use of denatured alcohols such as industrial denatured alcohol (IDA), industrial methylated spirits (IMS), and isopropanol. However, some denatured alcohols (e.g. IMS) seem to be unreliable as preservatives for DNA because of their capacity to gradually degrade dsDNA (
Non-flammable preservation solutions include:
Note that for collection of terrestrial invertebrates using passive trapping methods, the fluid serves as a collection and killing agent as well as a preservative. In some types of traps (e.g. Malaise traps), ethanol currently remains the preferred solution for this reason, and any alternative should be tested to check that it does not reduce trapping efficiency. If ethanol is used for collection, it can subsequently be filtered off for dry transportation of samples if necessary, but will still need to be disposed of correctly.
An alternative approach is to store the sample in a lysis buffer, which effectively maintains DNA integrity during storage. These solutions are usually non-hazardous and easy to transport. There are commercially available buffers as well as those that can be made up in the laboratory, such as Longmire’s Solution (
Non-liquid-based preservation strategies include immediate freezing or crushing of organisms to isolate mitochondria (
DNA extraction from bulk samples typically starts with homogenisation, which can be achieved either using bead beating or blending (Figure
DNA homogenisation methods. DNA sample homogenisation protocols vary depending on equipment available, here we present a number of lab set-ups which show differing equipment for the homogenisation of tissue samples (dry and wet) , IKA Tube-Mill 100 (a) with blending blades (b), dry malaise trap sample before and after homogenisation with the IKA Tube-Mill 100 (c and d), Qiagen TissueLyser II with plate attachment (bead mill also available) (e), and stainless steel wet-grinding blender (f and g).
A practical challenge to the large-scale operationalisation of this process is that bulk samples are often of considerable volume - much larger than can be accommodated in most tissue homogenisers or bead mills, and especially those that can process multiple samples in parallel. Therefore, large samples must either be homogenised one-by-one in a large-volume blender (which requires decontamination between samples) or split into smaller subsamples for this stage, which introduces additional consumables cost and reduces the capacity for parallel processing of large numbers of samples.
Moreover, it is usually necessary to clean the sample prior to homogenisation, removing organic and inorganic detritus (and organisms with thick calcium carbonate shells) to reduce the volume of the sample prior to homogenisation and improve homogenisation efficiency (although see See
Lastly, variation in body size presents a major challenge when working with bulk samples. This is particularly relevant to benthic macroinvertebrates, which vary in biomass by orders of magnitude. If DNA is extracted from bulk samples without size sorting, this leads to the DNA of very small organisms being overwhelmed by that of larger ones, resulting in detection bias weighted towards large-bodied taxa. Size sorting the organisms prior to extraction helps to fully recover the diversity of a sample, but it also requires a significant time investment as well as multiple DNA extractions per sample, which adds substantially to the cost, so there is a trade-off between taxonomic completeness and processing time/cost. Different sorting schemes have been proposed, ranging from very strict (separate into multiple size classes, extract DNA from each separately, and then pool the DNA extracts proportionally) to very coarse (remove large individuals, leaving only a leg or other body part in the bulk sample) (
As discussed above, various aspects of the sample-handling and homogenisation process limit the throughput of the DNA-based approach for bulk samples (specifically the time taken in drying, cleaning and size-sorting the samples, and capacity issues introduced by the need to divide up large samples). Furthermore, a potential barrier to the uptake of DNA-based methods for macroinvertebrate monitoring under the Water Framework Directive is the requirement to retain voucher specimens (a preserved specimen that serves as a verifiable and permanent record), which is incompatible with sample homogenisation.
An alternative to homogenisation is to retrieve DNA non-destructively by extracting it from the liquid solution in which the sample has been stored. Although frequently used for DNA extraction from individual museum specimens, this has only recently been widely applied to bulk samples in metabarcoding studies (e.g.,
Many tests of this approach (e.g.
Since this method is in the early stages of development, a number of fundamental questions remain regarding the relative effectiveness of different buffer solutions and how best to maximise release of DNA from the organisms into the solution without completely destroying the specimens (if they are to be retained as voucher specimens or for subsequent morphological identification). For instance, it may be possible to greatly increase extraction efficiency through agitation of the sample or the addition of proteinases, but this will destroy soft-bodied organisms. It is still unclear how long a sample should be left in the preservative solution and what volume of solution should be used for DNA extraction (
A wide variety of different commercial kits or liquid phase extraction methods are used for DNA extraction from homogenised macroinvertebrate samples. Thus, there are no clear recommendations as to which kit or method is preferable. However, due to health and safety concerns of working with toxic chemicals such as Phenol, commercial kit-based extraction approaches are generally preferred in commercial or routine monitoring laboratories. For DNA extraction from lysis buffer solutions, kits that enable a large volume of lysate to be processed may be preferred (e.g. QiAamp DNA Blood Maxi Kit, which uses 3-10 ml of lysate;
Early stages of the DNA extraction process are influenced by choice of preservation strategy, since some solutions, including ethanol, can interact with the chemistry during initial extraction stages, causing reduced extraction efficiency. Thus, samples preserved in ethanol must be fully dried prior to lysis, and this can pose a cross-contamination risk as electrostatic charges can cause small dried fragments to jump considerable distances. Using a lysis buffer (without SDS or other surfactants) or wet grinding in ethanol (
The volume of material used for extraction is also an important consideration. If the sample has been homogenised, using too much of the sample can cause PCR efficiency to be reduced through inhibition. Assuming that homogenisation has been sufficiently complete, representative communities can be described from as little as 0.3 g of the total homogenate (
Key takeaways:
Research priorities:
Freshwater periphytic samples (a.k.a. benthic biofilm) are characterized by the presence of a wide diversity of organisms representing all the tree of life domains (
Benthic diatoms are one of the major components of biofilm diversity in aquatic ecosystems and are commonly used for morphology-based ecological assessment (e.g.
DNA-based assessment of benthic diatom communities uses the same sampling methodology as morphology-based assessments applied for monitoring under the WFD, for which standard protocols have been set (NF EN 13946 - April 2014). These documents describe how to collect biofilm samples (number and type of substrate collected, habitat, biofilm surface collected), while a Technical Report from the European Committee for Standardisation (CEN/TR 17245, 2018) provides recommendations to ensure that biofilm samples are collected and stored in such a way as to be compatible with molecular analysis (Figure
This sampling strategy described in the standards was statistically optimized to obtain the best ecological assessment of evaluated rivers or lakes for minimal sampling effort. In the scope of routine monitoring and WFD requirements, this method is easy to apply and the same sample can be used for both DNA- and morphology-based assessment.
Diatom community DNA is dominant in the samples, so the vulnerability to contamination is low, which means that precautions can be lighter than for aquatic eDNA sampling: gloves are not mandatory, and collection of substrates generally requires entering the water (walking against the water flow), see the technical report (CEN/TR 17245, 2018) for more details. However, if biofilm samples are used to characterise prokaryotic communities or for analyses of extra-organismal DNA (e.g. from fish), considerations described in section 1.3 should be carefully applied to limit contamination.
Historically, biofilm samples were preserved using formaldehyde or Lugol solutions. These are compatible with morphological processing of samples, since they preserve the silica walls surrounding the diatom cells (frustules), which are used for identification with light microscopy. Formaldehyde solution is known to be incompatible with DNA preservation, and the use of Lugol solution for such application is still under debate (
The CEN technical report (CEN/TR 17245, 2018) recommends the use of pure and undenatured ethanol solution with a final concentration >70% to preserve biofilm samples collected in lakes and rivers. Several studies have applied this protocol successfully for benthic diatom ecological assessment in lakes and rivers using DNA metabarcoding (
Alternative preservation approaches that have been used for benthic diatom samples include RNA-later (
Growth of aquatic biofilms can be affected by a wide variety of environmental factors including the nature of the substratum (
The subsampled biofilm aliquot is pelleted using centrifugation (a separation process that relies on the action of centrifugal force to separate particles in a solid–liquid mixture) and the supernatant (i.e. the preservative solution) is discarded.
Like other environmental samples, biofilm samples are characterized by the presence of organic matter, humic acids and polyphenols that are known to inhibit molecular methods like PCR, so this needs to be accounted for during or after the DNA extraction process. In addition, diatom cells are protected by a frustule made of silica that is hard to break, and this can reduce the efficiency of DNA extraction if the cell lysis step is insufficient (
A wide variety of DNA extraction methods from benthic biofilms are found in the literature (e.g.
Key takeaways:
Research priorities:
Soil and aquatic sediment samples typically contain a high diversity of living organisms (organismal DNA) as well as DNA from larger organisms that has been shed into the sediment (extra-organismal DNA) and DNA from dead or dormant organisms. In aquatic systems, surface sediments also contain DNA from pelagic organisms or their cells that have settled from the water column. This makes soil and sediment samples a rich source of data across the entire spectrum of biodiversity (
Sediment metabarcoding that targets Metazoa will tend to be dominated by organismal DNA, which is present in much higher concentrations than environmental DNA (extra-organismal). This means that the datasets will predominantly comprise meiofaunal taxa rather than macrofauna. This is advantageous from the point of view of maximising the statistical power to show community change in response to impact or land use change, since meiofauna are often more diverse and more abundant than macrofauna, but it makes it difficult to directly compare metabarcoding results with those obtained from conventional surveys of benthic macrofauna or ground insects.
Molecular analysis of sediment samples is complicated by the fact that complex organic molecules and inorganic particles are able to bind, adsorb, and stabilize free DNA in sediments (
The volume of soil or sediment collected per sample will usually be decided according to the portion of biodiversity that is targeted and the spatial scale at which it operates. For microorganisms such as bacteria and single-celled eukaryotes, it is common to collect only very small-volume samples, which can be as small as 0.25 g and usually not more than 1 g. The advantage of such small-volume samples is that they can be easily and cheaply preserved and are compatible with high-throughput (i.e. automated) DNA extraction systems. If larger-bodied organisms belonging to the meiofauna or macrofauna are targeted, larger volumes (> 10 g) are recommended in order to achieve a representative sample (
Since DNA does not mix well in sediments, it is usually necessary to collect subsamples from across the area that the sample aims to represent. Sediment subsamples are most commonly collected using either a spoon/spatula (ideal for targeting a shallow layer of surface sediment) or a small coring device such as can be fashioned from a syringe. Syringe corers are ideal for targeting a deeper sediment profile, and the plunger on the syringe can be used to create suction enabling cores of loose or wet sediment to be collected. Subsamples can either be treated separately to maximise statistical power or merged and re-sampled to give a single smaller-volume sample that is representative of a wider area of sediment. The latter approach is more cost effective.
Very large volumes of sediment have been used in some studies (1 l or more) to target extracellular or extra-organismal DNA, which is present in very low concentrations in both terrestrial soils and marine sediments (e.g.
Samples are usually collected from the surface of the soil or sediment. If a large primary corer (e.g., grab sampler or boxcorer) is used to recover sediment from deep water, subsamples should be taken away from the edges of the corer, targeting the minimally disturbed parts of the sediment that have not come into direct contact with the equipment. There is no clear consensus as to the ideal vertical depth of the sample, and this may vary between target groups and ecosystems. For instance:
More research is needed to establish the optimal depth of core samples for targeting different portions of sediment biodiversity, and this may vary depending on the environmental, physical and chemical characteristics of the sediment.
One way to deal with large volumes of soil or sediment samples is to separate the organisms (macrofauna and/or meiofauna) from the soil or sediment itself, which is usually achieved through a series of flotation, decanting and sieving steps (
Separating the organisms from the soil or sediment allows a larger volume of sediment to be processed, but this needs to be balanced against the consideration that it is a labour-intensive process, and this may limit the scope of monitoring programmes in terms of the number of samples that can be handled.
Thus, extracting DNA directly from the soil or sediment itself is preferable in many ways: the process is more readily standardised and scalable, and requires less handling of the sample, which reduces contamination risk. DNA can be directly extracted from soil or sediment in volumes of up to 10 g using commercial DNA extraction kits (e.g. DNeasy PowerMax Soil Kit). Further work is needed to establish the optimal combination of sample size and replication to account for spatial heterogeneity in sediment communities, although various studies have examined this in certain environments (
Samples must be preserved for transportation to the laboratory and storage prior to DNA extraction. Rapid and effective preservation of soil/sediment samples is particularly important if either eDNA, RNA, or microorganisms are targeted. Common preservation strategies include freezing at -80oC, and the use of preservative solutions such as ethanol or Qiagen’s LifeGuard Soil Preservation Solution.
Freezing is widely accepted to be an effective way of preserving samples (
Preservation using liquid preservative solutions may represent a more practical option, but choice of preservative solution is critical.
Initial steps in DNA extraction will depend on the volume of soil or sediment collected, the target group for analysis, and the preservation method used. If a preservation liquid such as ethanol or LifeGuardTM has been used for sample preservation then this must first be removed from the sample, usually by centrifugation and discarding the supernatant. Subsequent wash steps may be needed to ensure that all traces of preservative are removed since they may interfere with the chemistry of the extraction kit. This is especially important with ethanol. If a salt-based buffer such as DESS has been used to preserve DNA, then the sample can sometimes be introduced more directly into an extraction process (
The maximum volume of soil or sediment that can be directly extracted in commercial kits is currently 10 g. Where a larger volume has been collected, thorough mixing prior to subsampling for extraction will help to ensure that the extracted DNA is representative of the sample as a whole. It may be worth carrying out multiple extractions per sample in this case, at least until it is clear to what extent a single extraction is representative of the whole sample. Soils and sediments typically contain PCR inhibitors, and heavily polluted sediments are often associated with particularly high levels of inhibition. Therefore, most extraction protocols need to include an inhibitor removal step. This is incorporated into commercial kit protocols designed for soils and sediments, but a specific clean-up step will almost certainly need to be incorporated where custom extraction protocols are used (e.g.
DNA from very large volume sediment samples are typically extracted using a phosphate buffer approach as outlined by (
Key takeaways:
Research priorities:
Before starting the main analysis, it is common and recommended to carry out some preliminary tests to characterise the DNA that has been extracted. This includes DNA quantification and testing for inhibition.
The most commonly-used platforms for DNA quantification are the Qubit fluorometer (Invitrogen, Carlsbad CA) and the Nanodrop spectrophotometer (Thermo Scientific, Waltham MA).
When targeting extra-organismal DNA extracted from environmental samples (water and sediment), it is important to bear in mind that a significant portion of the DNA extracted can belong to non-target organisms (
A common cause of false negative results is PCR inhibition (
This can be achieved through the use of an exogenous internal positive control (IPC) added at known concentration to the extracted DNA (
Inhibition can often be overcome by using additional purification steps (using commercial kits such as the Zymo OneStep PCR Inhibitor Removal Kits or Qiagen PowerClean kits), use of chemical enhancers such as bovine serum albumin (BSA) and dimethyl sulphoxide (DMSO) in the PCR reaction, or by dilution of the DNA. The inhibition test should then be repeated to check that the inhibition has been overcome.
While dilution of the DNA reduces the chance of one type of false negative result (inhibition) it can also increase the chance of false negatives occurring due to stochasticity by reducing the concentration of target DNA. This can be especially consequential when working with eDNA, where target DNA is already likely to be at very low concentrations. Dilution should therefore be compensated by increasing the number of PCR replicates performed or the volume of extracted DNA added to each reaction.
Data generated from eDNA analyses could be used as evidence for species presence (or when rigorously understood, absence) in contexts including permits for new development or construction of infrastructure, the designation of protected areas, protected species licensing, assessing illegal species introductions, and decisions to remove connectivity barriers. Many of these applications sit within a regulated framework of requirements, with legal consequences for environmental mismanagement, and this means that data derived from DNA-based analyses may have to be defended in court. Moreover, management responses to detection of certain species can be expensive and disruptive, so confidence in data quality is paramount.
Thus, the use of DNA-based monitoring methods to determine species presence or absence at a given location for environmental management purposes requires rigid quality assurance protocols, similar to those applied in other high-consequence sectors that use molecular methods, such as forensics (
A summary of the positive and negative controls that can be used at each stage of the process.
Stage in workflow | Positive control | Negative control | Referred to in this document | Notes |
---|---|---|---|---|
Site controls | Sample from site with known presence of target species | Sample from site with known absence of target species | Not always needed, but important when using new or partially-validated assays. Site negative is more relevant for single-species tests than for metabarcoding | |
Filter controls (aquatic eDNA only) | Distilled or bottled water processed under the same filtering conditions as the field samples to check that contamination is not being introduced in the field | 2.1.12 | Highly recommended for eDNA samples, especially when using open filters | |
Laboratory controls: DNA extraction | Add IPC (internal positive control) at known concentration to the lysis buffer (or other extraction solution) to control for successful DNA extraction. Amplify the IPC with qPCR or ddPCR to check the expected concentration is obtained (after controlling for inhibition). | DNA extraction protocol applied in the absence of the biological sample. Must be carried out alongside sample extractions using the same equipment and materials. Include IPC for comparison with samples. For each set of samples extracted together, add at least one negative extraction control | 3.3 | Negative controls mandatory Positive controls highly recommended |
Laboratory controls: Inhibition testing | Use IPC added to extracted DNA and amplify with qPCR / ddPCR to check efficiency. If IPC amplification fails or is delayed, purification steps should be undertaken and the test repeated. Negative results from inhibited samples should be reported as inconclusive | 3.4.2 3.4.3 | Highly recommended for all sample types; mandatory for environmental samples (water and sediment) | |
Laboratory controls: amplification with cPCR ddPCR, qPCR | Use DNA of the target species as a positive control for qPCR or ddPCR analysis. This should be run on every plate of samples. Standard curves are needed for DNA quantification in qPCR but not in ddPCR | Multiple template negative controls should be included in each plate of samples, using nuclease free water as a template in the PCR reaction. All other equipment and materials should be identical to those used for the samples | 3.4.2 3.4.3 | Mandatory |
Laboratory controls: Amplification for Metabarcoding | Amplify DNA from a mock community with known species composition. The mock community species should not be expected to occur in the samples on the same sequencing run. This should be included at least once in every PCR plate and included in all downstream processes | Multiple template negative controls should be included in each plate of samples, using nuclease free water as template in the PCR reaction. All other equipment and materials should be identical to those used for the samples | 3.5.3 | Recommended |
Laboratory controls: Sequencing | Sequence the mock community used as the positive PCR control | Sequence all negative controls included above | 3.5.3 | Negative controls Mandatory Positive controls highly recommended |
We define analytical positive and negative controls as follows:
A negative site control refers to a sample collected from a field site where the target taxon is known to be absent. It is not always required but is a core part of assay validation (
A negative filtration control is a sample where DNA-free water is filtered alongside the eDNA samples to check that DNA is not transferred between samples. It is especially important when any piece of equipment comes into direct contact with consecutive samples, using decontamination procedures to avoid cross-contamination.
Negative laboratory controls consist of DNA-free samples processed alongside the test samples at each stage of the process to check for (cross-)contamination. All negative controls should be processed to the end of the workflow, and new negatives should be added at each stage so that any contamination detected can be traced back to a specific point in the analysis.
Positive laboratory controls consist of a known concentration of pre-prepared DNA of one or more species that are expected to be amplified efficiently using the selected PCR protocol. In some cases they can also consist of purpose-designed synthetic DNA. If the positive control does not amplify as expected, it implies a high risk of false negative results (i.e., species not being detected when their DNA is present in the sample; a Type II error). Positive laboratory controls can represent a potential source of contamination for test samples, so they should be designed and handled with care. Where mock communities are used as positive DNA controls in metabarcoding workflows, they should contain species that are not expected in the test samples (e.g. from different geographic regions). This mitigates the risk that the positive control samples themselves represent a contamination risk, and additionally allows them to also be useful for detecting cross-contamination, functioning as additional negative controls.
Key takeaways:
Priorities for future research:
Targeted species detection methods refer to assays that are used to screen DNA samples for the presence of one or a few pre-defined species. While in principle bulk sample extracted DNA could be screened for indicator taxa, this is rarely practiced, and these samples are usually analysed with metabarcoding. Therefore, we focus here on single species detection from aquatic eDNA.
While various technology platforms can be used, the basic principle is that species presence is inferred based on successful amplification using a taxon-specific primer set. The same primer set can be adapted for use with different technologies, the most commonly-used being:
Note that other methods exist and are being developed, but have yet to be fully integrated into mainstream eDNA applications. Examples include loop-mediated isothermal amplification (LAMP), nucleic acid sequence based amplification (NASBA), self-sustained sequence replication (3SR), rolling circle amplification (RCA), and CRISPR-based assays (
The benefits of these single-taxon screening methods lies primarily in the simplicity of the laboratory and data analyses, and potentially in their sensitivity compared to metabarcoding approaches for some taxonomic groups (
cPCR does not incorporate any fluorescent dyes during the amplification process. The generated amplicon is visualized on agarose gels or with capillary electrophoresis machines. Using agarose gels, interpretation of the results is restricted to presence/absence inference, while analysis with capillary electrophoresis allows a fluorescence threshold to be set for positive detection, and the signal strength of the target band can be used as a semi-quantitative measure of target DNA.
While some eDNA studies have used this method, it most commonly serves as a cost-effective tool for preliminary tests during the assay validation process. Specificity to the target relies exclusively on primer design, so non-target amplification is a substantial risk with this approach if primer design has not been sufficiently optimised and validated (see below). Positive amplification of the target species can easily be confirmed by sanger sequencing of the PCR product, which should be carried out for a subsample of all successful amplifications.
qPCR is currently the most widely used technique for the detection of single taxa from environmental samples (
Similar to cPCR, ddPCR combines thermo cycling amplification (PCR) with subsequent evaluation of the generated amplicons in a second, separate step. ddPCR is a highly sensitive method that is capable of detecting single copies of target DNA. Furthermore, ddPCR is an endpoint reaction, and allows absolute quantification of target DNA copy number without the need for standard curves (
ddPCR randomly partitions 20 μl of PCR master mix (including up to 10 µl DNA extract) into ~20,000 individual droplets. The droplet matrix is then subjected to thermo cycling. Like qPCR, ddPCR is based on either i) probe hydrolysis, or ii) intercalating dyes (Sybr Green/Evagreen) technology. After amplification, droplets that contain target DNA are detected by fluorescence, allowing absolute quantification through poisson statistical analysis of the ratio of positive to negative droplets (
ddPCR instruments are more expensive and less widely accessible than qPCR instruments, but the approach is reported to be less prone to the effects of PCR inhibition. As a result, ddPCR can often be run with higher relative extract volumes without any sign of inhibition, which may at least partly explain the reported higher sensitivity of ddPCR compared to qPCR (
LOD and LOQ should be determined after an assay has been optimized and must be reported in order for an assay’s results to be interpreted correctly (
For qPCR, estimating target DNA concentration requires normalization against standard curves. These should consist of at least 5 different concentrations of DNA that contains primer and probe binding sites identical to the target species. Because high-concentration positive control DNA poses a contamination risk (especially when working with low template samples such as eDNA), the standard DNA should ideally be created using purified amplicons (e.g.
Good design of species-specific primers and probes is critical for any targeted species approach because interpretation of the results often depends solely on whether or not amplification occurs. Non-target amplification can therefore lead to species presence being wrongly inferred, with potentially costly consequences for environmental management. The challenge with the use of target-based approaches for routine monitoring at large geographic scales lies in designing and validating species-specific primers that are reliable across diverse ecological systems. This can be achieved, but it involves significant investment of time and effort, along with resources to fully test and validate assay performance in the intended environment.
As a guiding principle, the design and implementation of PCR primers should follow the established guidelines for the Minimum Information for the Publication of Quantitative Real-Time PCR Experiments, which have been adapted for use in eDNA applications (
The concentration of target DNA in the DNA extract (as measured by qPCR and ddPCR) or fluorescence strength of the target band (in cPCR) has been linked to species abundances under controlled experimental settings, but this is extremely complex to extend to natural systems. Use of IPCs for internal normalisation of quantitative estimates, together with models based on allometric scaling of species’ body sizes (
Key takeaways:
Priorities for future research:
Metabarcoding allows the simultaneous taxonomic identification of organism assemblages from a biological sample using high throughput sequencing of a standardised gene fragment (
Metabarcoding involves three principle laboratory steps:
Here we focus on the laboratory steps for the metabarcoding of taxa routinely used as ecological indicators. Most biomonitoring efforts will involve large numbers of samples and recent studies have already started addressing the practical and technical challenges of scaling up metabarcoding workflows for freshwater monitoring (
The goal of any metabarcoding analysis is to record and identify all species within a particular taxonomic group from a sample. To accomplish high taxonomic resolution (i.e. species level assignments) and maximise detection probability for rare species, designing or selecting appropriate primers is crucial.
Primers can be designed to target narrow taxonomic groups through to very broad ones. At the narrowest end of the spectrum, primers can also be designed to target a single species, using fast-evolving DNA regions that allow identification of intraspecific (population level) genetic diversity (e.g.
Barcode markers commonly used for metabarcoding in various organism groups. For a comprehensive list see (
Primer name | Target group | Gene region | Amplicon length | Citation | Forward primer sequence | Reverse primer sequence | Strengths | Weaknesses |
---|---|---|---|---|---|---|---|---|
Batra | Amphibians | 12S | c.a.60 bp | ( |
ACACCGCCCGTCACCCT | GTAYACTTACCATGTTACGACTT | High taxonomic coverage | Do not discriminate Peophylax species, need a human blocking primer |
515F–806R | Bacteria | 16S V4 | c.a. 390 bp | ( |
GTGYCAGCMGCCGCGGTAA | GGACTACNVGGGTWTCTAAT | Very widely used and standardised | |
[Diat_rbcL_708F_1+Diat_rbcL_708F_2+Diat_rbcL_708F_3] + [R3_1+R3_2] | Diatoms | rbcL | 312 bp | ( |
AGGTGAAGTAAAAGGTTCWTACTTAAA AGGTGAAACTAAAGGTTCWTACTTAAA AGGTGAAGTTAAAGGTTCWTAYTTAAA | CCTTCTAATTTACCWACWACTG CCTTCTAATTTACCWACAACAG | Diatom specific rbcL primers used in biomonitoring - high specificity and high coverage for diatoms - fully adapted to the use of Diat.barcode reference library - enable the evaluation of relative abundance of taxa (Vasselon et al 2018) | High specificity and coverage are obtained by pooling 3 forward and 2 reverse primers in equimolar proportions |
DIV4for - DIV4rev | Diatoms | 18S V4 | 280-300 bp | ( |
GCGGTAATTCCAGCTCCAATAG | CTCTGACAATGGAATACGAATA | Diatom specific 18S primers used in biomonitoring | |
1389F - 1510R | Eukaryotes | 18S V9 | c.a. 150 bp | ( |
TTGTACACACCGCCC | CCTTCYGCAGGTTCACCTAC | Universal eukaryotic primers | |
SSU_FO4+SSU_R22 | Eukaryotes (marine meiofauna) | 18S | 380bp | ( |
GCTTGTCTCAAAGATTAAGCC | CCTGCTGCCTTCCTTRGA | Good taxonomic coverage for marine meiofauna (V1-V2 region) | |
SSU_FO4 - SSU_R22mod | Eukaryotes (marine meiofauna) | 18S | c.a. 400 bp | ( |
GCTTGWCTCAAAGATTAAGCC | CCTGCTGCCTTCCTTRGA | Eukaryotic 18S primers, commonly used for metabarcoding marine meiofauna | |
TAReuk454FWD1 - TAReukREV3 | Eukaryotes (marine plankton) | 18S V4 | c.a. 400 bp | ( |
CCAGCASCYGCGGTAATTCC | ACTTTCGTTCTTGATYRA | Universal eukaryotic primers commonly used for marine plankton (Tara Oceans) but not only. | |
Teleo or Tele01 | Fish | 12S | c.a. 80 bp | ( |
ACACCGCCCGTCACTCT | CTTCCGGTACACTTACCATG | High taxonomic coverage, including Actinopterygii and Elasmobranchs | Poor species resolution for same family, need a human blocking primer |
MiFish-U | Fish | 12S | c.a. 170 bp | ( |
GTCGGTAAAACTCGTGCCAGC | CATAGTGGGGTATCTAATCCCAGTTTG | Good specificity, coverage, and resolution | |
s14F1 - s15.3 | Foraminifers | 18S 37F | 120-140 bp | ( |
AAGGGCACCACAAGAACGC | CCACCTATCACAYAATCATG | Foraminifer specific 18S primers commonly used in metabarcoding | |
FwhF2 + FwhR2n | Macroinvertebrates | COI | 205 bp | ( |
GGDACWGGWTGAACWGTWTAYCCHCC | GTRATWGCHCCDGCTARWACWGG | Short length of amplicon allows reliable amplification also of degraded material, very good taxonomic representation; short length allows to sequence on HiSeq and othe 2x150 bp platforms - cheap! | Not good for eDNA, too many bacteria are amplified |
fwhF2 + EPTDr2n | Macroinvertebrates, freshwater (especially insects) | COI | 142 bp (191 bp) | ( |
GGDACWGGWTGAACWGTWTAYCCHCC | CAAACAAATARDGGTATTCGDTY | Reduced non target amplification with eDNA | Increased primer bias compared to bulk metabarcoding primers, misses some taxa (e.g. some crustaceans, few trichopterans) due to the increased specificity. Can be more challenging to establish in the lab. |
fwhF2 + Fol-degen-rev | Macroinvertebrates, freshwater (especially insects) | COI | 313 bp (365 bp) | ( |
GGDACWGGWTGAACWGTWTAYCCHCC | TANACYTCNGGRTGNCCRAARAAYCA | Good amplicon length for Illumina sequencing | Not good for eDNA, too many bacteria are amplified. Forward primer affected by primer slippage |
BF3 + BR2 | Macroinvertebrates, freshwater (especially insects) | COI | 418 bp (458 bp) | ( |
CCHGAYATRGCHTTYCCHCG | TCDGGRTGNCCRAARAAYCA | Good taxonomic resolution | Not good for eDNA, too many bacteria are amplified. Amplicon length not ideal for all illumina sequencers |
Unio | Molluscs (Unionida) | 16S | c.a.130 bp | ( |
GCTGTTATCCCCGGGGTAR | AAGACGAAAAGACCCCGC | High taxonomic coverage | Some species cannot be discriminated |
Vene | Molluscs (Venerida) | 16S | c.a.130 bp | ( |
CSCTGTTATCCCYRCGGTA | TTDTAAAAGACGAGAAGACCC | High taxonomic coverage | Some species cannot be discriminated |
12S V5 | Vertebrates | 12S | c.a. 106 bp | ( |
ACTGGGATTAGATACCCC | TAGAACAGGCTCCTCTAG | Very high taxonomic coverage | Lower taxonomic resolution, particularly among bird species, non-target amplification of human |
To target broad taxonomic groups, degenerate primers are often used, which incorporate some level of flexibility in the priming sequence via so-called ‘mixed’ or ‘wobble’ bases. This allows for some sequence variation in the primer binding region of the target organisms so a more diverse group can be targeted. Degenerate primers will increase the number of amplified taxa, but can lead to increased amplification of non-target organisms when applied to environmental samples (e.g.
The most challenging aspect of primer design often relates to intermediate taxonomic groups such as Metazoa or paraphyletic groups such as ‘macroinvertebrates’ when they are targeted in environmental samples. In this case, primers have to be sufficiently broad to capture a taxonomically diverse group of target organisms, while not being so permissive as to amplify non-target groups such as bacteria and algae, which may dominate in terms of quantity of DNA isolated from environmental samples. High levels of non-target amplification in environmental samples are often mitigated by increasing sequencing depth. The increased read depth allows for sequences from non-target groups to be discarded during bioinformatic filtering, leaving a sufficient amount of sequence data derived from the target group. While this can work, it is an inefficient solution increasing overall costs and careful primer design is preferred where possible, with the aim of increasing target specificity and minimising non-target amplification.
There is an inherent trade-off in that as the target group broadens, a wider cross-section of biodiversity is obtained, but this comes at the expense of the completeness of the data in each of the different groups (
The potential for primer bias (also referred to as amplification bias) is an important consideration in primer design and selection. Primers that target a narrow taxonomic group can often be designed to have an exact or near-exact match to all target taxa, which ensures approximately equal amplification efficiency across taxa. This means that sequence read counts will usually correlate well with the relative concentrations of eDNA captured for each species in the sample. Conversely, those targeting a very broad group will vary in amplification efficiency across taxa, which reduces the potential to make (semi-)quantitative inferences from sequence read count data (
Amplicon length is a key factor affecting primer performance for metabarcoding, and here too there is an intrinsic trade-off to be negotiated. Shorter amplicons are usually more sensitive for the amplification of degraded DNA (
Thus, the balance tips in favour of using longer amplicons for metabarcoding of bulk invertebrates and sediment samples, especially when DNA is primarily derived from organismal DNA and is not expected to be degraded. In fact, for sediment biomonitoring the use of a longer amplicon can also be helpful in preferentially targeting DNA derived from living organisms, as opposed to accumulated eDNA, which tends to persist in short fragments (
Before ordering metabarcoding primers, it is first important to decide which metabarcoding labelling strategy will be employed. This is important because the primers need to be ordered with the corresponding 5’ nucleotide additions required for building sequencing libraries. A thorough review of metabarcoding labelling strategies can be found in
The library preparation step combines three key processes:
There are three main strategies with which metabarcoding amplicon libraries can be constructed prior to sequencing: the ‘one-step PCR’ approach, the ‘two-step PCR’ approach and the ‘ligation-based approach’ (
In the one-step PCR approach, sequence adapters and (typically) nucleotide tags are incorporated directly to the synthesis of the forward and reverse primers so that amplification and library preparation is achieved in a single PCR step, see Figure
The trade-off with this approach is that the primer sequences become very long when up to 60 nucleotides (sequence adapters and indexes) are added to the primers. This makes them costly to buy, while the long overhangs can decrease efficiency, thereby reducing the detection probability of rare species and the consistency between replicates (
The one-step approach is sometimes referred to as the ‘fusion primer approach’, but we avoid this terminology because all metabarcoding approaches involve primers that are to some extent a fusion of multiple components.
In the two-step PCR approach, shorter and potentially more efficient primers are used for the initial amplification of the DNA extracts, see Figure
The second stage of the two-step PCR process consists of a short second-round PCR (typically 8-10 cycles;
The third of the three main strategies for amplicon labelling in metabarcoding studies is ‘adapter ligation’ or ‘ligation-based approach’, see Figure
In the ligation-based approach, DNA extracts are PCR amplified with primers carrying short 5’ nucleotide tags, typically just 6-10 nucleotides in length and unique to each sample. The additions to primers are thus the shortest among the three approaches, which in theory should cause the least reduction in PCR efficiency. Following PCR, the tagged PCR products are pooled, and sequencing adapters are added using a ligase enzyme that covalently links the amplified DNA fragments to the adaptors
In all three approaches, heterogeneity spacers can be added to the primer sequence to improve sequencing performance by increasing the diversity at each base position, see Figure
Regardless of which library preparation is used, it is important to consider the number of PCR replicates to be performed per sample. Species represented in low DNA copy numbers can easily be missed in any given reaction, which introduces an element of stochasticity (
PCR replicates can either be labelled with the same nucleotide tags and/or indexes, meaning that the sequences derived from them will be pooled, or they can be individually tagged/indexed and treated as separate samples for downstream analysis. Sequencing each replicate independently allows for detection confidence to be bioinformatically assessed as a proportion of replicates in which a species occurs (
Another often overlooked consideration is the choice of polymerase, which can introduce significant amplification bias based on varying GC content preferences (
The most common reasons for a sequencing run to fail or produce low quality data is overclustering, which reduces the efficiency of base-calling and is usually linked to either insufficiently purified amplicon libraries, too high concentration of libraries due to mis-quantification of libraries, or insufficient nucleotide diversity
Due to the preferential sequencing of short reads on high-throughput sequencing (HTS) platforms, it is vital to remove primer dimers and other short fragments of DNA from the amplicon libraries prior to sequencing, otherwise few or no target sequences will be obtained. Amplicon libraries are usually purified using magnetic beads (e.g. AMPure XP, or Solid Phase Reversible Immobilization (SPRI)) or gel extraction, although the latter is labour-intensive and presents a higher contamination risk, especially where primers have been used without individual tags in first-round PCR. Gel extraction may be useful in small projects where a very close non-target band is present, but in general the presence of such a non-target band is a sign that the protocol requires further optimisation before being adopted for routine use. For large scale projects, post-PCR magnetic bead clean-up can be conveniently carried out in 96 well plate format (
A high degree of precision is required for accurate normalisation of libraries to ensure an even distribution of sequencing and prevent overclustering. The most accurate quantification of libraries is achieved using qPCR, although Qubit, TapeStation and Bioanalyzer devices are often used. Nanodrop and other spectrofluorometers are not recommended for this step (
Following quantification, libraries should be adjusted to equal concentration and pooled for sequencing. Negative controls should be indexed and included in the final pool, but since these are expected to contain no detectable DNA it will usually not be possible to add them at equal concentration. There is not yet a standard approach for pooling negatives, but one suggestion is to calculate the median volume in which the test sample libraries are added (to achieve the equimolar pool) and add the negative controls in this volume.
If all samples on the sequencing run have been amplified with the same primers and heterogeneity spacers have not been incorporated, there is a high risk that base-calling quality will be compromised by a lack of variation in nucleotide identity at each base position
Sequencing depth achieved for each sample will be a factor of (1) the choice of flow cell and sequencing platform and (2) the number of samples included on each run. Increasing sequencing depth will aid the recovery of rare taxa, especially in high diversity systems, but increases the per-sample cost. Most projects typically aim for a sequencing depth of between 50,000 and 200,000 reads per sample, but there are occasions when either shallower or deeper sequencing would be appropriate, depending on the sampling design, number of biological, PCR and sequencing replicates, and the importance of detecting rare species.
The Illumina MiSeq platform remains the most commonly used for metabarcoding and is well suited for most monitoring or research applications. For very large projects (i.e. hundreds or thousands of samples) or when much deeper sequencing is required, cost efficiencies may be gained through the use of higher-throughput platforms including the Illumina HiSeq, NextSeq, and NovaSeq. A comparison of costs per sample, sequence throughputs, and error rate among high-throughput sequencing platforms can be found in (
Although metabarcoding can be applied to DNA extracted from any type of sample, the sample type still influences decisions to be made at several points in the workflow (
While the above text focuses on the use of Illumina platforms for metabarcoding, Oxford Nanopore Technology devices (particularly the MinION) have also been used in several recent metabarcoding studies to detect target species of toxic microalgae (
The PacBio Sequel platforms also offer long read sequencing capabilities and can be used for metabarcoding. Although the reported error-rate is higher than that in Illumina sequencing, this can be corrected through the use of consensus sequences (
We do not explicitly cover bioinformatics in this document, but we do emphasise the importance of using a well-designed bioinformatics pipeline (a chain of command-line tools and custom scripts) that has been optimised for the specific marker, target group and use case. Even from the same raw sequence data, choice of bioinformatics parameters can make the difference between results that are fit for purpose and those that are not.
For a given metabarcoding project, it will be important to ensure that all samples are processed with the exact same bioinformatics pipeline, and it is particularly important to consider the need to link together datasets generated from different sequencing runs. This is especially relevant for taxonomic groups and markers with incomplete reference databases, meaning taxa cannot be linked based on species names and may influence the choice between use of OTUs (operational taxonomic units) and ESVs (exact sequence variants;
Choice of taxonomic assignment method and taxon acceptance thresholds (i.e., the number or proportion of sequence reads required for an OTU/ESV to be retained in the final dataset) can make a material difference to results obtained. Optimal parameter choices will depend on the characteristics of the marker used, the completeness of the reference database, and the purpose for which the data is to be used. For instance, if the aim is to assess overall ecological patterns then more aggressive filtering may be chosen to reduce noise and there is a relatively low cost to inaccurate taxonomic identification. However, if the aim is to detect invasive or endangered species, even very weak detections should be retained and species need to be identified with a high degree of accuracy.
There is considerable interest in the extent to which metabarcoding data can be used to determine intraspecific genetic diversity by differentiating between haplotypes. While the methods mentioned above go some way to minimise the impact of PCR and sequencing errors on metabarcoding data, precise choice of parameters can have a significant influence on effectiveness, and there is usually a tradeoff between minimising these errors and retaining true diversity, especially for low-abundance taxa. Therefore, extreme caution should be applied in interpreting sequence variants that match to the same species as evidence of intraspecific variation, particularly when using a metabarcoding marker designed for species-level identification, such as those listed in Table
Ecological indices, which are typically referred to as diversity indices, are quantitative measures that serve as statistical descriptors of biodiversity. In general, ecological indices either use presence/absence data to calculate richness, the number of unique biological units (e.g. species, functional group, etc.), or use proportional differences in abundances among biological groups to calculate diversity. There are many different abundance based ecological indices, due to the different levels of emphasis placed on rare species, but the central calculation framework is rooted in what are referred to as Hill numbers (
Traditionally, ecological indices have been calculated from data that catalogue the occurrence of captured individuals, such that the counts, abundances or frequencies of each unique biological unit (species) are directly linked to the individuals observed. In contrast, metabarcoding derived index scores, are not directly linked to abundance of each unique biological unit, but sometimes take into account sequence read counts. Read counts do not always closely correlate with abundance (
Inferrering community ecological indices from metabarcoding data can be relatively straightforward to adapt, at least from an analysis standpoint. Taxon richness can be easily calculated from metabarcoding data (
Key takeaways:
Priorities for future research:
This practical guide set out to summarize the current state of the art for the field and laboratory workflow that is used for DNA based methods of species detection and monitoring. While the process is complex and at each step there are many decisions to be made, we hope to promote greater understanding of the inherent considerations, trade-offs, and uncertainties so that good choices can be made. This section distills the main advice and steps for moving forward and is discerned from our collective knowledge presented in the guide. We conclude here with the main factors that influence methodological choices, suggestions for how to report results and share raw data in a consistent way. We end with an open invitation to the community to help us keep this guide as up to date as possible. The relevance of these methods for applied biomonitoring has already seen a huge increase in both monetary and time investment in solving the outstanding challenges, so we expect that many of these will be addressed within the coming months and years.
Regardless of sample type and analysis method, DNA-based monitoring of biodiversity is a complex process made up of many interdependent steps, each of which requires optimisation and incorporates choices and trade-offs. It is important to emphasize that given sufficient optimisation, highly reliable and replicable results can be obtained, and this is often robust to different choices being made at certain steps.
Choices made at each step will be influenced by multiple factors, including:
In applied DNA-based biomonitoring, it is vital to provide consistent documentation of methodological choices made and the results of quality control tests conducted throughout the workflow. This helps to provide confidence in the results obtained or to flag results that may be less reliable, and enables assessment of the comparability of results obtained using different workflows.
The exact data to report from any given workflow covered in this guide will vary depending on sample and analysis type. Minimum reporting standards will also vary depending on the goal of the work, which may range from exploratory surveys to publication in scientific journals and increasingly the provision of evidence meant to stand up in a court of law.
Cost saving option | Sample / analysis type | Considerations | Impact on budget | |
Field | Filter a larger volume of water per filter | Aquatic eDNA | May not be a viable option in systems with high suspended solids. Multiple samples still needed to account for uneven distribution of eDNA, especially in lentic habitats | Small in most cases, although may have a larger impact in certain environments (e.g. marine). Large volume samplers are likely to be more expensive so cost saving is only likely to be significant at large scales |
Field | Pool subsamples so that each sample is representative of a larger area | All | Lose spatial resolution & decrease probability of detecting rare species. Reduces options to analyse frequency / occupancy or conduct power analysis | Medium to high. A common way to maximise the number of different environments that can be sampled for a given budget |
Lab | Extract samples individually but then pool aliquots of DNA extracts for initial analysis. Store remaining DNA extracts individually. | All | Gives you the option to go back to analyse the individual samples for additional insights but start at a coarser level | Medium to high. Still incurs the costs of sample collection and DNA extraction for each sample, but initial analysis cost could be substantially reduced |
Lab | Reduce number of PCR replicates | All, but especially aquatic eDNA, for which more PCR replicates are recommended | Increases stochasticity and reduces detection probability | Small. This is usually a false economy since the reduced detection probability usually has to be compensated for by collecting and analysing more replicate samples, which is more expensive overall |
Lab | Use a higher-throughput sequencing platform such as Illumina Hi Seq or NovaSeq | Metabarcoding | Requires a large number of indexes for sample multiplexing | Small or negative unless very large numbers of samples give economies of scale. Can slow down the analysis as need to wait for enough samples to be accumulated |
Lab | Reduce sequencing depth | Metabarcoding | Reduces detection probability of rare species | Usually small. Sequencing typically only represents a small proportion of the overall cost per sample, and this can be a false economy similar to reducing the number of PCR replicates, requiring more biological sample replicates |
Lab | Reducing quality control testing | All | Lowers confidence in output and reduces the opportunity to identify specific steps that could have compromised the quality of results | Small to medium. Not recommended to eliminate the QC steps marked as mandatory in Table |
For the laboratory processes covered in this guide, our starting point should be to draw from standards that have already been established in other industries that use the same types of analyses. These include MIMARKS (minimum information about a marker gene sequence) and MIxS (minimum information about any “x” sequence) specifications (
Several efforts have been made to adapt these broader molecular standards for eDNA applications, including the establishment of minimum information criteria for eDNA analysis from water samples (
For field sampling steps, however, there is little in the way of standardised reporting requirements beyond best practice for reporting in scientific publications.
In Text Box 4 we provide an example of key information that should be recorded for all aquatic eDNA samples at the field collection stage and during analysis with a metabarcoding pipeline.
Note that in commercial settings, some methodological details may be commercially sensitive. These do not necessarily need to be included in reports but providers should ensure that all details are documented and securely stored internally in case they should be required for validation or verification purposes.
In the field:
In the lab:
Bioinformatics
This practical guide to DNA-based biomonitoring was a collaboration of many people across the spectrum of basic research to applied contexts and represents the state of advice and knowledge at the time of writing. We acknowledge that at the time of your reading this resource, some of the hurdles and challenges we highlight may have been overcome and there may be new evidence for ways forward that may even contradict the guidance provided here. We intend to update this practical guide periodically, but recommend that readers and users should keep in mind that rapid advancements will be made and it is always a good idea to discuss the current trends and read the newest studies. Please consider the authors of this guide as a resource and do not hesitate to reach out and discuss with us!
Thus, we view the practical advice collected here as to the best of our collective knowledge at time of publication. The chosen publication style as an electronic book format will allow us to update the guide to new versions with ease. We welcome further input from the broader scientific and applied communities of practice in this area to help us keep this document as up to date as possible and contribute to future editions so that it remains relevant and useful.
Amplicon: A section of DNA that has been amplified through a reaction such as PCR. It can also be termed PCR product.
Amplification: The process of creating copies of a particular region of DNA (the amplicon), usually through a PCR reaction using primers and enzymes such as polymerases. Non-target amplification refers to the unintended amplification of DNA from taxa that the primers were not designed to amplify (e.g. amplification of bacteria by primers designed to target metazoans).
Buffer: Liquid solutions used to maintain a stable pH, as they can neutralize small quantities of additional acid of base. For examples of buffers commonly used for preservation of DNA see Table
Barcoding: Taxonomic identification of a species based on the DNA sequencing of a short gene region that shows variation at the species level. This is known as a barcode region (also referred to as a marker region). Sequences obtained are compared against a reference database (e.g. BOLD www.boldsystems.org) to assign taxonomy.
Bioinformatics: Computational processing of sequence data. A core element of DNA metabarcoding pipelines, in which high-throughput sequencing data are quality filtered, summarised and compared against reference databases for taxonomic assignment, yielding a taxon-by-sample table that can be subjected to ecological analysis. A bioinformatics pipeline describes a script linking together a chain of software programmes that perform the various steps of data handling.
Bulk sample: A mixed community sample of organisms or their tissues such as would be collected in a net or trap or extracted from an environmental sample (e.g. by sieving soil or sediment samples).
CEN standard: CEN (Comité Européen de Normalisation) provides a platform for the development of European Standards and Technical Reports.
Clustering: A step in the bioinformatics process in which highly similar sequence reads are grouped together to form a cluster of highly similar reads that putatively originate from the same species. Sequence variants within a cluster include both real intraspecific genetic variation and sequences that contain errors introduced during PCR and sequencing. See also the definition for OTU.
ddPCR: Droplet Digital PCR. A platform that can be used for targeted detection of species (alternative to qPCR). A highly sensitive fluorescent probe based approach in which each reaction takes place in 20,000 individual droplets enabling absolute quantification of target DNA copy number without the need for standard curves. See section 7.1.3.
eDNA / Environmental DNA. DNA isolated from an environmental sample such as water or sediment. May include both Organismal DNA derived from whole organisms in the sample and extra-organismal DNA which is captured separately from the organism it originated from. Extra-organismal DNA may be in the form of cells, organelles, or free-floating DNA, originating from sources such as shed skin, scales, blood, mucus, faeces, urine, saliva and gametes.
DNA extraction: Isolation of DNA from a sample, using chemical methods. The DNA extraction usually incorporates steps to remove impurities from the DNA.
Environmental sample: A sample of an environmental medium, such as seawater, freshwater, soil or air.
ESV /ASV: Exact Sequence Variant or Amplicon Sequence Variant (broadly synonymous) are generated in the bioinformatics pipeline and represent individual high quality sequences in metabarcoding datasets. Can be used for taxon delimitation as an alternative to clustering into OTUs, with sequences that contain errors filtered out using denoising algorithms.
Filter: Membrane filter for the capture of eDNA constructed out of a wide range of synthetic materials, with specific pore sizes. See section 2.3 for further details.
High-throughput sequencing (HTS): DNA sequencing technology that produces millions of DNA sequence reads in parallel. Enables thousands of different organisms from a mixture of species to be sequenced at once, to obtain community data from a single analysis (i.e. metabarcoding). Various different platforms exist, but the most commonly used is Illumina’s MiSeq. Also known as Next-Generation Sequencing (NGS) or parallel sequencing. In contrast, the classic Sanger sequencing method produces one sequence at a time and is not suitable for mixed-species samples.
Indexing: Also known as sample multiplexing. Allows multiple samples to be pooled on one high-throughput sequencing run, by adding a short sequence of nucleotide base pairs to each sample during library preparation. This sequence is different for each sample on the run and enables sequences to be assigned back to the sample they came from after sequencing (known as demultiplexing; see Figure
Inhibition: Certain chemical compounds can reduce the efficiency of PCR amplification, or in some cases cause it to fail completely. This can lead to false negative results (i.e. non-detection when a species’ DNA is in fact present in the sample). Inhibitors may be present in the original sample (eg. in the form of tannins or humic acids) or may be added during sample processing or DNA extraction (eg. SDS, ethanol). Internal positive controls can be used to check for the presence of inhibition (see also section 6.2 on inhibition testing), and inhibitors can usually be removed through purification kits or dilution of the DNA.
ISO standard: The International Organization for Standardization is an international standard-setting body that promotes worldwide proprietary, industrial, and commercial standards.
Library: A molecular biology protocol through which DNA is prepared for sequencing on a high throughput sequencing platform. In the case of Illumina metabarcoding, this includes PCR amplification of the target DNA region, labelling of samples with unique nucleotide tags so that they can be multiplexed (pooled together for sequencing and bioinformatically separated after sequencing), and the addition of sequencing adaptors so that the DNA can bind to the Illumina flow cell. Metabarcoding library preparation can be done following various different approaches (see section 8.2.1 on amplicon library preparation).
Metabarcoding: Taxonomic identification of multiple species simultaneously from a complex (multi-species) sample, using high-throughput amplicon sequencing of a standardized gene fragment (e.g. COI). See section 8.
Metagenomics: The study of genomes recovered from a mixed community of organisms or from environmental samples. Metagenomics usually refers to the study of microbial communities but has also been applied to invertebrate faunal collections.
Mock community: A species community of known composition, usually assembled for use as a positive control. See also section 6.3 on analytical controls used for DNA and eDNA analyses.
Negative control: A negative control is used to check for potential contamination. A negative site control refers to a sample collected from a field site where the target taxon is known to be absent. A negative filtration control is a sample where DNA-free water is filtered alongside the eDNA samples to check that DNA is not transferred between samples. Negative laboratory controls consist of DNA-free samples processed alongside the test samples at each stage of the process to check for (cross-)contamination. In the context of DNA extraction, a negative control should not contain a DNA template and in the context of PCR, a negative control should not give amplicons. See also section 6.3 on analytical controls used for DNA and eDNA analyses.
OTU: Operational Taxonomic Units (OTUs) are proxies for species obtained using clustering algorithms to bioinformatically process sequencing data obtained from metabarcoding. Reads are clustered using a sequence similarity threshold (e.g. most often 97%). OTUs are not easily comparable across studies as they depend on the dataset in which they were created.
PCR: Stands for polymerase chain reaction. PCR is a method that uses thermal cycling (cyclical variations in temperature) in the presence of a polymerase enzyme to rapidly create millions of copies of a predefined DNA fragment. Primers are designed to bind to the DNA of the target group at either end of the chosen DNA fragment and the polymerase creates a copy of the DNA sequence between them. PCR (also termed DNA amplification) is a prerequisite for most forms of DNA sequencing and can be used as a diagnostic tool in itself to detect the presence of particular species when species-specific primers are used (e.g using qPCR or ddPCR). During thermal cycling, a series of repeated temperature changes are performed, which variously cause (1) DNA denaturation in which double-stranded DNA separates into single strands, (2) primer annealing where the primers bind onto single-stranded DNA, and (3) elongation where the polymerase synthesises DNA starting from the forward and reverse primers. This series of temperatures is repeated a predetermined number of times (termed PCR cycles), with the amount of target DNA doubling with each cycle leading to exponential amplification.
Positive control: A positive control is a sample that is expected to produce a known positive result, and is analysed alongside test samples to check that the analytical process is working as it should (e.g. a mock community can be used as a positive control during metabarcoding). Positive laboratory controls consist of a known concentration of pre-prepared DNA of one or more species that are expected to be amplified efficiently using the selected PCR protocol. See also section 6.3 on analytical controls used for DNA and eDNA analyses.
Primer or oligonucleotide: Short, single-stranded nucleic acid molecule (typically 20 bp or longer) consisting of a sequence of DNA bases that are designed to match the target DNA at a particular point in the genome. PCR usually requires a pair of primers (or primer set), one matching the target DNA at either end of the barcode region to be amplified. Primer mismatches occur when the primer sequence does not exactly match the target sequence, and this can reduce PCR efficiency or cause false negative results. Degenerate primers consist of a mixture of primer sequences that incorporate some variation at certain base positions so that the primers can bind to more variable target DNA (e.g. a broader taxonomic group) with minimal mismatches.
Probe: Usually refers to hydrolysis probes used in qPCR and ddPCR. Probes are DNA oligonucleotides designed to bind to the target DNA in a location between the PCR primers. The probe contains a fluorescent label, which is suppressed until PCR occurs, when it is released. The fluorescence emitted is detected by the instrument and used as a measure of target DNA amplification. Probe-based qPCR and ddPCR require both primers and the probe to bind to the target DNA in order for amplification to be detected, and this increases the specificity of assays.
qPCR: Quantitative polymerase chain reaction is a PCR reaction that quantifies DNA by means of a fluorescent dye that is measured by a fluorometer in real time throughout the amplification process. Information about relative and absolute amounts of DNA present can be inferred with the use of appropriate standard curves. See section 7.1.2.
Reference database: A library of DNA sequences derived from specimens of known identity. Sequence data obtained from test samples (e.g. via metabarcoding) can be matched against a reference database to assign taxonomic names to the sequences. The Barcode of Life Database (BOLD) is specifically developed for DNA barcoding and is highly curated but contains a limited selection of barcode genes. The NCBI database (also known as Genbank) is far more extensive but is not curated and contains a high level of error that must be accounted for in taxonomic assignment pipelines. Custom reference databases can also be made for particular projects to ensure that important species can be confidently identified.
Replicates: Repeat or duplicate samples / analyses used to test repeatability and measure variation, and to improve detection probability by overcoming stochasticity. Sample replicates (sometimes called biological replicates) refer to samples collected at the same time and location. Technical replicates are repetitions of the same analysis on the same sample - this can include extraction replicates where the sample is subdivided and multiple separate DNA extractions carried out, and PCR replicates, where the same PCR reaction is applied to multiple subsamples of a single DNA extract.
Sanger sequencing: A method of DNA sequencing developed by Frederick Sanger in 1977, also termed the Sanger ‘chain termination’ method. The method is based on the incorporation of radioactively or fluorescently labelled chain terminating nucleotides. It produces a single DNA sequence for each reaction, unlike high-throughput sequencing which can produce millions. In the field of DNA-based bioassessment, it is most commonly used for DNA barcoding to identify single specimens and the creation of reference barcodes from specimens of known identity.
Sequence read: a sequence of nucleotide bases (A,G,T,C) representing a DNA fragment. An Illumina MiSeq run generates around 30 million reads, each originating from an individual DNA fragment that was bound onto the surface of a flow cell. Many copies of the same DNA fragment can originate from the same species (even from the same organism), meaning that metabarcoding datasets typically contain many identical sequence reads. The number of sequence reads obtained for a given species in a sample is known as the read count. Although read counts often correlate with the relative quantity of species’ DNA in a sample, the quantitative interpretations must be made with caution due to technical and biological biases.
Sequencing: The process of determining the nucleotide sequence of a given DNA fragment, which enables species identification. See also definitions for Sanger sequencing and high-throughput sequencing.
Sensitivity: In diagnostics, sensitivity or true positive rate is a measure of the proportion of positives that are correctly identified. Essentially this refers to the ability of an assay to detect target DNA when it is present at very low concentrations.
Specificity: In diagnostics, specificity or true negative rate measures the proportion of negatives that are correctly identified. In primer design, specificity refers to the extent to which the primers (and probes where relevant) bind only to the target DNA without any non-target amplification. Specificity can be affected by the length and GC content of the primers, and by the annealing temperature used in PCR. However, the most important factor is careful primer/probe design to achieve exact complementarity to the target and multiple primer mismatches to related taxa that may co-occur.
Validation: A comprehensive set of experiments that evaluate the performance of an assay, including its sensitivity, specificity, accuracy, detection limit, range and limits of quantitation. In terms of eDNA, this also extends to field testing to check that expected results are returned under known conditions in the field.
WFD: EU water framework directive. Directive 2000/60/EC of the European Parliament and of the Council of 23 October 2000 establishing a framework for community action in the field of water policy.
Figure 1: Practical Guide overview.
R.C. Blackman, Illustrator version 25.2.3.
Figure 2: Water eDNA sample design.
R.C. Blackman, Illustrator version 25.2.3.
Figure 3: Water eDNA filter types.
a: R.C. Blackman, eawag
b: K. Panksep, Estonian University of Life Sciences
c: T. Macher, University of Duisburg-Essen
d: R.C. Blackman, Eawag
e: K. Bruce, NatureMetrics
f: K. Panksep, Estonian University of Life Sciences
Figure 4: Water eDNA collection and filtration methods.
a: M, Hellstrom, MIX Research
b: M. Seymour, The University of Hong Kong
c: A. Sheard
d: Smith-Root Inc
e: T. Macher, University of Duisburg-Essen
f. K. Panksep, Estonian University of Life Sciences
g: A. Valentini, SPYGEN
h: R. Schuetz, University Duisburg-Essen
Figure 5: DNA homogenisation methods.
a: A. Lindner, Zoological Research Museum Alexander Koenig
b: V. Ebrecht, ETH
c: A. Lindner, Zoological Research Museum Alexander Koenig
d: A. Lindner, Zoological Research Museum Alexander Koenig
e: R. Donnelly, University of Hull
f: F. Leese, University of Duisburg-Essen
g: F. Leese, University of Duisburg-Essen
Figure 6: Biofilm sampling.
a: V. Vasselon, Scimabio Interface
b: A. Bouchez, INRAE
c: V. Vasselon, Scimabio Interface
d: S. Lacroix, SYNAQUA project
Figure 7: Metabarcoding amplicon construct.
R.C. Blackman, Illustrator version 25.2.3.