Molecular identification of plants: from sequence to species
expand article infoHugo de Boer, Marcella Orwick Rydmark, Brecht Verstraete§, Barbara Gravendeel|
‡ Natural History Museum, Oslo, Norway
§ Meise Botanic Garden, Meise, Belgium
| Naturalis Biodiversity Center, Leiden, Netherlands
Open Access


Names are the carriers of knowledge. Without names, much of science would be meaningless. Names give us insight into the diseases that affect our health; the objects that sustain our economies; the celestial bodies that travel in the Universe. Names solve ambiguity.

In botany, the name of a plant may provide the first clues as to its characteristics, also called traits. Is it edible, or poisonous? Beautiful, or ugly? While some traits are relative (edible by whom, ugly to whom?), others are absolute: thorny, succulent, epiphytic. Some are obvious, others elusive. From morphological descriptions and DNA sequences to historical accounts and traditional uses, they are all linked by the name.

Until recently, the reliable identification of plants was the task of a select few: the taxonomists. Today, this is less so. The molecular identification of plants through DNA barcodes has been shown to perform just as well, and in fact often better, than taxonomists for many taxa, particularly when specimens lack reproductive structures. Other techniques, such as image recognition through machine learning and the spectrophotometric signature of leaves, can yield similar results. Does this mean the demise of taxonomists is on the horizon?

Not at all. I believe it is very much the opposite: in the current environmental crisis, the need to document and protect the world’s biodiversity has never been more acute. At the same time, some 20% of all plant species have not yet been scientifically described, and many of them may disappear even before we have identified and characterized them. The work of taxonomists remains therefore critical, but as molecular identification of species is underway and set to become routine across the private and public sectors, expert time can now be reallocated from bulk identifications to the training of students, build-up of physical and digital reference collections, and further development of identification methods. Technologies are here to help – not replace – taxonomy, by complementing the human strengths and compensating for some of our human weaknesses: an insufficient memory, a biased brain, and lack of time.

This book is for you who are curious about how plants can be identified using DNA: the most powerful source of information to link a plant to a name. This may sound trivial, but it is not. But don’t despair in advance: it is doable, mostly fun, and always rewarding. You just need to learn how.

Here, you will not only learn how various types of materials containing plant fragments can be identified to species in the lab and how to execute sophisticated computer analyses, but also gain a deeper understanding of the complexities and challenges faced by taxonomy in general, and plant identification in particular, including the lack of comprehensive reference databases. Enforcing strict species concepts onto nature’s inherent fluidity doesn’t always work, and despite all recent advances in this field it still happens that some plant samples cannot be confidently named. Yet, if this ever happens to you, this initially frustrating insight can also be scientifically revealing, and help you design further experiments.

The applications of molecular identification are far more numerous and trans-disciplinary than most people would imagine. Several chapters take a deep dive at applications in fields as seemingly disparate as palaeobotany and healthcare, but as I argued at the start of this text, they are all unified by a common denominator: the name, the information-carrier.

I hope you will find this book as inspiring, informative, and revelatory as I have, and that you will choose to carry out your own projects using the molecular identification of plants. And if you do so, just don’t forget to cite the chapters that inspired you!


An estimated 340,000–390,000 vascular plant species are known to science (Lughadha et al. 2016; Govaerts et al. 2021), and on average an additional 2,000 species are described each year (IPNI, 2020). Many of these plant species are poorly known in terms of ecology, distribution, threats, and potential benefits. Less than 10% has been assessed for the IUCN Red List, with a strong bias towards trees and species that are considered to be threatened (Bachman et al. 2019). A study assessing a sample of a thousand species representing global plant diversity uncovered that more than one in five were threatened with extinction (Brummitt et al. 2015). Plant extinctions are shown to occur up to 500 times faster today than in pre-industrial times (Humphreys et al. 2019). We are currently in a situation where for a large number of plant species we are unaware that they are at risk of extinction because we know them so poorly. Although new species are continuously being described, at the same time, others are going extinct. Unfortunately, many more species are going extinct without us knowing about it or even having discovered them.

Organismal diversity is the foundation of all biological research, but species discovery and delimitation requires taxonomic skills. Even the most experienced taxonomists can rarely critically identify more than 0.01% of the estimated 10–15 million species (Hammond 1992; Hawksworth and Kalin-Arroyo 1995). The Convention on Biological Diversity (CBD) recognised this challenge at its 1992 Rio Earth Summit, and established the Global Taxonomy Initiative (GTI) a few years later at its 5th Conference of Parties (CBD COP5 1996). The GTI was created to reduce the taxonomic impediment and aims to advance taxonomy and address the lack of information and expertise. The taxonomic impediment consists of the knowledge gaps in our taxonomic system (including those associated with genetic systems), the shortage of trained taxonomists and curators, and the impact these deficiencies have on our ability to conserve, use, and share the benefits of our biological diversity. Achieving the Aichi Biodiversity Targets and the Sustainable Development Goals and contributing to the post-2020 Global Biodiversity Framework requires an acceleration of taxonomy beyond traditional morphology-based methods and further integration of DNA-based approaches.

The global scientific community lacks the expertise and continuity to identify all species diversity, and biodiversity is lost at a greater speed than we can discover and describe new taxa (Antonelli et al. 2020; Butchart et al. 2010; Dirzo and Raven 2003; Hooper et al. 2012). Species description is a rigorous and time consuming process that can be made more effective through open data sharing and integrative taxonomy (Riedel et al. 2013). Morphological species identification has four significant limitations as outlined by Hebert et al. (2003): (1) phenotypic plasticity and genetic variability in the characters employed for species recognition can lead to incorrect identifications; (2) morphologically cryptic taxa, which are common in many groups, can be overlooked (Burns et al. 2008; Jarman and Elliott 2000; Knowlton 1993; Ragupathy et al. 2009); (3) morphological keys are often effective only for a particular life stage or gender, and many individuals cannot be identified; (4) modern interactive keys represent a major advance, but the use of keys often demands such a high level of expertise that misdiagnoses are common.

DNA-based species identification, i.e., molecular identification, makes it possible to identify species precisely from trace fragments such as pollen (Bell et al. 2019; Hawkins et al. 2015), detecting substitution in herbal pharmaceuticals (Raclariu et al. 2018, 2017), authentication of sustainable tropical timber (Nithaniyal et al. 2014), monitoring invasive alien species (Armstrong and Ball 2005), uncovering illegal international trade in endangered species (de Boer et al. 2017; Ghorbani et al. 2017), making rapid molecular biodiversity assessments (Bohmann et al. 2014; Thomsen and Willerslev 2015), and studying historical biodiversity through sedimentary DNA and ancient DNA (Anderson-Carpenter et al. 2011; Bálint et al. 2018).

These innovations in molecular identification enable us to detect and identify species in places and settings that were unimaginable only a few decades ago, or even in 2020 (Lynggaard et al. 2022). Molecular biodiversity assessments in fungi and insects especially have led to increasing numbers of “dark taxa”, i.e., taxa detected from DNA sequences alone by lacking a physical reference and identity for morphological description (Chimeno et al. 2022; Hausmann et al. 2020; Ryberg and Nilsson 2018). Dark taxa pose a challenge for taxonomy (Page 2016), but also reveal how molecular biodiversity assessments can overtake and accelerate beyond traditional taxonomy. This acceleration of species detection and discovery is crucial to overcome our global taxonomic impediment and help systematics make a bigger contribution to the CBD post-2020 Global Biodiversity Framework. The actual description and giving of names to newly discovered taxa is dependent on traditional taxonomy, and should be done by combining morphology and DNA. However, when tropical rainforests – key ecosystems harboring mega diversity and unknown species – are lost at rates of millions of hectares annually (Brondizio et al. 2019), it is more important to rapidly assess the most crucial biodiversity to conserve than to put a name to each taxon. The current revolution in molecular identification will empower us to play a key role in identifying that biodiversity.


  • Anderson-Carpenter LL, McLachlan JS, Jackson ST, Kuch M, Lumibao CY, Poinar HN (2011) Ancient DNA from lake sediments: bridging the gap between paleoecology and genetics. BMC Evol. Biol. 11, 30. doi: 10.1186/1471-2148-11-30
  • Antonelli A, Fry C, Smith RJ, Simmonds MSJ, Kersey PJ, Pritchard HW, Abbo MS, Acedo C, Adams J, Ainsworth AM, Allkin B, Annecke W, Bachman SP, Bacon K, Bárrios S, Barstow C, Battison A, Bell E, Bensusan K, Bidartondo MI et al. (2020) State of the World’s Plants and Fungi 2020. Royal Botanic Gardens, Kew. doi: 10.34885/172
  • Armstrong KF, Ball SL (2005) DNA barcodes for biosecurity: invasive species identification. Philos. Trans. R. Soc. Lond. B Biol. Sci. 360, 1813–1823. doi: 10.1098/rstb.2005.1713
  • Bachman SP, Field R, Reader T, Raimondo D, Donaldson J, Schatz GE, Lughadha EN (2019) Progress, challenges and opportunities for Red Listing. Biol. Conserv. 234, 45–55. doi: 10.1016/j.biocon.2019.03.002
  • Bálint M, Pfenninger M, Grossart H-P, Taberlet P, Vellend M, Leibold MA, Englund G, Bowler D (2018) Environmental DNA time series in ecology. Trends Ecol. Evol. 33, 945–957. doi: 10.1016/j.tree.2018.09.003
  • Bell KL, Burgess KS, Botsch JC, Dobbs EK, Read TD, Brosi BJ (2019) Quantitative and qualitative assessment of pollen DNA metabarcoding using constructed species mixtures. Mol. Ecol. 28, 431–455. doi: 10.1111/mec.14840
  • Bohmann K, Evans A, Gilbert MTP, Carvalho GR, Creer S, Knapp M, Yu DW, de Bruyn M (2014) Environmental DNA for wildlife biology and biodiversity monitoring. Trends Ecol. Evol. 29, 358–367. doi: 10.1016/j.tree.2014.04.003
  • Brummitt NA, Bachman SP, Griffiths-Lee J, Lutz M, Moat JF, Farjon A, Donaldson JS, Hilton-Taylor C, Meagher TR, Albuquerque S, Aletrari E, Andrews AK, Atchison G, Baloch E, Barlozzini B, Brunazzi A, Carretero J, Celesti M, Chadburn H, Cianfoni E, Nic Lughadha EM (2015) Green plants in the red: A baseline global assessment for the IUCN sampled red list index for plants. PLoS ONE 10, e0135152. doi: 10.1371/journal.pone.0135152
  • Burns JM, Janzen DH, Hajibabaei M, Hallwachs W, Hebert PDN (2008) DNA barcodes and cryptic species of skipper butterflies in the genus Perichares in Area de Conservacion Guanacaste, Costa Rica. Proc. Natl. Acad. Sci. USA 105, 6350–6355. doi: 10.1073/pnas.0712181105
  • Butchart SHM, Walpole M, Collen B, van Strien A, Scharlemann JPW, Almond REA, Baillie JEM, Bomhard B, Brown C, Bruno J, Carpenter KE, Carr GM, Chanson J, Chenery AM, Csirke J, Davidson NC, Dentener F, Foster M, Galli A, Galloway JN, Watson R (2010) Global biodiversity: indicators of recent declines. Science 328, 1164–1168. doi: 10.1126/science.1187512
  • CBD COP5 (1996) Decision V/9. Global Taxonomy Initiative: implementation and further advance of the Suggestions for Action [WWW Document]. URL (accessed 12.21.20).
  • Chimeno C, Hausmann A, Schmidt S, Raupach MJ, Doczkal D, Baranov V, Hübner J, Höcherl A, Albrecht R, Jaschhof M, Haszprunar G, Hebert PDN (2022) Peering into the darkness: DNA barcoding reveals surprisingly high diversity of unknown species of Diptera (Insecta) in Germany. Insects 13, 82. doi: 10.3390/insects13010082
  • de Boer HJ, Ghorbani A, Manzanilla V, Raclariu A-C, Kreziou A, Ounjai S, Osathanunkul M, Gravendeel B (2017) DNA metabarcoding of orchid-derived products reveals widespread illegal orchid trade. Proc. R. Soc. B 284, 20171182. doi: 10.1098/rspb.2017.1182
  • Dirzo R, Raven PH (2003) Global state of biodiversity and loss. Annu. Rev. Environ. Resour. 28, 137–167. doi: 10.1146/
  • Ghorbani A, Gravendeel B, Selliah S, Zarré S, de Boer HJ (2017) DNA barcoding of tuberous Orchidoideae: a resource for identification of orchids used in Salep. Mol. Ecol. Resour. 17, 342–352. doi: 10.1111/1755-0998.12615
  • Govaerts R, Nic Lughadha E, Black N, Turner R, Paton A (2021) The World Checklist of Vascular Plants, a continuously updated resource for exploring global plant diversity. Sci. Data 8, 215. doi: 10.1038/s41597-021-00997-6
  • Hammond PM (1992) Species inventory, in: Groombridge, B. (Ed.), Global Biodiversity: Status of the Earth’s Living Resources. Springer, Houten, pp. 17–39.
  • Hausmann A, Krogmann L, Peters RS, Rduch V, Schmidt S (2020) GBOL III: DARK TAXA. barbull 10. doi: 10.21083/ibol.v10i1.6242
  • Hawkins J, de Vere N, Griffith A, Ford CR, Allainguillaume J, Hegarty MJ, Baillie L, Adams-Groom B (2015) Using DNA metabarcoding to identify the floral composition of honey: A new tool for investigating honey bee foraging preferences. PLoS ONE 10, e0134735. doi: 10.1371/journal.pone.0134735
  • Hawksworth DL, Kalin-Arroyo MT (1995) Magnitude and distribution of biodiversity, in: Heywood, V.H., Watson, R.T. (Eds.), Global Biodiversity Assessment. Cambridge University Press, Cambridge, pp. 107–191.
  • Hebert PDN, Cywinska A, Ball SL, de Waard JR (2003) Biological identifications through DNA barcodes. Proc. R. Soc. Lond. B 270, 313–322.
  • Hooper DU, Adair EC, Cardinale BJ, Byrnes JEK, Hungate BA, Matulich KL, Gonzalez A, Duffy JE, Gamfeldt L, O’Connor MI (2012) A global synthesis reveals biodiversity loss as a major driver of ecosystem change. Nature 486, 105–108. doi: 10.1038/nature11118
  • Humphreys AM, Govaerts R, Ficinski SZ, Nic Lughadha E, Vorontsova MS (2019) Global dataset shows geography and life form predict modern plant extinction and rediscovery. Nat. Ecol. Evol. 3, 1043–1047. doi: 10.1038/s41559-019-0906-2
  • IPBES (2019) Global assessment report of the Intergovernmental Science-Policy Platform on Biodiversity and Ecosystem Services. S. Díaz, J. Settele, E. Brondízio and H. T. Ngo. Bonn, Germany, IPBES Secretariat: 1753. doi: 10.5281/zenodo.3831673
  • IPNI (2020) International Plant Names Index. The Royal Botanic Gardens, Kew, Harvard University Herbaria & Libraries and Australian National Botanic Gardens. Published on the Internet; [WWW Document]. URL (accessed 10.20.20).
  • Jarman SN, Elliott NG (2000) DNA evidence for morphological and cryptic Cenozoic speciations in the Anaspididae, ’living fossils’ from the Triassic. J. Evol. Biol. 13, 624–633.
  • Knowlton N (1993) Sibling species in the sea. Annu. Rev. Ecol. Syst. 24, 189–216. doi: 10.1146/
  • Lughadha EN, Govaerts R, Belyaeva I, Black N, Lindon H, Allkin R, Magill RE, Nicolson N (2016) Counting counts: revised estimates of numbers of accepted species of flowering plants, seed plants, vascular plants and land plants with a review of other recent estimates. Phytotaxa 272, 82. doi: 10.11646/phytotaxa.272.1.5
  • Lynggaard C, Bertelsen MF, Jensen CV, Johnson MS, Frøslev TG, Olsen MT, Bohmann K (2022) Airborne environmental DNA for terrestrial vertebrate community monitoring. Curr. Biol. 32, 701–707. doi: 10.1016/j.cub.2021.12.014
  • Nithaniyal S, Newmaster SG, Ragupathy S, Krishnamoorthy D, Vassou SL, Parani M (2014) DNA barcode authentication of wood samples of threatened and commercial timber trees within the tropical dry evergreen forest of India. PLoS ONE 9, e107669. doi: 10.1371/journal.pone.0107669
  • Page RDM (2016) DNA barcoding and taxonomy: dark taxa and dark texts. Philos. Trans. R. Soc. Lond. B Biol. Sci. 371. doi: 10.1098/rstb.2015.0334
  • Raclariu AC, Heinrich M, Ichim MC, de Boer H (2018) Benefits and limitations of DNA barcoding and metabarcoding in herbal product authentication. Phytochem. Anal. 29, 123–128. doi: 10.1002/pca.2732
  • Raclariu AC, Paltinean R, Vlase L, Labarre A, Manzanilla V, Ichim MC, Crisan G, Brysting AK, de Boer H (2017) Comparative authentication of Hypericum perforatum herbal products using DNA metabarcoding, TLC and HPLC-MS. Sci. Rep. 7, 1291. doi: 10.1038/s41598-017-01389-w
  • Ragupathy S, Newmaster SG, Murugesan M, Balasubramaniam V (2009) DNA barcoding discriminates a new cryptic grass species revealed in an ethnobotany study by the hill tribes of the Western Ghats in southern India. Mol. Ecol. Resour. 9 Suppl s1, 164–171. doi: 10.1111/j.1755-0998.2009.02641.x
  • Riedel A, Sagata K, Suhardjono YR, Tänzler R, Balke M (2013) Integrative taxonomy on the fast track - towards more sustainability in biodiversity research. Front. Zool. 10, 15. doi: 10.1186/1742-9994-10-15
  • Ryberg M, Nilsson RH (2018) New light on names and naming of dark taxa. MycoKeys 30, 31–39. doi: 10.3897/mycokeys.30.24376
  • Thomsen PF, Willerslev E (2015) Environmental DNA – An emerging tool in conservation for monitoring past and present biodiversity. Biol. Conserv. 183, 4–18. doi: 10.1016/j.biocon.2014.11.019

Section 1: Design, sampling, and substrates

Chapter 1 DNA from plant tissue

Plant DNA

What is DNA?

Deoxyribonucleic acid (DNA) is the blueprint of life. DNA encodes genes which carry instructions for the production of proteins, the fundamental components of a cell’s machinery. DNA was first isolated and confirmed as the genetic material in cells, and thereby the basis of heredity, in the 1940s (Avery et al. 1944). DNA is a polymer consisting of nucleotide monomers, each containing a phosphate group, a sugar group (ribose), and one of the four bases: adenine (A), thymine (T), cytosine (C), or guanine (G). The order of nucleotides determines the primary structure of DNA. Its secondary structure is dictated by hydrogen bonding between the purine-pyrimidine base pairs A-T (two bonds) and C-G (three bonds); these link the complementary antiparallel single DNA strands. The way in which nucleotide bases form pairs was discovered by Chargraff (1950). The two strands of chemically bonded nucleotides form the tertiary structure of DNA, which is a double helix in all known biological systems. The tertiary structure of DNA was first elucidated by Franklin, Watson, and Crick in the mid 20th century (Franklin and Gosling 1953; Watson and Crick 1953).

A fundamental tenet of molecular biology is that DNA is transcribed into ribonucleic acid (RNA), and subsequently translated into amino acids that form a protein sequence. We now have a much more detailed understanding of this framework, including the varied roles of RNA in gene expression and regulation, and the role of epigenetics—heritable changes in DNA that do not alter the base sequence (e.g., methylation). Since the discovery of DNA, there has been a steady increase in the use of DNA sequences as molecular markers in varied biological contexts, including medical and forensic applications, elucidation of genes encoding adaptive traits, understanding population genomic processes, as well as systematics of prokaryotic and eukaryotic organisms.

Distribution of plant DNA in the cell

Most DNA extraction protocols extract total cellular DNA. In certain experimental cases, it can also be preferable to target either DNA contained in the nucleus or DNA comprising organellar genomes (in plants: mitochondria and plastids). Organellar genomes are much smaller than any plant nuclear genome.

As with virtually all eukaryotes, plants have endosymbiotically derived mitochondria for cellular respiration and energy production. However, compared to other eukaryotic kingdoms (animals in particular), the mitochondrial genome of plants is quite large, ranging between 200 and 750 Kbp in size (Kubo and Newton 2008), and is characterised by a slow substitution rate and significant genome rearrangement events (Gualberto et al. 2014). Therefore, unlike in many other eukaryotes, the mitochondrial genome in plants is rarely used for molecular identification, including phylogenetics and systematics.

In contrast, plastid genomes (e.g.: found in chloroplasts of leaves or amyloplasts of cereal grains) have a very stable genomic structure and a size of around 150 Kbp in most cases (Twyford and Ness 2017). They have a high enough rate of substitution to serve as a useful molecular tool across different phylogenetic levels, including population-level and phylogeographical studies (Petit and Vendramin 2007), and harbour the plant DNA barcode genes matK, rbcL, and trnH-psbA (CBOL Plant Working Group 2009). Plant cells contain multiple plastids per cell, and each of these has several copies of their plastid genome, meaning that plastomes are present at a high copy number in DNA extracts. This makes them particularly useful for sequencing from total DNA extracts, using for instance a genome skimming approach (Dodsworth 2015; Twyford and Ness 2017).

Nuclear genomes, particularly in angiosperms, are highly variable in size, with the angiosperm mean and modal 1C (the amount of DNA in an unreplicated gametic nucleus) both at around 5 pg/Gbp (Pellicer et al. 2018). The largest genomes are found mostly amongst monocots, particularly the Liliaceae and Melanthiaceae, including the record-holder Paris japonica with around 150 Gbp of DNA (Pellicer et al. 2010). The smallest genomes have been found in carnivorous members of the genus Genlisea, e.g.: 61 Mbp in G. tuberosa (Fleischmann et al. 2014). It has only recently become possible to perform genome-wide analyses on the largest plant genomes thanks to developments in molecular methods for high-throughput DNA sequencing, including those that reduce genomic complexity (Dodsworth et al. 2019). Methodologies for high quality (chromosome-level) assembly of large plant genomes have also advanced, one example being Hi-C technology (Putnam et al. 2016; Neale et al. 2022).

Experimental history and main principles of DNA extractions

The first isolation of DNA, by the Swiss physician Friedrich Miescher in 1869, happened accidentally while studying proteins from leukocyte nuclei (Dahm 2005). Miescher noted a substance that precipitated from solution when acid was added and that re-dissolved upon the addition of an alkaline solution. He called this precipitant “nuclein”. While modern protocols for DNA isolation are considerably more refined than the very first trials, the general goal remains the same: to separate intact DNA from other plant cellular molecules, while minimising DNA degradation.

Plants possess a tough cell wall made up of cellulose and other compounds such as lignin, in addition to a cell membrane. This necessitates a robust first step for plant DNA extraction that disintegrates the structure of the plant tissue and breaks down cell walls. In a low-throughput scenario (or for samples that are tougher to disrupt), this could involve flash freezing the tissue with liquid nitrogen followed by grinding with a pestle and mortar. For higher throughput of samples, tissue-disrupting machinery can be applied. The ground material should then be taken forward immediately to the chemical steps of the process, which involve breakdown of the cellular membrane to release the lysate containing the soluble DNA. This is then separated from cell debris and other insoluble material. Various methods are subsequently used to separate DNA molecules from the remaining material, which can contain soluble proteins, nucleic acids, and small molecular metabolites (Doyle 1996). Cellulose and lignin derived from the cell wall, as well as polysaccharides, polyphenols, tannins, and other secondary metabolites (particularly prevalent in medicinal plants) are common endogenous impurities in DNA extractions. These compounds need to be separated and removed as much as possible; they may inhibit downstream laboratory steps and lead to poorer sequencing (Varma et al. 2007). However, extracting sufficient quantities of high-quality and high purity DNA from plants can often be challenging.

Numerous protocols and procedures have been developed to extract DNA from plant material of varying origins (Murray and Thompson 1980; Doyle and Doyle 1987; Rogers and Bendich 1989; Lodhi et al. 1994). Quite often, a protocol must be optimised or blended with others to obtain high-quality DNA from specific plant material. Refinement of the optimal isolation procedure will depend upon many factors, such as the source tissue, age of the material, and concentration of metabolites present in the plant.

A major innovation in DNA extraction protocols from plant material was developed by Doyle and Doyle (Doyle and Doyle 1987). This protocol uses the cationic detergent CTAB for extracting DNA from small amounts of plant tissues. This was a welcome alternative to the lengthy, expensive, and hazardous caesium chloride ethidium bromide density gradient centrifugation approach (Saghai-Maroof et al. 1984). This procedure quickly gained popularity due to its versatility and scalability, particularly in the volume of detergents, and the use of fresh instead of lyophilized tissue (Doyle and Doyle 1990; Doyle 1991). Today, there are numerous modifications of the original CTAB protocol for the isolation of pure, intact DNA from plants (Scott and Playford 1996; Sharma et al. 2000; Pirttilä et al. 2001; Drábková et al. 2002; Shepherd et al. 2002; Mogg and Bond 2003; Agbagwa et al. 2012). In the era of next generation sequencing, current innovations in DNA extraction protocols tend to focus on the need for high-throughput DNA extractions from many different taxa simultaneously (Mavrodiev et al. 2021).

Chapter 1: Box 1. Important first steps in the collection of plant material

Plant material for any research project must be collected ethically and legally, and the preparation of DNA extracts is no exception. Permission, prior informed consent and mutually agreeable terms of use must be obtained before using plant tissue for DNA extraction according to the Convention on Biological Diversity. This includes the fair and equitable sharing of benefits arising from the utilisation of genetic resources (as outlined in the Nagoya Protocol). National and international law and conventions apply to derivatives of biological materials, including DNA extracts and their transportation. The same principles apply to botanical collections such as seeds, silica dried specimens stored in a tissue bank, herbarium specimens, or plants in living collections. The terms under which they are stored in a collection may restrict the use of specimens for research and require additional permissions (for instance, from the regulatory authority in the country of origin) before they can be used. The storage and future use of DNA extracts, likewise, must comply with the terms of the permissions granted, which could include being stored indefinitely for future research, returned to the country or institute of origin, or discarded. See Chapter 2 DNA from museum collections for guidance about your responsibilities as a researcher.

Storing and preparing plant material for DNA extraction

Plant material

DNA can be extracted from healthy plant tissues including leaves, flowers, buds, seeds, roots, bark, and even spines. Young leaf tissue is the preferred starting material (Gemeinholzer et al. 2010), particularly for herbaceous plants, and fresh leaf tissue usually yields high volumes of high-quality DNA (Guo et al. 2018). However, the type of material used for DNA extraction depends on availability. Access to plant material and availability due to plant life cycles and seasonal variation may require a pragmatic approach. Some plant tissues (e.g., roots, stems), clades (e.g., ferns; (Thomson 2002)), and morphological features (e.g., succulence (Neubig et al. 2014)) present specific challenges during sample collection and storage, requiring tailored processing approaches.

Successful extraction of high-quality DNA from any plant material depends on the material being prepared correctly, dried rapidly (without excessive heat treatment), and stored in a dark, dry place to minimise degradation of its DNA. DNA degradation prior to extraction is caused by the release of endogenous nucleases during cellular lysis, which may be accelerated by environmental factors such as heat and humidity (Savolainen et al. 1995).

The extraction method is determined by the plant material available. For most kit and CTAB based protocols, a 1 cm2 section of herbaceous leaf tissue will suffice for a single extraction. Careful laboratory notes of the material used, including provenance data, sample weight, and extraction date, are vital for checking the quality of sequencing results against the specifics of the extraction process in the lab and for pinpointing reasons for variation between samples. For some protocols, weighed tissue can be placed straight into a 1.5 ml tube labelled with a unique number or laboratory code and other information, ready for the DNA extraction process.

Silica drying

Plant material dried and stored in silica gel – including as specimens stored in tissue banks specifically for the purpose of DNA extraction – tends to be a good source of high-quality DNA. Silica gel (silicon dioxide xerogel) is a desiccant that removes moisture from the atmosphere, drying out the plant tissue. Indicator silica gel crystals change colour when the silica is saturated, signalling when the silica gel should be regenerated or replaced. These crystals can be used in a mixture with non-indicating silica gel.

The use of silica gel is a popular approach to dry fresh plant material for DNA extraction because it is low cost and convenient compared to liquid nitrogen or lyophilization, especially when preparing tissue in the field. To effectively preserve the DNA in plant tissue, the recommended minimum ratio between plant material and silica is 1:10 (Chase and Hills 1991). However, if the material collected is mucilaginous, thick, or hardy, the volume of tissue should be reduced and cut into pieces, bringing the desiccant into contact with the cut surface of the plant material to facilitate rapid desiccation. The environment in which plant material is collected also affects the amount of silica needed and the frequency at which it needs to be replaced; a humid environment will require frequent changes of the desiccant. Tissue samples can either be stored directly in individual, sealed plastic bags containing silica gel, or in a breathable material such as a folded tea bag or coffee filter in a sealed container containing silica gel. The latter method is recommended to prevent cross contamination between samples and avoid powdering of the sample due to friction with the silica gel beads, which makes it more difficult to extract the tissue from the container later. Each sample should be double labelled on the outside, with a second label placed within the sample bag.


One approach is to freeze plant tissue until needed for DNA extraction, preferably at –80 °C, and otherwise in a standard laboratory freezer at –20 °C, if the sample is properly sealed. Alternatively, material can be flash frozen in liquid nitrogen. The resulting rapidly frozen material can yield high-quality DNA extractions, but liquid nitrogen is impractical for some settings due to handling considerations and cost (Till et al. 2015). Additionally, cycles of freezing and thawing of plant tissue should be avoided as this can damage plant cells, organelles, and DNA (Nagy 2010). It is therefore recommended that frozen plant material is only thawed once, right before the DNA is extracted.


High-quality DNA can be extracted from lyophilized (or freeze-dried) tissue, such as leaves and roots (Guinn 1966). This method was developed in the 1960s and is still used when fresh material cannot be used immediately or is not available. When paired with the correct extraction technique, lyophilized plant material can yield DNA of high quality (Nunes et al. 2011). During lyophilization, plant tissue is maintained at low temperatures (< –50 °C) and pressures (< 0.1 mbar), resulting in sublimation of the water in plant cells. A condenser is typically present that captures the vaporised water as ice. After removal of all water from the plant material (typically achieved within a few hours or overnight), the lyophilizer is brought to atmospheric conditions after which the dried plant tissues can be removed from the device. Proceeding with mechanical disruption of the tissue immediately after this is preferable, reabsorption of to avoid atmospheric moisture. However, the sample can alternatively be stored in silica gel before further use.

DNA extraction protocols

After the plant material has been prepared by drying and/or freezing using one of the above-mentioned techniques, a DNA extraction protocol can be implemented. Although there are a multitude of available protocols, the general methodology involves the following steps, discussed in more detail below:

  • Weighing of plant tissue
  • Mechanical disruption (grinding)
  • (Optional) pre-treatment
  • Extraction of nucleic acids from the cell
  • DNA isolation and precipitation
  • DNA purification

We place emphasis on the CTAB protocol due to its popularity, but also introduce other protocols that may be of interest to the reader.

General workflow for DNA extraction

Weighing plant tissue

The starting amount of plant tissue is important: too little will result in an unsatisfactory yield and too much may lead to poor grinding, saturation of the reaction and/or excessive debris which can also be detrimental to final yield. A useful starting ratio is a buffer quantity that is fivefold that of the weight of the leaf tissue (e.g., 0.2 g leaf tissue for 1 ml of buffer) (Kasajima 2018).

Mechanical disruption (grinding) of plant material

Plant tissue must be finely ground to a powder such that the cell walls are disrupted and the cell membranes are more accessible for the chemical reagents in subsequent steps to act successfully. It is advisable to scrape hairs or wax from the surface of the plant tissue before weighing and grinding. For herbarium specimens, special care should be taken that any glue that may be present is removed since this can interfere with the reagents used during the DNA extraction. Sterilised sand can also be used to increase the friction and enhance the disruption of the tissue; it will be separated later in the DNA extraction protocol. Fleshy tissue can be flash frozen in a mortar with a little liquid nitrogen before grinding. The dewar for transporting the liquid nitrogen should be clean and free of potential contaminants.

Manual grinding is inexpensive, yet time consuming and requires a sterilised mortar, pestle, and spatula for each sample. Use of a mechanical homogenizer, also called a tissue lyser, is more efficient. A steel ball bearing is added to each tube with a sample and shaken at high frequency within the instrument. This allows multiple samples to be disrupted simultaneously with minimal degradation of the nucleic acids. It also minimises loss of material and the chances of contamination, as each sample is processed in the tube that it remains in for subsequent extraction steps. Metallic, ceramic, or silica beads of different sizes can be added to the sample tubes to increase the disruption of particularly tough or woody material. Metallic and ceramic beads must be removed before proceeding with the protocol, but silica beads can be separated later in the protocol.

Optional pre-treatment

This step can be included as an optimisation strategy for increased yield, quality, or purity of the extracted DNA. For example, when high amounts of polysaccharides and/or polyphenols in the plant material are a concern (as is the case for succulent plants and plants in high stress environments, respectively), the modified STE-CTAB protocol can be used (Shepherd and McLay 2011). The ground plant tissue is washed up to three times with a Sucrose-Tris-EDTA (STE) buffer that dissolves most of the polysaccharides and polyphenol, after which the standard CTAB protocol can be followed. An alternative sorbitol-based pre-wash can also be beneficial in polyphenol removal and hence obtaining DNA of higher purity (Inglis et al. 2018).

Extraction of nucleic acids from the cell

In this stage, the goal is to release nucleic acids from the cell, whilst also minimising risk of nucleic acid degradation and to commence the segregation of unwanted cellular compounds from the DNA molecules.

The hallmark of the most widely adopted method for DNA extraction from plants, originally developed by Doyle and Doyle (Doyle and Doyle 1987) and Doyle (Doyle 1991) is cetrimonium bromide (CTAB) extraction buffer, and this should contain:

  • 2% w/v CTAB: a cationic detergent which, during DNA extraction, binds to the lipids in cell membranes, enhancing cell lysis, thus releasing intact nucleic acids from the nucleus and organelles
  • 1.4 M NaCl: a salt which increases the ionic strength of the solution, which simultaneously induces plasmolysis, promotes separation of proteins from DNA, and aids in polysaccharide precipitation
  • 100 mM Tris-HCl: a buffer (at pH ~8.0) which maintains the pH of the solution and stabilises the DNA by impeding degradation
  • 20 mM EDTA (ethylenediaminetetraacetic acid): which protects the DNA by inhibiting the enzymatic activity of DNase and RNase (i.e., by chelating divalent cations, such as Mg2+ and Ca2+, which are cofactors for these enzymes)
  • 0.2% ß-mercaptoethanol: which denatures polyphenols and tannins (abundant in plants), rendering it possible to separate them from the DNA in subsequent steps

CTAB buffer is added to each sample tube containing ground plant tissue and the mixture is incubated at 60–65 °C for 15–60 minutes. This can be done in an automatic shaking incubator. Alternatively, the sample tubes can be periodically shaken manually.

Alternatively, methods involving an SDS buffer can be applied (Dellaporta et al. 1983). The buffer recipe also contains NaCl, Tris-HCl, EDTA, and ß-mercaptoethanol, but differs in the application of the anionic detergent sodium dodecyl sulphate (SDS) for the disruption of cellular membranes, as well as the addition of sodium acetate (NaCH3COO).

DNA isolation and precipitation

The goal of this stage is the separation of DNA from other molecules in the lysate, by making use of the differing polarity of these molecules. This is followed by DNA precipitation from the solution.

In the CTAB protocol, the methodology is phase separation using organic solvent(s), where hydrophilic molecules, including DNA, can be isolated. A 24:1 solution of chloroform-isoamyl alcohol (SEVAG buffer) is added to the incubated CTAB/leaf tissue mixture. This solution is hazardous and must be prepared and added to the sample tubes in a fume hood to avoid inhalation. It is also highly volatile and evaporates very quickly, so it should be handled quickly to avoid evaporation during the work. The mixture is then centrifuged at room temperature, which results in the DNA becoming concentrated in the clear upper phase (i.e., the aqueous phase). The supernatant is very carefully drawn off with a pipette without disturbing or touching the organic phase (containing the chloroform with lipids, proteins, and other cellular debris) and transferred to a new tube. The supernatant is purified by adding RNase A and chilled isopropanol, where the latter induces precipitation of DNA. Samples are then transferred to a freezer at -20 °C, either overnight or for several days if sample input is low and maximum precipitation is desirable (at the cost of potential co-precipitation of salts).

In the SDS protocol, proteins and polysaccharides precipitate with the SDS itself. Sodium acetate in turn is used to precipitate the DNA; in solution this compound dissociates and the sodium ions (Na+) neutralise the negative ions on the sugar phosphate backbone of DNA molecules, thus making it less hydrophilic and amenable to precipitation (Heikrujam et al. 2020).

As a final step to both methodologies, the samples are centrifuged to encourage the formation of a DNA pellet, optionally washed with 70% ethanol at least once and re-suspended, preferably in 10 mM Tris-EDTA buffer (which serves to protect the DNA from damage, as explained in the CTAB buffer recipe above).

DNA purification

The DNA isolation stage is not perfect. Since the extraction process involves steps that segregate compounds by binding properties and molecular weight, co-extraction of molecularly similar polysaccharides is common. Furthermore, the eluent can contain certain contaminants, including traces of chemicals added during the extraction process and precipitated salts, as well as endogenous proteins, tannins, polysaccharides, and other molecules. The presence of such compounds can negatively impact the downstream experimental use of the DNA (i.e., act as PCR inhibitors), and further purification of DNA using various clean-up steps may be necessary.

One strategy is using a silica column and centrifugation-based method, by adding a chaotropic agent (commonly guanidine hydrochloride), which disrupts the hydrogen bonds between water molecules, creating a more hydrophobic environment. This increases the solubility of non-polar compounds (often contaminants) and additionally breaks up the hydration shell that forms around the negatively charged DNA phosphate backbone and further promotes efficient adsorption to the column surface under high salt and moderately acidic conditions (Esser et al. 2006). This is followed by washing steps with alcohol-based solvents and centrifugation to remove unbound contaminants before final elution of the DNA in a suitable buffer, such as 10 mM Tris-EDTA (pH 8.0).

An alternative involves the use of Solid Phase Reverse Immobilisation (SPRI) beads (Hawkins et al. 1994). These beads are paramagnetic, meaning that they clump together when exposed to a magnetic field. Their magnetite surface is coated with carboxyl molecules that can reversibly bind to DNA under specific chemical conditions. Polyethylene glycol (PEG), in this context termed the ‘crowding agent’, promotes the binding of DNA to SPRI beads. The ratio of this crowding agent to the DNA eluent is key: the higher the concentration, the greater the attractive force of DNA molecules to the beads, meaning that progressively smaller fragments with molecules of lower charge can bind to the beads. Therefore, choosing a ratio of SPRI beads – which are in solution with the crowding agent and salt (NaCl) – to DNA is the first step. A ratio of 1:1 is usually appropriate for DNA clean-up, though this ratio can be increased up to 2:1 for the retention of very short DNA fragments. Once the tube containing this mixture is placed into a paramagnetic plate, the DNA will remain immobilised to the SPRI beads, which are attracted to the sides of the tube, adjacent to the magnetic field. The supernatant containing any short nucleic acid remnants and contaminants can at this point be pipetted out from the tube. The beads are washed twice with an 80% ethanol solution before addition of an elution buffer (e.g., 10 mM Tris-HCl) to re-suspend the purified DNA.

Protocol optimization

When a DNA extraction protocol does not yield satisfactory results, in terms of quality or quantity of extracted DNA, modifications can be applied. A valuable strategy for this is conducting a search of the scientific literature for protocols that have been used for similar experimental purposes or have targeted the same taxonomic groups.

If using the CTAB protocol, understanding the biochemical actions and interactions of its components is a useful starting point to identifying what might need adjustment to help improve the outcome. CTAB acts according to the ionic strength of the solution; the concentration of NaCl must be at least 0.5 M so that it does not bind to nucleic acids, but does bind to proteins and neutrally charged polysaccharides as desired. NaCl is most commonly used at a concentration of 1.4 M. When working with a plant group that has a high content of polysaccharides, experimenting with higher concentrations of NaCl may improve the purity of the final DNA. Sometimes, other reagents such as N-Lauroylsarcosine (sarkosyl) buffer can be added, to enhance lysis (rupturing of the cell membrane) and to reduce the activity of DNase or RNase enzymes. Proteinase K can also be added to enhance the denaturation of proteins. The volume of 24:1 chloroform-isoamyl alcohol solution can also be adjusted. Phenol can be added as an additional non-polar, organic solvent that is highly effective in denaturing proteins and can aid in increasing the final DNA yield, as opposed to solely applying chloroform (Heikrujam et al. 2020), though it is very hazardous and requires careful handling.

Tris-HCl and EDTA are present in nearly all protocols. ß-mercaptoethanol is toxic and should thus be handled with care, and always in a fume hood with an extractor fan. One may consider simply not adding this reagent to the solution for plant tissues low in phenolic compounds. However, it is important to note that phenolic compounds co-precipitate with DNA and thus can be problematic in downstream steps of DNA laboratory work. ß-mercaptoethanol can be replaced with less toxic alternatives such as PVP (polyvinylpyrrolidone). PVP attaches to phenolic compounds via hydrogen bonding and can be removed together with them after centrifugation (Porebski et al. 1997; Varma et al. 2007). PVP has been found to improve DNA extraction from tissues such as wood (Rachmayanti et al. 2006). A similar compound – PVPP (polyvinylpolypyrrolidone), whose main characteristic compared to PVP is that it increases the pH of the extraction buffer – has also been found to increase the yield of DNA extracts (Kasajima et al. 2013). Finally, an optimization step for more recalcitrant plant tissues is the application of a 4–6 hour long or overnight incubation at 45–55 °C to increase the yield of the extracted DNA.

Commercial extraction kits

Most commercial kit-based protocols use a combination of buffers that perform similar functions to the components of the CTAB protocol, with a final step of elution through silica-columns, which tends to yield relatively clean DNA extracts. An added benefit of column-based kits is the use of filter columns at an earlier stage for the separation of crude plant material. Silica-based columns bind DNA so that it can be washed multiple times with alcohol-containing solutions to wash away contaminants before DNA elution. This speeds up DNA extraction significantly, reducing the total time from multiple days – as is common in regular protocols – to 6 hours. Drawbacks of these approaches however include the reduced yields of purified DNA in comparison to CTAB + chloroform extractions, as well as the significantly higher (~3–4 fold greater) cost.

Commercial kits that use magnetic beads are also becoming increasingly popular. Magnetic bead extraction kits are highly versatile and provide high yields of DNA that are also highly pure, in the absence of the hazardous solvents chloroform and phenol. After plant tissue grinding and lysis with an appropriate buffer, DNA is bound to the surface of the magnetic particles. The magnetic particle-DNA system is then washed several times with alcohol-containing solutions before a final elution step with a low salt buffer or nuclease-free water. In contrast to the column-based extraction method, binding of DNA to the magnetic particles occurs in solution, thus enhancing the efficiency and kinetics of binding and simultaneously increasing the contact of the bead-DNA compounds with the wash buffer, which improves the purity of the DNA. Magnetic particle kits have also been applied in combination with steps from the CTAB extraction method to extract high quality DNA from sorghum leaves and seeds, cotton leaves and pine needles (Xin and Chen 2012).

Finally, a less common commercial method involves the use of Whatman FTA® PlantSaver cards and custom reagents. This method is very practical in terms of collection of samples in the field and their transportation. Furthermore, immediate mechanical disruption of the plant tissue can eliminate the need for obtaining permits. While this method has been predominantly applied to agricultural plant taxa, its performance in 15 phylogenetically diverse non-agricultural taxa has been demonstrated, where DNA from these samples was found to be less fragmented than that from replicate samples extracted alongside with the CTAB method (Siegel et al. 2017).

DNA quantification and quality assessment

Assessment of the properties of each genomic DNA (gDNA) sample post-extraction – its integrity, quantity, and purity – is imperative for making decisions regarding downstream molecular work. The methods described below have some overlapping uses in terms of assessing these different properties, but we highlight which is most appropriate for each DNA quality-related aspect.

DNA integrity - agarose gel electrophoresis

Agarose gel electrophoresis is an appropriate method for estimating DNA integrity, as well as for crudely estimating DNA concentration. This method requires a horizontal gel electrophoresis tank with an external power supply, agarose, a running buffer such as Tris-acetate-EDTA (TAE) or sodium borate (SB), a fluorescent intercalating DNA dye, a loading dye, and a DNA standard (‘ladder’). The intercalating dye is added to the buffer (or sometimes to the loading dye) and serves to visualise the DNA in the agarose gel at the end point of electrophoresis. Historically, ethidium bromide was the standard intercalating agent, but it has now mostly been superseded by safer dyes that are less carcinogenic and do not require complex disposal procedures. Nonetheless, it is recommended that any compound that intercalates DNA be handled with care. The DNA standard is referred to as a ladder, since it is a complex of appropriately sized DNA standards of known concentrations which provide different benchmarks of size and concentration for comparison.

Each DNA sample and the DNA standard (ladder) are combined with loading dye and then pipetted into a well of the agarose gel, to then be subjected to an electric field. Due to the negatively charged phosphate backbone, DNA molecules will migrate towards the positively charged anode. The DNA migration rate depends on the fragment size, where smaller DNA fragments migrate faster, leading to a size-associated separation of DNA molecules. Additionally, the percentage of agarose in the gel will determine the size range of DNA that will be resolved with the greatest clarity. A range of 0.5% to 3% encompasses most applications, where < 1% is best for examining the genomic DNA of plants and 3% would be suitable for examining fragments with small (e.g., ~20 bp) differences in length. Once the fragments have migrated sufficiently to ensure resolution of the DNA and ladder, the gel is transferred to a cabinet with a UV light and the DNA fragments are visualised due to the excitation of the intercalating dye when UV is applied. The approximate yield and concentration of genomic DNA in a gel are indicated by comparison of the sample’s intensity of fluorescence to that of a standard.

Where a more precise estimation of the size of the DNA fragments is required, automated capillary electrophoresis can be used. Such systems (e.g., Agilent Bioanalyser, Agilent Tapestation) are more expensive to use, but – aside from precision – offer faster preparation and analysis time.

DNA quantity - fluorescence quantitation systems

Fluorescent measurements are considered the most accurate quantification method for measuring DNA concentration. These involve the addition of fluorescent dyes (in an accompanying buffer), which selectively intercalate into the DNA. Fluorescence measurements use excitation and emission values that vary depending on the dye used. The concentration of unknown samples is calculated by the fluorometer (e.g., Quantus™ or Qubit™) based on a comparison to a standard measurement from DNA of a known concentration (usually lambda bacteriophage DNA). Since the dyes are sensitive to light and degrade rapidly in its presence, sample tubes must be stored in the dark if readings are not taken imminently after their preparation in the buffer.

DNA purity - absorbance spectroscopy

A rough estimate of DNA yield and a more useful estimate of DNA purity can be measured via absorbance with a spectrophotometer that emits UV light through a UV-transparent cuvette containing the sample. Absorbance readings are conducted at 260 nm (A260), the wavelength of maximum absorption for DNA. The A260 measurement is then adjusted for turbidity (measured by absorbance at 320 nm), multiplied by the dilution factor, and calibrated using the following conversion factor: A260 of 1.0 = 50 µg/ml pure dsDNA. This useful relationship between light absorption and DNA concentration can be defined according to the Beer-Lambert law. Total yield is obtained by multiplying the DNA concentration by the final total purified sample volume. However, it is key to note that RNA also has maximum absorbance at 260 nm and aromatic amino acids have a maximum absorbance at 280 nm. Both molecules can contribute to the total measured absorbance at 260 nm and thus provide a misleading overestimate of DNA yield.

DNA purity is evaluated by measuring absorbance in the 230–320 nm range. Since proteins are the contaminant of primary concern, absorbance at 260 nm divided by absorbance at 280 nm is the standard metric. DNA can be considered of high quality and suitable for most genomic applications, when it has an A260/A280 ratio of 1.7–2.0. As a further step, the ratio of 260 nm to 230 nm can help evaluate the level of salt carryover in the purified DNA, where a A260/A230 of > 1.5 is considered to be of good quality. Strong absorbance at around 230 nm, which would lower this ratio, suggests the presence of organic compounds or chaotropic salts.

Instruments such as the NanoDrop® 2000 spectrophotometer are highly accurate for evaluating the A260/A280 and A260/A230 ratios. This method is not as accurate as fluorescence quantitation, but is most suitable where information on DNA purity is sought and is also time efficient (the sample is loaded directly into the machine and requires no preparation of buffers).

Approaches to challenging DNA extractions

Particularly challenging types of plant tissue, as well as degraded plant material, can still yield high-quality DNA if suitably optimised protocols are followed.

For instance, seeds can be a good source of DNA if specialised protocols are used (Sudan et al. 2017). Similar to other plant tissues, seeds require different collection and storage techniques depending on their morphology. Dry seeds can usually be collected and stored for long periods without treatment before being ground and used in a DNA extraction protocol. Soft seeds in comparison may need to be flash frozen using liquid nitrogen and cryopreserved prior to DNA extraction. The watery components of fleshy or succulent plant tissues require modified approaches to speed up drying before extraction to remove polysaccharide contaminants from the DNA extract (Larridon et al. 2015; Malakasi et al. 2019).

Figure 1.

Chapter 1 Infographic: Visual representation of the content of this chapter.

Advances in the sensitivity of genomic sequencing and optimised DNA extraction methods make it possible to study herbarium and other dried botanical specimens (Bieker and Martin 2018; Brewer et al. 2019; Grace et al. 2021; Malakasi et al. 2019; Särkinen et al. 2012). However, using this material involves mining irreplaceable reservoirs of biological and cultural heritage (Austin et al. 2019; Freedman et al. 2018). Sampling should be restricted to the minimum size expected to yield sufficient DNA for the project and the decision on which part of the specimen to sample should be made in consultation with a collection manager or specialist (see Chapter 2 DNA from museum collections). Novel techniques have been developed for minimally destructive sampling of herbarium specimens (Shepherd 2017; Sugita et al. 2020), but these are not universally applicable. Archaeological and museum collections present similar challenges and sometimes even more sensitive decisions. Archaeological plant material can include plant micro- and macrofossils (reviewed in (Kistler et al. 2020)), such as for example grape pips (Wales et al. 2016), or even archaeological artefacts, e.g., a palm-leaf jar stopper (Pérez-Escobar et al. 2021) or paper-mulberry tapa (Peña-Ahumada et al. 2020). At the lab bench, two key obstacles should be considered: contamination and degradation of the DNA. Whereas contamination is a crucial consideration throughout the process of DNA extraction, the physical fragmentation of plant tissue requires the most consideration during experimental design and downstream HTS laboratory work since herbarium and museum samples are treated with an overall more sensitive set of protocols than standard plant tissue samples (see Chapter 2 DNA from museum collections). Any small amount of accidentally introduced nucleic acid contamination will hitchhike alongside the (most likely) degraded DNA present in the sample of interest throughout the purification procedures and has a high likelihood of being preferentially amplified. Crucially, a laboratory’s DNA extraction area should not overlap with any area where PCR amplicons are generated. Finally, a simple way to test for persistent contamination is to include extraction blanks.

Physical and chemical degradation is to be expected in herbarium and museum specimens; DNA in deceased tissue breaks down over time. The rate of physical fragmentation is related to temperature and other environmental variables, as well as the composition of the plant tissue itself. In a study of herbarium specimens, it was shown that fragment length significantly regressed against sample age going back 300 years (Weiß et al. 2016), a proxy which can be exploited as a useful starting point for making DNA quality-based lab work decisions. This is more likely to hold true within a plant clade (e.g., plant family) and with a consistent method of sample preparation. However, the relationship of increasing fragmentation with sample age is not always linear. Fixation of the plant material for accessioning in the herbarium is often the single most damaging process (Staats et al. 2011).

The CTAB extraction protocol is generally preferable for extracting fragmented DNA, as it generally gives higher yields of DNA than kit-based methods. Where fragment size distribution is predicted to be very low, a high-volume chaotropic salt used as a binding buffer in the latter stage of extraction can improve the recovery of DNA molecules (Dabney et al. 2013). Alternatively, the ratio of SPRI beads to DNA during the clean-up step can be increased to retrieve more of the shorter DNA molecules. A hallmark of chemical DNA degradation, i.e. cytosine deamination, can be addressed in downstream steps by using repair enzymes in DNA library preparation and appropriate bioinformatic treatment (Kistler et al. 2020).

Concluding remarks

A wide variety of DNA extraction protocols are available in the literature. The structural, biochemical, and genomic characteristics of plants present a particular set of challenges; isolating high purity, undamaged DNA from plant tissue is non-trivial and requires a careful and patient approach in the laboratory. Therefore, researchers must often optimise a chosen protocol for their specific experiment. Success in the primary step of a molecular workflow is crucial, unlocking the downstream steps of plant molecular identification and characterisation, and hence possibilities for addressing many exciting questions in molecular and evolutionary biology.


  1. For each of the DNA-containing compartments in a plant cell, which of its characteristics deserve most consideration during DNA extraction and analysis, and why?
  2. Describe the main compound classes from plant extracts that need to be removed from DNA extracts for downstream analysis. How can they be removed?
  3. Describe the main difference between DNA extraction using the CTAB protocol and using a column-based extraction kit. What are the advantages and disadvantages of both?


Absorbance – A measure of the quantity of light absorbed by a sample, also referred to as optical density, measured using an absorbance spectrophotometer.

Beer-Lambert law – For a material through which light is travelling, the path length of light and concentration of the sample are both directly proportional to the absorbance of the light.

Chaotropic agent – A chemical substance which in an aqueous solution destroys the hydrogen bonds between water molecules (e.g., guanidine hydrochloride).

Cryopreservation – A preservation treatment for biological material, which involves cooling to very low temperatures (at least -80 °C, or -196 °C using e.g., liquid nitrogen).

Desiccant – A substance with a high affinity for water, such that it attracts moisture from surrounding materials, resulting in a state of dryness in its vicinity (e.g., silica gel).

DNA integrity – The level of fragmentation of extracted DNA, where minimal fragmentation of the original chromosomes equates to high DNA integrity.

Intercalating dye – A dye, whose molecular components stack between two bases of DNA, which is invaluable for DNA visualisation, yet at the same time implies a hazard for human health and demands laboratory safety considerations.

Lysate – A commonly fluid mixture of cellular contents that is the result of the disruption of cell walls and membranes via cell lysis.

Molecular marker (in a genetic context) – A sequence of DNA, which can be a single base pair, a gene, or repetitive sequence, with a known location in the genome, which tends to exhibit variation amongst individuals or taxa, such that it has useful research applications.

Organellar genome – The genetic material present in a plastid or mitochondrion, typically in the form of a small and circular genome and often in multiple copies within each organelle. These are thought to be present in eukaryotic cells as a result of endosymbiosis.

Plastome – The total genetic information contained by the plastid (e.g., chloroplast) of a plant cell.


  • Agbagwa IO, Datta S, Patil PG, Singh P, Nadarajan N (2012) A protocol for high-quality genomic DNA extraction from legumes. Genet. Mol. Res. 11, 4632–4639. doi: 10.4238/2012.1
  • Austin RM, Sholts SB, Williams L, Kistler L, Hofman CA (2019) Opinion: To curate the molecular past, museums need a carefully considered set of best practices. Proc Natl Acad Sci USA 116, 1471–1474. doi: 10.1073/pnas.1822038116
  • Avery OT, Macleod CM, McCarty M (1944) Studies on the chemical nature of the substance inducing transformation of pneumococcal types. Inductions of transformation by a desoxyribonucleic acid fraction isolated from pneumococcus type III. J. Exp. Med. 79, 137–158. doi: 10.1084/jem.79.2.137
  • Bieker VC, Martin MD (2018) Implications and future prospects for evolutionary analyses of DNA in historical herbarium collections. Botany Letters 165, 1–10. doi: 10.1080/23818107.2018.1458651
  • Brewer GE, Clarkson JJ, Maurin O, Zuntini AR, Barber V, Bellot S, Biggs N, Cowan RS, Davies NMJ, Dodsworth S, Edwards SL, Eiserhardt WL, Epitawalage N, Frisby S, Grall A, Kersey PJ, Pokorny L, Leitch IJ, Forest F, Baker WJ (2019) Factors affecting targeted sequencing of 353 nuclear genes from herbarium specimens spanning the diversity of angiosperms. Front. Plant Sci. 10, 1102. doi: 10.3389/fpls.2019.01102
  • CBOL Plant Working Group (2009) A DNA barcode for land plants. Proc Natl Acad Sci USA 106, 12794–12797. doi: 10.1073/pnas.0905845106
  • Chargaff E (1950) Chemical specificity of nucleic acids and mechanism of their enzymatic degradation. Experientia 6, 201–209. doi: 10.1007/BF02173653
  • Chase MW, Hills HH (1991) Silica gel: An ideal material for field preservation of leaf samples for DNA studies. Taxon 40, 215. doi: 10.2307/1222975
  • Dabney J, Meyer M, Pääbo S (2013) Ancient DNA damage. Cold Spring Harb. Perspect. Biol. 5. doi: 10.1101/cshperspect.a012567
  • Dahm R (2005) Friedrich Miescher and the discovery of DNA. Dev. Biol. 278, 274–288. doi: 10.1016/j.ydbio.2004.11.028
  • Dellaporta SL, Wood J, Hicks JB (1983) A plant DNA minipreparation: Version II. Plant Mol Biol Rep 1, 19–21.
  • Dodsworth S, Pokorny L, Johnson MG, Kim JT, Maurin O, Wickett NJ, Forest F, Baker WJ (2019) Hyb-Seq for flowering plant systematics. Trends Plant Sci. 24, 887–891. doi: 10.1016/j.tplants.2019.07.011
  • Dodsworth S (2015) Genome skimming for next-generation biodiversity analysis. Trends Plant Sci. 20, 525–527. doi: 10.1016/j.tplants.2015.06.012
  • Doyle JJ, Doyle JL (1987) A rapid DNA isolation procedure for small quantities of fresh leaf tissue. Phytochemical Bulletin 19, 11–15.
  • Doyle JJ, Doyle JL (1990) A rapid total DNA preparation procedure for fresh plant tissue. Focus 12, 13–15.
  • Doyle J (1991) DNA protocols for plants, in: Hewitt, G.M., Johnston, A.W.B., Young, J.P.W. (Eds.), Molecular Techniques in Taxonomy. Springer Berlin Heidelberg, Berlin, Heidelberg, pp. 283–293. doi: 10.1007/978-3-642-83962-7_18
  • Doyle K (1996) The source of discovery: protocols and applications guide.
  • Drábková L, Kirschner J, Vlĉek Ĉ (2002) Comparison of seven DNA extraction and amplification protocols in historical herbarium specimens of juncaceae. Plant Mol. Biol. Rep. 20, 161–175. doi: 10.1007/BF02799431
  • Esser K-H, Marx WH, Lisowsky T (2006) maxXbond: first regeneration system for DNA binding silica matrices. Nat. Methods 3. doi: 10.1038/nmeth845
  • Fleischmann A, Michael TP, Rivadavia F, Sousa A, Wang W, Temsch EM, Greilhuber J, Müller KF, Heubl G (2014) Evolution of genome size and chromosome number in the carnivorous plant genus Genlisea (Lentibulariaceae), with a new estimate of the minimum genome size in angiosperms. Ann. Bot. 114, 1651–1663. doi: 10.1093/aob/mcu189
  • Franklin RE, Gosling RG (1953) Evidence for 2-chain helix in crystalline structure of sodium deoxyribonucleate. Nature 172, 156–157. doi: 10.1038/172156a0
  • Freedman J, Dorp LB, Brace S (2018) Destructive sampling natural science collec-tions: an overview for museum professionals and researchers. Journal of Natural Science Collections 5, 21–34.
  • Gemeinholzer B, Rey I, Weising K, Grundman M, Muellner AN, Zetzsche H, Droege G, Seberg O, Petersen G, Rawson D, Weigt L (2010) Organizing specimen and tissue preservation in the field for subsequent molecular analyses, in: Eymann, J., Degreef, J., Hauser, C., Monje, J.C., Samyn, Y., VandenSpiegel, D. (Eds.), Manual on Field Recording Techniques and Protocols for All Taxa Biodiversity Inventories. Edgewater: ABCTaxa.
  • Grace OM, Pérez-Escobar OA, Lucas EJ, Vorontsova MS, Lewis GP, Walker BE, Lohmann LG, Knapp S, Wilkie P, Sarkinen T, Darbyshire I, Lughadha EN, Monro A, Woudstra Y, Demissew S, Muasya AM, Díaz S, Baker WJ, Antonelli A (2021) Botanical monography in the anthropocene. Trends Plant Sci. 26, 433–441. doi: 10.1016/j.tplants.2020.12.018
  • Gualberto JM, Mileshina D, Wallet C, Niazi AK, Weber-Lotfi F, Dietrich A (2014) The plant mitochondrial genome: dynamics and maintenance. Biochimie 100, 107–120. doi: 10.1016/j.biochi.2013.09.016
  • Guinn G (1966) Extraction of nucleic acids from lyophilized plant material. Plant Physiol. 41, 689–695. doi: 10.1104/pp.41.4.689
  • Guo Y, Yang G, Chen Y, Li D, Guo Z (2018) A comparison of different methods for preserving plant molecular materials and the effect of degraded DNA on ddRAD sequencing. Plant Diversity 40, 106–116. doi: 10.1016/j.pld.2018.04.001
  • Hawkins TL, O’Connor-Morin T, Roy A, Santillan C (1994) DNA purification and isolation using a solid-phase. Nucleic Acids Res. 22, 4543–4544. doi: 10.1093/nar/22.21.4543
  • Heikrujam J, Kishor R, Behari Mazumder P (2020) The chemistry behind plant DNA isolation protocols, in: Boldura, O.-M., Baltă, C., Sayed Awwad, N. (Eds.), Biochemical Analysis Tools - Methods for Bio-Molecules Studies. IntechOpen. doi: 10.5772/intechopen.92206
  • Inglis PW, Pappas M de CR, Resende LV, Grattapaglia D (2018) Fast and inexpensive protocols for consistent extraction of high quality DNA and RNA from challenging plant and fungal samples for high-throughput SNP genotyping and sequencing applications. PLoS ONE 13, e0206085. doi: 10.1371/journal.pone.0206085
  • Kasajima I, Sasaki K, Tanaka Y, Terakawa T, Ohtsubo N (2013) Large-scale extraction of pure DNA from mature leaves of Cyclamen Cyclamen persicum Mill. and other recalcitrant plants with alkaline polyvinylpolypyrrolidone (PVPP). Sci. Hortic. 164, 65–72. doi: 10.1016/j.scienta.2013.09.011
  • Kasajima I (2018) Successful tips of DNA extraction and PCR of plants for beginners. Trends in Res 1. doi: 10.15761/TR.1000115
  • Kistler L, Bieker VC, Martin MD, Pedersen MW, Ramos Madrigal J, Wales N (2020) Ancient plant genomics in archaeology, herbaria, and the environment. Annu. Rev. Plant Biol. 71, 605–629. doi: 10.1146/annurev-arplant-081519-035837
  • Kubo T, Newton KJ (2008) Angiosperm mitochondrial genomes and mutations. Mitochondrion 8, 5–14. doi: 10.1016/j.mito.2007.10.006
  • Larridon I, Walter HE, Guerrero PC, Duarte M, Cisternas MA, Hernández CP, Bauters K, Asselman P, Goetghebeur P, Samain M-S (2015) An integrative approach to understanding the evolution and diversity of Copiapoa (Cactaceae), a threatened endemic Chilean genus from the Atacama Desert. Am. J. Bot. 102, 1506–1520. doi: 10.3732/ajb.1500168
  • Lodhi MA, Ye G-N, Weeden NF, Reisch BI (1994) A simple and efficient method for DNA extraction from grapevine cultivars andVitis species. Plant Mol. Biol. Rep. 12, 6–13. doi: 10.1007/BF02668658
  • Malakasi P, Bellot S, Dee R, Grace OM (2019) Museomics clarifies the classification of aloidendron (asphodelaceae), the iconic African tree aloes. Front. Plant Sci. 10, 1227. doi: 10.3389/fpls.2019.01227
  • Mavrodiev EV, Dervinis C, Whitten WM, Gitzendanner MA, Kirst M, Kim S, Kinser TJ, Soltis PS, Soltis DE (2021) A new, simple, highly scalable, and efficient protocol for genomic DNA extraction from diverse plant taxa. Appl. Plant Sci. 9, e11413. doi: 10.1002/aps3.11413
  • Mogg RJ, Bond JM (2003) A cheap, reliable and rapid method of extracting high-quality DNA from plants. Mol. Ecol. Notes 3, 666–668. doi: 10.1046/j.1471-8286.2003.00548.x
  • Murray MG, Thompson WF (1980) Rapid isolation of high molecular weight plant DNA. Nucleic Acids Res. 8, 4321–4325. doi: 10.1093/nar/8.19.4321
  • Nagy ZT (2010) A hands-on overview of tissue preservation methods for molecular genetic analyses. Org. Divers. Evol. 10, 91–105. doi: 10.1007/s13127-010-0012-4
  • Neale DB, Zimin AV, Zaman S, Scott AD, Shrestha B, Workman RE, Puiu D, Allen BJ, Moore ZJ, Sekhwal MK, De La Torre AR, McGuire PE, Burns E, Timp W, Wegrzyn JL, Salzberg SL (2022) Assembled and annotated 26.5 Gbp coast redwood genome: a resource for estimating evolutionary adaptive potential and investigating hexaploid origin. G3 (Bethesda) 12. doi: 10.1093/g3journal/jkab380
  • Neubig KM, Whitten WM, Abbott JR, Elliott S, Soltis DE, Soltis PS (2014) Variables affecting DNA preservation in archival plant specimens, in: Applequist, W.L., Campbell, L.M. (Eds.), DNA Banking for the 21st Century: Proceedings of the US Workshop on DNA Banking. Presented at the Proceedings of the U.S. Workshop on DNA Banking, St. Louis, Missouri Botanical Garden, pp. 81–112.
  • Nunes CF, Ferreira JL, Fernandes MCN, Breves S de S, Generoso AL, Soares BDF, Dias MSC, Pasqual M, Borem A, Cançado GM de A (2011) An improved method for genomic DNA extraction from strawberry leaves. Cienc. Rural 41, 1383–1389. doi: 10.1590/S0103-84782011000800014
  • Pellicer J, Fay M, Leitch I (2010) The largest eukaryotic genome of them all? Bot. J. Linn. Soc. 164, 10–15. doi: 10.1111/j.1095-8339.2010.01072.x
  • Pellicer J, Hidalgo O, Dodsworth S, Leitch IJ (2018) Genome size diversity and its impact on the evolution of land plants. Genes (Basel) 9. doi: 10.3390/genes9020088
  • Peña-Ahumada B, Saldarriaga-Córdoba M, Kardailsky O, Moncada X, Moraga M, Matisoo-Smith E, Seelenfreund D, Seelenfreund A (2020) A tale of textiles: Genetic characterization of historical paper mulberry barkcloth from Oceania. PLoS ONE 15, e0233113. doi: 10.1371/journal.pone.0233113
  • Pérez-Escobar OA, Bellot S, Przelomska NAS, Flowers JM, Nesbitt M, Ryan P, Gutaker RM, Gros-Balthazard M, Wells T, Kuhnhäuser BG, Schley R, Bogarín D, Dodsworth S, Diaz R, Lehmann M, Petoe P, Eiserhardt WL, Preick M, Hofreiter M, Hajdas I, Baker WJ (2021) Molecular clocks and archeogenomics of a late period Egyptian date palm leaf reveal introgression from wild relatives and add timestamps on the domestication. Mol. Biol. Evol. 38, 4475–4492. doi: 10.1093/molbev/msab188
  • Petit RJ, Vendramin GG (2007) Plant phylogeography based on organelle genes: an introduction, in: Weiss, S., Ferrand, N. (Eds.), Phylogeography of Southern European Refugia. Springer Netherlands, Dordrecht, pp. 23–97. doi: 10.1007/1-4020-4904-8_2
  • Pirttilä AM, Hirsikorpi M, Kämäräinen T, Jaakola L, Hohtola A (2001) DNA isolation methods for medicinal and aromatic plants. Plant Mol. Biol. Rep. 19, 273–273. doi: 10.1007/BF02772901
  • Porebski S, Bailey LG, Baum BR (1997) Modification of a CTAB DNA extraction protocol for plants containing high polysaccharide and polyphenol components. Plant Mol. Biol. Rep. 15, 8–15. doi: 10.1007/BF02772108
  • Putnam NH, O’Connell BL, Stites JC, Rice BJ, Blanchette M, Calef R, Troll CJ, Fields A, Hartley PD, Sugnet CW, Haussler D, Rokhsar DS, Green RE (2016) Chromosome-scale shotgun assembly using an in vitro method for long-range linkage. Genome Res. 26, 342–350. doi: 10.1101/gr.193474.115
  • Rachmayanti Y, Leinemann L, Gailing O, Finkeldey R (2006) Extraction, amplification and characterization of wood DNA from dipterocarpaceae. Plant Mol. Biol. Rep. 24, 45–55. doi: 10.1007/BF02914045
  • Rogers SO, Bendich AJ (1989) Extraction of DNA from plant tissues, in: Gelvin, S.B., Schilperoort, R.A., Verma, D.P.S. (Eds.), Plant Molecular Biology Manual. Springer Netherlands, Dordrecht, pp. 73–83. doi: 10.1007/978-94-009-0951-9_6
  • Saghai-Maroof MA, Soliman KM, Jorgensen RA, Allard RW (1984) Ribosomal DNA spacer-length polymorphisms in barley: mendelian inheritance, chromosomal location, and population dynamics. Proc Natl Acad Sci USA 81, 8014–8018. doi: 10.1073/pnas.81.24.8014
  • Särkinen T, Staats M, Richardson JE, Cowan RS, Bakker FT (2012) How to open the treasure chest? Optimising DNA extraction from herbarium specimens. PLoS ONE 7, e43808. doi: 10.1371/journal.pone.0043808
  • Savolainen V, Cuénoud P, Spichiger R, Martinez MDP, Crèvecoeur M, Manen J-F (1995) The use of herbarium specimens in DNA phylogenetics: Evaluation and improvement. Plant Syst. Evol. 197, 87–98. doi: 10.1007/BF00984634
  • Scott KD, Playford J (1996) DNA extraction technique for PCR in rain forest plant species. BioTechniques 20, 974, 977, 979. doi: 10.2144/96206bm07
  • Sharma KK, Lavanya M, Anjaiah V (2000) A method for isolation and purification of peanut genomic DNA suitable for analytical applications. Plant Mol. Biol. Rep. 18, 393–393. doi: 10.1007/BF02825068
  • Shepherd LD, McLay TGB (2011) Two micro-scale protocols for the isolation of DNA from polysaccharide-rich plant tissue. J. Plant Res. 124, 311–314. doi: 10.1007/s10265-010-0379-5
  • Shepherd LD (2017) A non-destructive DNA sampling technique for herbarium specimens. PLoS ONE 12, e0183555. doi: 10.1371/journal.pone.0183555
  • Shepherd M, Cross M, Stokoe RL, Scott LJ, Jones ME (2002) High-throughput DNA extraction from forest trees. Plant Mol. Biol. Rep. 20, 425–425. doi: 10.1007/BF02772134
  • Siegel CS, Stevenson FO, Zimmer EA (2017) Evaluation and comparison of FTA card and CTAB DNA extraction methods for non-agricultural taxa. Appl. Plant Sci. 5. doi: 10.3732/apps.1600109
  • Staats M, Cuenca A, Richardson JE, Vrielink-van Ginkel R, Petersen G, Seberg O, Bakker FT (2011) DNA damage in plant herbarium tissue. PLoS ONE 6, e28448. doi: 10.1371/journal.pone.0028448
  • Sudan J, Raina M, Singh R, Mustafiz A, Kumari S (2017) A modified protocol for high-quality DNA extraction from seeds rich in secondary compounds. Journal of Crop Improvement 31, 1–11. doi: 10.1080/15427528.2017.1345028
  • Sugita N, Ebihara A, Hosoya T, Jinbo U, Kaneko S, Kurosawa T, Nakae M, Yukawa T (2020) Non-destructive DNA extraction from herbarium specimens: a method particularly suitable for plants with small and fragile leaves. J. Plant Res. 133, 133–141. doi: 10.1007/s10265-019-01152-4
  • Thomson J (2002) An improved non-cryogenic transport and storage preservative facilitating DNA extraction from “difficult” plants collected at remote sites. Telopea 9, 755–760. doi: 10.7751/telopea20024013
  • Till BJ, Jankowicz-Cieslak J, Huynh OA, Beshir MM, Laport RG, Hofinger BJ (2015) Sample collection and storage, in: Low-Cost Methods for Molecular Characterization of Mutant Plants. Springer International Publishing, Cham, pp. 9–11. doi: 10.1007/978-3-319-16259-1_3
  • Twyford AD, Ness RW (2017) Strategies for complete plastid genome sequencing. Mol. Ecol. Resour. 17, 858–868. doi: 10.1111/1755-0998.12626
  • Varma A, Padh H, Shrivastava N (2007) Plant genomic DNA isolation: an art or a science. Biotechnol. J. 2, 386–392. doi: 10.1002/biot.200600195
  • Wales N, Ramos Madrigal J, Cappellini E, Carmona Baez A, Samaniego Castruita JA, Romero-Navarro JA, Carøe C, Ávila-Arcos MC, Peñaloza F, Moreno-Mayar JV, Gasparyan B, Zardaryan D, Bagoyan T, Smith A, Pinhasi R, Bosi G, Fiorentino G, Grasso AM, Celant A, Bar-Oz G, Gilbert MTP (2016) The limits and potential of paleogenomic techniques for reconstructing grapevine domestication. J. Archaeol. Sci. 72, 57–70. doi: 10.1016/j.jas.2016.05.014
  • Watson JD, Crick FH (1953) Molecular structure of nucleic acids; a structure for deoxyribose nucleic acid. Nature 171, 737–738. doi: 10.1038/171737a0
  • Weiß CL, Schuenemann VJ, Devos J, Shirsekar G, Reiter E, Gould BA, Stinchcombe JR, Krause J, Burbano HA (2016) Temporal patterns of damage and decay kinetics of DNA retrieved from plant herbarium specimens. R. Soc. Open Sci. 3, 160239. doi: 10.1098/rsos.160239
  • Xin Z, Chen J (2012) A high throughput DNA extraction method with high yield and quality. Plant Methods 8, 26. doi: 10.1186/1746-4811-8-26


  1. The nuclear genome of plants is hugely variable in size. To maximise retrieval of intact DNA for species with larger genomes, a higher DNA yield should be aimed for. This could affect decisions regarding input material and the number of total DNA extractions carried out per sample. The plastid genome is present in high copy numbers in plant cells, as well as being a useful unit for addressing a variety of biological questions. Therefore, it is ideal for genome skimming experiments and a valuable target in degraded material, where the (single copy) nuclear genome might be highly fragmented. The mitochondrial genome of plants is characterised by high plasticity in its genomic structure and therefore is not recommended for plant identification.
  2. Problematic biomolecules in plant extracts include polyphenols, tannins, and polysaccharides. These interfere with DNA extraction buffers (such as CTAB) as well as with other buffers and enzymes used in downstream DNA analysis. They are removed from the solution by either SEVAG cleaning (in the CTAB protocol) or, basically, by column cleaning or magnetic particles (commercial kits). Polysaccharides can also be removed from the crude plant tissue prior to extraction using STE buffer. Phenolic compounds can often be removed using ß-mercaptoethanol and/or PVP. Further impurities such as secondary metabolic compounds that may interfere with enzymes in downstream protocols can often be removed using a SPRI bead clean-up protocol.
  3. The CTAB protocol uses specific buffers (such as SEVAG) and DNA precipitation (involving isopropanol) to separate non-DNA and DNA biomolecules, whereas extraction kits rely on using DNA-binding columns. or magnetic particles Although the kits are much more expensive on a per-sample basis, they generally yield clean DNA with a short turnaround time (up to 6 hours). CTAB extractions are very cheap and highly scalable as they do not rely on the specifically manufactured columns or magnetic particles. However, the protocol takes at least two full days to progress from plant tissue to DNA extract. Co-precipitation of non-DNA biomolecules is often observed and therefore affects the purity of the final DNA extract. Sometimes, substantial yield losses are observed using extraction kits and this can be a key consideration when dealing with precious samples.

Chapter 2 DNA from museum collections


Museum collections of plant origin include herbaria (pressed plants), xylaria (woods), and economic botany (useful plant) specimens. They are not only places of history and display, but also of research, and contain rich repositories of molecules, including DNA. Such DNA, retrieved from historical or ancient tissue, carries unique degradation characteristics and regardless of its age is known as ancient DNA (aDNA). Research into aDNA has developed rapidly in the last decade as a result of an improved understanding of its biochemical properties, the development of specific laboratory protocols for its isolation, and better bioinformatic tools. Why are museum collections useful sources of aDNA? We identify three main reasons: 1) specimens can play a key role in taxonomic and macroevolutionary inference when it is difficult to sample living material, for example, by giving us snapshots of extinct taxa (Van de Paer et al. 2016); 2) accurate identification of specimens that were objects of debate or scientific mystery, as exemplified by misidentified type specimens of the watermelon’s progenitor (Chomicki and Renner 2015); 3) specimens can provide us with ‘time machines’ to study microevolutionary processes and diversity changes over decades- to millennia-long timeframes (Gutaker and Burbano 2017; Pont et al. 2019). In all three cases, specimens are often associated with evidence of their occurrence in space and time. For further examples see Chapter 20 Museomics, and the Glossary.

However, extracting DNA does mean the destruction of a part of the specimen. Museum curators therefore face challenges in balancing the conservation of specimens for future research with the rising demand for aDNA analysis. Increasingly, curators are also considering legal and ethical issues in sampling (Austin et al. 2019; Pálsdóttir et al. 2019). Close collaboration between the aDNA researcher and the curatorial staff of museums is therefore essential for appropriate management of these issues (Freedman et al. 2018).

With few exceptions, plant material found in museums originally grew on lands tended or owned by people for many millennia (Ellis et al. 2021). Some specimens, such as artefacts or seeds of domesticated crops have an even more direct connection to human activities. Plant specimens, along with other living things, are therefore not simply assemblages of chemical compounds such as DNA, but also embody spiritual beliefs, diverse forms of ownership, traditional knowledge, and past histories of colonialism and other forms of harm (Anderson et al. 2011; Das and Lowe 2018; Pungetti et al. 2012). The implications of this are still being worked out in dialogues between museums and affected communities, often within a decolonising framework (McAlvay et al. 2021). There are, however, immediate steps that researchers and curators can take to ensure that the use of specimens is both legal and ethical.

A first consideration is whether the plant species or artefacts (such as baskets or wooden objects) are of special significance (e.g., sacred) to the source community. Examples of sacred material include Banisteriopsis caapi, used to make ayahuasca in South America (Rivier and Lindgren 1972), or Duboisia hopwoodii (pituri), used as tobacco in Australia (Ratsch et al. 2010). An online literature search or consultation with relevant experts will give a rapid pointer, which can be followed up with source communities in the study region. Collaboration with communities and scientists in source countries is essential for acknowledging the rights to plant material (even if not legally enshrined), and can be furthered by publication of results in local languages and media. These communities also hold significant expertise on plants that will improve the quality and relevance of research (Gewin 2021).

There are international conventions that usually apply when accessing, researching, and moving plant material between institutions and countries. Researchers must also be aware of country-specific laws that may require further permits and inspections, e.g., for plants that produce controlled substances, require phytosanitary checks, or are considered invasive species. Legal elements of the Convention on Biological Diversity (CBD), Nagoya Protocol, and Convention on Trade in Endangered Species (CITES) are covered in Chapter 27 Legislation and policy as well as in other published works (e.g. McManis and Pelletier 2014, Iob and Botigué 2021). While the CBD applies to specimens received by museums from 1992, in ethical terms (and under some implementations of the Nagoya Protocol) its principles, such as benefit-sharing, also apply to pre-1992 specimens (cf. Sherman and Henry 2020).

Sampling museum collections

Locating collections and specimens

Botanical gardens hold living specimens and distribute seeds of these via seed lists (Index Seminum). Their global collections can be searched via PlantSearch, hosted by Botanic Gardens Conservation International. Gene banks hold seeds, and sometimes also tissue and living plants. While they originally focused on crop plants and their wild relatives, many have now broadened in scope to include wild plants, such as Royal Botanic Gardens Kew’s Millennium Seed Bank. Many gene bank collections can be searched via Genesys. Herbaria hold dried plant specimens and can be located via Index Herbariorum. Although many herbaria are incompletely recorded in databases, substantial data can already be found in the Global Biodiversity Information Facility (GBIF) (Bieker and Martin 2018). Plants are present in abundance in almost all forms of human activity, and it is therefore not surprising that plant material can also be found outside the confines of herbaria, including in economic botany or ethnobotany collections (Salick et al. 2014), agricultural museums, and anthropology collections. Increasing awareness of the importance of biological collections, their uses, conservation efforts and crosslinks among them, is leading to important initiatives that integrate all digitised natural science collections from natural history museums, universities, and botanic gardens (Bakker et al. 2020).

There are a number of pitfalls when searching online catalogues. It may be necessary to search for accepted names and common synonyms: the same species may appear under different botanical names in a single collection, and accuracy of specimen identification varies. In general, herbarium specimens are the most reliable, as they bear diagnostic criteria such as flowers on which taxonomists rely. Garden material and seeds are often misidentified, or become confused in labelling, or are hybridised during repeated cultivations. Their identifications should be confirmed, for example growing on the seeds or by using morphological criteria (Nesbitt et al. 2003). Additionally, data may be missing, unspecific, or incorrectly transcribed or presented, in derived databases, for example in the case of georeferencing (Maldonado et al. 2015).

Researcher-curator collaboration

Research projects will benefit enormously from a close collaboration between researcher and curator. Museums should be approached early during a project, with the researcher providing sufficient detail about its background, aims, methodology, and timetable. Museums are often under-staffed and persistence may be required in making contact. Curators’ expertise will be crucial in identifying the most appropriate specimens for analysis, not only in their institutions, but in others with which they are familiar. The curator will also play a key role in assessing the provenance of specimens, using museum archives, and the implications for any of the ethical and legal issues addressed above. Curators often have good links to source communities and can advise on appropriate procedures.

After preliminary discussions, the researcher will usually need to fill in a ‘destructive sampling’ form. This acts as a permanent record of the justification for sampling, and allows the museum to make a detailed check on the aims and methodology of the project (see for example, British Museum form and policies). Requests that have unclear research aims or which employ inappropriate methodologies are unlikely to be approved. Researchers will likely need to sign a Material Transfer Agreement (MTA) or Material Supply Agreement (MSA) with the museum which sets out their legal responsibilities.

Sampling may be carried out by the researcher or the curator. If feasible, it is worthwhile for the researcher to carry out the sampling, as it allows for the investigation of the context of the specimen and for flexibility in choosing the samples. It may also speed up the process of obtaining samples, especially if a large number is required. It also allows samples to be safely hand-carried to the researcher’s laboratory. Where materials must be sent, it is safest to use a courier service, with specimens marked “Scientific specimens of no commercial value”.

It should be agreed with the museum whether, after sampling, surplus material should be returned or securely retained. Museums can require that they are informed about results and that they check manuscripts before publication. This is in any case good practice to ensure accurate reporting of sample details. Museum policies on co-authorship vary, and this topic should be discussed early. Significant contribution by the curator on the choice of appropriate samples, provenance research, or in technically complex sampling, merits co-authorship. Unless agreed otherwise, DNA sequencing data should be submitted to NCBI GenBank or other public repositories, taking care to give the correct specimen identifier. At a minimum, the museum’s unique catalogue number (if one exists), and the name of the museum should be cited. This allows the DNA sequence data to be linked directly with the specimen or object. Other museum and laboratory information may be included with the DNA sequence data or in publications (e.g., the collector name, collection number, dates, locations, and laboratory extraction numbers). Additionally, most museum collections will require that vouchers are annotated in a way that links them to DNA sequencing data (see below). Some museums have also started to permanently store DNA isolates, and we encourage researchers to share their stocks on request. Integrated data management and accessibility of the raw data and results will ultimately bolster curatorial practices, develop a more ethical science, and safeguard collections for future generations (Schindel and Cook 2018). Useful guidance on documentation issues is available from the Global Genome Biodiversity Network (GGBN).

Choice of specimens and sampling

Sampling decisions will be determined both by the research design and the nature of the specimens, in addition to the legal and ethical factors mentioned above. Changes to agreed sampling lists are often necessary once specimens have been examined, for example when they are lost, in poor condition, inadequately annotated or georeferenced, present in small quantities, or of rare taxa. Bulk raw material is usually easy to sample, while objects are usually not subjected to destructive sampling unless the results will inform the history and significance of the object. For herbarium specimens, preserving the morphological features, especially those that are diagnostic, for future research, is critical. Sampling should be targeted towards tissue types or organs at a given developmental state that are most numerous. For example, if there are many flowers and few leaves, it may be preferable to sample a petal. Or if there are few cauline and many rosette leaves, it may be preferable to sample a rosette leaf.

Different parts of a specimen may yield varying amounts, quality, and types of DNA. Wood, husks, and other tissues that were undergoing senescence at the time of preservation may yield less DNA. Young, immature leaves will have higher cell densities, and therefore are expected to yield more DNA. Seeds are often excellent sources of nuclear DNA, although the genotype of the seed will differ from the parent plant and might be of inconsistent ploidy. It may be necessary to extract DNA from individual seeds or to remove maternal tissue such as the testa. Some herbarium sheets will contain multiple individuals and, in most cases, it is better to sample individuals rather than mixed material. If individuals are pooled for DNA extraction, it may complicate downstream analyses that depend on individual genotypes.

The method of specimen preservation is another consideration for DNA isolation. Desiccation has been shown to preserve plant DNA remarkably well, while charring or ethanol preservation destroys plant DNA almost completely (Forrest et al. 2019; Nistelberger et al. 2016). Although not commonly used for aDNA analysis, ancient waterlogged (saturated with water) specimens have a potential for high endogenous contents as they are usually preserved in cold temperatures (Wagner et al. 2018; Wales et al. 2014).

Before sampling begins, the specimen’s identifying data, such as its herbarium ID, should be recorded with great care, and double-checked on both the sample label and typed list of specimens. Additionally, the museum may require that vouchers are annotated with the sampling date, tissue type, sample identifier, and information about the researchers. The voucher, including any labels, should be photographed, ideally before and after sampling. Digital links between herbarium vouchers, imaging, and DNA sequences are very useful; they can be included in herbarium and nucleotide databases.

For desiccated leaves, the most commonly sampled tissue, the process is usually straightforward. Using forceps and a scalpel or scissors one can make a precise cut and remove 1 cm2 or less of tissue. Generally, between 2 and 10 mg of dry leaf tissue is sufficient for the isolation of complex mixtures of genomic DNA fragments. It is preferable that leaves of lesser value are targeted, for example damaged, folded, or hidden, avoiding possible contamination by mould, lichen, or fungi. The sampling of detached “pocket” material should be conducted with caution, and only if the researcher and curator are confident that the detached material truly belongs to the voucher. For other tissue types, such as wood, researchers may need to develop tailored sampling methods on contemporary material first. After sampling, material should immediately be sealed in a labelled tube or envelope and packaged for transport.

Surface contamination

Potential contamination of the sample, specimen, or wider collection with exogenous DNA is an important consideration. For most museum collections, there will inevitably already be surface DNA contamination of specimens. Ask the curator about adhesives (e.g., wheat starch) and preservatives that were used with the specimen of interest. Curatorial staff and other users of the collections may not routinely wear gloves or, if they do, may not change them between specimens. In most cases, there is unlikely to be any benefit from the person undertaking sampling wearing protective equipment (e.g., face masks, hair nets) that is beyond that normally used by users of the collection. Contamination control is only as good as the weakest link.

Extra precautions may be taken for equipment that is used directly in the sampling process, for example, disposable scalpels that are changed between samples, or wiping of scalpel blades with bleach and ethanol. This will reduce the risk of cross-contamination between specimens. Further precautions may be beneficial if internal tissue is being sampled (e.g., inside a seed). In these cases, surface decontamination (see section below on pre-processing) followed by sampling with DNA-free equipment and while wearing personal protective equipment may be appropriate. In some cases where specialistic equipment such as microdrill is required, it may be beneficial for sampling to be undertaken within an ancient DNA laboratory, where contamination controls can be better implemented, however bringing large amounts of plant material into the laboratory should be limited as it is an additional contamination source.

Contamination of specimens and collections by ‘modern’ DNA and especially amplified DNA is perhaps the greatest risk, potentially compromising future research. Researchers are likely to have been using molecular laboratories, and steps should be taken to prevent the inadvertent transfer of modern DNA to museum collections. These precautions can include not visiting a collection directly from a modern laboratory, cleaning items that must move between modern laboratories and collections (e.g., clothes, phones, cameras), and using sampling equipment (scalpels, tubes, pens) that has not been taken from a modern laboratory.

Laboratory work with historical samples

Understanding aDNA traits

Before starting any experiments with historical and ancient plant samples, it is important to recognize challenges arising from the degraded nature of aDNA. Unlike DNA isolated from fresh samples, DNA from preserved specimens is fragmented, damaged, and contaminated post mortem (Gutaker and Burbano 2017), that includes even recently collected herbarium specimens (Weiß et al. 2016) and contamination with exogenous DNA (Bieker et al. 2020). Fragmentation describes the accumulation of breaks in the DNA backbone, leading to shorter DNA molecules. Breaks occur more often next to guanine or adenine bases, and this can be visualised in sequencing data with dedicated software (Jónsson et al. 2013). The median expected fragment length for aDNA from herbarium specimens is between 30-90 base pairs (bp) in unheated recent Arabidopsis extractions (Bakker 2019; Weiß et al. 2016). It is important to recognise that fragments shorter than 35 bp might generate spurious alignments due to microbial mismapping (Prüfer et al. 2010). The short length of aDNA fragments calls for special molecular methods that allow the retention of short molecules, as well as conservative bioinformatic settings during data processing.

aDNA is also affected by “damage”, post mortem substitutions that convert cytosine to uracil residues through deamination (uracils are read by insensitive DNA polymerases as thymine, hence the commonly used term “C-to-T substitutions’’) (Hofreiter et al. 2001). This process occurs preferentially at the ends of DNA molecules (Briggs et al. 2007), particularly with single-stranded DNA overhangs (Overballe-Petersen et al. 2012). Consequently, in the population of sequenced molecules, an elevated number of C-to-T substitutions are observed at the 5’ end, and complementary G-to-A substitutions at the 3’ end. Typically, herbarium-isolated DNA has around from 1 to 6% (in older samples) of cytosine residues converted to thymine (Durvasula et al. 2017; Gutaker et al. 2019; Weiß et al. 2016), while in archaeological material this number might be as high as 30%. These post mortem substitutions should be removed before downstream analyses.

Finally, it is important to recognize that aDNA from plants is in fact a mixture of bona fide endogenous DNA, exogenous DNA introduced pre mortem, (e.g., from endophytic microbes), and exogenous DNA introduced post mortem (e.g., from microbes involved in decomposition, human-associated collection and museum practices; see above) (Pääbo et al. 2004). Quantification of contamination is commonly done by dividing the number of sequence reads that map to the target reference genome by the total number of sequenced reads from the museum sample. In fresh material, the ratio is often around 0.98; in degraded material it can vary from 0 to 0.95 (Gutaker et al. 2017). Several examples of aDNA successfully obtained from plants are illustrated in Table 1.

Table 1.

Examples of selected successfully isolated and sequenced DNA from plant material. *BP: before present.

Species Tissue Age BP* Endogenous DNA Fragment length (bp) Damage at 5’ end Source
Thale cress (Arabidopsis thaliana) Leaf 184 83% ~62 0.026 Durvasula et al. 2017
Potato (Solanum tuberosum) Leaf 361 87% ~45 0.047 Gutaker et al. 2019
Maize (Zea mays) Cobs 1863 80% ~52 0.052 Swarts et al. 2017
Wheat (Triticum durum) Chaff 3150 40% ~53 0.095 Scott et al. 2019
Barley (Hordeum vulgare) Seeds 4988 86% ~49 0.138 Mascher et al. 2016

Given the characteristics of aDNA (Dabney et al. 2013) and the fact that it is very prone to contamination at any stage, guidelines have been proposed to facilitate the authentication process, and minimise potential contamination before, during, and after DNA extraction (Pääbo et al. 2004). We strongly recommend following gold-standard precautions when working with aDNA (Fulton and Shapiro 2019; Latorre et al. 2020).

The isolation and pre-amplification manipulation of aDNA should be carried out in a dedicated laboratory that is physically separated from labs where post-amplification steps are carried out. Ideally the aDNA laboratory should be supplied with HEPA-filtered air under positive pressure. Users should not move from a ‘modern’ laboratory (where amplified DNA is handled) to the aDNA laboratory on the same day. Reagents and materials in an aDNA lab should be DNA-free, disposable where possible, and never taken out of the clean lab. Surfaces should be cleaned before and after every experiment with 3–10% bleach, 70% ethanol, and overnight UV-C irradiation. To minimise contamination and ensure a DNA-free laboratory environment, users should wear full body suits, foot protectors, slippers, facemasks, sleeves, and double gloves (Fulton and Shapiro 2019). Together, these precautions limit cross-contamination from amplified and unamplified DNA.

Material preparation is an essential step before DNA can be isolated. Optional pre-processing of dirty samples can be done by gently cleaning the surface with a very low concentration (~3%) of bleach, and rinsing twice with ddH2O (Cappellini et al. 2010). When handling waterlogged, fragile, or permeable material, avoid using bleach and carry out ddH2O treatment only. To help identify contamination that might be introduced in the laboratory, samples should always be processed alongside negative controls, including for DNA isolation and library preparations. To reduce the likelihood of cross-contamination, small batches of up to 12 samples at a time are preferable (Latorre et al. 2020).

DNA extraction methods for different tissues should be considered. While plant materials tend to contain inhibitory substances like polyphenols, proteins, and polysaccharides, ancient plant materials can additionally be rich in humic acids and salts. This set of macromolecules might prevent successful DNA amplification (Wales et al. 2014) by affecting polymerase activity (Schrader et al. 2012). To reduce this inhibitory effect, smaller amounts of sample can be extracted in parallel, and the resulting DNA pooled to achieve a sufficient yield (Wagner et al. 2018).

Here we will cover the basics of recovering the highest quality of DNA from ancient plant tissues. Using a two-day extraction protocol will greatly increase the recovery of endogenous DNA. The first day consists of grinding the plant material. Tissue can be disrupted by: grinding dry, grinding flash-frozen, or grinding material soaked in lysis buffer. In all cases, grinding to finer particles increases the recovery of aDNA. Ground tissue is incubated in a fresh lysis buffer. Three commonly used buffers include CTAB (Kistler 2012), DTT (Wales and Kistler 2019), or PTB mixtures (Latorre et al. 2020). The second day is dedicated to isolating DNA from the lysate. Initial removal of non-DNA particles can be achieved by centrifugation with a shredding column (Latorre et al. 2020) or phenol/chloroform mixture (Kistler 2012; Wales and Kistler 2019; Wagner et al. 2018). In all methods, DNA is then captured in various DNA-binding silica columns (for example QIAgen MinElute columns) and purified (Dabney et al. 2013). Elution from silica columns produces the final, isolated aDNA.

By contrast to primed amplification approaches, even low amounts of isolated DNA can be used for genomic library preparation (Staats et al. 2013) and hence we recommend that a genomic library is constructed using a well-established method (Carøe et al. 2017; Kircher et al. 2012; Meyer and Kircher 2010; Meyer et al. 2012). Quantification of genomic DNA before sequencing using RT-qPCR allows the number of amplification cycles for each sample to be adjusted, in turn allowing the complexity of sequenced DNA fragments to be maximised. Bioinformatic pre-processing is an essential part of aDNA analyses, and is summarised in three available pipelines (Latorre et al. 2020; Peltzer et al. 2016; Schubert et al. 2014). Authentication is another crucial step in bioinformatic analyses that can currently be best achieved with mapDamage software (Jónsson et al. 2013).

Choosing and authenticating aDNA samples

To help decide which sampled material is most promising for further DNA analyses it is necessary to obtain good estimates for fragmentation, damage, and contamination. This can be achieved through sequencing genomic libraries in low-throughput mode (about 10,000 DNA reads per sample), commonly referred to as “screening” and bioinformatic analyses that produce relevant summary statistics. Promising samples will contain aDNA with a median fragment length over 50 bp and endogenous content over 0.2. For samples of particular interest, mapping the accuracy for short aDNA reads can be improved with specialised procedures (de Filippo et al. 2018), and endogenous content can be increased by targeted enrichment on hybridization arrays (Hodges et al. 2009) or ‘in solution’ (Maricic et al. 2010). Finally, one should pay attention to the frequency of C-to-T substitutions at the ends of the sequenced reads. Samples with 2–6% C-to-Ts can be corrected bioinformatically (by trimming ends or filtering transitions), while a higher percentage of C-to-Ts can be remedied through more effective enzymatic removal of uracil (Briggs et al. 2010).

Characterising DNA fragmentation and damage is very useful for authentication and establishing historical provenance of degraded plant samples. DNA degradation advances with time (Weiß et al. 2016), although its rate is highly modulated by intrinsic and environmental factors. Old samples should be considered authentic only if they exhibit fragmentation and damage patterns congruent with their age, tissue type, and storage conditions. In contrast to library-based approaches, primer-based sequencing (such as Sanger sequencing) does not allow quantification of these characteristics and should not be used with aDNA (Gutaker and Burbano 2017).

Figure 1.

Chapter 2 Infographic: Overview of sampling and obtaining DNA from museum collections. An team effort of communities, curators and researchers (1) Collection of botanical material should have detailed consideration of its ethical and legal aspects and the consultation of source communities in advance, in accordance with CITES, CBD and Nagoya legal and ethical frameworks. (2) Curated botanical samples can be found in different types of museums that include botanic gardens, ethnobotany and anthropological collections. The next step is to find relevant specimens with preferably rich metadata, e.g. species identification, collection place and date. (3) Once the specimens have been identified, they should undergo molecular analyses in clean facilities. Where they will be pre-processed according to their traits, avoiding contamination with other samples, “modern” specimens, and amplicons. Then, it is crucial to identify samples that failed and passed quality controls for endogenous DNA. Finally, the data produced should be linked to their respective vouchers and made available in public repositories like NCBI and BOLD.

Responsible lab use for aDNA

Library-based methods assist with the responsible use of collections, as they preserve the total (non-selective) DNA and ‘immortalise’ it for future use. Immortalisation only has value if the DNA that has been amplified is truly historical/ancient and devoid of contemporary contamination and hence all the aforementioned precautions are necessary when working with aDNA. We recommend that extracts or library builds are precisely annotated with the methods used and are properly archived.


  1. Name three legal considerations and their related ethical main issues that should be taken into account for aDNA research using museum material.
  2. Why is it important to process herbarium samples in a dedicated clean lab?
  3. Name three benefits of getting curators involved in the early stages of research using collections.


aDNA – Ancient DNA, DNA that exhibits biochemical characteristics typical for DNA from old degraded material, i.e., damage and fragmentation, regardless of age.

Artefact – An object made by humans that is of historical or cultural importance, examples include: clothing, ornaments, utensils.

Authentication – Bioinformatic analyses that quantify damage and fragmentation of sequenced DNA to help rule out that DNA is derived from contemporary contamination.

Collection – Repository of curated biological material arranged in a systematic fashion.

Contamination – Introduction of alien tissue or DNA to a specimen or DNA isolate, examples include: microbial colonisation, human epithelium, plant-based foods, etc.

Curator – Custodian of a collection with expert knowledge about specimens, their organisation, and preservation.

Destructive sampling – Permanent removal of a fragment of a specimen of any size that will be irretrievable after biochemical characterization.

DNA damage – Typically conversion of cytosine to uracil in DNA through deamination, which accumulates with time. During sequencing, uracil is replaced with thymine, hence the common synonym, C-to-T substitutions.

Endogenous DNA – Authentic DNA from targeted individuals of a species, in contrast to exogenous DNA from associated microbes and contemporary plant and human DNA contamination.

Fragmentation – Breaks in the DNA backbone, most frequently caused by depurination, leading to shorter DNA fragments with time.

Immortalization – Molecular manipulation of DNA, for example the attachment of DNA adapters, that allows infinite re-amplification of the original DNA from a biological specimen.

Type specimen – Preserved individual plant that has defining features of that taxon that is used for the first taxonomic description of a species. This permanent feature-specimen link is recognized in a publication.

Voucher – Preserved botanical specimen kept in permanent collection and cited by research project. Vouchers will have been expertly identified and are usually annotated with collection time, place, and collector details.


  • Anderson EN, Pearsall D, Hunn E, Turner N (2011) Ethnobiology. John Wiley & Sons, Inc., Hoboken, NJ, USA. doi: 10.1002/9781118015872
  • Austin RM, Sholts SB, Williams L, Kistler L, Hofman CA (2019) Opinion: To curate the molecular past, museums need a carefully considered set of best practices. Proc Natl Acad Sci USA 116, 1471–1474. doi: 10.1073/pnas.1822038116
  • Bakker FT, Antonelli A, Clarke JA, Cook JA, Edwards SV, Ericson PGP, Faurby S, Ferrand N, Gelang M, Gillespie RG, Irestedt M, Lundin K, Larsson E, Matos-Maraví P, Müller J, von Proschwitz T, Roderick GK, Schliep A, Wahlberg N, Wiedenhoeft J, Källersjö M (2020) The Global Museum: natural history collections and the future of evolutionary science and public education. PeerJ 8, e8225. doi: 10.7717/peerj.8225
  • Bakker FT (2019) Herbarium genomics: plant archival DNA explored, in: Lindqvist, C., Rajora, O.P. (Eds.) Paleogenomics: Genome-Scale Analysis of Ancient DNA, Population Genomics. Springer International Publishing, Cham, pp. 205–224. doi: 10.1007/13836_2018_40
  • Bieker VC, Martin MD (2018) Implications and future prospects for evolutionary analyses of DNA in historical herbarium collections. Botany Letters 165, 1–10. doi: 10.1080/23818107.2018.1458651
  • Bieker VC, Sánchez Barreiro F, Rasmussen JA, Brunier M, Wales N, Martin MD (2020) Metagenomic analysis of historical herbarium specimens reveals a postmortem microbial community. Mol. Ecol. Resour. 20, 1206–1219. doi: 10.1111/1755-0998.13174
  • Briggs AW, Stenzel U, Johnson PLF, Green RE, Kelso J, Prüfer K, Meyer M, Krause J, Ronan MT, Lachmann M, Pääbo S (2007) Patterns of damage in genomic DNA sequences from a Neandertal. Proc Natl Acad Sci USA 104, 14616–14621. doi: 10.1073/pnas.0704665104
  • Briggs AW, Stenzel U, Meyer M, Krause J, Kircher M, Pääbo S (2010) Removal of deaminated cytosines and detection of in vivo methylation in ancient DNA. Nucleic Acids Res. 38, e87. doi: 10.1093/nar/gkp1163
  • Cappellini E, Gilbert MTP, Geuna F, Fiorentino G, Hall A, Thomas-Oates J, Ashton PD, Ashford DA, Arthur P, Campos PF, Kool J, Willerslev E, Collins MJ (2010) A multidisciplinary study of archaeological grape seeds. Naturwissenschaften 97, 205–217. doi: 10.1007/s00114-009-0629-3
  • Carøe C, Gopalakrishnan S, Vinner L, Mak SST, Sinding MHS, Samaniego JA, Wales N, Sicheritz-Pontén T, Gilbert MTP (2017) Single-tube library preparation for degraded DNA. Methods Ecol. Evol. 9, 410–419. doi: 10.1111/2041-210X.12871
  • Chomicki G, Renner SS (2015) Watermelon origin solved with molecular phylogenetics including Linnaean material: another example of museomics. New Phytol. 205, 526–532. doi: 10.1111/nph.13163
  • Dabney J, Meyer M, Pääbo S (2013) Ancient DNA damage. Cold Spring Harb. Perspect. Biol. 5, a012567. doi: 10.1101/cshperspect.a012567
  • Das S, Lowe M (2018) Nature read in black and white: decolonial approaches to interpreting natural history collections. Journal of Natural Science Collections 6, 1–14.
  • de Filippo C, Meyer M, Prüfer K (2018) Quantifying and reducing spurious alignments for the analysis of ultra-short ancient DNA sequences. BMC Biol. 16, 121. doi: 10.1186/s12915-018-0581-9
  • Durvasula A, Fulgione A, Gutaker RM, Alacakaptan SI, Flood PJ, Neto C, Tsuchimatsu T, Burbano HA, Picó FX, Alonso-Blanco C, Hancock AM (2017) African genomes illuminate the early history and transition to selfing in Arabidopsis thaliana. Proc Natl Acad Sci USA 114, 5213–5218. doi: 10.1073/pnas.1616736114
  • Ellis EC, Gauthier N, Klein Goldewijk K, Bliege Bird R, Boivin N, Díaz S, Fuller DQ, Gill JL, Kaplan JO, Kingston N, Locke H, McMichael CNH, Ranco D, Rick TC, Shaw MR, Stephens L, Svenning J-C, Watson JEM (2021) People have shaped most of terrestrial nature for at least 12,000 years. Proc Natl Acad Sci USA 118, e2023483118. doi: 10.1073/pnas.2023483118
  • Forrest LL, Hart ML, Hughes M, Wilson HP, Chung K-F, Tseng Y-H, Kidner CA (2019) The limits of Hyb-Seq for herbarium specimens: impact of preservation techniques. Front. Ecol. Evol. 7, 439. doi: 10.3389/fevo.2019.00439
  • Freedman J, van Dorp L, Brace S (2018) Destructive sampling natural science collections: An overview for museum professionals and researchers. Journal of Natural Science Collections.
  • Fulton TL, Shapiro B (2019) Setting up an ancient DNA laboratory. Methods Mol. Biol. 1963, 1–13. doi: 10.1007/978-1-4939-9176-1_1
  • Gewin V (2021) How to include Indigenous researchers and their knowledge. Nature 589, 315–317. doi: 10.1038/d41586-021-00022-1
  • Gutaker RM, Burbano HA (2017) Reinforcing plant evolutionary genomics using ancient DNA. Curr. Opin. Plant Biol. 36, 38–45. doi: 10.1016/j.pbi.2017.01.002
  • Gutaker RM, Reiter E, Furtwängler A, Schuenemann VJ, Burbano HA (2017) Extraction of ultrashort DNA molecules from herbarium specimens. BioTechniques 62, 76–79. doi: 10.2144/000114517
  • Gutaker RM, Weiß CL, Ellis D, Anglin NL, Knapp S, Luis Fernández-Alonso J, Prat S, Burbano HA (2019) The origins and adaptation of European potatoes reconstructed from historical genomes. Nat. Ecol. Evol. 3, 1093–1101. doi: 10.1038/s41559-019-0921-3
  • Hodges E, Rooks M, Xuan Z, Bhattacharjee A, Benjamin Gordon D, Brizuela L, Richard McCombie W, Hannon GJ (2009) Hybrid selection of discrete genomic intervals on custom-designed microarrays for massively parallel sequencing. Nat. Protoc. 4, 960–974. doi: 10.1038/nprot.2009.68
  • Hofreiter M, Serre D, Poinar HN, Kuch M, Pääbo S (2001) Ancient DNA. Nat. Rev. Genet. 2, 353–359. doi: 10.1038/35072071
  • Iob A, Botigué L (2021) Crop archaeogenomics: a powerful resource in need of a well-defined regulation framework. Plants, People, Planet 4, 44–50. doi: 10.1002/ppp3.10233
  • Jónsson H, Ginolhac A, Schubert M, Johnson PLF, Orlando L (2013) mapDamage2.0: fast approximate Bayesian estimates of ancient DNA damage parameters. Bioinformatics 29, 1682–1684. doi: 10.1093/bioinformatics/btt193
  • Kircher M, Sawyer S, Meyer M (2012) Double indexing overcomes inaccuracies in multiplex sequencing on the Illumina platform. Nucleic Acids Res. 40, e3. doi: 10.1093/nar/gkr771
  • Kistler L (2012) Ancient DNA extraction from plants. Methods Mol. Biol. 840, 71–79. doi: 10.1007/978-1-61779-516-9_10
  • Latorre SM, Lang PLM, Burbano HA, Gutaker RM (2020) Isolation, library preparation, and bioinformatic analysis of historical and ancient plant DNA. Curr. Protoc. Plant Biol. 5, e20121. doi: 10.1002/cppb.20121
  • Maldonado C, Molina CI, Zizka A, Persson C, Taylor CM, Albán J, Chilquillo E, Rønsted N, Antonelli A (2015) Estimating species diversity and distribution in the era of Big Data: to what extent can we trust public databases? Glob. Ecol. Biogeogr. 24, 973–984. doi: 10.1111/geb.12326
  • Maricic T, Whitten M, Pääbo S (2010) Multiplexed DNA sequence capture of mitochondrial genomes using PCR products. PLoS ONE 5, e14004. doi: 10.1371/journal.pone.0014004
  • Mascher M, Schuenemann VJ, Davidovich U, Marom N, Himmelbach A, Hübner S, Korol A, David M, Reiter E, Riehl S, Schreiber M, Vohr SH, Green RE, Dawson IK, Russell J, Kilian B, Muehlbauer GJ, Waugh R, Fahima T, Krause J, Stein N (2016) Genomic analysis of 6,000-year-old cultivated grain illuminates the domestication history of barley. Nat. Genet. 48, 1089–1093. doi: 10.1038/ng.3611
  • McAlvay AC, Armstrong CG, Baker J, Elk LB, Bosco S, Hanazaki N, Joseph L, Martínez-Cruz TE, Nesbitt M, Palmer MA, Priprá de Almeida WC, Anderson J, Asfaw Z, Borokini IT, Cano-Contreras EJ, Hoyte S, Hudson M, Ladio AH, Odonne G, Peter S, Rashford J, Wall J, Wolverton S, Vandebroek I (2021) Ethnobiology phase VI: decolonizing institutions, projects, and scholarship. J. Ethnobiol. 41, 170–191. doi: 10.2993/0278-0771-41.2.170
  • McManis CR, Pelletier JS (2014) Legal aspects of biocultural collections, in: Salick, J., Konchar, K., Nesbitt, M. (Eds.) Presented at the Curating biocultural collections: a handbook, Royal Botanic Gardens, Kew, Kew, pp. 229–243.
  • Meyer M, Kircher M, Gansauge M-T, Li H, Racimo F, Mallick S, Schraiber JG, Jay F, Prüfer K, de Filippo C, Sudmant PH, Alkan C, Fu Q, Do R, Rohland N, Tandon A, Siebauer M, Green RE, Bryc K, Briggs AW, Pääbo S (2012) A high-coverage genome sequence from an archaic Denisovan individual. Science 338, 222–226. doi: 10.1126/science.1224344
  • Meyer M, Kircher M (2010) Illumina sequencing library preparation for highly multiplexed target capture and sequencing. Cold Spring Harb. Protoc. 2010, pdb.prot5448. doi: 10.1101/pdb.prot5448
  • Nesbitt M, Colledge S, Murray MA (2003) Organisation and management of seed reference collections. Environmental Archaeology 8, 77–84. doi: 10.1179/env.2003.8.1.77
  • Nistelberger HM, Smith O, Wales N, Star B, Boessenkool S (2016) The efficacy of high-throughput sequencing and target enrichment on charred archaeobotanical remains. Sci. Rep. 6, 37347. doi: 10.1038/srep37347
  • Overballe-Petersen S, Orlando L, Willerslev E (2012) Next-generation sequencing offers new insights into DNA degradation. Trends Biotechnol. 30, 364–368. doi: 10.1016/j.tibtech.2012.03.007
  • Pääbo S, Poinar H, Serre D, Jaenicke-Despres V, Hebler J, Rohland N, Kuch M, Krause J, Vigilant L, Hofreiter M (2004) Genetic analyses from ancient DNA. Annu. Rev. Genet. 38, 645–679. doi: 10.1146/annurev.genet.37.110801.143214
  • Pálsdóttir AH, Bläuer A, Rannamäe E, Boessenkool S, Hallsson JH (2019) Not a limitless resource: ethics and guidelines for destructive sampling of archaeofaunal remains. R. Soc. Open Sci. 6, 191059. doi: 10.1098/rsos.191059
  • Peltzer A, Jäger G, Herbig A, Seitz A, Kniep C, Krause J, Nieselt K (2016) EAGER: efficient ancient genome reconstruction. Genome Biol. 17, 60. doi: 10.1186/s13059-016-0918-z
  • Pont C, Wagner S, Kremer A, Orlando L, Plomion C, Salse J (2019) Paleogenomics: reconstruction of plant evolutionary trajectories from modern and ancient DNA. Genome Biol. 20, 29. doi: 10.1186/s13059-019-1627-1
  • Prüfer K, Stenzel U, Hofreiter M, Pääbo S, Kelso J, Green RE (2010) Computational challenges in the analysis of ancient DNA. Genome Biol. 11, R47. doi: 10.1186/gb-2010-11-5-r47
  • Pungetti G, Oviedo G, Hooke D (2012) Sacred species and sites: advances in biocultural conservation. Cambridge University Press, Cambridge. doi: 10.1017/CBO9781139030717
  • Ratsch A, Steadman KJ, Bogossian F (2010) The pituri story: a review of the historical literature surrounding traditional Australian Aboriginal use of nicotine in Central Australia. J. Ethnobiol. Ethnomed. 6, 26. doi: 10.1186/1746-4269-6-26
  • Rivier L, Lindgren J-E (1972) “Ayahuasca,” the South American hallucinogenic drink: an ethnobotanical and chemical investigation. Econ. Bot. 26, 101–129. doi: 10.1007/BF02860772
  • Salick J, Konchar K, Nesbitt M (2014) Curating Biocultural Collections: A Handbook. Royal Botanic Gardens, Kew, Kew.
  • Schindel DE, Cook JA (2018) The next generation of natural history collections. PLoS Biol. 16, e2006125. doi: 10.1371/journal.pbio.2006125
  • Schrader C, Schielke A, Ellerbroek L, Johne R (2012) PCR inhibitors - occurrence, properties and removal. J. Appl. Microbiol. 113, 1014–1026. doi: 10.1111/j.1365-2672.2012.05384.x
  • Schubert M, Ermini L, Der Sarkissian C, Jónsson H, Ginolhac A, Schaefer R, Martin MD, Fernández R, Kircher M, McCue M, Willerslev E, Orlando L (2014) Characterization of ancient and modern genomes by SNP detection and phylogenomic and metagenomic analysis using PALEOMIX. Nat. Protoc. 9, 1056–1082. doi: 10.1038/nprot.2014.063
  • Scott MF, Botigué LR, Brace S, Stevens CJ, Mullin VE, Stevenson A, Thomas MG, Fuller DQ, Mott R (2019) A 3,000-year-old Egyptian emmer wheat genome reveals dispersal and domestication history. Nat. Plants 5, 1120–1128. doi: 10.1038/s41477-019-0534-5
  • Sherman B, Henry RJ (2020) The Nagoya Protocol and historical collections of plants. Nat. Plants 6, 430–432. doi: 10.1038/s41477-020-0657-8
  • Staats M, Erkens RHJ, van de Vossenberg B, Wieringa JJ, Kraaijeveld K, Stielow B, Geml J, Richardson JE, Bakker FT (2013) Genomic treasure troves: complete genome sequencing of herbarium and insect museum specimens. PLoS ONE 8, e69189. doi: 10.1371/journal.pone.0069189
  • Swarts K, Gutaker RM, Benz B, Blake M, Bukowski R, Holland J, Kruse-Peeples M, Lepak N, Prim L, Romay MC, Ross-Ibarra J, Sanchez-Gonzalez J de J, Schmidt C, Schuenemann VJ, Krause J, Matson RG, Weigel D, Buckler ES, Burbano HA (2017) Genomic estimation of complex traits reveals ancient maize adaptation to temperate North America. Science 357, 512–515. doi: 10.1126/science.aam9425
  • Van de Paer C, Hong-Wa C, Jeziorski C, Besnard G (2016) Mitogenomics of Hesperelaea, an extinct genus of Oleaceae. Gene 594, 197–202. doi: 10.1016/j.gene.2016.09.007
  • Wagner S, Lagane F, Seguin-Orlando A, Schubert M, Leroy T, Guichoux E, Chancerel E, Bech-Hebelstrup I, Bernard V, Billard C, Billaud Y, Bolliger M, Croutsch C, Čufar K, Eynaud F, Heussner KU, Köninger J, Langenegger F, Leroy F, Lima C, Orlando L (2018) High-Throughput DNA sequencing of ancient wood. Mol. Ecol. 27, 1138–1154. doi: 10.1111/mec.14514
  • Wales N, Andersen K, Cappellini E, Avila-Arcos MC, Gilbert MTP (2014) Optimization of DNA recovery and amplification from non-carbonized archaeobotanical remains. PLoS ONE 9, e86827. doi: 10.1371/journal.pone.0086827
  • Wales N, Kistler L (2019) Extraction of ancient DNA from plant remains. Methods Mol. Biol. 1963, 45–55. doi: 10.1007/978-1-4939-9176-1_6
  • Weiß CL, Schuenemann VJ, Devos J, Shirsekar G, Reiter E, Gould BA, Stinchcombe JR, Krause J, Burbano HA (2016) Temporal patterns of damage and decay kinetics of DNA retrieved from plant herbarium specimens. R. Soc. Open Sci. 3, 160239. doi: 10.1098/rsos.160239


  1. Legal: CITES (restriction in international trade of endangered species), Nagoya Protocol (ownership and other significance to indigenous peoples), and Drug Act (controlled substances).
  2. The decay of DNA from historical plant material makes it very susceptible to contamination with exogenous modern DNA.
  3. Curators can contribute (1) high-quality metadata such as collection dates and provenance, (2) knowledge of collections in-house and elsewhere, (3) knowledge of source communities and ethical and legal issues, (4) advice on choice of specimens most suitable for sampling.

Chapter 3 DNA from water


The first studies conducted on DNA obtained from water samples were published in the 1990s. Cloning techniques were commonly used to investigate novel genes and functions of environmental communities at that time. Stein et al. (Stein et al. 1996) cloned DNA fragments obtained from water samples into E. coli vectors to investigate marine archaea metabolism. Relying on the development of high-throughput sequencing (HTS) technologies, it is now possible to capture and sequence almost all DNA fragments present in a water sample. A pioneering example represents the study, where Venter and colleagues sequenced sea water samples revealing diverse microbial compositions and functions (Venter et al. 2004). Further rapid development of sequencing technologies in the last decade, as well as the coinciding decrease in sequencing costs, has allowed for the incorporation of ‘environmental DNA’ (eDNA) methods (i.e., the analysis of DNA fragments isolated from environmental sample types such as water, air, and soil) into several applications, including in aquatic environmental surveys.

Conventionally, biomonitoring of freshwater and marine environments is based on direct observation of indicator taxa to compute biotic metrics/indices. This can be time and labour intensive (Pawlowski et al. 2018). Other methods such as depletion-based electrofishing, hydroacoustics, camera traps, and gillnets are also common (Deiner et al. 2017). In recent years, eDNA methods have been added to this toolbox of available methods for biomonitoring. Species-level information on key bioindicator species has for example been obtained by using the DNA obtained from water samples (Hajibabaei et al. 2011). Other applications include population quantification (Fukaya et al. 2020), invasive species detection (Anglès d’Auriac et al. 2019), water quality monitoring (Noyer et al. 2015), and revealing food web interactions (D’Alessandro and Mariani 2021).

The main advantage of water is the ease of sample collection compared to other aquatic sample types such as sediments or biofilms, as these substrates usually require more sophisticated tools and longer sampling times (Deiner et al. 2017). A potential disadvantage is that DNA in a water column decays into undetectable levels in two weeks at most (Dejean et al. 2011; Thomsen et al. 2012), whereas in sediments and ice cores it can persist much longer (Turner et al. 2015). Thus, DNA collected from water samples typically reflects contemporary communities, whereas those collected from sediments and ice cores reflect a longer temporal scale and can be used as a source for ancient DNA (Willerslev et al. 2007) (Chapter 8 DNA from ancient sediments).

Detecting DNA in water samples obtained from aquatic environments can be challenging because it is usually present at low concentrations with an uneven spatial distribution (Ficetola et al. 2008; Goldberg et al. 2016). In this chapter, we first explain the factors affecting the detection of DNA with a specific focus on plant species for environmental applications. The literature referenced here mostly focuses on vascular plants but the general approach might be suitable for a broader group of organisms as well (Alsos et al. 2018; Apothéloz-Perret-Gentil et al. 2021; Nowak et al. 2021). We then outline the general workflow and experimental setup for collecting DNA from water and strategies to optimise its detection.

Detection of DNA from aquatic environments

Natural processes influencing the composition and quantity of detectable DNA in a water sample can be categorised into 1) shedding of biological material from source organisms, 2) degradation, 3) transport across the water column, and 4) retention and resuspension (Harrison et al. 2019). Several biotic and abiotic environmental factors influence the rates of these processes. This creates a complex and environment-specific relationship between DNA that is detected in the water and how well this can be related to the presence and relative abundance of an aquatic organism. As almost all DNA fragments in a water sample can be detected with current sequencing technologies, establishing an optimal sampling strategy is crucial for minimising the probability of contamination and obtaining an accurate representation of biodiversity.


Senescence in aquatic plants releases free cells into the water column that will eventually break down into organic compounds, including DNA. However, degradation in many cells begins via apoptosis before shedding. Apoptosis involves the shrinkage of the cell and its nucleus in a programmed way, in contrast to necrosis, which is uncontrolled cell death due to loss of osmotic control typically by swelling and bursting (Hotchkiss et al. 2009; Toné et al. 2007). In general, plant and animal cells have similar mechanisms of apoptosis with tightly packed nuclear DNA in early stages, which is later hydrolyzed into smaller fragments of about 50 kb and multiples of approximately 180 bp (Reape et al. 2008; Vanyushin et al. 2004). Mitochondrial DNA, on the other hand, shows lower decay rates compared to nuclear DNA, which is attributed to the presence of the mitochondrial membrane or other localised factors (Foran 2006). Possibly owing to such similar mechanisms of cell death, Fujiwara et al. (Fujiwara et al. 2016) showed that temporal changes in the amount of DNA in water samples are similar for an aquatic plant, Egeria densa, and carp. However, this relationship is not always significant for plants, as opposed to fish, which is attributed to the differences in cell and tissue structures, cell functions, and metabolic systems (Matsuhashi et al. 2016).

DNA degradation

DNA is a highly stable molecule at neutral pH and moderate temperatures. However, there are several abiotic factors that directly and indirectly influence its stability in aquatic environments (Schroeder and Wolfenden 2007). High temperatures increase degradation rates either by denaturing DNA molecules directly or by increasing metabolic and enzymatic activities that lead to DNA degradation (Eichmiller et al. 2016; Okabe and Shimazu 2007). Ultraviolet light can either directly damage DNA or react with organic matter to form reactive molecules that indirectly damage DNA (Leech et al. 2009; Strickler et al. 2015). Pilliod et al. (2014) detected DNA in water samples after 18 days when kept in the dark after collection, but when exposed to light nothing was detected after eight days. Hypersaline and low oxygen environments can also affect the conformation and stability of DNA (Barnes et al. 2014; Hofreiter et al. 2001). Biotic factors depending on the source organism such as the type of shed tissue, age, size, or life history, or external biotic factors such as microbial activity, trophic state, or the concentration of extracellular nucleases might also influence DNA shedding and persistence in aquatic environments (Beng and Corlett 2020; de Souza et al. 2016; Eichmiller et al. 2016; Harrison et al. 2019). The effect of abiotic factors can be expected to be similar for all free extracellular DNA, so the detection probabilities of aquatic plants might be influenced more by shedding and transport rates prior to the release from the cell.

Transportation, retention, and resuspension

Hydrological characteristics of the water body are also critical to consider when inferring species presence and distribution. DNA can bind to particles of varying size in aquatic environments (less than 0.2 µm to greater than 180 µm) and this particle association is one of many parameters that affect DNA transport and diffusion (Shogren et al. 2016). DNA is known to persist longer in sediment compared to water columns and this adsorbed portion can be resuspended into the water after aquatic DNA is degraded (Shogren et al. 2018). Microbial decomposition of plant material in freshwater sediments has also been shown to release extra plant DNA into the water column (Poté et al. 2009). The type of the sediment (e.g., clay vs. organic) and binding affinity of DNA are some of the factors that influence these processes (Beng and Corlett 2020; Harrison et al. 2019). In general, DNA transport in aquatic ecosystems follows similar dynamics with the particles categorised as fine particulate organic matter (i.e., between 0.5 µm to 1 mm) (Pont et al. 2018; Wilcox et al. 2016). Filtration methods are therefore usually designed to capture this size range (Harrison et al. 2019; Pont et al. 2018; Wilcox et al. 2016).

Considering the higher dilution and the effects of currents and waves in marine waters, DNA is generally less concentrated and more quickly dispersed compared to freshwater ecosystems (Foote et al. 2012; Thomsen et al. 2012). However, marine waters are also characterised by higher salinity and more stable temperatures which are known to have stabilising effects on DNA molecules (Okabe and Shimazu 2007; Tsuji et al. 2017). In addition to temperature, pH is also known to be more stable in seas and oceans compared to terrestrial aquatic ecosystems (Collins et al. 2018). Under favourable conditions, DNA obtained from marine water samples can distinguish communities less than 60 m apart for up until 6 hours and can persist above the detection limit for several days (Foote et al. 2012; Kelly et al. 2018; Thomsen et al. 2012).

In rivers and streaming waters, the probability of DNA detection is strongly correlated with downstream transportation rates. Retention, rather than degradation, appears to be a more important factor that limits the transport of DNA in streaming waters (Shogren et al. 2018; Wilcox et al. 2016). Considering the wide range of particle sizes that DNA has been found to be associated with, modelling its transport in flowing waters is not an easy task. The transport rate can be influenced by additional factors such as stream bed characteristics or the presence of biofilms (Shogren et al. 2018). More recently, hydrological models have been used to predict the transport and decay rates of DNA in aquatic ecosystems (Carraro et al. 2021, 2020; Mächler et al. 2021). In lakes and ponds, it can be distributed patchily and fall below the detection limit within just metres owing to the lack of horizontal mixing in the water column (Goldberg et al. 2016). This results in accumulation of DNA in comparatively small and stagnant waters (Harper et al. 2019). Vertical mixing, on the other hand, can be limited by thermal stratification in lakes. This results in each layer having different effects on DNA degradation. Collecting samples during periods when thermal stratification is released and mixing occurs (e.g., during spring and fall overturns in dimictic lakes) may lead to changes in biodiversity estimates, and this should be considered when designing sampling strategies (Bista et al. 2017; Harrison et al. 2019).

Targeted approaches and community analyses using plant DNA from water

Conventional sampling techniques often require a lot of time and effort for detecting indicator, rare, or invasive species. Keeping the target organism alive or intact might also be an important consideration in such cases. Detection of species via nucleic acids collected from environmental samples (eDNA/eRNA) is a relatively new approach that emerged in the last five years (Anglès d’Auriac et al. 2019). These methods offer a non-destructive and efficient complementary approach for the detection of aquatic organisms. They rely on reference sequences and the amount of available data varies among taxonomic groups and countries (Chapter 10 DNA barcoding and Chapter 11 Amplicon metabarcoding). For example, aquatic vascular plants used in biomonitoring are well represented in public databases (BOLD, GenBank), while this is hard to achieve for diatoms due to large proportions of undescribed species and the problems with cultivation of monoclonal cultures (Weigand et al. 2019). Nevertheless, eDNA studies on plants are critical for our understanding of the dynamics of plant communities in aquatic environments. An important application of eDNA-based methods in recent years has been for the detection of invasive aquatic plants (Anglès d’Auriac et al. 2019; Fujiwara et al. 2016; Gantz et al. 2018; Kuehne et al. 2020; Miyazono et al. 2020; Muha et al. 2019; Scriver et al. 2015). In most of these studies, species-specific markers were used to obtain presence/absence data, and the downstream laboratory and data analysis steps are well known and efficient. These methods can also be useful to investigate seasonal or spatial distributions of species (Muha et al. 2019). As suggested for other types of environmental samples, increasing the number and spatial coverage of samples and the time of sampling may improve detection rates for species that are rare, have low biomass, or inhabit a relatively distant site (Alsos et al. 2018).

Although DNA from plant communities have been detected from environmental samples as parts of larger surveys (e.g., within coral reefs), biodiversity studies targeting a large number of plant species are still rare, possibly owing to issues with universal amplification and discriminatory power of single or multiple gene surveys in plants (DiBattista et al. 2019; Fraser et al. 2017). Yet, there are recent efforts to design primers and assays targeting larger groups of plants as well (Coghlan et al. 2020; Shackleton et al. 2019).

An important application for DNA-based methods is the quantification of species abundance and biomass since there are several environmental applications that rely on this information. Depending on the specific aim of the study, this information can be obtained at varying degrees of efficiency and reliability. Approaches employing species-specific methods are more suitable for abundance or biomass estimations (e.g., qPCR, ddPCR). However, they require a priori knowledge of the target group and are limited to already described species. On the other hand, high-throughput approaches can identify species that are rare or have low biomass (e.g., metabarcoding, metagenomics), but they suffer from biases introduced by downstream steps such as PCR amplification, sequencing (Chapter 9 Sequencing platforms and data types), availability of reference sequences, and even the bioinformatics analyses (Chapter 18 Sequence to species) (Alsos et al. 2018; Zhou et al. 2013)

Although molecular methods for species detection have been used as a tool for biodiversity management for more than a decade, only 2% of the available studies have focused on plants (Tsuji et al. 2019). One of the main reasons is the limited information on the dynamics of DNA released from plants to aquatic environments. However, several recent and exciting experimental studies have been published on the relationship between plant biomass and DNA concentrations (Fujiwara et al. 2016; Kuehne et al. 2020; Matsuhashi et al. 2016), temporal plant DNA degradation (Fujiwara et al. 2016; Gantz et al. 2018; Matsuhashi et al. 2016), and the seasonal variation of DNA concentrations (Anglès d’Auriac et al. 2019; Kuehne et al. 2020; Matsuhashi et al. 2019). Although these types of studies are relatively new and need further optimization, two important findings are already apparent: (i) there is so far no observed consistent positive relationship between biomass and DNA concentrations, and (ii) the detectability of DNA significantly increases in autumn in temperate regions when leaf senescence and degradation start. While there is no consensus yet that this is true for all plant species, this finding does imply that the optimal sampling season for a plant can vary depending on morphology, reproductive biology, and the life cycle of the target taxon.

Experimental design

Recent studies that detect plant species in aquatic ecosystems via eDNA are mainly about methodological adjustments (Fujiwara et al. 2016; Gantz et al. 2018; Kuehne et al. 2020; Matsuhashi et al. 2019, 2016; Schabacker et al. 2020; Strickler et al. 2015). The technique involves three main steps: 1) collecting water samples, 2) DNA isolation and sequencing, and 3) taxonomic annotation of assembled sequences. Although substantial improvements are being made for the last step due to the developments in bioinformatics (Chapter 18 Sequence to species), sample collection, DNA isolation, and even the choice of sequencing platform (Chapter 9 Sequencing platforms and data types) can still introduce biases (Singer et al. 2019; Tsuji et al. 2019). Methodological research on these two steps is usually conducted using mesocosm and aquarium experiments focusing on spatial and temporal dynamics of DNA (Kuehne et al. 2020). In the next part of this chapter, we will describe the experimental workflow and discuss the issues related to the processes affecting DNA detection from water samples, as outlined in the previous section.

Sampling strategies: water collection, filtering, and transportation

There are three main steps in a field study for the collection of aqueous eDNA: water collection, transportation, and filtering. In designing sampling strategies for species identification from water samples, there are many factors to consider. These include, but are not limited to, the field conditions, the distance between sampling point and laboratory, the amount of water that is required, and the morphology and life cycle of the target organism (Tsuji et al. 2019). There are multiple methods that can be applied for each of these steps. In this section, we will discuss and compare these methods by focusing on their advantages and limitations.

After the selection of the sampling location, the next step is to decide on the transportation strategy. Water samples can either be directly transported to the laboratory or filtered in the field. If direct transportation is the chosen method, the samples are usually collected with sterilised glass or plastic bottles or disposable plastic tubes. After that, DNA in the water samples can be captured by filtration or ethanol precipitation in the laboratory. This method both reduces the effort and time spent in the field and researchers can perform additional analyses on water samples or store subsamples for further processing (Tsuji et al. 2019; Williams et al. 2016). Storage and preservation of these samples can be challenging, however, and the amount of obtained DNA can be lower compared to filtering in the field (Minamoto et al. 2016). Usually, 15 ml to 1 l of water samples are collected when the samples are transported, while thousands of litres can be processed via field filtration method when using filters with large pore sizes (Schabacker et al. 2020; Sepulveda et al. 2019). Additionally, another advantage of filtering is that a large number of filters can be easily transported in one go due to their small sizes. However, filtering can increase the required sampling time and effort in the field. For example, if muddy waters must be processed with small pore size filters, filtration of a single litre can take hours (Hunter et al. 2019). Another important consideration for field filtering is that lots of laboratory equipment should be carried to the site (Bruce et al. 2021). Keeping the equipment sterile and preventing contamination can be challenging in field conditions. The collection of filtered distilled water can be done to detect such issues, and is thus highly recommended. The choice of the transportation method, on the other hand, depends primarily on the distance between sampling locations and the laboratory (Thomas et al. 2019). For short distances, collecting water samples can be more practical, especially when resampling is possible, as the samples can be processed in sterile laboratory conditions. On the other hand, if the sampling site is far from the laboratory, field filtering can be more efficient due to the high volumes of water samples that can be processed in a single survey, and the protection of DNA in filters during the transportation (Harper et al. 2019; Hinlo et al. 2017; Minamoto et al. 2016).

Precipitation using ethanol or isopropanol can be used for capturing DNA after water collection, but filtration is the more widely used method (Tsuji et al. 2019). The aim of this technique, whether it is applied in the laboratory or in the field, is filtering the water through a relatively small pore size membrane to hold free extracellular or/and cellular DNA. There are different options for the filtering step based on the material, pore size, and filter type (Tsuji et al. 2019). Polyethersulfone (PES), cellulose nitrate, and glass fibre are the most commonly used types of filters in DNA research. Glass fibre filters are commonly suggested due to their higher capability to absorb DNA (Muha et al. 2019; Spens et al. 2017; Tsuji et al. 2019).

Pore sizes of filters used in eDNA studies range from 0.22 µm to 60 µm (Schabacker et al. 2020; Tsuji et al. 2019). Earlier studies were usually conducted using comparatively smaller pore size filters (0.22-0.7 µm). However, after it was shown that the actual particle sizes for eDNA collected from water samples vary between 0.2 μm and 180 μm (Minamoto et al. 2016; Schabacker et al. 2020; Turner et al. 2014), larger filter sizes were more commonly used. Although this makes it possible to process larger volumes of water, it also increases the probability of introducing PCR inhibitors. In this situation, extra steps for removing PCR inhibitors can be used during DNA isolation (Hunter et al. 2019).

The type of filter is one of the most important decisions to be made when designing the sampling strategy. Filters can be classified as open or encapsulated/cartridge filters (Spens et al. 2017; Tsuji et al. 2019). Open filters are membranes that are usually fixed on an immobilised manifold system that is connected to a vacuum pump for filtering the water. Open filters require more laboratory materials and are more easily contaminated, and they are therefore not practical for use in the field. Encapsulated filters can be used with vacuum pumps or simply by mechanical force (syringes). This reduces the effort and time required for field sampling. As contamination can also be prevented immediately after water filtering, encapsulated filters offer many advantages over filtering on-site (Spens et al. 2017; Thomas et al. 2019). A key drawback of encapsulated filters is cost: encapsulated filters generally cannot be used more than once, so the total cost of filters needs to be considered, in particular for large-scale projects.

Contamination of samples and the degradation of DNA are two critical processes that should be avoided as much as possible from water collection in the field to DNA isolation in the lab (Goldberg et al. 2016; Hinlo et al. 2017; Tsuji et al. 2019). Specific eDNA sampling guidelines have been published by environmental agencies with detailed instructions on how to avoid contamination and design the optimal collection protocol (Carim et al. 2016; Laramie et al. 2015).

DNA extraction

Choosing the correct DNA extraction protocol can be crucial in ensuring that the effect of PCR inhibitors in water samples will be minimised. The chemical and physical characteristics of samples can vary considerably, and therefore the quantity and purity of isolated DNA also vary (Goldberg et al. 2016). Plant DNA isolation protocols start after the capture of DNA from water samples via either precipitation or filtration. With precipitation, samples are usually mixed with ethanol and centrifuged to collect the precipitated DNA as pellets after removal of the supernatant. A critical point here is that ethanol should be totally removed from the samples as it can affect the efficiency of further downstream steps (Kuehne et al. 2020). When isolating DNA using filtration, the filters are usually incubated in a lysis solution to ensure the DNA is free, centrifuged to remove other molecules, and isolated from the supernatant. Commercial DNA isolation kits designed for environmental tissues and plant samples (e.g., DNeasy PowerWater, PowerPlant or Blood & Tissue, Qiagen) are usually preferred by researchers with some small modifications on the recommended protocols based on the sample types (Coghlan et al. 2020; Kuehne et al. 2020; Matsuhashi et al. 2016; Miyazono et al. 2020; Schabacker et al. 2020).

Figure 1.

Chapter 3 Infographic: Summary of steps from field collection of water samples to DNA extraction in the laboratory. (1) Open or closed (encapsulated/cartridge) filters can be used for filtering water samples on-site. Large filters (e.g., plankton net with 60 μm pore size) are preferred for filtering larger volumes of water, while small pore size filters can usually process a few litres. Closed filters offer the advantage of preventing contamination, therefore they are more commonly used for on-site filtration. (2) Degradation is another important issue that should be prevented until DNA extraction. Water or filter samples can either be preserved in a chemical buffer or transported in cold and dark conditions to the laboratory for further processing. (3) Plant DNA in water samples can be captured by filtration or precipitation. When using filtration, samples are usually incubated in a lysis solution to extract DNA, while in precipitation samples are mixed with ethanol and DNA is collected in the pellet. Commercial DNA isolation kits specifically designed for environmental sample types are commonly used with some small modifications.

Conclusion and prospects

DNA isolated from water samples can be used for several downstream applications based on the specific aim of the study or survey. Currently, qPCR methods are the most commonly used method for detecting specific target taxa in water samples, while metabarcoding is used for community analyses (Chapter 11 Metabarcoding). The studies comparing the efficiency of these DNA methods with more conventional methods show varying results. For some species or taxa, DNA-based detection methods appear to outperform more conventional methods (Deiner et al. 2016; Tingley et al. 2018), though in other cases there is not necessarily an improvement in detection compared to more conventional surveys (Rose et al. 2019; Wood et al. 2019). Although DNA-based methods are constantly being improved, there are still challenges related to both false positives (when DNA is detected for a species or taxa that is known to be absent) and false negatives (when DNA is not detected for a species or taxa that is known to be present) (Beng and Corlett 2020). Therefore, at least for now, DNA-based methods for aquatic studies focusing on plants are still best coupled with conventional surveys. Based on the further development of sequencing methods and the increasing availability of reference sequence data in public databases, however, there are additional opportunities such as application of metagenomics (Chapter 12 Metagenomics), target capture (Chapter 14 Target capture), or analysis of whole plastomes (Chapter 16 Whole genome sequencing). These methods might provide the additional benefit of integrating functional information besides species detection.


  1. Are water samples best collected at a single point representative of the habitat diversity as a whole? Motivate your answer.
  2. Describe three biotic and three abiotic factors that can affect DNA detection rates in aquatic environments. Explain in a few sentences how these factors can result in the detection of false positives and false negatives in streaming waters.
  3. List five factors that should be taken into account while designing a sampling strategy for detection of DNA from water samples.


Apoptosis – Controlled cell death which involves cell shrinkage, nuclear fragmentation, chromatin condensation, and chromosomal DNA fragmentation.

Biofilm – A consortium of microorganisms where cells stick to each other and often also to a surface.

Dimictic lake – A body of freshwater whose difference in temperature between surface and bottom layers becomes negligible twice per year.

Extracellular nucleases – Enzymes that can work outside of the cell and are capable of cleaving the phosphodiester bonds between nucleotides of nucleic acids.

Mesocosm – Any outdoor experimental system that simulates the natural environment under controlled conditions.

Necrosis – Uncontrolled cell death due to the loss of osmotic control typically by swelling and bursting.

PCR inhibitors – Any factor which prevents the amplification of nucleic acids through the polymerase chain reaction.

Primer – A short single stranded nucleic acid sequence used by all living organisms in the initiation of DNA synthesis.

qPCR (Quantitative PCR) – An extension of the PCR technique which allows estimation of the initial quantity of nucleic acids in a biological sample.

Senescence – The gradual deterioration of functional characteristics with ageing (can be used both for organismal or cellular ageing).

Thermal stratification – The phenomenon in which lakes develop two discrete layers of water of different temperatures; warm on top (epilimnion) and cold below (hypolimnion).

Vector (i.e., cloning vectors) – A small piece of DNA that can be stably maintained in an organism that a foreign DNA fragment can be inserted into for cloning purposes.


  • Alsos IG, Lammers Y, Yoccoz NG, Jørgensen T, Sjögren P, Gielly L, Edwards ME (2018) Plant DNA metabarcoding of lake sediments: How does it represent the contemporary vegetation. PLoS ONE 13, e0195403. doi: 10.1371/journal.pone.0195403
  • Anglès d’Auriac MB, Strand DA, Mjelde M, Demars BOL, Thaulow J (2019) Detection of an invasive aquatic plant in natural water bodies using environmental DNA. PLoS ONE 14, e0219700. doi: 10.1371/journal.pone.0219700
  • Apothéloz-Perret-Gentil L, Bouchez A, Cordier T, Cordonier A, Guéguen J, Rimet F, Vasselon V, Pawlowski J (2021) Monitoring the ecological status of rivers with diatom eDNA metabarcoding: A comparison of taxonomic markers and analytical approaches for the inference of a molecular diatom index. Mol. Ecol. 30, 2959–2968. doi: 10.1111/mec.15646
  • Barnes MA, Turner CR, Jerde CL, Renshaw MA, Chadderton WL, Lodge DM (2014) Environmental conditions influence eDNA persistence in aquatic systems. Environ. Sci. Technol. 48, 1819–1827. doi: 10.1021/es404734p
  • Beng KC, Corlett RT (2020) Applications of environmental DNA (eDNA) in ecology and conservation: opportunities, challenges and prospects. Biodivers. Conserv. 29, 2089–2121. doi: 10.1007/s10531-020-01980-0
  • Bista I, Carvalho GR, Walsh K, Seymour M, Hajibabaei M, Lallias D, Christmas M, Creer S (2017) Annual time-series analysis of aqueous eDNA reveals ecologically relevant dynamics of lake ecosystem biodiversity. Nat. Commun. 8, 14087. doi: 10.1038/ncomms14087
  • Bruce K, Blackman R, Bourlat SJ, Hellstrom AM, Bakker J, Bista I, Bohmann K, Bouchez A, Brys R, Clark K, Elbrecht V, Fazi S, Fonseca V, Hänfling B, Leese F, Mächler E, Mahon AR, Meissner K, Panksep K, Pawlowski J, Deiner K (2021) A practical guide to DNA-based methods for biodiversity assessment. Pensoft Publishers. doi: 10.3897/ab.e68634
  • Carim KJ, McKelvey KS, Young MK, Wilcox TM, Schwartz MK (2016) A protocol for collecting environmental DNA samples from streams. U.S. Department of Agriculture, Forest Service, Rocky Mountain Research Station, Ft. Collins, CO. doi: 10.2737/RMRS-GTR-355
  • Carraro L, Mächler E, Wüthrich R, Altermatt F (2020) Environmental DNA allows upscaling spatial patterns of biodiversity in freshwater ecosystems. Nat. Commun. 11, 3585. doi: 10.1038/s41467-020-17337-8
  • Carraro L, Stauffer JB, Altermatt F (2021) How to design optimal eDNA sampling strategies for biomonitoring in river networks. Environmental DNA 3, 157–172. doi: 10.1002/edn3.137
  • Coghlan SA, Shafer ABA, Freeland JR (2020) Development of an environmental DNA metabarcoding assay for aquatic vascular plant communities. Environmental DNA. doi: 10.1002/edn3.120
  • Collins RA, Wangensteen OS, O’Gorman EJ, Mariani S, Sims DW, Genner MJ (2018) Persistence of environmental DNA in marine systems. Commun. Biol. 1, 185. doi: 10.1038/s42003-018-0192-6
  • D’Alessandro S, Mariani S (2021) Sifting environmental DNA metabarcoding data sets for rapid reconstruction of marine food webs. Fish Fish. doi: 10.1111/faf.12553
  • Deiner K, Bik HM, Mächler E, Seymour M, Lacoursière-Roussel A, Altermatt F, Creer S, Bista I, Lodge DM, de Vere N, Pfrender ME, Bernatchez L (2017) Environmental DNA metabarcoding: Transforming how we survey animal and plant communities. Mol. Ecol. 26, 5872–5895. doi: 10.1111/mec.14350
  • Deiner K, Fronhofer EA, Mächler E, Walser J-C, Altermatt F (2016) Environmental DNA reveals that rivers are conveyer belts of biodiversity information. Nat. Commun. 7, 12544. doi: 10.1038/ncomms12544
  • Dejean T, Valentini A, Duparc A, Pellier-Cuit S, Pompanon F, Taberlet P, Miaud C (2011) Persistence of environmental DNA in freshwater ecosystems. PLoS ONE 6, e23398. doi: 10.1371/journal.pone.0023398
  • de Souza LS, Godwin JC, Renshaw MA, Larson E (2016) Environmental DNA (edna) detection probability is influenced by seasonal activity of organisms. PLoS ONE 11, e0165273. doi: 10.1371/journal.pone.0165273
  • DiBattista JD, Reimer JD, Stat M, Masucci GD, Biondi P, De Brauwer M, Bunce M (2019) Digging for DNA at depth: rapid universal metabarcoding surveys (RUMS) as a tool to detect coral reef biodiversity across a depth gradient. PeerJ 7, e6379. doi: 10.7717/peerj.6379
  • Eichmiller JJ, Best SE, Sorensen PW (2016) Effects of temperature and trophic state on degradation of environmental DNA in lake water. Environ. Sci. Technol. 50, 1859–1867. doi: 10.1021/acs.est.5B010305672
  • Ficetola GF, Miaud C, Pompanon F, Taberlet P (2008) Species detection using environmental DNA from water samples. Biol. Lett. 4, 423–425. doi: 10.1098/rsbl.2008.0118
  • Foote AD, Thomsen PF, Sveegaard S, Wahlberg M, Kielgast J, Kyhn LA, Salling AB, Galatius A, Orlando L, Gilbert MTP (2012) Investigating the potential use of environmental DNA (eDNA) for genetic monitoring of marine mammals. PLoS ONE 7, e41781. doi: 10.1371/journal.pone.0041781
  • Foran DR (2006) Relative degradation of nuclear and mitochondrial DNA: an experimental approach. J. Forensic Sci. 51, 766–770. doi: 10.1111/j.1556-4029.2006.00176.x
  • Fraser CI, Connell L, Lee CK, Cary SC (2017) Evidence of plant and animal communities at exposed and subglacial (cave) geothermal sites in Antarctica. Polar Biol. 41, 1–5. doi: 10.1007/s00300-017-2198-9
  • Fujiwara A, Matsuhashi S, Doi H, Yamamoto S, Minamoto T (2016) Use of environmental DNA to survey the distribution of an invasive submerged plant in ponds. Freshwater Science 35, 748–754. doi: 10.1086/685882
  • Fukaya K, Murakami H, Yoon S, Minami K, Osada Y, Yamamoto S, Masuda R, Kasai A, Miyashita K, Minamoto T, Kondoh M (2020) Estimating fish population abundance by integrating quantitative data on environmental DNA and hydrodynamic modelling. Mol. Ecol. doi: 10.1111/mec.15530
  • Gantz CA, Renshaw MA, Erickson D, Lodge DM, Egan SP (2018) Environmental DNA detection of aquatic invasive plants in lab mesocosm and natural field conditions. Biol. Invasions 20, 1–18. doi: 10.1007/s10530-018-1718-z
  • Goldberg CS, Turner CR, Deiner K, Klymus KE, Thomsen PF, Murphy MA, Spear SF, McKee A, Oyler-McCance SJ, Cornman RS, Laramie MB, Mahon AR, Lance RF, Pilliod DS, Strickler KM, Waits LP, Fremier AK, Takahara T, Herder JE, Taberlet P (2016) Critical considerations for the application of environmental DNA methods to detect aquatic species. Methods Ecol. Evol. doi: 10.1111/2041-210X.12595
  • Hajibabaei M, Shokralla S, Zhou X, Singer GAC, Baird DJ (2011) Environmental barcoding: a next-generation sequencing approach for biomonitoring applications using river benthos. PLoS ONE 6, e17497. doi: 10.1371/journal.pone.0017497
  • Harper LR, Buxton AS, Rees HC, Bruce K, Brys R, Halfmaerten D, Read DS, Watson HV, Sayer CD, Jones EP, Priestley V, Mächler E, Múrria C, Garcés-Pastor S, Medupin C, Burgess K, Benson G, Boonham N, Griffiths RA, Lawson Handley L, Hänfling B (2019) Prospects and challenges of environmental DNA (eDNA) monitoring in freshwater ponds. Hydrobiologia 826, 25–41. doi: 10.1007/s10750-018-3750-5
  • Harrison JB, Sunday JM, Rogers SM (2019) Predicting the fate of eDNA in the environment and implications for studying biodiversity. Proc. Biol. Sci. 286, 20191409. doi: 10.1098/rspb.2019.1409
  • Hinlo R, Gleeson D, Lintermans M, Furlan E (2017) Methods to maximise recovery of environmental DNA from water samples. PLoS ONE 12, e0179251. doi: 10.1371/journal.pone.0179251
  • Hofreiter M, Serre D, Poinar HN, Kuch M, Pääbo S (2001) Ancient DNA. Nat. Rev. Genet. 2, 353–359. doi: 10.1038/35072071
  • Hotchkiss RS, Strasser A, McDunn JE, Swanson PE (2009) Cell death. N. Engl. J. Med. 361, 1570–1583. doi: 10.1056/NEJMra0901217
  • Hunter ME, Ferrante JA, Meigs-Friend G, Ulmer A (2019) Improving eDNA yield and inhibitor reduction through increased water volumes and multi-filter isolation techniques. Sci. Rep. 9, 5259. doi: 10.1038/s41598-019-40977-w
  • Kelly RP, Gallego R, Jacobs-Palmer E (2018) The effect of tides on nearshore environmental DNA. PeerJ 6, e4521. doi: 10.7717/peerj.4521
  • Kuehne LM, Ostberg CO, Chase DM, Duda JJ, Olden JD (2020) Use of environmental DNA to detect the invasive aquatic plants Myriophyllum spicatum and Egeria densa in lakes. Freshwater Science 39, 521–533. doi: 10.1086/710106
  • Laramie MB, Pilliod DS, Goldberg CS, Strickler KM (2015) Environmental DNA sampling protocol - filtering water to capture DNA from aquatic organisms. Techniques and Methods.
  • Leech DM, Snyder MT, Wetzel RG (2009) Natural organic matter and sunlight accelerate the degradation of 17ss-estradiol in water. Sci. Total Environ. 407, 2087–2092. doi: 10.1016/j.scitotenv.2008.11.018
  • Mächler E, Salyani A, Walser J-C, Larsen A, Schaefli B, Altermatt F, Ceperley N (2021) Environmental DNA simultaneously informs hydrological and biodiversity characterization of an Alpine catchment. Hydrol. Earth Syst. Sci. 25, 735–753. doi: 10.5194/hess-25-735-2021
  • Matsuhashi S, Doi H, Fujiwara A, Watanabe S, Minamoto T (2016) Evaluation of the environmental DNA method for estimating distribution and biomass of submerged aquatic plants. PLoS ONE 11, e0156217. doi: 10.1371/journal.pone.0156217
  • Matsuhashi S, Minamoto T, Doi H (2019) Seasonal change in environmental DNA concentration of a submerged aquatic plant species. Freshwater Science 38, 654–660. doi: 10.1086/704996
  • Minamoto T, Naka T, Moji K, Maruyama A (2016) Techniques for the practical collection of environmental DNA: filter selection, preservation, and extraction. Limnology 17, 23–32. doi: 10.1007/s10201-015-0457-4
  • Miyazono S, Kodama T, Akamatsu Y, Nakao R, Saito M (2020) Application of environmental DNA methods for the detection and abundance estimation of invasive aquatic plant Egeria densa in lotic habitats. Limnology. doi: 10.1007/s10201-020-00636-w
  • Muha TP, Skukan R, Borrell YJ, Rico JM, Garcia de Leaniz C, Garcia-Vazquez E, Consuegra S (2019) Contrasting seasonal and spatial distribution of native and invasive Codium seaweed revealed by targeting species-specific eDNA. Ecol. Evol. 9, 8567–8579. doi: 10.1002/ece3.5379
  • Nowak P, Wiebe C, Karez R, Schubert H (2021) Applications of environmental DNA methods for charophyte biodiversity. ACA 4. doi: 10.3897/aca.4.e64944
  • Noyer C, Abot A, Trouilh L, Leberre VA, Dreanno C (2015) Phytochip: development of a DNA-microarray for rapid and accurate identification of Pseudo-nitzschia spp and other harmful algal species. J. Microbiol. Methods 112, 55–66. doi: 10.1016/j.mimet.2015.03.002
  • Okabe S, Shimazu Y (2007) Persistence of host-specific Bacteroides-Prevotella 16S rRNA genetic markers in environmental waters: effects of temperature and salinity. Appl. Microbiol. Biotechnol. 76, 935–944. doi: 10.1007/s00253-007-1048-z
  • Pawlowski J, Kelly-Quinn M, Altermatt F, Apothéloz-Perret-Gentil L, Beja P, Boggero A, Borja A, Bouchez A, Cordier T, Domaizon I, Feio MJ, Filipe AF, Fornaroli R, Graf W, Herder J, van der Hoorn B, Iwan Jones J, Sagova-Mareckova M, Moritz C, Barquín J, Kahlert M (2018) The future of biotic indices in the ecogenomic era: Integrating (e)DNA metabarcoding in biological assessment of aquatic ecosystems. Sci. Total Environ. 637–638, 1295–1310. doi: 10.1016/j.scitotenv.2018.05.002
  • Pilliod DS, Goldberg CS, Arkle RS, Waits LP (2014) Factors influencing detection of eDNA from a stream-dwelling amphibian. Mol. Ecol. Resour. 14, 109–116. doi: 10.1111/1755-0998.12159
  • Pont D, Rocle M, Valentini A, Civade R, Jean P, Maire A, Roset N, Schabuss M, Zornig H, Dejean T (2018) Environmental DNA reveals quantitative patterns of fish biodiversity in large rivers despite its downstream transportation. Sci. Rep. 8, 10361. doi: 10.1038/s41598-018-28424-8
  • Poté J, Ackermann R, Wildi W (2009) Plant leaf mass loss and DNA release in freshwater sediments. Ecotoxicol. Environ. Saf. 72, 1378–1383. doi: 10.1016/j.ecoenv.2009.04.010
  • Reape TJ, Molony EM, McCabe PF (2008) Programmed cell death in plants: distinguishing between different modes. J. Exp. Bot. 59, 435–444. doi: 10.1093/jxb/erm258
  • Rose JP, Wademan C, Weir S, Wood JS, Todd BD (2019) Traditional trapping methods outperform eDNA sampling for introduced semi-aquatic snakes. PLoS ONE 14, e0219244. doi: 10.1371/journal.pone.0219244
  • Schabacker JC, Amish SJ, Ellis BK, Gardner B, Miller DL, Rutledge EA, Sepulveda AJ, Luikart G (2020) Increased eDNA detection sensitivity using a novel high-volume water sampling method. Environmental DNA 2, 244–251. doi: 10.1002/edn3.63
  • Schroeder GK, Wolfenden R (2007) Rates of spontaneous disintegration of DNA and the rate enhancements produced by DNA glycosylases and deaminases. Biochemistry 46, 13638–13647. doi: 10.1021/bi701480f
  • Scriver M, Marinich A, Wilson C, Freeland J (2015) Development of species-specific environmental DNA (eDNA) markers for invasive aquatic plants. Aquatic Botany 122, 27–31. doi: 10.1016/j.aquabot.2015.01.003
  • Sepulveda AJ, Schabacker J, Smith S, Al-Chokhachy R, Luikart G, Amish SJ (2019) Improved detection of rare, endangered and invasive trout in using a new large-volume sampling method for eDNA capture. Environmental DNA 1, 227–237. doi: 10.1002/edn3.23
  • Shackleton ME, Rees GN, Watson G, Campbell C, Nielsen D (2019) Environmental DNA reveals landscape mosaic of wetland plant communities. Glob. Ecol. Conserv. 19, e00689. doi: 10.1016/j.gecco.2019.e00689
  • Shogren AJ, Tank JL, Andruszkiewicz EA, Olds B, Jerde C, Bolster D (2016) Modelling the transport of environmental DNA through a porous substrate using continuous flow-through column experiments. J. R. Soc. Interface 13. doi: 10.1098/rsif.2016.0290
  • Shogren AJ, Tank JL, Egan SP, August O, Rosi EJ, Hanrahan BR, Renshaw MA, Gantz CA, Bolster D (2018) Water flow and biofilm cover influence environmental DNA detection in recirculating streams. Environ. Sci. Technol. 52, 8530–8537. doi: 10.1021/acs.est.8B010301822
  • Singer GAC, Fahner NA, Barnes JG, McCarthy A, Hajibabaei M (2019) Comprehensive biodiversity analysis via ultra-deep patterned flow cell technology: a case study of eDNA metabarcoding seawater. Sci. Rep. 9, 5991. doi: 10.1038/s41598-019-42455-9
  • Spens J, Evans AR, Halfmaerten D, Knudsen SW, Sengupta ME, Mak SST, Sigsgaard EE, Hellström M (2017) Comparison of capture and storage methods for aqueous macrobial eDNA using an optimized extraction protocol: advantage of enclosed filter. Methods Ecol. Evol. 8, 635–645. doi: 10.1111/2041-210X.12683
  • Stein JL, Marsh TL, Wu KY, Shizuya H, DeLong EF (1996) Characterization of uncultivated prokaryotes: isolation and analysis of a 40-kilobase-pair genome fragment from a planktonic marine archaeon. J. Bacteriol. 178, 591–599. doi: 10.1128/jb.178.3.591-599.1996
  • Strickler KM, Fremier AK, Goldberg CS (2015) Quantifying effects of UV-B, temperature, and pH on eDNA degradation in aquatic microcosms. Biological Conservation 183, 85–92. doi: 10.1016/j.biocon.2014.11.038
  • Thomas AC, Nguyen PL, Howard J, Goldberg CS (2019) A self-preserving, partially biodegradable eDNA filter. Methods Ecol. Evol. 10, 1136–1141. doi: 10.1111/2041-210X.13212
  • Thomsen PF, Kielgast J, Iversen LL, Wiuf C, Rasmussen M, Gilbert MTP, Orlando L, Willerslev E (2012) Monitoring endangered freshwater biodiversity using environmental DNA. Mol. Ecol. 21, 2565–2573. doi: 10.1111/j.1365-294X.2011.05418.x
  • Tingley R, Greenlees M, Oertel S, van Rooyen AR, Weeks AR (2018) Environmental DNA sampling as a surveillance tool for cane toad Rhinella marina introductions on offshore islands. Biol. Invasions 1–6. doi: 10.1007/s10530-018-1810-4
  • Toné S, Sugimoto K, Tanda K, Suda T, Uehira K, Kanouchi H, Samejima K, Minatogawa Y, Earnshaw WC (2007) Three distinct stages of apoptotic nuclear condensation revealed by time-lapse imaging, biochemical and electron microscopy analysis of cell-free apoptosis. Exp. Cell Res. 313, 3635–3644. doi: 10.1016/j.yexcr.2007.06.018
  • Tsuji S, Takahara T, Doi H, Shibata N, Yamanaka H (2019) The detection of aquatic macroorganisms using environmental DNA analysis—A review of methods for collection, extraction, and detection. Environmental DNA 1, 99–108. doi: 10.1002/edn3.21
  • Tsuji S, Ushio M, Sakurai S, Minamoto T, Yamanaka H (2017) Water temperature-dependent degradation of environmental DNA and its relation to bacterial abundance. PLoS ONE 12, e0176608. doi: 10.1371/journal.pone.0176608
  • Turner CR, Barnes MA, Xu CCY, Jones SE, Jerde CL, Lodge DM (2014) Particle size distribution and optimal capture of aqueous macrobial eDNA. Methods Ecol. Evol. 5, 676–684. doi: 10.1111/2041-210X.12206
  • Turner CR, Uy KL, Everhart RC (2015) Fish environmental DNA is more concentrated in aquatic sediments than surface water. Biological Conservation 183, 93–102. doi: 10.1016/j.biocon.2014.11.017
  • Vanyushin BF, Bakeeva LE, Zamyatnina VA, Aleksandrushkina NI (2004) Apoptosis in plants: specific features of plant apoptotic cells and effect of various factors and agents. Int. Rev. Cytol. 233, 135–179. doi: 10.1016/S0074-7696(04)33004-4
  • Venter JC, Remington K, Heidelberg JF, Halpern AL, Rusch D, Eisen JA, Wu D, Paulsen I, Nelson KE, Nelson W, Fouts DE, Levy S, Knap AH, Lomas MW, Nealson K, White O, Peterson J, Hoffman J, Parsons R, Baden-Tillson H, Smith HO (2004) Environmental genome shotgun sequencing of the Sargasso Sea. Science 304, 66–74. doi: 10.1126/science.1093857
  • Weigand H, Beermann AJ, Čiampor F, Costa FO, Csabai Z, Duarte S, Geiger MF, Grabowski M, Rimet F, Rulik B, Strand M, Szucsich N, Weigand AM, Willassen E, Wyler SA, Bouchez A, Borja A, Čiamporová-Zaťovičová Z, Ferreira S, Dijkstra K-DB, Ekrem T (2019) DNA barcode reference libraries for the monitoring of aquatic biota in Europe: Gap-analysis and recommendations for future work. Sci. Total Environ. 678, 499–524. doi: 10.1016/j.scitotenv.2019.04.247
  • Wilcox TM, McKelvey KS, Young MK, Sepulveda AJ, Shepard BB, Jane SF, Whiteley AR, Lowe WH, Schwartz MK (2016) Understanding environmental DNA detection probabilities: A case study using a stream-dwelling char Salvelinus fontinalis. Biological Conservation 194, 209–216. doi: 10.1016/j.biocon.2015.12.023
  • Willerslev E, Cappellini E, Boomsma W, Nielsen R, Hebsgaard MB, Brand TB, Hofreiter M, Bunce M, Poinar HN, Dahl-Jensen D, Johnsen S, Steffensen JP, Bennike O, Schwenninger J-L, Nathan R, Armitage S, de Hoog C-J, Alfimov V, Christl M, Beer J, Collins MJ (2007) Ancient biomolecules from deep ice cores reveal a forested southern Greenland. Science 317, 111–114. doi: 10.1126/science.1141758
  • Williams KE, Huyvaert KP, Piaggio AJ (2016) No filters, no fridges: a method for preservation of water samples for eDNA analysis. BMC Res. Notes 9, 298. doi: 10.1186/s13104-016-2104-5
  • Wood SA, Pochon X, Ming W, von Ammon U, Woods C, Carter M, Smith M, Inglis G, Zaiko A (2019) Considerations for incorporating real-time PCR assays into routine marine biosecurity surveillance programmes: a case study targeting the Mediterranean fanworm (Sabella spallanzanii) and club tunicate (Styela clava) 1. Genome 62, 137–146. doi: 10.1139/gen-2018-0021
  • Zhou X, Li Y, Liu S, Yang Q, Su X, Zhou L, Tang M, Fu R, Li J, Huang Q (2013) Ultra-deep sequencing enables high-fidelity recovery of biodiversity for bulk arthropod samples without PCR amplification. Gigascience 2, 4. doi: 10.1186/2047-217X-2-4


  1. No. The probability of species detection depends on the presence and concentration of DNA collected in a water sample. Therefore, multiple sampling sites with replicates is highly encouraged to obtain a broader overview of the local species diversity.
  2. The life history, age, and size of an organism are some of the biotic factors which can affect DNA detection rates. After release from the cell, abiotic factors such as UV, temperature, and pH can influence these rates. In streaming waters, including currents, false positives might be detected in downstream regions due to the transport of DNA. Similarly, if the DNA of the target organism degrades too quickly, it cannot be detected resulting in false negatives.
  3. The scientific question, environmental conditions (physical and chemical), distance between sampling point and laboratory, and morphology and life cycle of the target organism should be considered when designing sampling strategies.

Chapter 4 DNA from soil


The natural presence of any plant entails the existence of a substrate where it can anchor itself and absorb nutrients for its development and survival (Wardle et al. 2004). This is most commonly the ground and specifically, soil. Nevertheless, the link between soil and plants goes beyond soil supporting plants as plants are one of the main soil-forming forces of pedogenesis through the accumulation of organic matter as well as modification of the soil biochemistry surrounding the roots (Corti et al. 2005). This process over time leads to the formation of soil layers, termed horizons, that can commonly be visibly identified (Schulz et al. 2013; Shlemon 1985; Vogt et al. 1995). Near the ground surface, the first soil horizon is an organic layer composed of growing roots and decomposing vegetative and reproductive plant material from local or regional origins, i.e., fallen debris, pollen particles, seeds (Vogt et al. 1995). Hence, this soil horizon is particularly rich in plant DNA from the environment (soil eDNA in short; Taberlet et al. 2018) and can be used as a proxy for plant identification and other biodiversity assessments (Fahner et al. 2016; Taberlet et al. 2018; Yoccoz et al. 2012).

Since the first isolation of DNA from soil bacteria, soil eDNA has gained attention for the assessment of terrestrial environments for several reasons: soil is virtually everywhere, it is easy to collect and transport, harbors signals from above and below biota including both active and dormant cells, and is a non-invasive sample collection technique (Torsvik et al. 1990; Yoccoz 2012); for more on soil eDNA applications see Chapter 24 Environment and biodiversity assessments). Soil eDNA assessments targeting modern plant diversity commonly employ samples that are collected near the surface (organic horizont). However, some studies may refer to sediments which can lead to confusing eDNA samples coexisting in underground environments (Kristensen and Rabenhorst 2015). Although both soil and sediments are products of mineral weathering (Wood 1987), in soils the deposition of these products happens in situ and remains on the surface, while in sediments these products are transported and redeposited elsewhere in layers over time, e.g., the ground or the bottom of a lake or stream (Burdige 2020). Moreover, sediments in general have very different organic content, particle size and mineralogy, and lesser organismal activity than soil, although the transition from soil to sediment can be gradual and depends on the eco-physiological characteristics of the regional environment (e.g., tropical vs. boreal forest; Shackley 1975; Smol et al. 2001). Yet, during flooding events sediments can be transported very rapidly from one place to another while sedimenting in new layers mixed with soil (Baldwin and Mitchell 2000). In these contexts, soil and sedimentary eDNA samples may have a mix of different spatio-temporal signals when it comes to the reconstruction of terrestrial or aquatic environments (Deiner et al. 2017; Thomsen and Willerslev 2015). Ancient sedimentary DNA (sedaDNA) is commonly sampled from bottom sediment layers in either aquatic or terrestrial environments (Parducci et al. 2018), and its temporal signal is usually correlated with sampling depth (Willerslev and Cooper 2005). For more on sedaDNA and its applications see Chapter 8 aDNA from sediments. Sedimentary DNA (sedDNA) usually refers to modern sediments that were either recently deposited or signal contemporary environments. Plant biodiversity assessments of modern environments often employ surface lake sediments (Andersen et al. 2012; Pedersen et al. 2015; Willerslev et al. 2014) as it captures current biodiversity from the entire watershed catchment area (Alsos et al. 2018). This chapter focuses on modern DNA isolated from soil eDNA.

Further, studies may also refer to bulk soil DNA when using soil samples to identify unknown communities, especially in forensic contexts (Boggs et al. 2019; Gothwal et al. 2007; Meiklejohn et al. 2018). Bulk DNA is commonly used in contexts where known taxa are mixed, molecularly identified (usually by metabarcoding), and then studied. There is no consensus on the precise use of these different terms, and the terminology often reflects disciplinary backgrounds and study approaches (Kristensen and Rabenhorst 2015). Yet, it is worth noting that all terms mentioned so far are not mutually exclusive nor encapsulate a particular environment. For example, soil may also be used in aquatic contexts when pedogenic processes lead to horizon differentiation, e.g., estuarine substrata (Wardle et al. 2004). Thus, careful interpretation of the context in which the term is employed is recommended to ensure correct interpretation of data and studies.

Soil DNA: degradation, persistence, and decay

Molecular (plant) identification using soil or sediment eDNA relies on isolating DNA traces from roots, debris, seeds, and pollen (Levy-Booth et al. 2007), which signal diverse spatial and temporal origins, i.e., local or regional, ancient or contemporary. When these plant parts settle into the ground, DNA can be present either in intact cells (intracellular DNA or iDNA) or free in the environment following cell lysis or rupture (extracellular DNA or exDNA; Nagler et al. 2018). The largest fraction of eDNA in underground environments is exDNA that originates from bacteria and fungal soil communities (Levy-Booth et al. 2007; Nagler et al. 2018; Pietramellara et al. 2009; Poté et al. 2009).

The state of DNA in the soil is subject to intrinsic and extrinsic DNA properties related to the origins of the DNA as well as factors influencing its decay (Barnes et al. 2014; Lacoursière-Roussel and Deiner 2021; Sirois and Buckley 2019). For more on leaf DNA decay together with organic horizon formation, see the infographic. Soil eDNA is therefore a combination of iDNA and exDNA, that can degrade rapidly or persist over time. Intrinsic DNA properties that can affect its persistence in the ground include characteristics such as DNA GC content, purity, and weight (Nielsen et al. 2000; Pietramellara et al. 2009; Sirois and Buckley 2019; Taberlet et al. 2018; Vuillemin et al. 2017). Intrinsic DNA properties are those of the organism that affect the magnitude of DNA deposition such as life history traits like biomass, feeding, social, nesting, burrowing, hibernation, etc. Extrinsic DNA properties are more related to abiotic and biotic processes operating in the ground, e.g., soil mineralogy, organic components, pH, electrostatic properties, moisture, the presence/absence of UV radiation, bioturbation, enzymatic activity by microbial communities, and decomposition (Cozzolino et al. 2007; Gardner and Gunsch 2017; Gulden et al. 2005; Levy-Booth et al. 2007; Prosser and Hedgpeth 2018; Saeki et al. 2011). Examples of biotic processes operating in natural environments can be found in the infographic.

iDNA persists due to protection from the cell wall and membranes against abiotic processes. Cells are more likely to remain intact in the ground if there is decreased enzymatic activity as a result of rapid soil desiccation, low temperatures, or extreme pH values (Pietramellara et al. 2009; Taberlet et al. 2018). exDNA is more likely to persist when it binds to surface-reactive particles and hydrophobic soil components such as clay, sand, silt, and humic acids (Levy-Booth et al. 2007; Pietramellara et al. 2009). DNA may also indirectly persist via bacterial integration of DNA fragments (Levy-Booth et al. 2007). Bacterial enzymatic activity plays a central role in DNA degradation in soil (Blum et al. 1997). DNase is secreted copiously to access the phosphorus and nitrogen from the DNA and acts more rapidly on DNA at higher temperatures (Levy-Booth et al. 2007). Since both the temperature and underground biota activity levels are higher in tropical climates, there are generally increased degradation rates in tropical vs. boreal soils. Soil types may also affect degradation rates (Sirois and Buckley 2019) using a controlled microcosm reported that synthetic DNA degraded slower in forest than in agricultural soils where tillage and other disruptive processes can affect persistence. Predicting the origins and persistence of eDNA remains a thorny issue, mainly because of the complex nature of the properties involved (Barnes and Turner 2015; Deiner et al. 2017).

Soil memory

Plant eDNA bound to soil particles can originate from multiple taxa and multiple vegetative parts, each one with particular mechanisms to bind, persist and degrade in soil substrates. Plant DNA persistence within soil allows us to harvest its botanical memory for identifying vegetation through time. Indeed, comparisons of plant identifications through both visual vegetation surveys and soil eDNA assessments have shed light on the temporal signals stored in top soils. In boreal areas, plant identification through soil eDNA signal mostly registered contemporary vegetation (Ariza et al. 2022; Edwards et al. 2018; Yoccoz et al. 2012), however, taxa surveyed up to 30 years ago was also reported, suggesting that soil eDNA harbors more of a contemporary memory (Ariza et al. 2022). The extent of this memory effect across soil types and environments is poorly understood while its implications are relevant for society (e.g., biodiversity assessments and monitoring, forensics, biosafety). For more on applications of soil eDNA see Chapter 24 Environment and biodiversity assessments.

Designing a soil eDNA study

The flora and study area are key in any study to ensure sound conclusions. Below you will find considerations that can help you to answer common questions when designing field and wet lab experiments.

How to sample and how much?

Soil sampling can be done either by scooping out the soil, drilling down a tube, i.e., a 50 ml falcon tube, or with a soil core sampler. We recommend to use sampling protocols specifically validated in an environment similar to your study site, e.g., woodlands, grasslands, meadows, boreal temperate, and tropical forest (Bienert et al. 2012; Dopheide et al. 2019; Fahner et al. 2016; Taberlet et al. 2012; Yoccoz et al. 2012). It is also recommended to sample in flat areas as slopes can cause erosion and colluvium that can interfere with soil stratification. Soil and sedimentary particles are deposited in sequence, thus we can expect the bottom soil horizons to harbor older eDNA signals than those at the top. However, mixing across vertical layers can be expected as a result of bioturbation, and it is thus very important to assess the stratigraphy of the soil/sediment that is being investigated. If bioturbation is absent, sampling specific soil horizons can thus be used to capture vegetation with particular time signals (Dickie et al. 2018). Similarly, the amount of soil collected, as well as the number of samples and replicates, can affect the spatial and time signal captured (Calderón-Sanou et al. 2020; Dopheide et al. 2019; Taberlet et al. 2012; Zinger et al. 2019a). We recommend sampling at least 10 g of soil, but power analysis and rarefaction curves can aid to determine and optimize this parameter (Dickie et al. 2018; Dopheide et al. 2019). If one prefers to reduce the effect of local heterogeneity in the sampling strategy, several dozens of subsamples (between 20 and 50 g) can be mixed (Dickie et al. 2018; Taberlet et al. 2012). This strategy is however not suitable for studies dealing with patterns at small spatial scales (< 1 m2; Edwards et al. 2018).

How to process the soil samples?

Obtaining clean DNA samples as well as avoiding cross contamination is challenging when sampling soil eDNA. Collection instruments should therefore be decontaminated between each sample (e.g., flaming, chlorine cleaning), gloves and masks should be worn and changed regularly to avoid introduction of DNA, and samples should be stored in separate plastic bags. In order to stop (or greatly reduce) enzymatic activity, samples should be stored cold or frozen, preferably at -20 °C, if immediate sample processing is not possible (Taberlet et al. 2012). Post-collection treatment of soil samples can also include air drying or freeze-drying to stop enzymatic activity and preserve DNA integrity in the sample (Nocker et al. 2012; Ritter et al. 2018). Soil samples are usually a mix of both above and below ground fragments of fauna and flora, i.e., debris, manure, roots, seeds, pollen, insects, and worms. DNA from organisms that are present in large total biomass may complicate detection of DNA signals from rare organisms. Thus, particularly for plant identification studies, it is worth considering whether root and leaf fragments should be sieved out from the soil samples. This will also contribute towards amplifying the signal from those low abundant taxa and normalize amplifications for all organisms present in a sample.

Extraction of iDNA or exDNA?

DNA extraction is a key bottleneck when capturing molecular data, and protocols need to be tailored to both the study area and the question(s). At a minimum, you need to decide which fraction of the total soil eDNA (iDNA or exDNA) you want to isolate to answer your research question. In general, isolating exDNA is preferred when targeting non-microorganisms and avoiding diversity patterns across short temporal scales (Taberlet et al. 2012; Zinger et al. 2009). While both extraction protocols are generally similar, iDNA extraction requires a cell lysis step. Breaking the cell wall or pollen exine can be achieved with soil grinding, sonication, thermal shocks, or chemical treatments (Frostegård et al. 1999; Zhou et al. 2007). For DNA extraction protocols specifically for pollen DNA, see Chapter 5 DNA from pollen. Commercial kits for DNA extraction are readily available for joint or separate extraction of iDNA and exDNA from soil, and these are commonly used in soil eDNA studies (Alsos et al. 2018; Edwards et al. 2018; Fahner et al. 2016; Foucher et al. 2020; Yoccoz et al. 2012; Zinger et al. 2019b). Taberlet et al. (2012) proposed an extraction protocol targeting exDNA that is suitable for tropical and nontropical areas, and can be performed with material that is commonly found in molecular laboratories. Depending on the soil properties in your study area, you can adapt commercial kits to increase the quality and quantity of DNA. For example, adding chloroform can increase the separation of the organic phase and aqueous phase, which in turn optimizes DNA quality (Fatima et al. 2014). However, chloroform is highly abrasive and can induce cell lysis. Alternatively, slightly alkaline solutions of phosphate buffers can remove soil particles to which exDNA might be bound while simultaneously preventing lysis of the cells (Nagler et al. 2018).

Which DNA marker(s) to use?

If (meta)barcoding is used for identification, there are three desired features for a barcode in any study: sufficient polymorphism for identification at the desired taxonomic resolution, conserved primer binding sites for universal amplification, and available reference sequences for the target organism. In many cases, not all features can be met. You may therefore need to decide on which features are most important for your research question. For more general information about choosing suitable markers and available reference databases, see Chapter 10 DNA barcoding and Chapter 11 Amplicon metabarcoding. Soil eDNA studies targeting plants have used markers found in chloroplast DNA (trnL P6 loop, matK, rbcL) and in ribosomal DNA (ITS2; Epp et al. 2018; Fahner et al. 2016; Yoccoz et al. 2012). However, metagenomic and target enrichment approaches are also starting to gain popularity as these avoid bias by PCR amplification and reduce the noise from non-target organisms (Johnson et al. 2019; Murchie et al. 2021). Fahner et al. (2016) compared the performance of plant barcodes (long vs. short barcodes) and recommended ITS2 and rbcL when identifying plants through soil eDNA metabarcoding, because these outperformed other markers in terms of recovery, reference completeness and identification resolution. Since the nuclear region, ITS2, is shared across plants and fungi, and the latter are abundantly present in soil, increased amplification of fungi can be expected. To avoid this, plant-specific primers targeting these regions can be used (Cheng et al. 2016). Furthermore, to avoid biased assessments towards particular plant groups when using ITS2, i.e., flowering plants or mosses, a combination of both TS2F/ITSp4 and ITSp3/ITSu4 primers pairs, is recommended to yield most of the land plant communities (Cheng et al. 2016; Timpano et al. 2020). In addition, the trnL P6 loop is the most commonly used marker in plant eDNA studies for a number of reasons: it has sufficient variability across both angiosperms and gymnosperms, there are a number of available reference databases as well as taxa-specific primers, and its small size works well for degraded eDNA (Alsos et al. 2020; Epp et al. 2018; Foucher et al. 2020).

Figure 1.

Chapter 4 Infographic: From leaf DNA to soil environmental DNA. One of the ways in which plant DNA is deposited in soil surfaces is through the accumulation of fallen leaves from trees.


  1. The laboratory technician hands you an extraction protocol that has been used previously to extract DNA from soil and sediments. How do you know if this protocol will extract both iDNA and exDNA? Motivate your answer.
  2. You are designing your soil eDNA study for a plant taxon that is distributed heterogeneously across plots. Describe the soil sampling strategy that will take into account the target taxon distribution.
  3. You want to reconstruct vegetation types based on soil eDNA targeting the trnL P6 loop. This marker will not allow you to identify all taxa to species level. Will this affect your ability to determine the vegetation types? Motivate why or why not?


Bioturbation – Biological processes involved in the dissemination of genetic media through terrestrial media.

DNA degradation – Refers to the physical changes of the DNA molecule.

DNA decay – Refers to the reduction in detectable quantity of eDNA.

DNA persistence – Refers to the amount of DNA that remains detectable across time.

DNA polymorphism – Presence of two or more variants of a particular DNA sequence.

Horizon – A layer parallel to the soil surface whose physical, chemical and biological characteristics differ from the layers above and beneath.

Power analysis – Probability of detecting an effect, given that the effect is really there. Can also be seen as rejecting the null hypothesis when it is in fact false.

Pedogenesis – The process of soil formation as regulated by the effects of place, environment, and history.

Rarefaction curves (in ecology) – A technique to assess species richness given the number of samples collected.


  • Alsos IG, Lammers Y, Yoccoz NG, Jørgensen T, Sjögren P, Gielly L, Edwards ME (2018) Plant DNA metabarcoding of lake sediments: how does it represent the contemporary vegetation. PLoS ONE 13, e0195403. doi: 10.1371/journal.pone.0195403
  • Alsos IG, Lavergne S, Merkel MKF, Boleda M, Lammers Y, Alberti A, Pouchon C, Denoeud F, Pitelkova I, Pușcaș M, Roquet C, Hurdu B-I, Thuiller W, Zimmermann NE, Hollingsworth PM, Coissac E (2020) The treasure vault can be opened: large-scale genome skimming works well using herbarium and silica gel dried material. Plants 9, 432. doi: 10.3390/plants9040432
  • Andersen K, Bird KL, Rasmussen M, Haile J, Breuning-Madsen H, Kjaer KH, Orlando L, Gilbert MTP, Willerslev E (2012) Meta-barcoding of “dirt” DNA from soil reflects vertebrate biodiversity. Mol. Ecol. 21, 1966–1979. doi: 10.1111/j.1365-294X.2011.05261.x
  • Ariza M, Fouks B, Mauvisseau Q, Halvorsen R, Alsos IG, de Boer H (2022) Plant biodiversity assessment through soil eDNA reflects temporal and local diversity. Methods Ecol. Evol. doi: 10.1111/2041-210X.13865
  • Baldwin DS, Mitchell AM (2000) The effects of drying and re-flooding on the sediment and soil nutrient dynamics of lowland river-floodplain systems: a synthesis. Regul. Rivers: Res. Mgmt. 16, 457–467. doi: 10.1002/1099-1646(200009/10)16:5<457::AID-RRR597>3.0.CO;2-B
  • Barnes MA, Turner CR, Jerde CL, Renshaw MA, Chadderton WL, Lodge DM (2014) Environmental conditions influence eDNA persistence in aquatic systems. Environ. Sci. Technol. 48, 1819–1827. doi: 10.1021/es404734p
  • Barnes MA, Turner CR (2015) The ecology of environmental DNA and implications for conservation genetics. Conserv. Genet. 17, 1–17. doi: 10.1007/s10592-015-0775-4
  • Bienert F, De Danieli S, Miquel C, Coissac E, Poillot C, Brun J-J, Taberlet P (2012) Tracking earthworm communities from soil DNA. Mol. Ecol. 21, 2017–2030. doi: 10.1111/j.1365-294X.2011.05407.x
  • Blum SAE, Lorenz MG, Wackernagel W (1997) Mechanism of retarded DNA degradation and prokaryotic origin of dnases in nonsterile soils. Syst. Appl. Microbiol. 20, 513–521. doi: 10.1016/S0723-2020(97)80021-5
  • Boggs LM, Scheible MKR, Machado G, Meiklejohn KA (2019) Single fragment or bulk soil DNA metabarcoding: which is better for characterizing biological taxa found in surface soils for sample separation? Genes (Basel) 10, 431. doi: 10.3390/genes10060431
  • Burdige DJ (2007) Geochemistry of marine sediments. Princeton University Press, Princeton, 624 pp.
  • Calderón-Sanou I, Münkemüller T, Boyer F, Zinger L, Thuiller W (2020) From environmental DNA sequences to ecological conclusions: how strong is the influence of methodological choices? J. Biogeogr. 47, 193–206. doi: 10.1111/jbi.13681
  • Cheng T, Xu C, Lei L, Li C, Zhang Y, Zhou S (2016) Barcoding the kingdom Plantae: new PCR primers for ITS regions of plants with improved universality and specificity. Mol. Ecol. Resour. 16, 138–149. doi: 10.1111/1755-0998.12438
  • Corti G, Agnelli A, Cuniglio R, Sanjurjo MF, Cocco S (2005) Characteristics of rhizosphere soil from natural and agricultural environments, in: Huang, P.M., Gobran, G.R. (Eds.), Biogeochemistry of Trace Elements in the Rhizosphere. Elsevier, pp. 57–128. doi: 10.1016/B0104978-044451997-9/50005-2
  • Cozzolino S, Cafasso D, Pellegrino G, Musacchio A, Widmer A (2007) Genetic variation in time and space: the use of herbarium specimens to reconstruct patterns of genetic variation in the endangered orchid Anacamptis palustris. Conserv. Genet. 8, 629–639. doi: 10.1007/s10592-006-9209-7
  • Deiner K, Bik HM, Mächler E, Seymour M, Lacoursière-Roussel A, Altermatt F, Creer S, Bista I, Lodge DM, de Vere N, Pfrender ME, Bernatchez L (2017) Environmental DNA metabarcoding: transforming how we survey animal and plant communities. Mol. Ecol. 26, 5872–5895. doi: 10.1111/mec.14350
  • Dickie IA, Boyer S, Buckley HL, Duncan RP, Gardner PP, Hogg ID, Holdaway RJ, Lear G, Makiola A, Morales SE, Powell JR, Weaver L (2018) Towards robust and repeatable sampling methods in eDNA-based studies. Mol. Ecol. Resour. 18, 940–952. doi: 10.1111/1755-0998.12907
  • Dopheide A, Xie D, Buckley TR, Drummond AJ, Newcomb RD (2019) Impacts of DNA extraction and PCR on DNA metabarcoding estimates of soil biodiversity. Methods Ecol. Evol. 10, 120–133. doi: 10.1111/2041-210X.13086
  • Edwards ME, Alsos IG, Yoccoz N, Coissac E, Goslar T, Gielly L, Haile J, Langdon CT, Tribsch A, Binney HA, von Stedingk H, Taberlet P (2018) Metabarcoding of modern soil DNA gives a highly local vegetation signal in Svalbard tundra. The Holocene 28, 2006–2016. doi: 10.1177/0959683618798095
  • Epp LS, Kruse S, Kath NJ, Stoof-Leichsenring KR, Tiedemann R, Pestryakova LA, Herzschuh U (2018) Temporal and spatial patterns of mitochondrial haplotype and species distributions in Siberian larches inferred from ancient environmental DNA and modeling. Sci. Rep. 8, 17436. doi: 10.1038/s41598-018-35550-w
  • Fahner NA, Shokralla S, Baird DJ, Hajibabaei M (2016) Large-scale monitoring of plants through environmental DNA metabarcoding of soil: recovery, resolution, and annotation of four DNA markers. PLoS ONE 11, e0157505. doi: 10.1371/journal.pone.0157505
  • Fatima F, Pathak N, Rastogi Verma S (2014) An improved method for soil DNA extraction to study the microbial assortment within rhizospheric region. Mol. Biol. Int. 2014, 518960. doi: 10.1155/2014/518960
  • Foucher A, Evrard O, Ficetola GF, Gielly L, Poulain J, Giguet-Covex C, Laceby JP, Salvador-Blanes S, Cerdan O, Poulenard J (2020) Persistence of environmental DNA in cultivated soils: implication of this memory effect for reconstructing the dynamics of land use and cover changes. Sci. Rep. 10, 10502. doi: 10.1038/s41598-020-67452-1
  • Frostegård A, Courtois S, Ramisse V, Clerc S, Bernillon D, Le Gall F, Jeannin P, Nesme X, Simonet P (1999) Quantification of bias related to the extraction of DNA directly from soils. Appl. Environ. Microbiol. 65, 5409–5420. doi: 10.1128/AEM.65.12.5409-5420.1999
  • Gardner CM, Gunsch CK (2017) Adsorption capacity of multiple DNA sources to clay minerals and environmental soil matrices less than previously estimated. Chemosphere 175, 45–51. doi: 10.1016/j.chemosphere.2017.02.030
  • Gothwal RK, Nigam VK, Mohan MK, Sasmal D, Ghosh P (2007) Extraction of bulk DNA from Thar Desert soils for optimization of PCR-DGGE based microbial community analysis. Electron. J. Biotechnol. 10, 400–408. doi: 10.2225/vol10-issue3-fulltext-6
  • Gulden RH, Lerat S, Hart MM, Powell JR, Trevors JT, Pauls KP, Klironomos JN, Swanton CJ (2005) Quantitation of transgenic plant DNA in leachate water: real-time polymerase chain reaction analysis. J. Agric. Food Chem. 53, 5858–5865. doi: 10.1021/jF01040504667
  • Johnson MG, Pokorny L, Dodsworth S, Botigué LR, Cowan RS, Devault A, Eiserhardt WL, Epitawalage N, Forest F, Kim JT, Leebens-Mack JH, Leitch IJ, Maurin O, Soltis DE, Soltis PS, Wong GK-S, Baker WJ, Wickett NJ (2019) A universal probe set for targeted sequencing of 353 nuclear genes from any flowering plant designed using k-medoids clustering. Syst. Biol. 68, 594–606. doi: 10.1093/sysbio/syy086
  • Kristensen E, Rabenhorst MC (2015) Do marine rooted plants grow in sediment or soil? A critical appraisal on definitions, methodology and communication. Earth-Science Reviews 145, 1–8. doi: 10.1016/j.earscirev.2015.02.005
  • Lacoursière-Roussel A, Deiner K (2021) Environmental DNA is not the tool by itself. J. Fish Biol. 98, 383–386. doi: 10.1111/jfb.14177
  • Levy-Booth DJ, Campbell RG, Gulden RH, Hart MM, Powell JR, Klironomos JN, Pauls KP, Swanton CJ, Trevors JT, Dunfield KE (2007) Cycling of extracellular DNA in the soil environment. Soil Biology and Biochemistry 39, 2977–2991. doi: 10.1016/j.soilbio.2007.06.020
  • Meiklejohn KA, Jackson ML, Stern LA, Robertson JM (2018) A protocol for obtaining DNA barcodes from plant and insect fragments isolated from forensic-type soils. Int. J. Legal Med. 132, 1515–1526. doi: 10.1007/s00414-018-1772-1
  • Murchie TJ, Kuch M, Duggan AT, Ledger ML, Roche K, Klunk J, Karpinski E, Hackenberger D, Sadoway T, MacPhee R, Froese D, Poinar H (2021) Optimizing extraction and targeted capture of ancient environmental DNA for reconstructing past environments using the PalaeoChip Arctic-1.0 bait-set. Quaternary Research 99, 305–328. doi: 10.1017/qua.2020.59
  • Nagler M, Insam H, Pietramellara G, Ascher-Jenull J (2018) Extracellular DNA in natural environments: features, relevance and applications. Appl. Microbiol. Biotechnol. 102, 6343–6356. doi: 10.1007/s00253-018-9120-4
  • Nielsen KM, Smalla K, van Elsas JD (2000) Natural transformation of Acinetobacter sp. strain BD413 with cell lysates of Acinetobacter sp., Pseudomonas fluorescens, and Burkholderia cepacia in soil microcosms. Appl. Environ. Microbiol. 66, 206–212. doi: 10.1128/aem.66.1.206-212.2000
  • Nocker A, Fernández PS, Montijn R, Schuren F (2012) Effect of air drying on bacterial viability: A multiparameter viability assessment. J. Microbiol. Methods 90, 86–95. doi: 10.1016/j.mimet.2012.04.015
  • Parducci L, Nota K, Wood J (2018) Reconstructing past vegetation communities using ancient DNA from lake sediments, in: Lindqvist, C., Rajora, O.P. (Eds.), Paleogenomics: Genome-Scale Analysis of Ancient DNA. Springer International Publishing, Cham, pp. 163–187. doi: 10.1007/13836_2018_38
  • Pedersen MW, Overballe-Petersen S, Ermini L, Sarkissian CD, Haile J, Hellstrom M, Spens J, Thomsen PF, Bohmann K, Cappellini E, Schnell IB, Wales NA, Carøe C, Campos PF, Schmidt AMZ, Gilbert MTP, Hansen AJ, Orlando L, Willerslev E (2015) Ancient and modern environmental DNA. Philos. Trans. R. Soc. Lond. B Biol. Sci. 370, 20130383. doi: 10.1098/rstb.2013.0383
  • Pietramellara G, Ascher J, Borgogni F, Ceccherini MT, Guerri G, Nannipieri P (2009) Extracellular DNA in soil and sediment: fate and ecological relevance. Biol. Fertil. Soils 45, 219–235. doi: 10.1007/s00374-008-0345-8
  • Poté J, Ackermann R, Wildi W (2009) Plant leaf mass loss and DNA release in freshwater sediments. Ecotoxicol. Environ. Saf. 72, 1378–1383. doi: 10.1016/j.ecoenv.2009.04.010
  • Prosser CM, Hedgpeth BM (2018) Effects of bioturbation on environmental DNA migration through soil media. PLoS ONE 13, e0196430. doi: 10.1371/journal.pone.0196430
  • Ritter CD, Zizka A, Roger F, Tuomisto H, Barnes C, Nilsson RH, Antonelli A (2018) High-throughput metabarcoding reveals the effect of physicochemical soil properties on soil and litter biodiversity and community turnover across Amazonia. PeerJ 6, e5661. doi: 10.7717/peerj.5661
  • Saeki K, Ihyo Y, Sakai M, Kunito T (2011) Strong adsorption of DNA molecules on humic acids. Environ. Chem. Lett. 9, 505–509. doi: 10.1007/s10311-011-0310-x
  • Schulz S, Brankatschk R, Dümig A, Kögel-Knabner I, Schloter M, Zeyer J (2013) The role of microorganisms at different stages of ecosystem development for soil formation. Biogeosciences 10, 3983–3996. doi: 10.5194/bg-10-3983-2013
  • Shackley M (1975) Archaeological sediments: a survey of analytical methods. Butterworth, London and Boston.
  • Shlemon RJ (1985) Application of soil-stratigraphic techniques to engineering geology. Environmental & Engineering Geoscience xxii, 129–142. doi: 10.2113/gseegeosci.xxii.2.129
  • Sirois SH, Buckley DH (2019) Factors governing extracellular DNA degradation dynamics in soil. Environ. Microbiol. Rep. 11, 173–184. doi: 10.1111/1758-2229.12725
  • Smol JP, Birks HJB, Last WM, Bradley RS, Alverson K (Eds) (2001) Tracking environmental change using lake sediments: terrestrial, algal, and siliceous indicators, Developments in paleoenvironmental research. Springer Netherlands, Dordrecht. doi: 10.1007/0-306-47668-1
  • Taberlet P, Bonin A, Zinger L, Coissac E (Eds) (2018) Environmental DNA: for biodiversity research and monitoring. Oxford University Press. doi: 10.1093/oso/9780198767220.001.0001
  • Taberlet P, Prud’Homme SM, Campione E, Roy J, Miquel C, Shehzad W, Gielly L, Rioux D, Choler P, Clément J-C, Melodelima C, Pompanon F, Coissac E (2012) Soil sampling and isolation of extracellular DNA from large amount of starting material suitable for metabarcoding studies. Mol. Ecol. 21, 1816–1820. doi: 10.1111/j.1365-294X.2011.05317.x
  • Thomsen PF, Willerslev E (2015) Environmental DNA – An emerging tool in conservation for monitoring past and present biodiversity. Biological Conservation 183, 4–18. doi: 10.1016/j.biocon.2014.11.019
  • Timpano EK, Scheible MKR, Meiklejohn KA (2020) Optimization of the second internal transcribed spacer (ITS2) for characterizing land plants from soil. PLoS ONE 15, e0231436. doi: 10.1371/journal.pone.0231436
  • Torsvik V, Goksøyr J, Daae FL (1990) High diversity in DNA of soil bacteria. Appl. Environ. Microbiol. 56, 782–787. doi: 10.1128/aem.56.3.782-787.1990
  • Vogt KA, Vogt DJ, Palmiotto PA, Boon P, O’Hara J, Asbjornsen H (1995) Review of root dynamics in forest ecosystems grouped by climate, climatic forest type and species. Plant Soil 187, 159–219. doi: 10.1007/BF010400017088
  • Vuillemin A, Horn F, Alawi M, Henny C, Wagner D, Crowe SA, Kallmeyer J (2017) Preservation and significance of extracellular DNA in ferruginous sediments from Lake Towuti, Indonesia. Front. Microbiol. 8, 1440. doi: 10.3389/fmicb.2017.01440
  • Wardle DA, Bardgett RD, Klironomos JN, Setälä H, van der Putten WH, Wall DH (2004) Ecological linkages between aboveground and belowground biota. Science 304, 1629–1633. doi: 10.1126/science.1094875
  • Willerslev E, Cooper A (2005) Ancient DNA. Proc. Biol. Sci. 272, 3–16. doi: 10.1098/rspb.2004.2813
  • Willerslev E, Davison J, Moora M, Zobel M, Coissac E, Edwards ME, Lorenzen ED, Vestergård M, Gussarova G, Haile J, Craine J, Gielly L, Boessenkool S, Epp LS, Pearman PB, Cheddadi R, Murray D, Bråthen KA, Yoccoz N, Binney H, Taberlet P (2014) Fifty thousand years of Arctic vegetation and megafaunal diet. Nature 506, 47–51. doi: 10.1038/nature12921
  • Wood JM (1987) Biological processes involved in the cycling of elements between soil or sediments and the aqueous environment. Hydrobiologia 149, 31–42. doi: 10.1007/BF010400048644
  • Yoccoz NG, Bråthen KA, Gielly L, Haile J, Edwards ME, Goslar T, Von Stedingk H, Brysting AK, Coissac E, Pompanon F, Sønstebø JH, Miquel C, Valentini A, De Bello F, Chave J, Thuiller W, Wincker P, Cruaud C, Gavory F, Rasmussen M, Taberlet P (2012) DNA from soil mirrors plant taxonomic and growth form diversity. Mol. Ecol. 21, 3647–3655. doi: 10.1111/j.1365-294X.2012.05545.x
  • Yoccoz NG (2012) The future of environmental DNA in ecology. Mol. Ecol. 21, 2031–2038. doi: 10.1111/j.1365-294X.2012.05505.x
  • Zhou L-J, Pei K-Q, Zhou B, Ma K-P (2007) A molecular approach to species identification of Chenopodiaceae pollen grains in surface soil. Am. J. Bot. 94, 477–481. doi: 10.3732/ajb.94.3.477
  • Zinger L, Bonin A, Alsos IG, Bálint M, Bik H, Boyer F, Chariton AA, Creer S, Coissac E, Deagle BE, De Barba M, Dickie IA, Dumbrell AJ, Ficetola GF, Fierer N, Fumagalli L, Gilbert MTP, Jarman S, Jumpponen A, Kauserud H, Taberlet P (2019a) DNA metabarcoding-Need for robust experimental designs to draw sound ecological conclusions. Mol. Ecol. 28, 1857–1862. doi: 10.1111/mec.15060
  • Zinger L, Shahnavaz B, Baptist F, Geremia RA, Choler P (2009) Microbial diversity in alpine tundra soils correlates with snow cover dynamics. ISME J. 3, 850–859. doi: 10.1038/ismej.2009.20
  • Zinger L, Taberlet P, Schimann H, Bonin A, Boyer F, De Barba M, Gaucher P, Gielly L, Giguet-Covex C, Iribar A, Réjou-Méchain M, Rayé G, Rioux D, Schilling V, Tymen B, Viers J, Zouiten C, Thuiller W, Coissac E, Chave J (2019b) Body size determines soil community assembly in a tropical forest. Mol. Ecol. 28, 528–543. doi: 10.1111/mec.14919


  1. By checking if there is a step that can lyse the cells to extract iDNA. This step can be grinding, sonication, thermal shocks, or chemical treatments such as with chloroform.
  2. To take into account heterogeneity the strategy is to take many subsamples and mix them.
  3. Soil eDNA using trnL P6 loop will not give you accurate species lists in most floras, but rather lists of genera with occasional low-level or higher-level identifications. Most vegetation types are characterized by a few key species only, so having limited taxonomic resolution of your identifications is unlikely to affect the overall vegetation type calling. However in some floras or vegetation types this approach will be insufficient, e.g., for those characterized by specific taxa in locally speciose genera.

Chapter 5 DNA from pollen


Why use DNA from pollen instead of morphology?

To identify pollen, spores, and other plant-related microremains, the field of palynology has traditionally relied on microscope-based analyses. This is a time-consuming process that requires highly trained specialists. Additionally, pollen grains from many plant families are morphologically indistinguishable using light microscopy (Beug 2004). Therefore, pollen can often not be distinguished beyond the genus- or family-level. Using more advanced microscopy techniques, the finer and potentially species-specific details on the pollen surface (i.e., exine) can be visualised (e.g., scanning electron microscope (SEM) and super-resolution microscopy (see e.g. Sivaguru et al. 2018)). However, these techniques often require extensive sample preparation, highly trained palynologists, and require costly microscopes. Moreover, some pollen grain features are so fine (less than 500 nm) that not even these sophisticated imaging techniques can visualise them. A combination of high-resolution imaging and automatic image detection using sufficiently trained neural networks is another emerging method to increase taxonomic resolution with pollen morphology (Polling et al. 2021; Romero et al. 2020). This technique, however, requires an extensively trained network with a large and varied pollen image reference database.

These challenges highlight the necessity for innovative methods within the field of palynology, to increase both the speed and accuracy of pollen identifications. DNA-based methods for the molecular identification of pollen grains have the potential to be of complementary value. However, the extraction of DNA from pollen is non-trivial. This chapter therefore focuses on how DNA can be extracted from pollen, the common problems encountered, and the qualitative and quantitative molecular possibilities for analyses.

Applications of DNA-based methods for pollen identification

Using pollen grain DNA for identification has shown promising results in a number of applications, including the study of provenance and authentication of honey (Hawkins et al. 2015; Prosser and Hebert 2017; Utzeri et al. 2018), plant-pollinator networks (Pornon et al. 2017; Richardson et al. 2019), hay fever predictions (Campbell et al. 2020; Kraaijeveld et al. 2015; Leontidou et al. 2018), forensic science (Bell et al. 2016a, and references therein), and environmental reconstructions from pollen in soil (Parducci et al. 2017) (see Section 3 for full information on applications). Ancient DNA can be extracted from pollen grains as old as 150 kyr (Suyama et al. 1996), and has also been used for reconstructing ancient plant-pollinator networks (Gous et al. 2019) (see Chapter 21 Palaeobotany).

Collecting pollen for DNA analysis

Collecting pollen for DNA analysis is mostly similar to collecting pollen for microscopic analysis, though more care should be taken to avoid contamination from other potential sources of DNA. This is because pollen generally contains low quantities of DNA and is therefore prone to contamination. Pollen grains can either be collected directly from the environment (air, water, soil, etc.) or from pollinators (pollen baskets, honey). Pollen collected from the environment will most often (though not always) be derived from anemophilous (wind pollinated) plants, while pollinators collect the majority of pollen from so-called entomophilous (insect pollinated) plants. Pollinators may, however, also have anemophilous pollen accidentally sticking to their bodies. For studies looking at pollen from pollinators, either all pollen grains on the animal’s body are collected by washing off the pollen or, when present, only the corbicular pollen baskets are collected (Bell et al. 2017; Richardson et al. 2015). Pollinators can either be collected in the field using aerial netting or collected from natural history collections (Gous et al. 2019). Insect-collected pollen baskets contain many hundreds of thousands of pollen grains, and collecting even a small subset of this basket is sufficient for molecular analysis. Honey also contains huge numbers of pollen grains, but it can be more challenging to work with for DNA analyses. This is because there are many compounds in honey such as polyphenols and flavonoids that can chemically inhibit methods used for DNA sequencing (Prosser and Hebert 2017). In contrast, while airborne pollen grains lack these inhibitors, it is present in only relatively low concentrations in the ambient air. Therefore, to collect sufficient amounts of pollen for molecular analyses, most of the sampling methods focus on air filtration methods. These include both volumetric (e.g., Hirst type; Hirst 1952) and gravimetric methods (for an overview please see Banchi et al. 2020; Levetin 2004).

Pollen DNA extraction

Pollen lysis

Pollen grains can be referred to as “natural plastic”: they have a very hard outer cell wall called an exine, which is made of sporopollenin (Brooks and Shaw 1968). Pollen exine is very resistant to non-oxidative physical, biological, and chemical degradation. This is evidenced by their ubiquitous presence in the fossil record and some fossil pollen exines have been found preserved for over 243 million years (Hochuli and Feist-Burkhardt 2013). Extracting DNA from pollen grains is thus not trivial, since the exine must be broken to release the inner DNA. Entomophilous pollen grains also contain DNA-rich pollenkitt outside the exine, but this DNA is usually heavily degraded, and it is the DNA inside the pollen grains that remains intact (Pornon et al. 2017; Pacini and Hesse 2005). A lysis step using mechanical bead-beating and a lysis buffer is often used before DNA extraction of pollen grains, and has been shown to improve DNA quantity (Swenson and Gemeinholzer 2021). However, if the lysis time is too long, or the bead-beating too vigorous, DNA yield may actually decrease. (Swenson and Gemeinholzer 2021) found that best results can be obtained at 33 to 67% exine rupture, instead of 100% exine rupture and using 2 hours of lysis incubation instead of 24 hours. Various different bead-beating strategies have been adopted (Table 1), including using a single relatively large bead (5 mm) or different mixtures of large and small beads. Many different types of material have also been used, including stainless steel, tungsten carbide, glass, and zirconium beads, but the choice of material does not seem to influence the extraction. It is always recommended to test the lysis efficiency, which can be done by checking the fraction of broken (i.e., lysed) pollen grains under the microscope after the bead beating process (e.g. Kraaijeveld et al. 2015).

Table 1.

Overview of selected studies since 2017 that have used molecular techniques to identify pollen, including the aim, strategy for pollen lysis, extraction method, amount of PCR cycles, sequencing method, and marker choice.

Study Aim Pollen lysis step Extraction method PCR cycles Sequencing method Markers
Leontidou et al. 2018 Airborne pollen identification Bead beating (one 5 mm stainless steel bead), two 1-min cycles at 30 Hz DNeasy Plant Mini Kit (Qiagen) and Nucleomag kit (Macherey–Nagel) 30 Sanger sequencing trnL
Lang et al. 2019 Pollen quantification Bead beating (mix of 0.5 and 1 mm silica beads), 2 min Wizard (Promega) N/A Genome skimming N/A
Bell et al. 2019 Pollen quantification Bead beating (mini-bead beater), 3 min FastDNA SPIN Kit for Soil (MP Biomedicals) 30 Metabarcoding nrITS2, rbcL
Peel et al. 2019 Pollen quantification Bead beating (five 1 mm stainless steel beads), 2 min at 22.5 Hz Adapted CTAB N/A Genome skimming N/A
Gous et al. 2019 Plant pollinator interactions over time Bead beating (one 3 mm stainless steel bead + lysis buffer), 2 min at 25 Hz QIAamp DNA Micro Kit and DNeasy Plant Mini Kit (Qiagen), Nucleospin DNA Trace Kit (Macherey-Nagel) 30 Metabarcoding nrITS1, nrITS2, rbcL
Brennan et al. 2019 Airborne pollen identification Bead beating (3 mm tungsten beads), 4 min at 30 Hz DNeasy Plant Mini Kit (Qiagen) 35 Metabarcoding nrITS2, rbcL
Richardson et al. 2019 Bee pollen diet Bead beating (3.355 mg 0.7 mm zirconia beads), 5 min DNeasy Plant Mini kit (Qiagen) Three steps (55 cycles in total) Metabarcoding nrITS2, rbcL, trnL, trnH
Suchan et al. 2019 Insect migration analysis Bead beating (five zirconium beads), 1 min at 30 Hz No extraction, using Phire Plant Direct Polymerase Two steps (32 cycles in total) Metabarcoding nrITS2
Baksay et al. 2020 Pollen quantification CF lysis buffer (Nucleospin Food Kit) DNeasy Plant Mini Kit (Qiagen) 25, 30, 35 Metabarcoding nrITS1, trnL
Campbell et al. 2020 Airborne pollen identification Bead beating (0.2 g 425–600 μm glass beads + lysis buffer), two 1-min cycles (3450 oscillations/min) Adapted CTAB 40 Metabarcoding rbcL
Bänsch et al. 2020; Leidenfrost et al. 2020 Bee pollen diet Bead beating (150 g mix of 1.4 mm ceramic and 3 mm tungsten beads + lysis buffer), two 45 second cycles at 6.5 m/s DNeasy Plant Mini Kit (Qiagen) 37 Metabarcoding nrITS2

It should be noted that other methods for DNA extraction from pollen exist in which the pollen grains are not destroyed, and in some specific cases, excluding the bead-beating step has even given better results (Ghitarrini et al. 2018; Gous et al. 2019).

DNA extraction

Several commercially available DNA extraction protocols have been used for DNA extraction from pollen grains after the lysis step. Table 1 gives an overview of protocols used in recent literature (for a full overview see Bell et al. 2016b). DNA is most commonly extracted from pollen using the DNeasy Plant Mini Kit (Qiagen) due to its ease of use and high success rate. However, while this is the most commonly used method, recent papers comparing different methods suggest that the best DNA extraction protocol should be empirically found. In one recent paper, several extraction protocols were compared for airborne pollen collected using air samplers (Leontidou et al. 2018). The highest DNA yield was obtained by using a DNA lysis step with steel beads and the Nucleomag Kit. For bee-collected pollen grains, however, the DNeasy Mini Kit gave the best results amongst several different protocols (Gous et al. 2019). Thus, it is always recommended to test several different DNA extraction methods for optimal DNA yield within the chosen study system.

The quality of DNA that can be extracted from pollen samples is critical for any molecularly-based identification method, and particularly when working with very small amounts of DNA. Therefore, avoiding contamination is critical and it is essential to work in a clean lab, to keep windows closed, use sterilised tools in a laminar flow cabinet, and to keep the DNA extraction lab separate from the post-PCR environment.

Molecular methods for pollen identification

Molecular methods can contribute to the analysis of pollen both by identifying which species are present (qualitative) as well as by giving a measure of the abundance of different pollen species (quantification). While DNA metabarcoding methods are currently most often used (Table 1), DNA barcoding techniques have also been applied to target specific species from a mixture, while metagenomics now allows for pollen quantification. For a review of these different sequencing methods, see Chapter 10 DNA barcoding, Chapter 11 Amplicon metabarcoding, and Chapter 12 Metagenomics.

Qualitative pollen analysis

DNA barcoding

Species-resolution in pollen grain identifications is critical for studies that try to answer specific research questions including: what particular species of flower does a common carder bee prefer? What grass species is responsible for most of the pollen in the ambient air in early May? Species-specific markers and qPCR techniques can be used for the identification of specific species within a mixture of different pollen types (see Chapter 10 DNA barcoding). One study used custom-made primers for the nuclear Internal Transcribed Spacer (nrITS) to differentiate between mugwort (Artemisia vulgaris) and ragweed (Ambrosia artemisiifolia), two notoriously allergenic species from the Asteraceae family (Müller-Germann et al. 2017). These newly constructed primers were then applied on aerobiological samples to show that ragweed pollen can travel long distances, since it was detected outside of the local pollination period. Barcoding was also used to show that allergenic Juniperus ashei pollen grains could be found in Canada, even if the closest plants that they could have originated from were located in Texas and Oklahoma, USA (Mohanty et al. 2017). These are two studies that illustrate the potential to identify pollen grains at the species level using DNA-based methods, though this level of resolution is not always necessary. In the grass family (Poaceae) for example, all species from certain subfamilies are known to have much higher allergenic prevalence than other subfamilies, and therefore subfamily resolution is sufficient for hay fever predictions (Frenguelli et al. 2010). Ghitarrini et al. (2018), for example, used species- but also subfamily-specific primers with real-time PCR to target the most allergenic types of grasses. Pooideae (a subfamily of grasses with many allergenic species) and individual species within this subfamily were detected in aerobiological samples on a presence/absence basis.

DNA metabarcoding

DNA barcoding can be used to target specific species, yet it is rare that a pollen sample contains only a single pollen species. DNA metabarcoding is therefore the most-often used method for the molecular identification of the different species of pollen grains from mixed samples (see Chapter 11 Amplicon metabarcoding). Both nuclear and chloroplast DNA can be amplified in pollen DNA (Bell et al. 2016b), and amongst the many different markers that have been tested, rbcL, trnL, matK, and trnH-psbA from the chloroplast, as well as nuclear ribosomal ITS2 (nrITS2), have so far shown the most promise for the molecular identification of pollen grains. Since no universal barcode exists that would allow detection of all plant lineages, a combination of a nuclear and chloroplast marker has been advised (Hollingsworth 2011). nrITS2 (~450 bp) is particularly relevant for the identification of pollen grains when relatively fresh (and non-degraded) DNA is available. In one example, pollen was collected from the bodies of the migratory butterfly species Vanessa cardui and identified based on nrITS2, providing geographical information on where the butterflies were migrating from (Suchan et al. 2019). Because several Saharan endemic plants were identified to the species level, this provided excellent evidence for the butterflies originating from the Sahara region.

Figure 1.

Chapter 5 Infographic: Overview of pollen sources, DNA extraction, and downstream analytical methods for the molecular identification of plants from pollen DNA.

While research into targeting different barcoding regions and primers is ongoing (trnT-F; Alan et al. 2019; and ITS1; Baksay et al. 2020), another development is the use of more specific reference databases. The commonly used NCBI GenBank returns many untrustworthy hits since it is not curated (see e.g. Meiklejohn et al. 2019). Brennan et al. (2019) designed a metabarcoding study with two common markers (rbcL and nrITS2), but using a strictly curated reference library containing sequences only from those grass species that occurred locally. They further customised this database to include all other invasive as well as cultivated species in the UK. Using their customised database, the authors showed signals in temporally restricted grass genera throughout the grass pollen season, with minimal background from unexpected species that often results from mismatches when using a more generic reference database. Furthermore, they identified that while some genera of grass may flower early in summer in one location, it could be months later for flowering to occur in other locations. This information can be used by hay fever patients to figure out what specific grass genus they are allergic to, and additionally illustrates the relationship between flowering phenology and airborne pollen incidence.

It is important to use positive controls with known concentrations of different pollen species in any DNA metabarcoding study. This is because the amount of DNA that can be extracted from different pollen types has been shown to vary. For example, it can be easier to extract DNA from pollen with a thinner exine and from plant species that are richer in chloroplast DNA than from those having a more ‘sturdy’ exine (Leontidou et al. 2018). Furthermore, in-silico testing of the chosen primers on target plant species, and making sure reference sequences are available can help to improve the efficiency of the study.

Quantitative pollen analysis

Beyond identifying which pollen species are present in a particular sample, pollen grain quantification is equally important. For example, for hay fever forecasts, it is not just important to know if there are certain allergenic pollen in the air, but also how many pollen grains there are at a given point in time. The golden standard for palynology has been to count a certain number of pollen grains under the microscope (e.g., 200 to 500) to obtain a semi-quantitative measure of the pollen types in a sample. While DNA-based methods for pollen quantification are less developed than DNA-based methods for identification, DNA-based pollen quantification using metagenomics (reviewed in Chapter 12 Metagenomics) seems feasible, while there is still strong debate about using DNA metabarcoding reads for this purpose.

DNA metabarcoding reads

In a recent study on the use of DNA to quantify pollen grains, Bell and colleagues found a very weak correlation between pollen counts recorded by palynologists and the proportion of metabarcoding reads (Bell et al. 2019). They constructed different mixtures of known pollen species, and then amplified the marker regions rbcL and nrITS2. The authors showed that it depends not only on the species studied, but also on the presence of other species in the mock mixture whether or not this correlation was higher or lower. They identified four metabarcoding related factors that influenced this quantitative bias: copy number, preservation, DNA isolation technique, and amplification bias. Indeed, in many other studies that explore quantification using metabarcoding reads, these factors are often identified as major problems, and DNA metabarcoding reads are therefore mostly used only for relative read abundances in other fields of science (Deagle et al. 2019; Lamb et al. 2019; Pawluczyk et al. 2015).

Another group of scholars, however, are finding more promising results in using DNA metabarcoding to quantify pollen grains. Baksay et al. (2020) for example studied the influence of several factors on quantifying species abundance using mock pollen mixtures, with two commonly found bee-collected pollen species. First, the marker regions nrITS1 and trnL were chosen and the amplification results were compared to the number of pollen grains counted using flow cytometry. They found the best results using trnL and 30 PCR cycles, or with a high-fidelity PCR polymerase and nrITS1 to circumvent the high GC content in the nuclear ribosomal nrITS region. It is important to note that while trnL overall gave the best results for quantification, species-level resolution was only possible with the nrITS1 marker region. Similarly promising results were obtained by Richardson et al. (2019) where a multi-locus approach was used to quantify bee-collected pollen. The amplification results for trnL and rbcL matched well with the microscopy results, while nrITS2 showed a weak correlation. The authors therefore recommended using the median or mean abundance from several loci to improve the quantification accuracy. Bänsch et al. (2020) in contrast found a high correlation between read count and microscopy count using the nrITS2 region on pollen collected by honey bees and bumblebees. The authors suggested that the correlation depends on the specific type of pollen species studied.

Metagenomic approaches

Since using DNA metabarcoding approaches for pollen abundance may not give quantitative results with complex, multi-species samples, other molecular methods such as genome skimming and shotgun sequencing are being used to circumvent some of the drawbacks. The major advantage of these two methods is that they do not include a PCR-step and therefore do not introduce amplification bias (see Chapter 12 Metagenomics). Genome skimming has already been used to show that quantification is feasible, even for pollen from species that are very rare in mock mixtures (Lang et al. 2019). Because full genomes are only available for less than 1% of all plant species, Peel et al. (2019) developed a method where only partial genome skims are used (0.5X coverage). They found a high correlation between their partial genome skimming results and the expected relative abundance for each pollen type in the mixture. Moreover, the authors indicate that while genome skimming a single pollen sample is still relatively expensive (€70), the advancements made in sequencer technology will help to reduce this price significantly in the near future.


  1. What are the main advantages of molecular pollen identification over traditional (microscopic) methods? Justify your answer.
  2. Pollen is dispersed by various vectors. There are two main types of pollination strategies in land plants, please name them and also explain the importance of the difference between the two in terms of DNA yield.
  3. Which four factors make the quantification of pollen grains using metabarcoding problematic?


Anemophilous – Wind-pollinated.

Bead beating – The application of beads to break open the outer cell wall of pollen grains.

Hirst-type pollen trap – Volumetric air sampler that is one of the standard devices for monitoring airborne pollen and spores.

cpDNA – Chloroplast DNA.

Entomophilous – Insect-pollinated.

Exine – Outer wall of pollen grains. Composed mainly of sporopollenin that is extremely resistant to degradation. The exine of pollen grains has to be broken to release the DNA from the organic material within the grains.

Palynology – The science that studies both living and fossil spores, pollen grains, and other microscopic structures (e.g., chironomids, dinocysts, acritarchs, chitinozoans, scolecodonts).

Pollen grains – The male gametophyte of seed plants; source and carrier for the male gametes (spermatozoids or sperm cells).

Pollenkitt – The outermost hydrophobic lipid layer mostly present on entomophilous pollen grains.

Sporopollenin – A chemically inert biological polymer that is a component of the outer wall (see Exine) of a pollen grain.

Super-resolution microscopy – Technique in optical microscopy that allows visualisation of images with resolutions up to 140 nm, much higher than those imposed by the diffraction limit. This technique also allows visualisation of internal structures.


  • Alan Ş, Sarışahin T, Şahin AA, Kaplan A, Erdoğan İ, Pınar NM (2019) A new method to quantify atmospheric Poaceae pollen DNA based on the trnT-F cpDNA region. Turk. J. Bioch. 44, 248–253. doi: 10.1515/tjb-2018-0020
  • Baksay S, Pornon A, Burrus M, Mariette J, Andalo C, Escaravage N (2020) Experimental quantification of pollen with DNA metabarcoding using ITS1 and trnL. Sci. Rep. 10, 4202. doi: 10.1038/s41598-020-61198-6
  • Banchi E, Pallavicini A, Muggia L (2020) Relevance of plant and fungal DNA metabarcoding in aerobiology. Aerobiologia (Bologna) 36, 9–23. doi: 10.1007/s10453-019-09574-2
  • Bänsch S, Tscharntke T, Wünschiers R, Netter L, Brenig B, Gabriel D, Westphal C (2020) Using ITS2 metabarcoding and microscopy to analyse shifts in pollen diets of honey bees and bumble bees along a mass-flowering crop gradient. Mol. Ecol. 29, 5003–5018. doi: 10.1111/mec.15675
  • Bell KL, Burgess KS, Botsch JC, Dobbs EK, Read TD, Brosi BJ (2019) Quantitative and qualitative assessment of pollen DNA metabarcoding using constructed species mixtures. Mol. Ecol. 28, 431–455. doi: 10.1111/mec.14840
  • Bell KL, Burgess KS, Okamoto KC, Aranda R, Brosi BJ (2016a) Review and future prospects for DNA barcoding methods in forensic palynology. Forensic Sci. Int. Genet. 21, 110–116. doi: 10.1016/j.fsigen.2015.12.010
  • Bell KL, de Vere N, Keller A, Richardson RT, Gous A, Burgess KS, Brosi BJ (2016b) Pollen DNA barcoding: current applications and future prospects. Genome 59, 629–640. doi: 10.1139/gen-2015-0200
  • Bell KL, Fowler J, Burgess KS, Dobbs EK, Gruenewald D, Lawley B, Morozumi C, Brosi BJ (2017) Applying pollen DNA metabarcoding to the study of plant-pollinator interactions. Appl. Plant Sci. 5, 1600124. doi: 10.3732/apps.1600124
  • Beug H-J (2004) Leitfaden der Pollenbestimmung für Mitteleuropa und angrenzende Gebiete. Friedrich Pfeil, München.
  • Brennan GL, Potter C, de Vere N, Griffith GW, Skjøth CA, Osborne NJ, Wheeler BW, McInnes RN, Clewlow Y, Barber A, Hanlon HM, Hegarty M, Jones L, Kurganskiy A, Rowney FM, Armitage C, Adams-Groom B, Ford CR, Petch GM, PollerGEN Consortium Creer S (2019) Temperate airborne grass pollen defined by spatio-temporal shifts in community composition. Nat. Ecol. Evol. 3, 750–754. doi: 10.1038/s41559-019-0849-7
  • Brooks J, Shaw G (1968) Chemical structure of the exine of pollen walls and a new function for carotenoids in nature. Nature 219, 532–533. doi: 10.1038/219532a0
  • Campbell BC, Al Kouba J, Timbrell V, Noor MJ, Massel K, Gilding EK, Angel N, Kemish B, Hugenholtz P, Godwin ID, Davies JM (2020) Tracking seasonal changes in diversity of pollen allergen exposure: Targeted metabarcoding of a subtropical aerobiome. Sci. Total Environ. 747, 141189. doi: 10.1016/j.scitotenv.2020.141189
  • Deagle BE, Thomas AC, McInnes JC, Clarke LJ, Vesterinen EJ, Clare EL, Kartzinel TR, Eveson JP (2019) Counting with DNA in metabarcoding studies: how should we convert sequence reads to dietary data? Mol. Ecol. 28, 391–406. doi: 10.1111/mec.14734
  • Frenguelli G, Passalacqua G, Bonini S, Fiocchi A, Incorvaia C, Marcucci F, Tedeschini E, Canonica GW, Frati F (2010) Bridging allergologic and botanical knowledge in seasonal allergy: a role for phenology. Ann. Allergy Asthma Immunol. 105, 223–227. doi: 10.1016/j.anai.2010.06.016
  • Ghitarrini S, Pierboni E, Rondini C, Tedeschini E, Tovo GR, Frenguelli G, Albertini E (2018) New biomolecular tools for aerobiological monitoring: Identification of major allergenic Poaceae species through fast real-time PCR. Ecol. Evol. 8, 3996–4010. doi: 10.1002/ece3.3891
  • Gous A, Swanevelder DZH, Eardley CD, Willows-Munro S (2019) Plant-pollinator interactions over time: Pollen metabarcoding from bees in a historic collection. Evol. Appl. 12, 187–197. doi: 10.1111/eva.12707
  • Hawkins J, de Vere N, Griffith A, Ford CR, Allainguillaume J, Hegarty MJ, Baillie L, Adams-Groom B (2015) Using DNA metabarcoding to identify the floral composition of honey: A new tool for investigating honey bee foraging preferences. PLoS ONE 10, e0134735. doi: 10.1371/journal.pone.0134735
  • Hirst JM (1952) AN AUTOMATIC VOLUMETRIC SPORE TRAP. Ann. Applied Biology 39, 257–265. doi: 10.1111/j.1744-7348.1952.tB010500904.x
  • Hochuli PA, Feist-Burkhardt S (2013) Angiosperm-like pollen and Afropollis from the Middle Triassic (Anisian) of the Germanic Basin (Northern Switzerland). Front. Plant Sci. 4, 344. doi: 10.3389/fpls.2013.00344
  • Hollingsworth PM (2011) Refining the DNA barcode for land plants. Proc Natl Acad Sci USA 108, 19451–19452. doi: 10.1073/pnas.1116812108
  • Kraaijeveld K, de Weger LA, Ventayol García M, Buermans H, Frank J, Hiemstra PS, den Dunnen JT (2015) Efficient and sensitive identification and quantification of airborne pollen using next-generation DNA sequencing. Mol. Ecol. Resour. 15, 8–16. doi: 10.1111/1755-0998.12288
  • Lamb PD, Hunter E, Pinnegar JK, Creer S, Davies RG, Taylor MI (2019) How quantitative is metabarcoding: A meta-analytical approach. Mol. Ecol. 28, 420–430. doi: 10.1111/mec.14920
  • Lang D, Tang M, Hu J, Zhou X (2019) Genome-skimming provides accurate quantification for pollen mixtures. Mol. Ecol. Resour. 19, 1433–1446. doi: 10.1111/1755-0998.13061
  • Leidenfrost RM, Bänsch S, Prudnikow L, Brenig B, Westphal C, Wünschiers R (2020) Analyzing the dietary diary of bumble bee. Front. Plant Sci. 11, 287. doi: 10.3389/fpls.2020.00287
  • Leontidou K, Vernesi C, De Groeve J, Cristofolini F, Vokou D, Cristofori A (2018) DNA metabarcoding of airborne pollen: new protocols for improved taxonomic identification of environmental samples. Aerobiologia (Bologna) 34, 63–76. doi: 10.1007/s10453-017-9497-z
  • Levetin E (2004) Methods for aeroallergen sampling. Curr. Allergy Asthma Rep. 4, 376–383. doi: 10.1007/s11882-004-0088-z
  • Meiklejohn KA, Damaso N, Robertson JM (2019) Assessment of BOLD and GenBank - Their accuracy and reliability for the identification of biological materials. PLoS ONE 14, e0217084. doi: 10.1371/journal.pone.0217084
  • Mohanty RP, Buchheim MA, Levetin E (2017) Molecular approaches for the analysis of airborne pollen: A case study of Juniperus pollen. Ann. Allergy Asthma Immunol. 118, 204-211.e2. doi: 10.1016/j.anai.2016.11.015
  • Müller-Germann I, Pickersgill DA, Paulsen H, Alberternst B, Pöschl U, Fröhlich-Nowoisky J, Després VR (2017) Allergenic Asteraceae in air particulate matter: quantitative DNA analysis of mugwort and ragweed. Aerobiologia (Bologna) 33, 493–506. doi: 10.1007/s10453-017-9485-3
  • Pacini E, Hesse M (2005) Pollenkitt – its composition, forms and functions. Flora - Morphology, Distribution, Functional Ecology of Plants 200, 399–415. doi: 10.1016/j.flora.2005.02.006
  • Parducci L, Bennett KD, Ficetola GF, Alsos IG, Suyama Y, Wood JR, Pedersen MW (2017) Ancient plant DNA in lake sediments. New Phytol. 214, 924–942. doi: 10.1111/nph.14470
  • Pawluczyk M, Weiss J, Links MG, Egaña Aranguren M, Wilkinson MD, Egea-Cortines M (2015) Quantitative evaluation of bias in PCR amplification and next-generation sequencing derived from metabarcoding samples. Anal. Bioanal. Chem. 407, 1841–1848. doi: 10.1007/s00216-014-8435-y
  • Peel N, Dicks LV, Clark MD, Heavens D, Percival-Alwyn L, Cooper C, Davies RG, Leggett RM, Yu DW (2019) Semi-quantitative characterisation of mixed pollen samples using MinION sequencing and Reverse Metagenomics (RevMet). Methods Ecol. Evol. 10, 1690–1701. doi: 10.1111/2041-210X.13265
  • Polling M, Li C, Cao L, Verbeek F, de Weger LA, Belmonte J, De Linares C, Willemse J, de Boer H, Gravendeel B (2021) Neural networks for increased accuracy of allergenic pollen monitoring. Sci. Rep. 11, 11357. doi: 10.1038/s41598-021-90433-x
  • Pornon A, Andalo C, Burrus M, Escaravage N (2017) DNA metabarcoding data unveils invisible pollination networks. Sci. Rep. 7, 16828. doi: 10.1038/s41598-017-16785-5
  • Prosser SWJ, Hebert PDN (2017) Rapid identification of the botanical and entomological sources of honey using DNA metabarcoding. Food Chem. 214, 183–191. doi: 10.1016/j.foodchem.2016.07.077
  • Richardson RT, Curtis HR, Matcham EG, Lin C-H, Suresh S, Sponsler DB, Hearon LE, Johnson RM (2019) Quantitative multi-locus metabarcoding and waggle dance interpretation reveal honey bee spring foraging patterns in Midwest agroecosystems. Mol. Ecol. 28, 686–697. doi: 10.1111/mec.14975
  • Richardson RT, Lin C-H, Sponsler DB, Quijia JO, Goodell K, Johnson RM (2015) Application of ITS2 metabarcoding to determine the provenance of pollen collected by honey bees in an agroecosystem. Appl. Plant Sci. 3, 1400066. doi: 10.3732/apps.1400066
  • Romero IC, Kong S, Fowlkes CC, Jaramillo C, Urban MA, Oboh-Ikuenobe F, D’Apolito C, Punyasena SW (2020) Improving the taxonomy of fossil pollen using convolutional neural networks and superresolution microscopy. Proc. Natl. Acad. Sci. USA 17, 28496–28505. doi: 10.1073/pnas.2007324117
  • Sivaguru M, Urban MA, Fried G, Wesseln CJ, Mander L, Punyasena SW (2018) Comparative performance of airyscan and structured illumination superresolution microscopy in the study of the surface texture and 3D shape of pollen. Microsc. Res. Tech. 81, 101–114. doi: 10.1002/jemt.22732
  • Suchan T, Talavera G, Sáez L, Ronikier M, Vila R (2019) Pollen metabarcoding as a tool for tracking long-distance insect migrations. Mol. Ecol. Resour. 19, 149–162. doi: 10.1111/1755-0998.12948
  • Suyama Y, Kawamuro K, Kinoshita I, Yoshimura K, Tsumura Y, Takahara H (1996) DNA sequence from a fossil pollen of Abies spp. from Pleistocene peat. Genes Genet. Syst. 71, 145–149. doi: 10.1266/ggs.71.145
  • Swenson SJ, Gemeinholzer B (2021) Testing the effect of pollen exine rupture on metabarcoding with Illumina sequencing. PLoS ONE 16, e0245611. doi: 10.1371/journal.pone.0245611
  • Utzeri VJ, Schiavo G, Ribani A, Tinarelli S, Bertolini F, Bovo S, Fontanesi L (2018) Entomological signatures in honey: an environmental DNA metabarcoding approach can disclose information on plant-sucking insects in agricultural and forest landscapes. Sci. Rep. 8, 9996. doi: 10.1038/s41598-018-27933-w


  1. A higher taxonomic resolution can be achieved using molecular methods such as metabarcoding. Furthermore, pollen analysis requires highly trained experts that have to spend considerable time to analyse a single sample and therefore molecular techniques are faster, especially with a large number of samples.
  2. Entomophilous (insect collected) and anemophilous (wind dispersed) pollen. The presence of pollenkitt on entomophilous pollen grains influences the amount of DNA that can be obtained per pollen grain.
  3. Copy number, DNA preservation, DNA isolation technique, and amplification bias.

Chapter 6 DNA from food and medicine

Why use DNA for the identification of food and medicine?

DNA-based methods for the molecular identification of plant products can help us to address food and medicine authenticity issues at each stage in the supply chain (Di Bernardo et al. 2007). Documentation and requirements for DNA-based detection methods for food authentication are defined in collaborative activities by the European Committee for Standardization (CEN) and the International Organization for Standardization (ISO). Both rapid and accurate identification of plant products are crucial for the the herbal drug industry (Mishra et al. 2016), where DNA-based authentication is recognised as a sensitive approach to identify edible and medicinal plant species, cultivars, and to detect their substitutes and adulterants in crude or processed products independent from life stage, tissue type, and physiological conditions of their constituents (Howard et al. 2020; Lo and Shaw 2018a, 2019; Mishra et al. 2016; Pawar et al. 2017; Raclariu et al. 2018; Techen et al. 2014). Molecular methods were integrated in the Pharmacopoeia of China in 2020 (Pharmacopoeia Committee of P. R. China 2020) and validated by the British Pharmacopoeia Commission in 2018 (British Pharmacopoeia Commission 2018) . In addition to species authentication, genetic identification of medicinal plants can assist in the field of pharmacophylogenomics and bioprospecting to discover new plant pharmaceutical resources (Hao et al. 2015) (Chapter 22 Healthcare).

DNA-based methods to identify plants in food and medicine

The majority of standardised DNA-based authentication methods for the inspection and regulation of food and plant-medicines use well-established PCR-based techniques for DNA amplification as these are sensitive, specific, and simple (Hirst et al. 2019). PCR-based authentication methods are standardised in the CEN international legislation mainly for the detection of genetically modified foods (GMOs) including soybean, hazelnut, almond, rapeseed, etc. (Grohmann and Seiler 2019).

DNA barcoding methods are also established for the identification of unique medicinal and edible plant species (Ichim 2019). High resolution melting (HRM) in combination with DNA barcoding (Bar-HRM) can also be used to identify barcode differences in complex botanical matrices and assess the quality of crude materials in the herbal supply chain (Mezzasalma et al. 2017) (See Chapter 13 Barcoding - High resolution melting).

High-throughput sequencing (HTS) methods such as amplicon metabarcoding are also powerful tools for the authentication of herbal end products, post-marketing control, pharmacovigilance, and the assessment of species composition in botanical medicines, such as in traditional Chinese medicines (TCMs) (Arulandhu et al. 2017; De Boer et al. 2015; Juul et al. 2015; Lo and Shaw 2018b; Omelchenko et al. 2019; Raclariu et al. 2018; Seethapathy et al. 2019). An example of the discriminatory power of DNA metabarcoding is revealed in a study in which 15 highly processed TCM ingredients could be identified as species and genera listed on CITES appendices I and II (Coghlan et al. 2012).

In addition to PCR-based techniques, the detection of single nucleotide polymorphisms (SNPs) is frequently used for the molecular identification and authentication of various food commodities using small DNA fragments (Di Bernardo et al. 2007; Lo and Shaw 2019). Several assay types are commercially available for SNP chips and similar technologies (Hirst et al. 2019). Metagenomics is also promising for the qualitative and quantitative analysis of processed food and medicine matrices (Raime et al. 2020).

DNA-based methods for molecular plant identification depend on well-curated nucleotide sequence repositories. In addition to GenBank (Benson et al. 2018) and the Barcode of Life database (BOLD) (Ratnasingham and Hebert 2007), the Medicinal Materials DNA Barcode Database (MMDBD) has been proposed as a sequence reference platform to identify medicinal plant, animal, and fungi species (Wong et al. 2018).

DNA isolation from food and medicines

Successful DNA extraction is the foundation for any further downstream analysis (Corrado 2016; Elsanhoty et al. 2011; Pinto et al. 2007; Turkec et al. 2015). Since food and medicine products can differ in molecular characteristics and structural form, the choice for a DNA isolation strategy must be sample specific. DNA extraction from most food products are based on DNA isolation techniques originally designed in the 1980s (Dellaporta et al. 1983; Lockley and Bardsley 2000), though these protocols are now typically adapted to include polymerases resistant to the inhibitors commonly found in a wide range of food and medicinal products (Omelchenko et al. 2019). Frequently used DNA extraction procedures are phenol-chloroform, detergent, and protease-based extraction methods and solid-phase extraction methods (see Chapter 1 DNA from plant tissue).

Factors affecting the efficacy of DNA extraction

Four main factors that affect the efficacy of DNA isolation from food and medicine samples are the sample source and processing, collection and storage, homogenisation, and the presence of contaminants. Generally it is easier to extract high-quality DNA from fresh samples (Peterson et al. 1997) since processing techniques often involve factors (e.g., high temperatures and changes in pH) that reduce the quality of DNA (Gryson 2010; Gryson et al. 2004). Secondly, samples need to be stored in low temperature conditions to reduce nuclease activity. Chemical inhibitors can also be used during collection and storage to lower the risk of hydrolysis and block nuclease activity. Sample homogenization is necessary to ensure that the purified DNA samples are representative of the complete original sample as well as to reduce DNA interactions with high molecular weight compounds such as polysaccharides (Wood 2002). Mechanical grinding with a mortar and pestle, disruption via agitation in the presence of ceramic and metal beads, and mechanical shearing with the help of grinding mills can be used. Alternatively, hydrolysing enzymes, or grinding in presence of liquid nitrogen can disrupt problematic plant material, such as samples with a high content of hardened cell walls. Fourthly, optimization of DNA extraction protocols is often necessary to reduce contaminating constituents, like plant secondary metabolites, proteins, etc. (see Table 1) (Wilkes 2019). In particular, spices and teas are rich in secondary metabolites, bark, roots, hard seeds, etc. (Omelchenko et al. 2019). The cetyltrimethylammonium bromide (CTAB) isolation method (Murray and Thompson 1980) is mostly used for unknown multi-herbal samples or samples with high quantities of polysaccharides (Arulandhu et al. 2017). This usually includes serine protease within the extraction buffer to remove proteins. The enzymatic activity of proteinase K is accelerated by sample incubation with the extraction buffer at 56 °C. Additionally, the initial lysis can be prolonged for optimal results.

Table 1.

Removal of frequent contaminants that can reduce the yield of extracted DNA from edible and medicinal plants.

Proteins and RNA
What compounds define the chemical composition of your samples? Polysaccharides (starch, sugars) Polyphenolics
RNA (plant secondary metabolites like: tannins, flavonoids, terpenoids, etc.)
Understand the specific properties of your samples for DNA extraction Can co-purify with DNA Can co-precipitate with DNA When bound to DNA very hard to remove in extraction
dependending on the age of the samples and how they were conserved Results in a sticky viscous consistency to DNA pellet after centrifugation
Inhibition of enzymes used for molecular techniques (restriction endonucleases, polymerases, and ligases (Pandey et al. 1996)) Results in contaminated pellets not usable for many downstream analyses (John 1992; Peterson et al. 1997)
Adherence to wells in agarose gel residing in long smears of bands detected in gel (Sharma et al. 2002)
Consider applying mitigation strategies to overcome difficulties in extracting DNA from your samples RNA removable with DNase-free RNase A or ethanol precipitation using lithium chloride Removal via highly concentrated sodium chloride (NaCl) in extraction buffers leading to increased solubility in ethanol Binder compounds polyvinyl pyrrolidone (PVP) or polypyrrolidone (PVPP) can be used in extraction buffers to absorb polyphenols before polymerization with DNA
Proteins can be removed by i) inclusion of detergents (cetyltrimethylammonium bromide (CTAB), SDS) in extraction buffer
Combination of NaCl and cationic detergent CTAB
CTAB with differential precipitation (Murray and Thompson 1980)
Use of antioxidant compounds (BME, DDT, ascorbic acid, iso-ascorbate) in buffer to prevent polymerization (Pich and Schubert 1993; Puchooa 2004)
ii) protein denaturants e.g., β-mercaptoethanol (BME), dithiothreitol (DTT)
iii) enzymatic proteases e.g., proteinase K

Although CTAB-based methods usually result in DNA extraction from plants and processed food and medicine products, the quantity is often quite low and the protocols are time consuming (Costa et al. 2015; Grazina et al. 2020). Many commercial DNA extraction kits are based on solid phase DNA purification (Boom et al. 1990) and have been well adopted for DNA isolation from specific matrices from various organic materials. Optimization and specification of such extraction protocols can be achieved by modifying wash buffer composition, extraction reagents, etc. in-house.

Commercial vs. in-house DNA isolation techniques

Several studies exist that compare commercial and in-house DNA isolation techniques for food and medicine (Costa et al. 2015; Di Bernardo et al. 2007; Omelchenko et al. 2019; Pafundo et al. 2011; Pinto et al. 2007; Smith et al. 2005). These studies indicate that the best method for DNA isolation is highly sample dependent. For example, silica membrane spin column based kits and sorbent-based kits produce higher DNA yields for teas and spices, whereas the CTAB method based on liquid-phase segregation was superior for more processed herbal remedies (Omelchenko et al. 2019), while extremely processed foods or medicinal extracts could hardly be analysed as a result of total DNA degradation (Grazina et al. 2020; Llongueras et al. 2013; Parveen et al. 2016). In one comparative study, eight different DNA extraction kits were tested for 13 medicinal plant products. Nucleospin plant methods overall yielded the best purity and amplification results for DNA extraction from degraded samples, while DNeasy kits resulted in the highest yields of extracted DNA from botanicals (Llongueras et al. 2013). This suggests that DNA extraction using commercial kits is highly sample dependent, and that there is no universal protocol to extract DNA from herbal products (Grazina et al. 2020; Llongueras et al. 2013). Nevertheless, the European Union Reference Laboratory for GM Food and Feed (EU-RL GMFF) recommends the use of certain extraction methods (Table 2), which can be a helpful guideline for choosing the right extraction protocol and adapting it to a specific sample source.

Table 2.

Overview of different DNA extraction methods recommended for use with food by the European Union Reference Laboratory for GM Food and Feed (EU-RL GMFF).

Plant source Method of choice Reference
Maize CTAB precipitate (in-house) (Rogers and Bendich 1985) CRLVL16/05XP corrected version 2 01/03/2018
Maize seeds and grains For isolation of genomic DNA from a wide variety of maize tissues and derived matrices for high-quality genomic DNA from processed plant tissue (e.g., leaf, grain, or seed).
Lysis step (thermal lysis in the presence of Tris HCl, EDTA, CTAB, and β-mercaptoethanol).
Tissues processed prior to extraction procedure. Possible methods of processing include a mortar and pestle with liquid nitrogen (leaf) or commercial blender (grain or seed).
Soybean CTAB precipitate (in-house) (Dellaporta et al. 1983) CRLVL13/05XP 14/05/2007
Soybean seeds “Dellaporta-derived” method starts with a lysis step (thermal lysis in the presence of Tris HCl, EDTA, NaCl, and β-mercaptoethanol).
Isopropanol precipitation and removal of contaminants such as lipophilic molecules and proteins by extraction with phenol:chloroform:isoamyl alcohol.
Potato CTAB/Microspin” method CRLVL09/05XP Corrected Version 1 20/01/2009
Freeze-dried potato tubers Lysis step (thermal lysis in the presence of CTAB, EDTA, and proteinase K).
Removal of RNA by digestion with RNase A and removal of contaminants such as lipophilic molecules and proteins by extraction with chloroform.
Remaining inhibitors are removed by a gel filtration step using the commercially available product S-300 HR Microspin Columns (Amersham Pharmacia).
Rapeseed CTAB precipitate (in-house) (Dellaporta et al. 1983) CRLVL14/04XP Corrected Version 1 15/01/2007
Lysis step (thermal lysis in the presence of Tris HCl, EDTA, SDS, and β-mercaptoethanol).
Removal of contaminants such as lipophilic molecules and proteins by extraction with phenol and chloroform.
DNA precipitate is generated by using isopropanol. The pellet is dissolved in TE buffer.
Rapeseed Inhibitors are removed by an anion exchange chromatography step using the DNA Clean & Concentrator 25 kit (Zymo Research). CRLVL14/04XP Corrected Version 1 15/01/2007
Multi-herbal products CTAB precipitate (in-house) (Murray and Thompson 1980) Arulandhu et al. 2017
Technique is ideal for the rapid isolation of small amounts of DNA from many different species and is also useful for large scale isolations.
Lysis step (thermal lysis in the presence of Tris HCl, EDTA, CTAB, and β-mercaptoethanol).
Removal of contaminants such as lipophilic molecules and proteins by extraction with phenol and chloroform.
Samples processed prior to extraction procedure (mortar and pestle, liquid nitrogen, or commercial blender).

Analysing the quantity and purity of extracted DNA

After DNA extraction, measuring both the DNA concentration and purity is important before continuing with further downstream analysis. Isolated DNA can be tested for quality using absorbance methods, agarose gel electrophoresis, and fluorescent DNA-intercalating dyes (Wilkes 2019). DNA concentration can be determined with the help of optical density (when the ratios of 260/280 nm and 260/230 nm are between 1.5 and 2.0, the isolated DNA can be used for amplification) (Lo and Shaw 2018a; Matsuoka et al. 2001; Wilkes 2019). Additionally, for sequencing, it is recommended to include a positive control to avoid false negatives that could be due to the presence of PCR inhibitors (Hoorfar et al. 2004; Lo and Shaw 2018a).

The reality of DNA-based identification

It is in the interest of both biodiversity conservation and public safety that DNA-based techniques are further developed to screen food and medicine sourced from the global market (Han et al. 2016; Ichim and de Boer 2020; Seethapathy et al. 2019). Standards for taxon-specific PCR techniques are described for some food plants, like soybean, hazelnut, almond, and rapeseed, but these are established mostly for allergen and GMP testing (Grohmann and Seiler 2019). HTS techniques are the only control tests that ensure the potential identification of all species in complex medicine and food products without prior knowledge on expected adulterants. These have become increasingly popular and more affordable methods for commercial laboratories. Today, laboratories offer analysis for meat, plants (including spices and herbs), fish, and crustaceans (Hirst et al. 2019). However, market tests have come with a higher detection limit and cannot guarantee a 100% confirmation of the presence or absence of an adulterant species (Hirst et al. 2019). Thus, the gap between the expectations and reality for DNA testing of food and medicine is the availability of commercial tests, the limit for detection and quantification, the specificity and the comparison with accurate reference materials and databases. In all these applications, however, the extraction of sufficient quantities of (reasonably) high-quality DNA are necessary. We can expect however that as the interest in using DNA-based methods for the authentication of food and medicine grows, that further development in methods related to extracting DNA from these often-challenging sources will also progress.


  1. The quality of DNA from food and medicinal sources is a critical factor for DNA-based analyses. Which factors can influence the quality of nucleic acids extracted from foods and plant-based medicines?
  2. What is the first step when choosing a DNA isolation technique for your samples?
  3. What methods can be used for measuring DNA quality after isolation?


Bioprospecting - The exploration of biodiversity for new resources of social and commercial value.

Pharmacophylogenomics - Plant pharmacophylogenomics is a field established by combining the fields of ethnopharmacology, plant systematics, phytochemistry, pharmacology, and bioinformatics. It is the application of phylogenomics to the study of pharmaceuticals.

Pharmacopoeia - From the obsolete typography pharmacopœia, literally, “drug-making”. In its modern technical sense, it is a book containing directions for the identification of compound medicines, and is published by the authority of a government or a medical or pharmaceutical society.

Pharmaphylogenetics - Field of research focusing on the phylogenetic correlation between phylogeny, chemical constituents, and pharmaceutical effects of medicinal plants.


  • Arulandhu AJ, Staats M, Hagelaar R, Voorhuijzen MM, Prins TW, Scholtens I, Costessi A, Duijsings D, Rechenmann F, Gaspar FB, Barreto Crespo MT, Holst-Jensen A, Birck M, Burns M, Haynes E, Hochegger R, Klingl A, Lundberg L, Natale C, Niekamp H, Kok E (2017) Development and validation of a multi-locus DNA metabarcoding method to identify endangered species in complex samples. Gigascience 6, 1–18. doi: 10.1093/gigascience/gix080
  • Benson DA, Cavanaugh M, Clark K, Karsch-Mizrachi I, Ostell J, Pruitt KD, Sayers EW (2018) GenBank. Nucleic Acids Res. 46, D41–D47. doi: 10.1093/nar/gkx1094
  • Boom R, Sol CJ, Salimans MM, Jansen CL, Wertheim-van Dillen PM, van der Noordaa J (1990) Rapid and simple method for purification of nucleic acids. J. Clin. Microbiol. 28, 495–503. doi: 10.1128/jcm.28.3.495-503.1990
  • British Pharmacopoeia Commission (2018) Supplementary chapter SC VII D, in: DNA Barcoding as a Tool for Botanical Identification of Herbal Drugs. The Stationary Office, London, United Kingdom.
  • Coghlan ML, Haile J, Houston J, Murray DC, White NE, Moolhuijzen P, Bellgard MI, Bunce M (2012) Deep sequencing of plant and animal DNA contained within traditional Chinese medicines reveals legality issues and health safety concerns. PLoS Genet. 8, e1002657. doi: 10.1371/journal.pgen.1002657
  • Corrado G (2016) Advances in DNA typing in the agro-food supply chain. Trends Food Sci. Technol. 52, 80–89. doi: 10.1016/j.tifs.2016.04.003
  • Costa J, Amaral JS, Fernandes TJR, Batista A, Oliveira MBPP, Mafra I (2015) DNA extraction from plant food supplements: influence of different pharmaceutical excipients. Mol. Cell. Probes 29, 473–478. doi: 10.1016/j.mcp.2015.06.002
  • Dellaporta SL, Wood J, Hicks JB (1983) A plant DNA minipreparation: Version II. Plant Mol. Biol. Rep. 1, 19–21. doi: 10.1007/BF010602712670
  • De Boer HJ, Cross HB, De Wilde WJJO, Duyfjes-de Wilde BEE, Gravendeel B (2015) Molecular phylogenetic analyses of Cucurbitaceae tribe Benincaseae urge for merging of Pilogyne with Zehneria. Phytotaxa 236, 173. doi: 10.11646/phytotaxa.236.2.6
  • Di Bernardo G, Del Gaudio S, Galderisi U, Cascino A, Cipollaro M (2007) Comparative evaluation of different DNA extraction procedures from food samples. Biotechnol. Prog. 23, 297–301. doi: 10.1021/bp060182m
  • Elsanhoty RM, Ramadan MF, Jany KD (2011) DNA extraction methods for detecting genetically modified foods: a comparative study. Food Chem. 126, 1883–1889. doi: 10.1016/j.foodchem.2010.12.013
  • Grazina L, Amaral JS, Mafra I (2020) Botanical origin authentication of dietary supplements by DNA-based approaches. Comp. Rev. Food Sci. Food Safety 19, 1080–1109. doi: 10.1111/1541-4337.12551
  • Grohmann L, Seiler C (2019) CHAPTER 21. Standardization of DNA-based Methods for Food Authenticity Testing, in: Burns, M., Foster, L., Walker, M. (Eds.), DNA Techniques to Verify Food Authenticity: Applications in Food Fraud, Food Chemistry, Function and Analysis. Royal Society of Chemistry, Cambridge, pp. 227–234. doi: 10.1039/9781788016025-00227
  • Gryson N, Messens K, Dewettinck K (2004) Evaluation and optimisation of five different extraction methods for soy DNA in chocolate and biscuits. Extraction of DNA as a first step in GMO analysis. J. Sci. Food Agric. 84, 1357–1363. doi: 10.1002/jsfa.1767
  • Gryson N (2010) Effect of food processing on plant DNA degradation and PCR-based GMO analysis: a review. Anal. Bioanal. Chem. 396, 2003–2022. doi: 10.1007/s00216-009-3343-2
  • Han J, Pang X, Liao B, Yao H, Song J, Chen S (2016) An authenticity survey of herbal medicines from markets in China using DNA barcoding. Sci. Rep. 6, 18723. doi: 10.1038/srep18723
  • Hao D, Xiao P, Liu L, Peng Y, He C (2015) [Essentials of pharmacophylogeny: knowledge pedigree, epistemology and paradigm shift]. Zhongguo Zhong Yao Za Zhi 40, 3335–3342.
  • Hirst B, Fernandez-Calvino L, Weiss T (2019) Chapter 24. Commercial DNA testing, in: Burns, M., Foster, L., Walker, M. (Eds.), DNA Techniques to Verify Food Authenticity: Applications in Food Fraud, Food Chemistry, Function and Analysis. Royal Society of Chemistry, Cambridge, pp. 264–282. doi: 10.1039/9781788016025-00264
  • Hoorfar J, Cook N, Malorny B, Wagner M, De Medici D, Abdulmawjood A, Fach P (2004) Letter to the editor. Lett. Appl. Microbiol. 38, 79–80. doi: 10.1046/j.1472-765X.2003.01456.x
  • Howard C, Lockie-Williams C, Slater A (2020) Applied barcoding: the practicalities of DNA testing for herbals. Plants 9. doi: 10.3390/plants9091150
  • Ichim MC, de Boer HJ (2020) A review of authenticity and authentication of commercial ginseng herbal medicines and food supplements. Front. Pharmacol. 11, 612071. doi: 10.3389/fphar.2020.612071
  • Ichim MC (2019) The DNA-based authentication of commercial herbal products reveals their globally widespread adulteration. Front. Pharmacol. 10, 1227. doi: 10.3389/fphar.2019.01227
  • John ME (1992) An efficient method for isolation of RNA and DNA from plants containing polyphenolics. Nucleic Acids Res. 20, 2381. doi: 10.1093/nar/20.9.2381
  • Juul S, Izquierdo F, Hurst A, Dai X, Wright A, Kulesha E, Pettett R, Turner DJ (2015) What’s in my pot? Real-time species identification on the MinION. BioRxiv. doi: 10.1101/030742
  • Llongueras JP, Nair S, Salas-Leiva D, Schwarzbach AE (2013) Comparing DNA extraction methods for analysis of botanical materials found in anti-diabetic supplements. Mol. Biotechnol. 53, 249–256. doi: 10.1007/s12033-012-9520-0
  • Lockley AK, Bardsley RG (2000) DNA-based methods for food authentication. Trends Food Sci. Technol. 11, 67–77. doi: 10.1016/S0924-2244(00)00049-2
  • Lo Y-T, Shaw P-C (2018a) DNA-based techniques for authentication of processed food and food supplements. Food Chem. 240, 767–774. doi: 10.1016/j.foodchem.2017.08.022
  • Lo Y-T, Shaw P-C (2018b) DNA barcoding in concentrated Chinese medicine granules using adaptor ligation-mediated polymerase chain reaction. J. Pharm. Biomed. Anal. 149, 512–516. doi: 10.1016/j.jpba.2017.11.048
  • Lo YT, Shaw PC (2019) Application of next-generation sequencing for the identification of herbal products. Biotechnol. Adv. 37, 107450. doi: 10.1016/j.biotechadv.2019.107450
  • Matsuoka T, Kuribara H, Akiyama H, Miura H, Goda Y, Kusakabe Y, Isshiki K, Toyoda M, Hino A (2001) A multiplex PCR method of detecting recombinant DNAs from five lines of genetically modified maize. Shokuhin Eiseigaku Zasshi 42, 24–32. doi: 10.3358/shokueishi.42.24
  • Mezzasalma V, Ganopoulos I, Galimberti A, Cornara L, Ferri E, Labra M (2017) Poisonous or non-poisonous plants? DNA-based tools and applications for accurate identification. Int. J. Legal Med. 131, 1–19. doi: 10.1007/s00414-016-1460-y
  • Mishra P, Kumar A, Nagireddy A, Mani DN, Shukla AK, Tiwari R, Sundaresan V (2016) DNA barcoding: an efficient tool to overcome authentication challenges in the herbal market. Plant Biotechnol. J. 14, 8–21. doi: 10.1111/pbi.12419
  • Murray MG, Thompson WF (1980) Rapid isolation of high molecular weight plant DNA. Nucleic Acids Res. 8, 4321–4325. doi: 10.1093/nar/8.19.4321
  • Omelchenko DO, Speranskaya AS, Ayginin AA, Khafizov K, Krinitsina AA, Fedotova AV, Pozdyshev DV, Shtratnikova VY, Kupriyanova EV, Shipulin GA, Logacheva MD (2019) Improved Protocols of ITS1-Based Metabarcoding and Their Application in the Analysis of Plant-Containing Products. Genes (Basel) 10. doi: 10.3390/genes10020122
  • Pafundo S, Gullì M, Marmiroli N (2011) Comparison of DNA extraction methods and development of duplex PCR and real-time PCR to detect tomato, carrot, and celery in food. J. Agric. Food Chem. 59, 10414–10424. doi: 10.1021/jF0106202382s
  • Pandey RN, Adams RP, Flournoy LE (1996) Inhibition of random amplified polymorphic DNAs (RAPDs) by plant polysaccharides. Plant Mol. Biol. Rep. 14, 17–22. doi: 10.1007/BF010602671898
  • Parveen I, Gafner S, Techen N, Murch SJ, Khan IA (2016) DNA barcoding for the identification of botanicals in herbal medicine and dietary supplements: strengths and limitations. Planta Med. 82, 1225–1235. doi: 10.1055/s-0042-111208
  • Pawar RS, Handy SM, Cheng R, Shyong N, Grundel E (2017) Assessment of the authenticity of herbal dietary supplements: comparison of chemical and DNA barcoding methods. Planta Med. 83, 921–936. doi: 10.1055/s-0043-107881
  • Peterson DG, Boehm KS, Stack SM (1997) Isolation of milligram quantities of nuclear DNA from tomato (Lycopersicon esculentum), A plant containing high levels of polyphenolic compounds. Plant Mol. Biol. Rep. 15, 148–153. doi: 10.1007/BF010602812265
  • Pharmacopoeia Committee of P. R. China (2020) Pharmacopoeia of People’s Republic of China. China Medical Science and Technology Press, Beijing.
  • Pich U, Schubert I (1993) Midiprep method for isolation of DNA from plants with a high content of polyphenolics. Nucleic Acids Res. 21, 3328. doi: 10.1093/nar/21.14.3328
  • Pinto AD, Forte V, Guastadisegni MC, Martino C, Schena FP, Tantillo G (2007) A comparison of DNA extraction methods for food analysis. Food Control 18, 76–80. doi: 10.1016/j.foodcont.2005.08.011
  • Puchooa D (2004) A simple, rapid and efficient method for the extraction of genomic DNA from lychee (Litchi chinensis Sonn.). Afr. J. Biotechnol. 3, 253–255. doi: 10.5897/AJB01062004.000-2046
  • Raclariu AC, Heinrich M, Ichim MC, de Boer H (2018) Benefits and limitations of DNA barcoding and metabarcoding in herbal product authentication. Phytochem. Anal. 29, 123–128. doi: 10.1002/pca.2732
  • Raime K, Krjutškov K, Remm M (2020) Method for the Identification of Plant DNA in Food Using Alignment-Free Analysis of Sequencing Reads: A Case Study on Lupin. Front. Plant Sci. 11, 646. doi: 10.3389/fpls.2020.00646
  • Ratnasingham S, Hebert PDN (2007) BOLD: The Barcode of Life Data System ( Mol. Ecol. Notes 7, 355–364. doi: 10.1111/j.1471-8286.2007.01678.x
  • Rogers SO, Bendich AJ (1985) Extraction of DNA from milligram amounts of fresh, herbarium and mummified plant tissues. Plant Mol. Biol. 5, 69–76. doi: 10.1007/BF010600020088
  • Seethapathy GS, Raclariu-Manolica A-C, Anmarkrud JA, Wangensteen H, de Boer HJ (2019) DNA metabarcoding authentication of ayurvedic herbal products on the European market raises concerns of quality and fidelity. Front. Plant Sci. 10, 68. doi: 10.3389/fpls.2019.00068
  • Sharma AD, Gill PK, Singh P (2002) DNA isolation from dry and fresh samples of polysaccharide-rich plants. Plant Mol. Biol. Rep. 20, 415–415. doi: 10.1007/BF010602772129
  • Smith DS, Maxwell PW, De Boer SH (2005) Comparison of several methods for the extraction of DNA from potatoes and potato-derived products. J. Agric. Food Chem. 53, 9848–9859. doi: 10.1021/jF0106051201v
  • Techen N, Parveen I, Pan Z, Khan IA (2014) DNA barcoding of medicinal plant material for identification. Curr. Opin. Biotechnol. 25, 103–110. doi: 10.1016/j.copbio.2013.09.010
  • Turkec A, Kazan H, Karacanli B, Lucas SJ (2015) DNA extraction techniques compared for accurate detection of genetically modified organisms (GMOs) in maize food and feed products. J. Food Sci. Technol. 52, 5164–5171. doi: 10.1007/s13197-014-1547-8
  • Unable to find information for 9645779, n.d.
  • Wilkes T (2019) CHAPTER 3. DNA Extraction from Food Matrices, in: Burns, M., Foster, L., Walker, M. (Eds.), DNA Techniques to Verify Food Authenticity: Applications in Food Fraud, Food Chemistry, Function and Analysis. Royal Society of Chemistry, Cambridge, pp. 29–49. doi: 10.1039/9781788016025-00029
  • Wong T-H, But GW-C, Wu H-Y, Tsang SS-K, Lau DT-W, Shaw P-C (2018) Medicinal Materials DNA Barcode Database (MMDBD) version 1.5-one-stop solution for storage, BLAST, alignment and primer design. Database (Oxford) 2018. doi: 10.1093/database/bay112
  • Wood EJ (2002) Principles and techniques of practical biochemistry (5th Ed.): Wilson, K., Walker, J. (eds.). Biochem. Mol. Biol. Educ. 30, 214–215. doi: 10.1002/bmb.2002.494030030062


  1. Sample source and processing, collection and storage, homogenisation, and the presence of contaminants.
  2. One should firstly consider whether it is a complex mixture or a pure product, and the degree of processing (form and degree of homogeneity).
  3. Absorbance methods, agarose gel electrophoresis, and fluorescent DNA-intercalating dyes.

Chapter 7 DNA from faeces


What are faecal samples?

Do you know that faeces are windows to the natural world? Faeces, although not the most glamorous thing in the world, are worth their weight in gold when it comes to providing information about the host(s) they are derived from. Faeces, also commonly known as scat, poop, droppings, excreta, or stools are solid remains of the ingested food that were not digested in the intestine. They are composed of water, protein, polysaccharides, fats, solids (e.g., fibres from plants), and bacteria (Rose et al. 2015). From mites to elephants, faeces provide researchers with useful information about the animal and its environment (Tovey et al. 1981; Webber et al. 2018). Although fresh samples are usually used, information can also be retrieved from coprolites (fossilised faecal remains) even when they are 237 million years old (Qvarnström et al. 2019; van Geel et al. 2011; Welker et al. 2014).

Types of information that can be retrieved from faeces

Different types of information can be obtained from faeces. Chemical analyses provide information on hormonal changes that can occur from stress (Barja et al. 2008; Turner and Mathews 2010). Home-range and behaviour can be studied using the location of the faeces (Penteriani and Delgado 2008; Stewart et al. 2001). The appearance and size of faeces can even provide sex and species identification. For example, male capercaillies have larger dropping diameters than females (Thiel et al. 2007), and wombats are the only mammal with cube-shaped droppings (Yang et al. 2019). Molecular methods can be used for sex and species identification, identifying intestinal parasites, microbiome studies, and host genetics (A’Hara et al. 2009; Medeiros et al. 2012; Oliveira et al. 2020; Palomares et al. 2012; Soares et al. 2020). Additionally, studying faeces can give insights into animals’ diet, providing information on the composition of plants ingested by herbivores and omnivores (Robeson et al. 2018; Valentini et al. 2009).

Non-molecular methods for analysing diet in faecal samples

Non-molecular methods have traditionally been used for the analysis of contents from faecal samples. An example is microhistology, where small amounts of faecal samples are mounted on a microscope slide, and digested remains of plant cuticle fragments are identified based on morphology (Baumgartner and Martin 1939). However, this method is extremely time-consuming and requires trained experts to be able to identify partial fragments of plants. Another disadvantage is that the abundance of easily digested plants are often underestimated using this technique (Shrestha and Wegge 2006). Near-infrared reflectance spectroscopy (NIRS) is another method used to determine the composition of plants in faecal samples (Norris et al. 1976), but this technique requires validation with reference samples of the diet and constant monitoring of equipment calibration (Dixon and Coates 2009). Stable isotope analysis and plant cuticular wax alkane measurements of faecal samples have also been carried out (Carnahan 2011; Mayes and Dove 2000). However, species-level resolution is not possible in stable isotope analysis (Mayes and Dove 2000), while alkane measurements require specialised equipment for extraction and detection that is often not available as standard laboratory services (Garnick et al. 2018). Additionally, cuticular wax alkane measurements are not suitable for assessing complex compositions (Garnick et al. 2018). These challenges coupled with advancements in high-throughput sequencing (HTS) techniques have resulted in a shift towards molecular methods for analysing faecal samples.

Applications of extracted DNA from faeces

In plant molecular applications, a common use of faecal samples is in herbivore/omnivore diet studies. The goal of most plant-focused diet studies is to characterise the diet profile of the host, which can be used to answer research questions concerning for example, resource competition and partitioning (Kartzinel et al. 2015; Lopes et al. 2015; Soininen et al. 2015), herbivore impact on local vegetation (Hibert et al. 2011), how livestock diets can be monitored (Lee et al. 2018; Pegard et al. 2009), temporal variability in diet compositions (Aziz et al. 2017), and dietary foraging plasticity (Kowalczyk et al. 2019; Quéméré et al. 2013). DNA extracted from faecal samples can also be used for other types of plant identification applications including palaeobotany (Chame 2003; Poinar et al. 1998) (see Chapter 21 Palaeobotany), faecal contamination of food in food safety (Jay-Russell 2013) (see Chapter 24 Food safety), environment and biodiversity assessments (Best 2008; Eycott et al. 2007; Green et al. 2018; Kartzinel et al. 2015) (see Chapter 25 Environment and biodiversity assessments), and forensics genetics (Norris and Bock 2000) (see Chapter 26 Forensics genetics, botany and palynology). Another potential application is the study of plant diseases such as parasitic fungi to assess the health of a particular ecosystem. Parasitic fungi that are found in plants can be ingested by herbivores/omnivores when the plants are eaten and derived fungal DNA is subsequently found in their faeces.

Advantages and limitations

The main advantage of using faecal samples for molecular plant identification as compared to other types of samples such as whole animals/insects (Staudacher et al. 2011) or gut contents (Junnila et al. 2010) is that it is non-invasive, and removes the need to capture or locate the animals for obtaining samples (Taberlet et al. 1999). In addition to being easily collected, faeces are constantly produced and therefore not considered rare (except for coprolites). Moreover, they are relatively easy to detect as they are normally the most persistent remnant from scarce or elusive animals (Hibert et al. 2013; Iwanowicz et al. 2016). Trained dogs can be used for the detection of faeces if required (Arandjelovic et al. 2015).

One limitation when using faecal samples for molecular plant identification is that it can be difficult to obtain fresh faecal samples collected immediately after defecation, especially when working with wild animals. Age of samples can have an impact on the amount and quality of DNA that can be extracted due to DNA degradation caused by exposure to environmental conditions (Taberlet et al. 1999). DNA degradation is particularly problematic when working with large DNA markers (>300 bp) as degradation results in short DNA fragments, which will not be amplified using large DNA markers (Frantzen et al. 1998; Taberlet et al. 1999). The availability of fresh faecal samples can also have an impact on the choice of downstream molecular techniques used for analysis (Chua et al. 2021). Another consideration is that if closely-related species have overlapping habitats, additional molecular work is needed to distinguish and identify the host of the droppings (A’Hara et al. 2009), which increases the budget and time required to process the samples. Finally, information obtained from faecal samples provides only a snapshot of the diet and can be influenced by individual preferences (Lopes et al. 2015), sex (Mata et al. 2016), or seasonal differences (Clare et al. 2014), therefore, more samples per individual or species are needed to obtain a full overview of the diet (Trites and Joy 2005) (Table 1).

Table 1.

Advantages and limitations of using DNA from faeces to reconstruct plant communities.

Advantages Limitations
Non-invasive Fresh samples may be challenging to obtain from wild animals
Easy to detect and collect Presence of PCR inhibitors
Not considered rare DNA degradation
Does not require capturing or locating animal of interest Hard to distinguish morphologically with closely related species
Additional molecular work needed
Increased cost and time

Experimental design

Sampling strategies

Before designing any sampling strategies for the collection of faecal samples, there are at least six factors that researchers must take into consideration:

  1. The research question(s) and the required data to achieve the research objectives
  2. The ecology of the species to be studied
  3. The feasibility of sampling in the study area (is accessing the terrain a safety risk?)
  4. The duration and spatial extent of the project (long term or short term? Does it span across different seasons?)
  5. Budget constraints
  6. Ethical considerations

Based on the research question(s) and objectives (i.e., quantitative, presence/absence, composition), researchers must decide how many samples and replicates are needed from each individual and/or population to sufficiently meet their research objectives. The choice of downstream molecular methods used for reconstructing herbivore/omnivore diet will also have an impact on how many samples are required. In quantitative studies where the objective is to quantify the ingested biomass, the number of different individuals sampled is not as important as in composition studies, where more individuals are required to obtain a better overview of the dietary range of the studied species. This is due to the effect of individual food preference, which can lead to biases in retrieving the whole range of a dietary profile for a given species if only a few individuals are studied (Watanabe 1984). In studies determining the presence/absence of a specific dietary component, it may be prudent to sample in larger numbers from both different individuals and populations to prevent false negatives caused by small sample size. While not always possible due to the ecology of the studied species, collecting replicates is also a good practice to evaluate any variation in the study and this aspect should be incorporated into sampling strategies (Mata et al. 2019). Other sampling variables such as seasonal effects (Ait Baamrane et al. 2012), age of faecal samples (McInnes et al. 2017), and differences in diet between sexes (Du Toit 2006), should also be taken into consideration, as these factors can affect the reconstructed diet (Chua et al. 2021).

Generally, the more ecological information gathered and incorporated into sampling strategies, the higher the chance of successful faecal collection. For wild species, prior ecological information regarding the species of interest is essential for designing sound sampling strategies, to optimise and streamline sample collection. Researchers can use the following questions as a guide in planning their sample collection strategy:

  • Is the target species localised to a certain area?
  • What is the extent of its daily range (does it differ between seasons)?
  • Is it a generalist or a specialist?
  • What is its foraging behaviour (does it differ between seasons)?
  • Is the habitat easily accessible for sample collection?
  • What is the density of the population in the study sites?
  • Does its habitat overlap with closely-related species and will this lead to possible collection of faeces from non-target species?

Without this information, it is challenging to narrow down specific study sites for field collection. Additionally, such information can reduce the necessary man-power, resources, and time spent in the field while increasing the probability of finding sufficient numbers of faecal samples. Knowledge of habitat range and population density can prevent excessive amounts of samples collected from a single individual when the research question requires samples from multiple individuals. Differences in home-range and diet between seasons can also impact sample collection strategy (Rodrìguez and Obeso 2000). For example, a higher population density may be present during breeding seasons as compared to non-breeding seasons, which can impact the sampling strategies. For wide species whose faecal samples may be hard to find, sampling can be aided by the use of pointing dogs (Arandjelovic et al. 2015). For captive species, this information is not as important, as the study area is significantly reduced and faecal samples can be easily collected. Other considerations for both wild and captive animal faecal collection strategies include issues such as disturbances to the studied animals and risk to personal safety from aggressive animals.

Sampling strategies are also heavily dependent on budget constraints, which may reduce the time spent on sample collection, the number of samples processed, and also the molecular techniques used in analysing the faecal samples. Therefore, it is prudent to ensure that the budget fits the research objectives or that research objectives should be tailored to fit the research budget. While there are many different approaches to sampling, two commonly used approaches are systematic sampling and opportunistic sampling. In systematic sampling, the study area is divided into grids or transects, and samples are taken at each grid point or fixed intervals (Osborne 1942). While simple to carry out, it is not always feasible for faecal collection as animals do not defecate along a grid point/transect line. In contrast, in opportunistic sampling, researchers simply collect faecal samples in a study area when they come across it without being confined to grids or transects. While time-consuming, this method can result in the collection of more samples (De Barba et al. 2010). Depending on the ecology of the animal, sometimes more than one dropping is deposited at a single location. Researchers can choose to collect all droppings or only a few, depending on the research question.

Finally, ethical consideration of minimising distress to studied animals is one of the main concerns in animal studies and there are legal restrictions as implemented in the EU Directive 2010/63/EU on the protection of animals used for scientific purposes (Zemanova 2019). As faecal samples are collected non-invasively and often without the presence of the studied animal, researchers are less bound by this restriction as it does not pose any welfare harm to the animal. However, permits may be required for the collection of faecal samples from protected species, for entering protected habitats, and/or for transportation across borders. The possibility of receiving these permits should be checked at the beginning of any project, and be organised well in advance of the planned collection period. When all these factors have all been considered, sampling strategies can then be developed to cover the essential questions such as where, when, and how many samples to collect.

Figure 1.

Chapter 7 Infographic: Visual representation of the content of this chapter.

Collection, transportation, and storage

Once the sampling strategy has been determined, the sampling in the field can start. The first step is to locate the faecal samples in the field. Once faecal samples have been located, collection can begin. When collecting faecal samples, there are a few materials that will be needed no matter what animal and habitat the faecal samples are derived from; sterile tubes filled with e.g. RNAlater™, silica beads or 90% ethanol, gloves, and a device to collect the samples. Sterile tubes will be necessary for sample storage. Tubes can have either removable screw-lids or hinged lids. Removable screw-lids have the advantage that the lids will not come off during transport. However, there is an increased risk of environmental contamination with these lids since they are separate from the tube and must be placed somewhere before collection. Tubes with hinged lids are easier to work with in that sense, though they can open during transport if not sealed (e.g., with parafilm™). Proper use of gloves and a collection device are also important to limit the risk of a collector becoming sick from directly handling faeces, as well as reducing the risk of sample contamination. The size and type of the sampling device can differ depending on the size of the faecal dropping and can range from a toothpick to a large spoon.

DNA-based diet analyses are very sensitive to contamination, and the trace amounts of digested plant material that can be extracted from faecal samples is easily contaminated. Contamination can occur between samples, by plant DNA from the surrounding environment, or even from the collector’s (plant-based) lunch (Lusk 2014). Therefore, it is important to practice good sample collection hygiene. Although it is important to wear gloves while collecting samples, the gloves themselves can be a source of contamination. Care must be taken to not contaminate the gloves through touching other plants or plant-based items in the environment, or from handling another faecal sample. If they are contaminated, they must be changed or sterilised with bleach-solution (at least 5%). If the latter is performed, it is important to carry the bleach solution back to the lab site for proper disposal. Similar to the gloves, the devices for sample collection can also be a source of sample contamination. Thus, one-use disposable devices or those that are sterilised through the use of flame or a bleach solution should be used and properly disposed of (Champlot et al. 2010; Kemp and Smith 2005). Finally, collection needs to be done on surfaces with minimal contamination such as rocks or ice (McInnes et al. 2017), and avoid collecting samples on wet soil (Ando et al. 2018). To identify any potential sample contamination, it is necessary to include negative collection controls, which does not include any faecal sample. For these negative controls to be useful in downstream analyses, it is important to treat them identically to the ‘real’ samples. Thus, the negative controls should include the same storage buffers used, and be collected under the same conditions with the same collections devices and storage tubes that were used for the ‘real’ samples (Deiner et al. 2017; Zinger et al. 2019).

To avoid DNA degradation, faecal samples should be preserved as soon as possible upon collection and stored under the same conditions (Nsubuga et al. 2004). This can be achieved in a variety of ways, including freezing the samples with or without storage in e.g. RNAlater, 90% ethanol, or silica with or without prior ethanol addition (Alberdi et al. 2019; Nsubuga et al. 2004; Roeder et al. 2004).

DNA extraction

To avoid contamination, extractions should be carried out in a room free of PCR amplified DNA. Due to the risk of zoonotic disease transmission, extraction should ideally be carried out in a flow-hood to avoid inhaling dust from dry faeces (Lear et al. 2018). Before extracting DNA from faecal samples, pre-processing steps are required. This entails removal of the outer faecal layers which have been in contact with the environment and thus exposed to environmental contaminants (Van Geel et al. 2014). Outer layers are enriched for host epithelial DNA (Creamer et al. 1961), which reduces the proportion of starting plant DNA material, so it would be prudent to remove it. Depending on the research question, pooling of faecal samples collected from the same or different individuals may be necessary. If this is required, samples should be well-mixed. Reduction of faecal sample volume through sub-sampling of faecal dropping may also be necessary for DNA extraction.

Faecal samples from plant-eating animals usually contain high levels of PCR inhibitors such as humic acid, which can lead to amplification failure during downstream analysis (Ramón-Laca et al. 2015). Minimising the carryover of PCR inhibitors is thus one of the key considerations in the extraction process, particularly when using metabarcoding (see Chapter 11 Amplicon metabarcoding). Several commercial kits have been developed to deal with the removal of inhibitors (Johnson et al. 2005), and some commonly used kits for extracting plant DNA from faecal samples are i.e. QIAGEN DNeasy Blood and Tissue Kit, and QIAGEN DNeasy PowerFecal kit. However, some kits such as the QIAGEN stool kit can contain plant contaminants such as potato, so it is recommended to avoid using such kits when identifying plant DNA extracted from faecal samples (Valentini et al. 2009). Similar to the sample collection controls, it is also important to include extraction controls for each extraction day, so that any possible contaminants can be identified (Zinger et al. 2019). Additionally, the use of extraction replicates (two or more DNA extraction from the same sample) allows for a better overview of the plant communities present within one sample as compared to not having any replicates (Hernandez-Rodriguez et al. 2018).

Molecular methods for faecal analysis

Depending on the research question(s), several different HTS methods can be used for analysing DNA extracted from faecal samples including metabarcoding (Valentini et al. 2009) (see Chapter 11 Amplicon metabarcoding), metagenomics (Srivathsan et al. 2016, 2015) (see Chapter 12 Metagenomics), and target capture (Perry 2014) (see Chapter 14 Target capture). The advantages and limitations of each method can be found in the aforementioned chapters. Another less commonly used non-HTS molecular approach is based on PCR amplification with selected primers coupled to electrophoresis, called PCR capillary electrophoresis (PCR-CE) (Czernik et al. 2013; Pegard et al. 2009). One advantage of this approach compared to HTS methods is that it is faster and cheaper and when complementary genes are targeted, high species resolution can be achieved. However, this approach is sensitive to contamination from extraction kits (Valentini et al. 2009), such as potato DNA which has similar peak sizes to some plant species, making it challenging for accurate species identification. It is also only useful when fresh faecal samples are available (Czernik et al. 2013), which is not always possible with fieldwork.


  1. Name one sampling limitation of working with faecal samples as compared to other types of samples (e.g., gut contents) and give suggestions on how to overcome this limitation.
  2. How does prior information of studied species ecology aid in sampling design?
  3. Contamination can occur during sample collection, sample preprocessing, and DNA extraction. Describe the main type of contamination during each phase and how it can be prevented.


Coprolites – Fossilised faeces.

Near-infrared spectroscopy (NIRS) – A non-destructive and fast technique utilising the near-infrared region of the electromagnetic spectrum.

RNAlater – Non-toxic aqueous reagent for storage purposes, preserving RNA and DNA.

Stable isotopes – Non-radioactive elements.

Zoonotic disease – Infectious disease caused by pathogens jumping from non-human hosts to humans.


  • A’Hara SW, Hancock M, Piertney SB, Cottrell JE (2009) The development of a molecular assay to distinguish droppings of black grouse Tetrao tetrix from those of capercaillie Tetrao urogallus and red grouse Lagopus Lagopus Scoticus. Wildlife Biol. 15, 328–337. doi: 10.2981/08-046
  • Ait Baamrane MA, Shehzad W, Ouhammou A, Abbad A, Naimi M, Coissac E, Taberlet P, Znari M (2012) Assessment of the food habits of the Moroccan dorcas gazelle in M’Sabih Talaa, west central Morocco, using the trnL approach. PLoS ONE 7, e35643. doi: 10.1371/journal.pone.0035643
  • Alberdi A, Aizpurua O, Bohmann K, Gopalakrishnan S, Lynggaard C, Nielsen M, Gilbert MTP (2019) Promises and pitfalls of using high-throughput sequencing for diet analysis. Mol. Ecol. Resour. 19, 327–348. doi: 10.1111/1755-0998.12960
  • Ando H, Fujii C, Kawanabe M, Ao Y, Inoue T, Takenaka A (2018) Evaluation of plant contamination in metabarcoding diet analysis of a herbivore. Sci. Rep. 8, 15563. doi: 10.1038/s41598-018-32845-w
  • Arandjelovic M, Bergl RA, Ikfuingei R, Jameson C, Parker M, Vigilant L (2015) Detection dog efficacy for collecting faecal samples from the critically endangered Cross River gorilla (Gorilla gorilla diehli) for genetic censusing. R. Soc. Open Sci. 2, 140423. doi: 10.1098/rsos.140423
  • Aziz SA, Clements GR, Peng LY, Campos-Arceiz A, McConkey KR, Forget P-M, Gan HM (2017) Elucidating the diet of the island flying fox (Pteropus hypomelanus) in Peninsular Malaysia through Illumina Next-Generation Sequencing. PeerJ 5, e3176. doi: 10.7717/peerj.3176
  • Barja I, Silván G, Illera JC (2008) Relationships between sex and stress hormone levels in feces and marking behavior in a wild population of Iberian wolves (Canis lupus signatus). J. Chem. Ecol. 34, 697–701. doi: 10.1007/s10886-008-9460-0
  • Baumgartner LL, Martin AC (1939) Plant Histology as an Aid in Squirrel Food-Habit Studies. The Journal of Wildlife Management 3, 266. doi: 10.2307/3796113
  • Best RJ (2008) Exotic grasses and feces deposition by an exotic herbivore combine to reduce the relative abundance of native forbs. Oecologia 158, 319–327. doi: 10.1007/s00442-008-1137-4
  • Carnahan AM (2011) Determining the diets of moose (Alces alces) in Alaska using plant wax components (Undergraduate thesis). University of Alaska Anchorage.
  • Chame M (2003) Terrestrial mammal feces: a morphometric summary and description. Mem. Inst. Oswaldo Cruz 98 Suppl 1, 71–94. doi: 10.1590/s0074-02762003000900014
  • Champlot S, Berthelot C, Pruvost M, Bennett EA, Grange T, Geigl E-M (2010) An efficient multistrategy DNA decontamination procedure of PCR reagents for hypersensitive PCR applications. PLoS ONE 5. doi: 10.1371/journal.pone.0013042
  • Chua PYS, Crampton-Platt A, Lammers Y, Alsos IG, Boessenkool S, Bohmann K (2021) Metagenomics: a viable tool for reconstructing herbivore diet. Mol. Ecol. Resour. 21, 2249–2263. doi: 10.1111/1755-0998.13425
  • Chua PYS, Lammers YYS, Menoni E, Ekrem T, Bohmann K, Boessenkool S, Alsos IG (2021) Molecular dietary analyses of western capercaillies (Tetrao urogallus) reveal a diverse diet. Environ. DNA 3, 1156–1171. doi: 10.1002/edn3.237
  • Clare EL, Symondson WOC, Broders H, Fabianek F, Fraser EE, MacKenzie A, Boughen A, Hamilton R, Willis CKR, Martinez-Nuñez F, Menzies AK, Norquay KJO, Brigham M, Poissant J, Rintoul J, Barclay RMR, Reimer JP (2014) The diet of Myotis lucifugus across Canada: assessing foraging quality and diet variability. Mol. Ecol. 23, 3618–3632. doi: 10.1111/mec.12542
  • Creamer B, Shorter RG, Bamforth J (1961) The turnover and shedding of epithelial cells: Part I The turnover in the gastro-intestinal tract. Gut 2, 110–116. doi: 10.1136/gut.2.2.110
  • Czernik M, Taberlet P, Swisłocka M, Czajkowska M, Duda N, Ratkiewicz M (2013) Fast and efficient DNA-based method for winter diet analysis from stools of three cervids: moose, red deer, and roe deer. Acta Theriol. 58, 379–386. doi: 10.1007/s13364-013-0146-9
  • De Barba M, Waits LP, Genovesi P, Randi E, Chirichella R, Cetto E (2010) Comparing opportunistic and systematic sampling methods for non-invasive genetic monitoring of a small translocated brown bear population. Journal of Applied Ecology 47, 172–181. doi: 10.1111/j.1365-2664.2009.01752.x
  • Deiner K, Bik HM, Mächler E, Seymour M, Lacoursière-Roussel A, Altermatt F, Creer S, Bista I, Lodge DM, de Vere N, Pfrender ME, Bernatchez L (2017) Environmental DNA metabarcoding: Transforming how we survey animal and plant communities. Mol. Ecol. 26, 5872–5895. doi: 10.1111/mec.14350
  • Dixon R, Coates D (2009) Review: near infrared spectroscopy of faeces to evaluate the nutrition and physiology of herbivores. J. Near Infrared Spectrosc. 17, 1–31. doi: 10.1255/jnirs.822
  • Du Toit JT (2006) Sex differences in the foraging ecology of large mammalian herbivores, in: Ruckstuhl, K., Neuhaus, P. (Eds.), Sexual Segregation in Vertebrates: Ecology of the Two Sexes. Cambridge University Press, Cambridge, pp. 35–52. doi: 10.1017/CBO9780511525629.004
  • Eycott AE, Watkinson AR, Hemami MR, Dolman PM (2007) The dispersal of vascular plants in a forest mosaic by a guild of mammalian herbivores. Oecologia 154, 107–118. doi: 10.1007/s00442-007-0812-1
  • Frantzen MA, Silk JB, Ferguson JW, Wayne RK, Kohn MH (1998) Empirical evaluation of preservation methods for faecal DNA. Mol. Ecol. 7, 1423–1428. doi: 10.1046/j.1365-294x.1998.00449.x
  • Garnick S, Barboza PS, Walker JW (2018) Assessment of animal-based methods used for estimating and monitoring rangeland herbivore diet composition. Rangeland Ecology & Management 71, 449–457. doi: 10.1016/j.rama.2018.03.003
  • Green AJ, Lovas-Kiss Á, Stroud RA, Tierney N, Fox AD (2018) Plant dispersal by Canada geese in Arctic Greenland. Polar Res. 37, 1508268. doi: 10.1080/17518369.2018.1508268
  • Hernandez-Rodriguez J, Arandjelovic M, Lester J, de Filippo C, Weihmann A, Meyer M, Angedakin S, Casals F, Navarro A, Vigilant L, Kühl HS, Langergraber K, Boesch C, Hughes D, Marques-Bonet T (2018) The impact of endogenous content, replicates and pooling on genome capture from faecal samples. Mol. Ecol. Resour. 18, 319–333. doi: 10.1111/1755-0998.12728
  • Hibert F, Sabatier D, Andrivot J, Scotti-Saintagne C, Gonzalez S, Prévost M-F, Grenand P, Chave J, Caron H, Richard-Hansen C (2011) Botany, genetics and ethnobotany: a crossed investigation on the elusive tapir’s diet in French Guiana. PLoS ONE 6, e25850. doi: 10.1371/journal.pone.0025850
  • Hibert F, Taberlet P, Chave J, Scotti-Saintagne C, Sabatier D, Richard-Hansen C (2013) Unveiling the diet of elusive rainforest herbivores in next generation sequencing era? The tapir as a case study. PLoS ONE 8, e60799. doi: 10.1371/journal.pone.0060799
  • Iwanowicz DD, Vandergast AG, Cornman RS, Adams CR, Kohn JR, Fisher RN, Brehme CS (2016) Metabarcoding of fecal samples to determine herbivore diets: a case study of the endangered pacific pocket mouse. PLoS ONE 11, e0165366. doi: 10.1371/journal.pone.0165366
  • Jay-Russell M (2013) What is the risk from wild animals in food-borne pathogen contamination of plants? CAB Reviews 8. doi: 10.1079/PAVSNNR20138040
  • Johnson DJ, Martin LR, Roberts KA (2005) STR-typing of human DNA from human fecal matter using the QIAGEN QIAamp stool mini kit. J. Forensic Sci. 50, 802–808.
  • Junnila A, Müller GC, Schlein Y (2010) Species identification of plant tissues from the gut of An. sergentii by DNA analysis. Acta Trop. 115, 227–233. doi: 10.1016/j.actatropica.2010.04.002
  • Kartzinel TR, Chen PA, Coverdale TC, Erickson DL, Kress WJ, Kuzmina ML, Rubenstein DI, Wang W, Pringle RM (2015) DNA metabarcoding illuminates dietary niche partitioning by African large herbivores. Proc Natl Acad Sci USA 112, 8019–8024. doi: 10.1073/pnas.1503283112
  • Kemp BM, Smith DG (2005) Use of bleach to eliminate contaminating DNA from the surface of bones and teeth. Forensic Sci. Int. 154, 53–61. doi: 10.1016/j.forsciint.2004.11.017
  • Kowalczyk R, Wójcik JM, Taberlet P, Kamiński T, Miquel C, Valentini A, Craine JM, Coissac E (2019) Foraging plasticity allows a large herbivore to persist in a sheltering forest habitat: DNA metabarcoding diet analysis of the European bison. Forest Ecology and Management 449, 117474. doi: 10.1016/j.foreco.2019.117474
  • Lear G, Dickie I, Banks J, Boyer S, Buckley H, Buckley T, Cruickshank R, Dopheide A, Handley K, Hermans S, Kamke J, Lee C, MacDiarmid R, Morales S, Orlovich D, Smissen R, Wood J, Holdaway R (2018) Methods for the extraction, storage, amplification and sequencing of DNA from environmental samples. N. Z. J. Ecol. doi: 10.20417/nzjecol.42.9
  • Lee T, Alemseged Y, Mitchell A (2018) Dropping Hints: estimating the diets of livestock in rangelands using DNA metabarcoding of faeces. MBMG 2, e22467. doi: 10.3897/mbmg.2.22467
  • Lopes CM, De Barba M, Boyer F, Mercier C, da Silva Filho PJS, Heidtmann LM, Galiano D, Kubiak BB, Langone P, Garcias FM, Gielly L, Coissac E, de Freitas TRO, Taberlet P (2015) DNA metabarcoding diet analysis for species with parapatric vs sympatric distribution: a case study on subterranean rodents. Heredity 114, 525–536. doi: 10.1038/hdy.2014.109
  • Lusk RW (2014) Diverse and widespread contamination evident in the unmapped depths of high throughput sequencing data. PLoS ONE 9, e110808. doi: 10.1371/journal.pone.0110808
  • Mata VA, Amorim F, Corley MFV, McCracken GF, Rebelo H, Beja P (2016) Female dietary bias towards large migratory moths in the European free-tailed bat (Tadarida teniotis). Biol. Lett. 12, 20150988. doi: 10.1098/rsbl.2015.0988
  • Mata VA, Rebelo H, Amorim F, McCracken GF, Jarman S, Beja P (2019) How much is enough? Effects of technical and biological replication on metabarcoding dietary analysis. Mol. Ecol. 28, 165–175. doi: 10.1111/mec.14779
  • Mayes RW, Dove H (2000) Measurement of dietary nutrient intake in free-ranging mammalian herbivores. Nutr. Res. Rev. 13, 107–138. doi: 10.1079/095442200108729025
  • McInnes JC, Alderman R, Deagle BE, Lea M-A, Raymond B, Jarman SN (2017) Optimised scat collection protocols for dietary DNA metabarcoding in vertebrates. Methods Ecol. Evol. 8, 192–202. doi: 10.1111/2041-210X.12677
  • Medeiros RJ, King RA, Symondson WOC, Cadiou B, Zonfrillo B, Bolton M, Morton R, Howell S, Clinton A, Felgueiras M, Thomas RJ (2012) Molecular evidence for gender differences in the migratory behaviour of a small seabird. PLoS ONE 7, e46330. doi: 10.1371/journal.pone.0046330
  • Norris DO, Bock JH (2000) Use of Fecal Material to Associate a Suspect with a Crime Scene: Report of Two Cases. J. Forensic Sci. 45, 14657J. doi: 10.1520/JFS14657J
  • Norris KH, Barnes RF, Moore JE, Shenk JS (1976) Predicting forage quality by infrared replectance spectroscopy. J. Anim. Sci. 43, 889–897. doi: 10.2527/jas1976.434889x
  • Nsubuga AM, Robbins MM, Roeder AD, Morin PA, Boesch C, Vigilant L (2004) Factors affecting the amount of genomic DNA extracted from ape faeces and the identification of an improved sample storage method. Mol. Ecol. 13, 2089–2094. doi: 10.1111/j.1365-294X.2004.02207.x
  • Oliveira BCM, Murray M, Tseng F, Widmer G (2020) The fecal microbiota of wild and captive raptors. Anim Microbiome 2, 15. doi: 10.1186/s42523-020-00035-7
  • Osborne JG (1942) Sampling Errors of Systematic and Random Surveys of Cover-Type Areas. J. Am. Stat. Assoc. 37, 256–264. doi: 10.1080/01621459.1942.10500634
  • Palomares F, Roques S, Chávez C, Silveira L, Keller C, Sollmann R, do Prado DM, Torres PC, Adrados B, Godoy JA, de Almeida Jácomo AT, Tôrres NM, Furtado MM, López-Bao JV (2012) High proportion of male faeces in jaguar populations. PLoS ONE 7, e52923. doi: 10.1371/journal.pone.0052923
  • Pegard A, Miquel C, Valentini A, Coissac E, Bouvier F, François D, Taberlet P, Engel E, Pompanon F (2009) Universal DNA-based methods for assessing the diet of grazing livestock and wildlife from feces. J. Agric. Food Chem. 57, 5700–5706. doi: 10.1021/jf803680c
  • Penteriani V, Delgado M del M (2008) Owls may use faeces and prey feathers to signal current reproduction. PLoS ONE 3, e3014. doi: 10.1371/journal.pone.0003014
  • Perry GH (2014) The Promise and Practicality of Population Genomics Research with Endangered Species. Int. J. Primatol. 35, 55–70. doi: 10.1007/s10764-013-9702-z
  • Poinar HN, Hofreiter M, Spaulding WG, Martin PS, Stankiewicz BA, Bland H, Evershed RP, Possnert G, Pääbo S (1998) Molecular coproscopy: dung and diet of the extinct ground sloth Nothrotheriops shastensis. Science 281, 402–406. doi: 10.1126/science.281.5375.402
  • Quéméré E, Hibert F, Miquel C, Lhuillier E, Rasolondraibe E, Champeau J, Rabarivola C, Nusbaumer L, Chatelain C, Gautier L, Ranirison P, Crouau-Roy B, Taberlet P, Chikhi L (2013) A DNA metabarcoding study of a primate dietary diversity and plasticity across its entire fragmented range. PLoS ONE 8, e58971. doi: 10.1371/journal.pone.0058971
  • Qvarnström M, Wernström JV, Piechowski R, Tałanda M, Ahlberg PE, Niedźwiedzki G (2019) Beetle-bearing coprolites possibly reveal the diet of a Late Triassic dinosauriform. R. Soc. Open Sci. 6, 181042. doi: 10.1098/rsos.181042
  • Ramón-Laca A, Soriano L, Gleeson D, Godoy JA (2015) A simple and effective method for obtaining mammal DNA from faeces. Wildlife Biol. 21, 195–203. doi: 10.2981/wlb.00096
  • Robeson MS, Khanipov K, Golovko G, Wisely SM, White MD, Bodenchuck M, Smyser TJ, Fofanov Y, Fierer N, Piaggio AJ (2018) Assessing the utility of metabarcoding for diet analyses of the omnivorous wild pig (Sus scrofa). Ecol. Evol. 8, 185–196. doi: 10.1002/ece3.3638
  • Rodrìguez AE, Obeso JR (2000) Diet of the Cantabrian Capercaillie: geographic variation and energetic content. Ardeola 47, 77–83.
  • Roeder AD, Archer FI, Poinar HN, Morin PA (2004) A novel method for collection and preservation of faeces for genetic studies. Mol. Ecol. Notes 4, 761–764. doi: 10.1111/j.1471-8286.2004.00737.x
  • Rose C, Parker A, Jefferson B, Cartmell E (2015) The characterization of feces and urine: A review of the literature to inform advanced treatment technology. Crit. Rev. Environ. Sci. Technol. 45, 1827–1879. doi: 10.1080/10643389.2014.1000761
  • Shrestha R, Wegge P (2006) Determining the composition of herbivore diets in the trans-Himalayan rangelands: a comparison of field methods. Rangeland Ecology & Management 59, 512–518. doi: 10.2111/06-022R2.1
  • Soares FA, Benitez A do N, Santos BMD, Loiola SHN, Rosa SL, Nagata WB, Inácio SV, Suzuki CTN, Bresciani KDS, Falcão AX, Gomes JF (2020) A historical review of the techniques of recovery of parasites for their detection in human stools. Rev. Soc. Bras. Med. Trop. 53, e20190535. doi: 10.1590/0037-8682-0535-2019
  • Soininen EM, Gauthier G, Bilodeau F, Berteaux D, Gielly L, Taberlet P, Gussarova G, Bellemain E, Hassel K, Stenøien HK, Epp L, Schrøder-Nielsen A, Brochmann C, Yoccoz NG (2015) Highly overlapping winter diet in two sympatric lemming species revealed by DNA metabarcoding. PLoS ONE 10, e0115335. doi: 10.1371/journal.pone.0115335
  • Srivathsan A, Ang A, Vogler AP, Meier R (2016) Fecal metagenomics for the simultaneous assessment of diet, parasites, and population genetics of an understudied primate. Front. Zool. 13, 17. doi: 10.1186/s12983-016-0150-4
  • Srivathsan A, Sha JCM, Vogler AP, Meier R (2015) Comparing the effectiveness of metagenomics and metabarcoding for diet analysis of a leaf-feeding monkey (Pygathrix nemaeus). Mol. Ecol. Resour. 15, 250–261. doi: 10.1111/1755-0998.12302
  • Staudacher K, Wallinger C, Schallhart N, Traugott M (2011) Detecting ingested plant DNA in soil-living insect larvae. Soil Biol. Biochem. 43, 346–350. doi: 10.1016/j.soilbio.2010.10.022
  • Stewart PD, Macdonald DW, Newman C, Cheeseman CL (2001) Boundary faeces and matched advertisement in the European badger (Meles meles): a potential role in range exclusion. J. Zool. 255, 191–198. doi: 10.1017/S0952836901001261
  • Taberlet P, Waits LP, Luikart G (1999) Noninvasive genetic sampling: look before you leap. Trends Ecol. Evol. 14, 323–327. doi: 10.1016/s0169-5347(99)01637-7
  • Thiel D, Jenni-Eiermann S, Braunisch V, Palme R, Jenni L (2007) Ski tourism affects habitat use and evokes a physiological stress response in capercaillie Tetrao urogallus: a new methodological approach. Journal of Applied Ecology 45, 845–853. doi: 10.1111/j.1365-2664.2008.01465.x
  • Tovey ER, Chapman MD, Platts-Mills TA (1981) Mite faeces are a major source of house dust allergens. Nature 289, 592–593. doi: 10.1038/289592a0
  • Trites AW, Joy R (2005) Dietary analysis from fecal samples: how many scats are enough? J. Mammal. 86, 704–712. doi: 10.1644/1545-1542(2005)086[0704:DAFFSH]2.0.CO;2
  • Turner DH, Mathews DH (2010) NNDB: the nearest neighbor parameter database for predicting stability of nucleic acid secondary structure. Nucleic Acids Res. 38, D280-2. doi: 10.1093/nar/gkp892
  • Valentini A, Miquel C, Nawaz MA, Bellemain E, Coissac E, Pompanon F, Gielly L, Cruaud C, Nascetti G, Wincker P, Swenson JE, Taberlet P (2009) New perspectives in diet analysis based on DNA barcoding and parallel pyrosequencing: the trnL approach. Mol. Ecol. Resour. 9, 51–60. doi: 10.1111/j.1755-0998.2008.02352.x
  • van Geel B, Guthrie RD, Altmann JG, Broekens P, Bull ID, Gill FL, Jansen B, Nieman AM, Gravendeel B (2011) Mycological evidence of coprophagy from the feces of an Alaskan Late Glacial mammoth. Quat. Sci. Rev. 30, 2289–2303. doi: 10.1016/j.quascirev.2010.03.008
  • Van Geel BAS, Protopopov A, Bull IAN, Duijm E, Gill F, Lammers Y, Nieman A, Rudaya N, Trofimova S, Tikhonov AN, Vos R, Zhilich S, Gravendeel B (2014) Multiproxy diet analysis of the last meal of an early Holocene Yakutian bison. J. Quaternary Sci. 29, 261–268. doi: 10.1002/jqs.2698
  • Watanabe JM (1984) Food preference, food quality and diets of three herbivorous gastropods (Trochidae: Tegula) in a temperate kelp forest habitat. Oecologia 62, 47–52. doi: 10.1007/BF00377371
  • Webber JT, Henley MD, Pretorius Y, Somers MJ, Ganswindt A (2018) Changes in African elephant (Loxodonta africana) faecal steroid concentrations post-defaecation. Bothalia 48. doi: 10.4102/abc.v48i2.2312
  • Welker F, Duijm E, van der Gaag, KJ van Geel, B de Knijff, P van Leeuwen, J Mol, D van der Plicht, J Raes, N Reumer, J Gravendeel (2014) Analysis of coprolites from the extinct mountain goat Myotragus balearicus. Quaternary Research 81, 106–116. doi: 10.1016/j.yqres.2013.10.006
  • Yang P, Lee A, Chan M, Martin A, Edwards A, Carver S, Hu D (2019) How, and why, do wombats make cube-shaped poo?
  • Zemanova MA (2019) Poor implementation of non-invasive sampling in wildlife genetics studies. ReEco 4, 119–132. doi: 10.3897/rethinkingecology.4.32751
  • Zinger L, Bonin A, Alsos IG, Bálint M, Bik H, Boyer F, Chariton AA, Creer S, Coissac E, Deagle BE, De Barba, M Dickie, IA Dumbrell, AJ Ficetola, GF Fierer, N Fumagalli, L Gilbert, MTP Jarman, S Jumpponen, A Kauserud, H Taberlet (2019) DNA metabarcoding-Need for robust experimental designs to draw sound ecological conclusions. Mol. Ecol. 28, 1857–1862. doi: 10.1111/mec.15060


  1. Possible challenges and solutions: It is difficult to obtain fresh faecal samples → one can use pointing dogs; Problem of using relatively long DNA barcoding fragments → use primers that can amplify shorter regions; Overlapping habitats of closely related species → use additional molecular markers to identify species (though this increases the cost and time necessary); Faecal samples only provide a snapshot of the entire diet → take multiple samples from the same individual and/or sample a larger number of individuals over a longer period and larger geographical area.
  2. It helps to narrow down the study areas for field collection. This reduces the manpower, resources, and time needed, increasing the chance of finding samples. When applying for permits, you can point out how to keep the disturbance of animals in the field to a minimum with this knowledge, which will increase the chances of obtaining permission.
  3. During sample collection → wear gloves, ensure that samples are not collected from wet soil, practice good collection hygiene; During sample preprocessing → remove outer layers that were in close contact with the environment, work in flow-hood and a PCR-free lab; During DNA extraction → include extraction controls, avoid using extraction kits with plant-based or other types of contaminants.

Chapter 8 aDNA from sediments


Sedimentary ancient DNA studies aim to reconstruct the biology and ecology of past environments using the DNA present in the sediment record. Compared to modern soil and sedimentary DNA (see Chapter 4 DNA from soil), these analyses can be more challenging due to the prolonged exposure of the DNA to degradation processes. This has major implications for the scope of the study and the appropriate study design, which will be discussed in this chapter.

What is sedimentary ancient DNA?

In order to use sedimentary ancient DNA for paleoecological studies (sedaDNA; Haile et al. 2009) it is important to understand some aspects of its physical nature and the local environment’s role in transforming modern DNA into sedaDNA. We will start by breaking down the term into its components.

Ancient DNA is the hereditary genetic content of cells from organisms that died a long time ago. There is no consensus on how old DNA should be in order to be called ancient, as the age is generally less important than the exposure to degradation processes that make it more degraded than modern DNA. SedaDNA degradation processes are primarily related to environmental and sedimentary properties, such as temperature, pH, water content, oxygen levels, and minerals present in the sediment (Giguet-Covex et al. 2019; Torti et al. 2015), whereas time plays a secondary role: providing opportunity for these processes to take place. Permafrost in general provides excellent conditions for preserving DNA, due to its neutral pH, anaerobic conditions, and near-constant subzero temperatures that ensure it remains constantly frozen for 2 years or longer. Optimal conditions in ice cores from Greenland have allowed the detection of plant DNA as old as 450 to 800 thousand years (Willerslev et al. 2007). To date, the oldest amplifiable DNA from sediments is from ca. 400 thousand years old permafrost (Willerslev et al. 2004, 2003).

How does DNA end up in the sediment? Sediment is a result of erosion, weathering and biological processes and consists of organic and inorganic particles (e.g., sand and silt) that are transported by wind, water, or people (Masselink et al. 2014). These transportation processes also explain the main distinctive quality between sediments and soils: soils develop precisely because of the absence of horizontal transport, allowing biological, physical, and chemical weathering of the local substrate, thereby forming soil horizons rich in organic matter (see Chapter 4 DNA from soil). Deposition of sediment happens when the sediments stop being transported and stay in place. The incorporation of organismal remains into the sediment are similarly a result of transportation by wind, water, or people, or a result of organisms living at that location (Alsos et al. 2018; Parducci et al. 2018). The processes involved in the transfer, deposition, and preservation of organismic remains are called taphonomic processes. Bacterial and fungal DNA make up a very large part of sedimentary DNA, since they are natural inhabitants of sediments and outrate macroorganisms in terms of total biomass. Animal DNA that is found in sediment typically comes from skin flakes, faeces, urine, saliva, hair, feathers, and other animal tissues, while plant DNA typically originates from plant debris, leaves, seeds, fruits, and other plant tissues. Living cells can actively secrete DNA into sediment (e.g., plant root tips; Wen et al. 2017), while dead tissues can degrade, releasing the intracellular DNA (iDNA), along with the rest of the cell contents, when cell lysis occurs. Both active secretion of DNA as well as cell lysis result in iDNA becoming extracellular DNA (exDNA).

Once exposed to the sedimentary environment, exDNA can undergo different post-depositional taphonomic processes that determine the quality of the DNA on longer timescales. ExDNA can be internalised by microbial cells (Overballe-Petersen and Willerslev 2014), degraded by extracellular microbial nucleases that break it up into smaller fragments, damaged by abiotic processes such as hydrolysis and oxidation, or preserved by adsorption onto particles such as humic acids, sand and clay minerals (Torti et al. 2015; Willerslev and Cooper 2005). An overview of DNA degradation processes is provided in Figure 1. Chemical alkylation can lead to cross-links within (intra) and between (inter) DNA molecules making it impossible to PCR amplify the DNA (Fulton and Shapiro 2019). Low pH, high temperatures, high oxygen and water content can also lead to strand breaks, deamination of nitrogen bases, and base modifications (Dabney et al. 2013; Willerslev and Cooper 2005). These processes can result in a decrease in the amount of detectable DNA, shorter DNA fragments, and changes in chemical properties as damage accumulates over time. DNA is better preserved in sediments with a high mineral content and at low temperatures. Minerals can inactivate nucleases as well as bind to and protect DNA, while low temperatures thermally stabilise DNA against chemical degradation (Torti et al. 2015). Desiccated dry and anoxic sediments will putatively also strongly decrease the effects of hydrolysis and oxidation, respectively. The preserved exDNA together with the iDNA preserved in dead cells make up the total DNA that can be recovered using sedaDNA methods.

Figure 1.

Schematic overview of DNA degradation processes (hydrolysis, oxidation, alkylation and Maillard reaction) that can cause DNA damage in the form of cleavage, base modifications or cross-links. The major mechanism leading to miscoding lesions in aDNA is the hydrolysis of cytosine to uracil, which leads to G to A and C to T substitutions by DNA polymerases, whereas blocking lesions can obstruct the movement of DNA polymerases during PCR (Dabney et al. 2013).

Advantages and limitations of sedaDNA as palaeoecological proxy

By analysing the ancient DNA present in the sediment (Haile et al. 2009; Slon et al. 2017) it is possible to identify the source species of archaeological artefacts and deposits, and even detect organisms in the absence of any visible remains. For plants, the detection of taxa that do not leave traces in the fossil record (e.g., Alsos et al. 2016; Bremond et al. 2017; Brown et al. 2021; Pedersen et al. 2013) opens up new ways of studying past vegetation complementary to more traditional palaeoecological proxies such as pollen and macrofossils.

Macrofossils and plant sedaDNA originate close to the sample location and give a similar local signal (Alsos et al. 2018; Jørgensen et al. 2012; Niemeyer et al. 2017), while the pollen record generally includes taxa that originated from further away from the sample location (Parducci et al. 2018) as pollen, especially of wind-pollinated species, may originate from a wide area as they are distributed regionally through the air (Birks and Bjune 2010). Pollen does not contribute much to the total pool of sedaDNA (Clarke et al. 2020; Sjögren et al. 2017). This can be partially explained by the low DNA content of pollen grains and the robustness of the pollen grain wall, hindering the retrieval of the DNA. At the source, DNA can be considered more consistent than pollen, as all plant tissues contain DNA, but not all plants produce pollen, and insect-pollinated plants produce fewer pollen than wind-pollinated plants.

In general, palaeovegetation data are the result of the attributes of the original vegetation, combined with depositional factors and preservation, as well as the experimental procedures to produce the data. For sedaDNA analyses, this includes every step of the data generation itself: sampling, transport, storage, processing of the DNA in the laboratory, and finally, the bioinformatic pipelines used. In terms of the data generation, pollen analyses and macrofossil analyses rely on taxonomic identification by microscopy, which is labour-intensive and requires a high level of taxonomic knowledge. Although some training is needed to work in an ancient DNA laboratory, in principle, taxonomic identification by DNA can be carried out without prior taxonomic knowledge. However, familiarity with plant taxonomy, phylogenetic placement, and biology of different groups is invaluable in the interpretation of the automated identifications. For example, it is important to check if the automated DNA identifications make sense for the sample location, because contamination, DNA degradation, and the quality of the reference library can cause false DNA identifications (see Chapter 18 Sequence to species for details).

A combination of sedaDNA, macrofossils, and pollen proxies gives the most complete overview of plant diversity and community composition through time. The choice for these proxies is dependent on the aims of the study. Table 1 summarises the main differences.

Table 1.

Comparison of pollen, plant macrofossils, and sedaDNA as proxies for palaeoecological reconstructions on the levels of: source and sediment, data generation, and data interpretation. Sources: Ahmed et al. 2018; Birks and Bjune 2010; Parducci et al. 2018, 2017.

Category Pollen Plant macrofossils SedaDNA
Source and sediment
- Scale Regional Local Local
- Taxonomic groups Pollen-producers All plants All organisms
- Potential sources of bias High pollen-producing plants; vegetation cover close to sampling area; differential preservation Differential preservation of tissue-types and species Differential DNA degradation and decay
Data generation
- Labour-intensive Yes Yes No
- Need for taxonomic knowledge Yes Yes No
- Taxonomic resolution Limited to identifiable pollen types, generally to genus level Generally to species-level Depends on the marker, possible to species-level DNA contamination;
- Potential sources of bias Identifiability of the remains Identifiability of the remains; random occurrence choice of lab techniques; completeness of reference library
Data interpretation
- Qualitative Yes Yes Yes
- Quantitative Partial Limited Debated

SedaDNA research applications

The first study using sedaDNA of macroorganisms was published in 2003, demonstrating the possibility to detect plant and animal DNA in both permafrost sediments and temperate cave sediments (Willerslev et al. 2003). Since then, the number of sedaDNA studies and applications has increased as enhanced understanding of ancient DNA and methodological developments allowed better reconstructions, as also illustrated by a recent comprehensive synthesis of current analytical procedures (Capo et al. 2021). SedaDNA methods are relevant for a range of research fields across biology, conservation, and archaeology and have been applied for roughly two main purposes: understanding natural environmental processes and reconstructing past human-environmental interactions.

Environmental reconstructions can range from polar, to temperate and tropical regions, although they are limited to sampling sites that allow preservation of sedaDNA, such as permafrost, lake sediments, and dry cave sediments. Permafrost sediment can be used to assess vegetational development in polar regions under climate change (e.g., Willerslev et al. 2014; Zimmermann et al. 2017). SedaDNA from archaeological sites can reveal human past activities such as plant and animal cultivation, migration and settlement history (e.g., Hebsgaard et al. 2009; Smith et al. 2015), and Neanderthal and Denisovan DNA have been recovered from cave sediments (Slon et al. 2017; Vernot et al. 2021). Lake sediments can be reliable archives of the palaeoenvironment, integrating environmental information across the lake catchment area and displaying a very clear temporal stratification. Many sedaDNA studies use lake sediments to focus on past vegetation dynamics, which can be used to establish natural baselines for conservation (e.g., Boessenkool et al. 2014; Wilmshurst et al. 2014), reconstruct the effects of past climate change on the environment (e.g., Alsos et al. 2016, 2020; Clarke et al. 2020; Jørgensen et al. 2012), show long-lasting effects of biological invasions (e.g., Ficetola et al. 2018), or track past human impacts (e.g., Giguet-Covex et al. 2014; Pansu et al. 2015). This list illustrates the wide range of potential applications, but for further discussion, please see Section 3 of this book, especially Chapter 21 Palaeobotany and Chapter 24 Environment and biodiversity assessments can be relevant for sedaDNA.

Experimental design

SedaDNA research strategy

Due to its low concentration, retrieving ancient DNA from sediment samples requires strict protocols to avoid contamination by modern DNA or further degradation (Cooper and Poinar 2000; Capo et al. 2021). However, once these protocols are followed sedaDNA can be a powerful tool providing novel insights to palaeoecology reconstructions that are not possible through traditional methods.

The previous section described some sedaDNA studies focusing on palaeoecological and archaeological questions. In both cases, choices of location and methods are very much steered by the research focus and what is already known about the area, such as past changes in climate, geology, ecology, or human impacts. Although details in the study design can differ, all sedaDNA studies follow the same steps: site selection, collection of samples and metadata, DNA extraction, further processing of the DNA in the lab, sequencing, and finally, bioinformatic sequence quality filtering and data analyses (Figure 2).

Figure 2.

Simplified overview of the sedaDNA research process, including some of the major challenges and potential solutions indicated at each step.

Choices for the different options at each step depend on the aims of the study. For example, when performing a reconstruction of overall plant community dynamics with universal plant metabarcoding primers, the most common taxa and major trends in community change will be reliably retrieved in the first PCR performed (Alsos et al. 2016), with no specific sampling strategy. However, the detection of rare plant species will require a number of repeats (Alsos et al. 2016), and possibly sampling at several locations (Capo et al. 2021). The following questions can help to develop a sedaDNA research strategy and these topics will be discussed throughout this chapter:

  1. What is my study aim?
  2. What spatial and temporal scale do I need to cover?
  3. What contextual information and metadata do I need?
  4. What taxa should I target and at what taxonomic resolution?
  5. What laboratory and analytical methods should I use?
  6. How will I minimise / control for contamination, biases, and false positives?

Site selection

The aims of the study define the temporal and spatial scale needed to achieve them, thereby steering the selection of relevant sampling sites. Lake sediments provide a record of the plants that occurred in the lake catchment, being the area of land from which water and surface runoff drains into the lake (Giguet-Covex et al. 2019). A lake sediment record can only go as far back as the formation of the lake itself. Other terrestrial sediments may primarily contain the DNA that is deposited by plants growing at that particular location, or by humans, animals, or abiotic factors such as wind and water. For example, DNA in cave sediments will come primarily from organisms that have lived or died in the cave, or from remains that are transported into the cave (Hofreiter et al. 2003). The likelihood of finding sedaDNA should also be considered. However, more often than not the sampling location is opportunity driven, especially when it comes to archaeological sites, and sedaDNA retrieval can prove difficult.

General conditions under which sedaDNA preserves well are: cold and stable temperatures, neutral pH, dry or anoxic sediments with a high mineral content. Sediments from rockshelters, dry caves, and lake sediments are generally preferred as they are protected and provide stable conditions: rockshelter and dry cave sediments are sheltered from rain and have stable temperatures and there is some evidence that calcite has a high adsorption capacity for DNA (Capo et al. 2021; Freeman et al. 2020). Lake sediments on the other hand are often anoxic and generally undisturbed, especially when they are below the wave disturbance depth and subsurface slopes are gentle.

Dating of sediments

Dating is important in any study that involves ancient samples. Only with accurate dating can the timing of events be compared and their rates of change estimated. Commonly applied sediment dating methods are radioisotopic dating (in particular 210Pb, 14C, and luminescence dating) and dating based on chemostratigraphy or marker minerals (in particular tephrochronology), and the choice for a method depends on the type and age of the sediments (see Table 2 for an overview). Many sources describe these methods in detail (e.g., Bradley 1999) and we provide a brief introduction here.

Table 2.

Summary of sediment dating methods, their applicability and limitations. Sources: Barsanti et al. 2020; Bradley 1999; Fattahi and Stokes 2003.

Dating method Suitable sample types Age limit Sources of error and uncertainty
210Pb dating Materials from aquatic environments such as lacustrine and marine deposits ~100 to 150 years Complex sedimentation processes that break the dating model assumptions, such as compaction, local mixing, erosion etc.
14C (radiocarbon) dating Organic remains (charcoal, wood, animal tissue), carbonates (corals, sediments, stalagmites and stalactites), water, air and organic matter from various sediments, soil, paleosol and peat deposits Up to 50,000 years Atmospheric 14C content fluctuation due to changes in cosmogenic production rate and exchange between the atmosphere and ocean
Luminescence dating: TL: materials containing crystalline minerals, such as sediments, lava, clay, and ceramics TL: A few years to over 1,000,000 years Variations in environmental radiation dose; saturation of electron traps in sample minerals
- Thermoluminescence (TL)
- Optical stimulated luminescence (OSL) OSL: materials containing quartz or potassium feldspar sand-sized grains, or fine-grained mineral deposits OSL: A few decades to ~150,000 years for quartz.
Tephrochronology Terrestrial and lake sediments, marine deposits and ice cores that contain tephra Up to 35,000 years, extendable under good conditions Can only obtain indirect dates within the 14C age range

Radioisotopic dating is based on the principle of radioactive decay. When a nucleus breaks down, it emits energy and forms a daughter product. The time this takes is expressed as the half-life, i.e., the time that it takes for 50% of a parent element to transmute into the daughter product. The relative quantity of a radioactive parent element in a sample can be used to infer its age. Relatively young aquatic sediments, with ages up to 150 years are commonly dated with 210Pb (half-life: 22.27 years; Barsanti et al. 2020). 210Pb occurs naturally in the atmosphere and settles in sediments through dry fallout or precipitation. The supply of this 210Pb is not constant but the decline of this excess 210Pb along a sediment sequence is a proxy for the sedimentation rate. Additionally, if the age at a point of the sequence is known, a chronology can be determined. Radiocarbon (14C, half-life: 5730 years) is a radioactive isotope of carbon that naturally occurs in the atmosphere. Plants fix atmospheric carbon during photosynthesis, so the level of 14C in plants and animals upon death approximately equals the level of 14C in the atmosphere at that time. After death, it decreases as 14C decays to 14N at a rate of 50% per 5730 years, allowing the date of death to be estimated. Limited by its half-life, radiocarbon dating is only possible for samples younger than 50,000 years. As the concentration of atmospheric 14C is not constant over time, radiocarbon dates are calibrated against a global calibration curve obtained from tree rings and varved lake sediments (Reimer et al. 2020). This produces calendrical dates, which are expressed as calibrated years before present (cal years BP) with present being 1950 (before large-scale testing of nuclear weapons). The most reliable age-depth models for both marine and lake sediments use accelerator mass-spectrometry (AMS) dating of macroscopic plant or animal fragments (as little as 0.1 mg) as this can avoid the problems of both mixed material and also the so-called hard-water error associated with carbonate waters.

Luminescence dating is based on the phenomenon that mineral crystals absorb electrons from the ionising radiation of surrounding sediments over time, and when stimulated in a laboratory by heat or light, they release the accumulated radiation as luminescence. The intensity of measured luminescence indicates the length of time between this in-lab stimulation and the last natural event of similar stimulation. Heat stimulated or thermoluminescence (TL) dating is used to date baked pottery from archeological sites or sediments once in contact with molten lava; optically stimulated luminescence (OSL) dating is used to date sediments once exposed to sunlight. The time range for luminescence dating can be from a few decades to over 1 Ma, depending on the ability of a mineral to absorb radiation over time. For studies concerning relatively young samples, OSL dating of quartz grains are generally used, covering from a few decades to ~150 ka.

Tephrochronology uses the chemical signature of tephra (volcanic ash) to pinpoint the age of that specific layer in a sediment sequence by reference to known or unknown dated volcanic eruptions. Terrestrial sediments (Froese et al. 2006), marine deposits (Larsen et al. 2002), and ice cores (Davies et al. 2008) from areas once under the influence of dated volcanic eruption events can be dated with this method. With accurate geochemical fingerprinting, tephrochronology can be used to corroborate or even extend the dating limits of other techniques.

Prepare to work cleanly

DNA is everywhere - including in the air - and contamination can come from many different sources. When collecting and working with sedaDNA samples, it is important to keep in mind that the DNA you are interested in will probably be present in very low concentrations. Contamination with modern DNA can easily overpower the sedaDNA signal in which you are interested. Therefore it is important to absolutely minimise the amount of modern DNA coming into your samples and limit further degradation of the sedaDNA.

The precautions you can take include: work cleanly, use equipment that is free of DNA and nucleases, and try to keep the samples in a stable and cold environment. In practice this is not so easy, which is why dedicated ancient DNA facilities are set up to avoid any form of contamination. These facilities should be physically isolated - ideally in a separate building - from any location where PCRs are performed (Fulton and Shapiro 2019) and strict cleaning regimes and clean lab practices should be upheld. How to set up and work in an ancient DNA lab is described in detail by e.g., Cooper and Poinar (2000) and Fulton and Shapiro (2019). Here we summarise general clean lab practices. We note that working cleanly and consistently will require practice and adequate training.

You should assume that everything that you bring into the lab is contaminated with DNA. Therefore, before entering the lab, you should have showered and changed into clean clothes and everything you bring into the lab should be decontaminated. Inside the lab, you should wear a hairnet, face mask, full body suit with hood, shoe covers, and gloves at all times. Wearing two layers of gloves will allow you to change the outer gloves while still covering your hands, and you should change your outer gloves regularly while working. All tools and equipment should be decontaminated before use, and regular cleaning of the aDNA workspace is needed. Decontamination can be achieved by using a DNA decontamination product (e.g., 3-10% bleach or DNA-ExitusPlusTM) for surfaces, ideally supplemented with UV irradiation of the workspace. To prevent cross-contamination, tools should be cleaned between working with each sample or sample-extract. Tools should be left in a DNA decontamination product for at least 10 minutes, rinsed with UV irradiated milliQ water, and ideally also UV irradiated using a UV crosslinker with irradiation at the shortest distance possible to the UV source (Champlot et al. 2010).

Collection, transport, and storage of ancient sediment samples

Choices for sampling and personal protective equipment will depend on the setting, as the sampling of sediments at an archaeological site can be very different from the sub-sampling of a lake sediment core in a lab facility. It is important to try to limit the amount of potential contamination, but practical considerations and the target DNA can also be leading. For example, a study aiming to recover human aDNA will require stricter use of personal protective equipment than a study focussing on plant aDNA. Sampling of sediments can be done directly in the field or by subsampling of sediment cores in a clean, sheltered environment. When collecting sediment cores for sedaDNA, closed-chamber piston-type corers are preferred (Parducci et al. 2017) as they enclose the sediment in a plastic tube that can be opened in the laboratory. As frozen sediments should be kept at freezing temperatures, subsampling of these types of cores requires a climate chamber (Epp et al. 2019).

A general sedaDNA sampling kit contains personal protective equipment, sampling equipment, and cleaning products, including: full bodysuits, face masks, hairnets, nitrile gloves, sterile scalpels, sample tubes, clean ziplock bags, DNA decontamination products, distilled water, 70% ethanol, trays or beakers for cleaning the tools, paper towels, trash bags and pens for labelling. To limit potential contamination, much of the preparation for the sampling kit takes place in the ancient DNA lab facility: making sure the sampling tools and collection tubes are prepared and DNA-free. Aluminium foil can be helpful for covering your workspace and provides a clean surface for all of the sampling materials at a sampling site. Sterile syringes with the tip cut off can be useful mini-corers, speeding up the sample-taking (Epp et al. 2019). If you are taking sub-samples in a lab facility, make sure it is isolated from any PCR machine as the high number of DNA copies produced with PCR can become airborne and may enter your samples through the building air supply (Fulton and Shapiro 2019; Willerslev and Cooper 2005). Tracing of contamination during sampling can be done by placing several open sample tubes with DNA-free water in your work area (Parducci et al. 2017), or using tracer DNA during coring or on the outside of the sediment core (Epp et al. 2019; Pedersen et al. 2016).

The sampling itself follows aDNA lab procedures where possible, even if it takes place elsewhere: clean the workspace, use personal protective equipment, do not hover over the sediment you are sampling and change outer gloves and tools between each individual sample. In order to avoid contamination, sampling should start at the oldest part of the sediment, working your way up to the youngest parts and subsamples from sediment cores should be taken from inside the undisturbed centre (Parducci et al. 2017). Sampling procedures for both non-frozen and frozen sediment cores are described in detail by Epp et al. (2019). Collected samples should be kept in a stable and low-temperature environment (i.e. freeze at -20 for longer term storage), as degradation slows down with lower temperatures and temperature fluctuations can be additionally damaging to the DNA. An ice-box with ice packs can be used for temporary storage and transport of the taken samples. Further processing of the sedaDNA samples should be done in a laboratory dedicated to working with ancient DNA.

Sedimentary ancient DNA extraction

The choice for a specific DNA extraction protocol depends on a range of factors, including the aim of your study, sample characteristics, available laboratory facilities and equipment, and costs of the reagents or extraction kits. The latter can be a consideration of investing either time or finances as it can be cheaper to make the buffers needed for extraction yourself, but this also increases the preparation time and could introduce additional contamination to your samples. There are several protocols that can be used for sedaDNA extraction (see Capo et al. 2021; for a detailed review) and general steps are: sample homogenization, lysis, binding, washing, and elution of the DNA. Here we discuss some of the most commonly used extraction protocols and we summarise their main advantages and limitations in Table 3.

Table 3.

Overview of the advantages and limitations of several commonly used extraction protocols and some example publications using these protocols.

Extraction protocol Sample size Advantages Limitations Used by
DNeasy PowerMax kit (Qiagen) ≤ 10 g - Large initial sample volume - Expensive Epp et al. 2018; Zimmermann et al. 2017
- Few inhibitors in the resulting extract - DNA can be lost with inhibitor removal solution
DNeasy PowerSoil kit (Qiagen) ≤ 250 mg - Few amplification and sequencing inhibitors in the resulting extract - DNA can be lost with inhibitor removal solution Lejzerowicz et al. 2013; Monchamp et al. 2016; Dommain et al. 2020
- Easy processing of large sets of samples - Smaller initial sample volume compared to the PowerMax kit
Rohland protocol (Rohland et al. 2018) ≤ 50 mg - Developed to recover small DNA fragments - Small starting amount of sediment Zavala et al. 2021; Vernot et al. 2021
- Easy processing of large sets of samples - Potential coextraction of inhibitors
- Homemade buffers can increase contamination risk
Phosphate buffer + NucleoSpin® Soil kit (Taberlet et al. 2012) ≤ 15 g - Large initial sample volume - Extracts only extracellular DNA Giguet-Covex et al. 2014; Pansu et al. 2015
- Processes a 2 ml subsample of the phosphate buffer and sample mixture
Murchie protocol (Murchie et al. 2020) ≤ 250 mg - High DNA yields - Optimised for permafrost samples and may not perform as well in lake sediment Murchie et al. 2020
- Uses a high volume binding buffer to improve the recovery of small DNA fragments

All extraction protocols include similar steps for the isolation of sedimentary DNA (Figure 3), but due to the differences in chemical composition of the buffers, input volume, use of equipment, and targeted DNA (total DNA, iDNA, or exDNA), results of these protocols can vary. You can decide to extract only exDNA using the “Taberlet protocol”, where samples are first incubated in a saturated phosphate buffer and later on purified with an extraction kit, skipping the lysis step (Taberlet et al. 2012). An advantage is that a large sample volume can be processed, minimising the possible effects of heterogeneous distribution of DNA in the sediment. However, DNA yield and purity can be lower in comparison to the DNeasy PowerMax Kit (Qiagen), formerly known as the PowerMax Soil DNA Isolation Kit (MO BIO Laboratories, Inc.; Zinger et al. 2016) and probably also to other protocols targeting total DNA (e.g., the Rohland protocol; Rohland et al. 2018).

Figure 3.

Common DNA extraction steps: (1) samples are first homogenized using a sterile scalpel and later on go through a step, in which either (2a) extracellular DNA is washed off the sedimentary matrix (Taberlet et al. 2012) and/or (2b) intracellular DNA is freed through lysis, which can include beating with garnet beads. The free DNA suspended in a high salt buffer can now bind to either (3a) a silica column or (3b) silica magnetic beads, (4) samples are washed with an ethanol based buffer to remove impurities, and finally (5) DNA is eluted in an elution buffer. Figure based on Rohland et al. (2018).

SedaDNA studies employing protocols developed for the extraction of modern environmental DNA from soils and sediments generally add additional steps to increase the yield of DNA from low concentration ancient sediment samples. A lysis step can be added to extract iDNA from intact cells present in the samples through chemical lysis, and/or mechanical shearing of cell membranes using beads. Adding certain chemicals to the lysis buffer can also increase yield: N-phenacylthiazolium bromide (PTB) breaks down cross-links between DNA and proteins (Vasan et al. 1996; Poinar et al. 1998), and adding proteinase K and dithiothreitol (DTT) during the lysis step of the PowerMax and PowerSoil kits allows better recovery of DNA (Epp et al. 2019). It has also been suggested to concentrate the DNA before further processing (Taberlet et al. 2018), as sedaDNA concentrations are likely to be low (Zimmermann et al. 2020). The Rohland protocol is specifically designed to target degraded DNA from ancient samples (Rohland et al. 2018) and should yield a higher concentration of short fragments compared to the other extraction protocols, especially when silica magnetic beads are used for DNA binding.

Figure 4.

Chapter 8 Infographic: Visual representation of the content of this chapter. Top left image based on Pederson et al. (2015).

Be aware that the presence of certain substances may inhibit further amplification or sequencing steps. These can be derived from humic substances (important components of humus), which are commonly present in sediments and might inhibit downstream analysis. Moreover, the amount of humic substances is site-specific, and it might be necessary to repurify the samples or use inhibitor removal columns. During DNA extraction, contamination may be introduced from the laboratory facilities, tools, reagents and other consumables. It is essential to track this contamination by including a negative control. It is suggested to add one such extraction control for each batch of 11 samples, and include it in all subsequent steps (e.g., metabarcoding, library preparation, sequencing; Rohland et al. 2018). It is common for the extraction of modern DNA to add a positive control with a known DNA content, but due to the contamination risk this is not recommended for sedaDNA (Willerslev and Cooper 2005).

Molecular methods for sedaDNA

After extracting the DNA, the sedaDNA needs to be further processed before sequencing and several approaches are continuously being improved and new ones developed.

Most sedaDNA studies apply a DNA metabarcoding approach, using PCR amplification primers to target short DNA sequences (< 300 bp, preferentially around or below 100 bp) from taxonomic marker genes to identify specific taxonomic groups (see Chapter 11 Amplicon metabarcoding). It is relatively low cost and some of the metabarcoding primers give high taxonomic resolution. However, this method can introduce amplification bias (Bellemain et al. 2010) and is susceptible to errors introduced in the PCR. More recently, shotgun sequencing became another option for these types of samples (Pedersen et al. 2016). This approach converts the DNA extracts directly to a library for sequencing, allowing the analyses of the entire diversity of taxonomic groups in the samples including microorganisms (Ahmed et al. 2018), plants (Parducci et al. 2019; Pedersen et al. 2016), animals (Graham et al. 2016; Pedersen et al. 2016), and humans (Slon et al. 2017; Vernot et al. 2021). Shotgun sequencing requires a high sequencing depth and can be costly as most sequences will be from non-target organisms. Target capture has recently been applied to sedaDNA samples to enrich the concentration of taxa of interest in a shotgun approach by using DNA (Schulte et al. 2020) or RNA (Murchie et al. 2020; Seeber et al. 2019) baits. These methods are described in detail in Chapter 11 Amplicon metabarcoding, Chapter 12 Metagenomics, and Chapter 14 Target capture, and are followed by library preparation and sequencing (see Chapter 9 Sequencing platforms and data types).

Sequencing data can be processed using bioinformatic tools, where strict quality filtering of the sequence data is followed by taxonomic assignment. Further filtering allows removal of sequences with low identity scores, contaminants (i.e., sequences present in the controls), and false-positives (see Chapter 18 Sequence to species for details). False identifications can be caused by the quality of the reference library, but also by technical errors, contamination, or errors in the DNA sequences, especially as sedaDNA is generally highly degraded and of low concentration. It is therefore important to check if the identifications make sense for the sampling location and age before further analyses of the sedaDNA data.


  1. Name and explain two main advantages of using sedaDNA as a proxy for past plant presence compared to pollen. Motivate your answer.
  2. Imagine you have a long lake sediment core that is thought to be between 50 000 and 10 000 years old. What dating methods could be used to date this core and why?
  3. What are the main sources of bias when working with sedaDNA (name at least 3) and how can you limit the resulting false positives?


Alkylation – Addition or substitution of an alkyl group (CnH2n+1) to an organic molecule.

Accelerator Mass-Spectrometry (AMS) dating – A dating method that determines the age of an organic material (i.e., macroscopic remains of plants or animals) by measuring their radiocarbon concentration.

Cell lysis – The process whereby the membrane(s) of a cell breaks down, thereby releasing the cell contents.

exDNA – Extracellular DNA; all DNA located outside cell membranes.

Geochemical fingerprinting – A method using chemical signals to infer the origin, the formation and/or the environment of a geological sample.

Half-life – The time necessary for half of a radioactive atom’s nucleus to decay by emission of matter and energy to form a new daughter product. The half-life is specific to a radioactive element, and can be used for dating purposes.

iDNA – Intracellular DNA; all DNA present within cell membranes.

Lake catchment – Area of land from which water and surface runoff drains into a lake.

Luminescence dating – A group of methods to determine how long ago mineral grains were last exposed to sunlight or sufficient heating by measuring the luminescence emitted by the mineral grain upon stimulation.

Metabarcoding – Method for the simultaneous identification of many taxa within the same complex DNA extract. This is achieved by high throughput sequencing (HTS) of amplicons from taxonomic marker genes (barcodes).

Next Generation Sequencing (NGS) – Massively parallel sequencing technology allowing high throughput of DNA.

Nucleases – Diverse group of enzymes able to hydrolyze the phosphodiester bonds of DNA and RNA thereby cleaving them into smaller fragments.

Optically stimulated luminescence (OSL) dating – Dating method that determines the age of a sample by measuring the luminescence it emits in response to visible or infrared light.

Palaeoecology – The study of the relationship between past organisms and their ancient environments.

Permafrost – Soil, sediment, or rock that is continuously exposed to temperatures of < 0 °C for at least two consecutive years.

Radioactive isotope – An atom with excess nuclear energy and prone to undergo radioactive decay.

Reference library – A database of known DNA sequences with their taxonomic identifications, used in bioinformatics as a reference to identify the DNA sequences obtained in a sedaDNA study.

sedaDNA – Sedimentary ancient DNA; this is the aged and degraded DNA from dead organisms now incorporated in the sediment record, either as iDNA in dead tissues, or as exDNA free in the sediment matrix or adsorbed to sediment particles.

Shotgun sequencing – A method for the random sequencing of all of the DNA within a DNA extract.

Taphonomic processes – The processes involved in the transfer, deposition and preservation or organismal remains, including DNA.

Target capture – A technique that allows the capture of the DNA of interest by hybridization to target-specific probes (baits).

Tephrochronology – A geochronological technique that uses layers of tephra (volcanic ash from a single volcanic eruption) to create a chronological framework for the sedimentary record.

Thermoluminescence (TL) dating – Dating method that determines the age of a sample by measuring the luminescence it emits in response to heat.

Total DNA – The intracellular and extracellular DNA combined.

Tree-ring dating – Also called dendrochronology; a method of dating tree rings to the exact year they were formed.


  • Ahmed E, Parducci L, Unneberg P, Ågren R, Schenk F, Rattray JE, Han L, Muschitiello F, Pedersen MW, Smittenberg RH, Yamoah KA, Slotte T, Wohlfarth B (2018) Archaeal community changes in Lateglacial lake sediments: Evidence from ancient DNA. Quat. Sci. Rev. 181, 19–29. doi: 10.1016/j.quascirev.2017.11.037
  • Alsos IG, Lammers Y, Yoccoz NG, Jørgensen T, Sjögren P, Gielly L, Edwards ME (2018) Plant DNA metabarcoding of lake sediments: How does it represent the contemporary vegetation. PLoS ONE 13, e0195403. doi: 10.1371/journal.pone.0195403
  • Alsos IG, Sjögren P, Brown AG, Gielly L, Merkel MKF, Paus A, Lammers Y, Edwards ME, Alm T, Leng M, Goslar T, Langdon CT, Bakke J, van der Bilt WGM (2020) Last Glacial Maximum environmental conditions at Andøya, northern Norway; evidence for a northern ice-edge ecological “hotspot.” Quat. Sci. Rev. 239, 106364. doi: 10.1016/j.quascirev.2020.106364
  • Alsos IG, Sjögren P, Edwards ME, Landvik JY, Gielly L, Forwick M, Coissac E, Brown AG, Jakobsen LV, Foreid MK, Pedersen MW (2016) Sedimentary ancient DNA from Lake Skartjorna, Svalbard: Assessing the resilience of arctic flora to Holocene climate change. The Holocene 26, 627–642. doi: 10.1177/0959683615612563
  • Barsanti M, Garcia-Tenorio R, Schirone A, Rozmaric M, Ruiz-Fernández AC, Sanchez-Cabeza JA, Delbono I, Conte F, De Oliveira Godoy JM, Heijnis H, Eriksson M, Hatje V, Laissaoui A, Nguyen HQ, Okuku E, Al-Rousan SA, Uddin S, Yii MW, Osvath I (2020) Challenges and limitations of the 210Pb sediment dating method: Results from an IAEA modelling interlaboratory comparison exercise. Quat. Geochronol. 59, 101093. doi: 10.1016/j.quageo.2020.101093
  • Bellemain E, Carlsen T, Brochmann C, Coissac E, Taberlet P, Kauserud H (2010) ITS as an environmental DNA barcode for fungi: an in silico approach reveals potential PCR biases. BMC Microbiol. 10, 189. doi: 10.1186/1471-2180-10-189
  • Birks HH, Bjune AE (2010) Can we detect a west Norwegian tree line from modern samples of plant remains and pollen? Results from the DOORMAT project. Veg. Hist. Archaeobot. 19, 325–340. doi: 10.1007/s00334-010-0256-0
  • Boessenkool S, McGlynn G, Epp LS, Taylor D, Pimentel M, Gizaw A, Nemomissa S, Brochmann C, Popp M (2014) Use of ancient sedimentary DNA as a novel conservation tool for high-altitude tropical biodiversity. Conserv. Biol. 28, 446–455. doi: 10.1111/cobi.12195
  • Bradley RS (1999) Paleoclimatology: reconstructing climates of the Quaternary. Elsevier.
  • Bremond L, Favier C, Ficetola GF, Tossou MG, Akouégninou A, Gielly L, Giguet-Covex C, Oslisly R, Salzmann U (2017) Five thousand years of tropical lake sediment DNA records from Benin. Quat. Sci. Rev. 170, 203–211. doi: 10.1016/j.quascirev.2017.06.025
  • Brown AG, Van Hardenbroek M, Fonville T, Davies K, Mackay H, Murray E, Head K, Barratt P, McCormick F, Ficetola GF, Gielly L, Henderson ACG, Crone A, Cavers G, Langdon PG, Whitehouse NJ, Pirrie D, Alsos IG (2021) Ancient DNA, lipid biomarkers and palaeoecological evidence reveals construction and life on early medieval lake settlements. Sci. Rep. 11, 11807. doi: 10.1038/s41598-021-91057-x
  • Capo E, Giguet-Covex C, Rouillard A, Nota K, Heintzman PD, Vuillemin A, Ariztegui D, Arnaud F, Belle S, Bertilsson S, Bigler C, Bindler R, Brown AG, Clarke CL, Crump SE, Debroas D, Englund G, Ficetola GF, Garner RE, Gauthier J, Parducci L (2021) Lake sedimentary DNA research on past terrestrial and aquatic biodiversity: overview and recommendations. Quaternary 4, 6. doi: 10.3390/quat4010006
  • Champlot S, Berthelot C, Pruvost M, Bennett EA, Grange T, Geigl E-M (2010) An efficient multistrategy DNA decontamination procedure of PCR reagents for hypersensitive PCR applications. PLoS ONE 5. doi: 10.1371/journal.pone.0013042
  • Clarke CL, Alsos IG, Edwards ME, Paus A, Gielly L, Haflidason H, Mangerud J, Regnéll C, Hughes PDM, Svendsen JI, Bjune AE (2020) A 24,000-year ancient DNA and pollen record from the Polar Urals reveals temporal dynamics of arctic and boreal plant communities. Quat. Sci. Rev. 247, 106564. doi: 10.1016/j.quascirev.2020.106564
  • Cooper A, Poinar HN (2000) Ancient DNA: do it right or not at all. Science 289, 1139. doi: 10.1126/science.289.5482.1139b
  • Dabney J, Meyer M, Pääbo S (2013) Ancient DNA damage. Cold Spring Harb. Perspect. Biol. 5. doi: 10.1101/cshperspect.a012567
  • Davies SM, Wastegård S, Rasmussen TL, Svensson A, Johnsen SJ, Steffensen JP, Andersen KK (2008) Identification of the Fugloyarbanki tephra in the NGRIP ice core: a key tie-point for marine and ice-core sequences during the last glacial period. J. Quaternary Sci. 23, 409–414. doi: 10.1002/jqs.1182
  • Dommain R, Andama M, McDonough MM, Prado NA, Goldhammer T, Potts R, Maldonado JE, Nkurunungi JB, Campana MG (2020) The Challenges of Reconstructing Tropical Biodiversity With Sedimentary Ancient DNA: A 2200-Year-Long Metagenomic Record From Bwindi Impenetrable Forest, Uganda. Front. Ecol. Evol. 8. doi: 10.3389/fevo.2020.00218
  • Epp LS, Kruse S, Kath NJ, Stoof-Leichsenring KR, Tiedemann R, Pestryakova LA, Herzschuh U (2018) Temporal and spatial patterns of mitochondrial haplotype and species distributions in Siberian larches inferred from ancient environmental DNA and modeling. Sci. Rep. 8, 17436. doi: 10.1038/s41598-018-35550-w
  • Epp LS, Zimmermann HH, Stoof-Leichsenring KR (2019) Sampling and Extraction of Ancient DNA from Sediments. Methods Mol. Biol. 1963, 31–44. doi: 10.1007/978-1-4939-9176-1_5
  • Fattahi M, Stokes S (2003) Dating volcanic and related sediments by luminescence methods: a review. Earth-Science Reviews 62, 229–264. doi: 10.1016/S0012-8252(02)00159-9
  • Ficetola GF, Poulenard J, Sabatier P, Messager E, Gielly L, Leloup A, Etienne D, Bakke J, Malet E, Fanget B, Støren E, Reyss J-L, Taberlet P, Arnaud F (2018) DNA from lake sediments reveals long-term ecosystem changes after a biological invasion. Sci. Adv. 4, eaar4292. doi: 10.1126/sciadv.aar4292
  • Freeman Dieudonné, Collins Sand (2020) Survival of environmental DNA in natural environments: Surface charge and topography of minerals as driver for DNA storage. BioRxiv. doi: 10.1101/2020.01.28.922997
  • Froese DG, Zazula GD, Reyes AV (2006) Seasonality of the late Pleistocene Dawson tephra and exceptional preservation of a buried riparian surface in central Yukon Territory, Canada. Quat. Sci. Rev. 25, 1542–1551. doi: 10.1016/j.quascirev.2006.01.028
  • Fulton TL, Shapiro B (2019) Setting up an ancient DNA laboratory. Methods Mol. Biol. 1963, 1–13. doi: 10.1007/978-1-4939-9176-1_1
  • Giguet-Covex C, Ficetola GF, Walsh K, Poulenard J, Bajard M, Fouinat L, Sabatier P, Gielly L, Messager E, Develle AL, David F, Taberlet P, Brisset E, Guiter F, Sinet R, Arnaud F (2019) New insights on lake sediment DNA from the catchment: importance of taphonomic and analytical issues on the record quality. Sci. Rep. 9, 14676. doi: 10.1038/s41598-019-50339-1
  • Giguet-Covex C, Pansu J, Arnaud F, Rey P-J, Griggo C, Gielly L, Domaizon I, Coissac E, David F, Choler P, Poulenard J, Taberlet P (2014) Long livestock farming history and human landscape shaping revealed by lake sediment DNA. Nat. Commun. 5, 3211. doi: 10.1038/ncomms4211
  • Graham RW, Belmecheri S, Choy K, Culleton BJ, Davies LJ, Froese D, Heintzman PD, Hritz C, Kapp JD, Newsom LA, Rawcliffe R, Saulnier-Talbot É, Shapiro B, Wang Y, Williams JW, Wooller MJ (2016) Timing and causes of mid-Holocene mammoth extinction on St. Paul Island, Alaska. Proc Natl Acad Sci USA 113, 9310–9314. doi: 10.1073/pnas.1604903113
  • Haile J, Froese DG, Macphee RDE, Roberts RG, Arnold LJ, Reyes AV, Rasmussen M, Nielsen R, Brook BW, Robinson S, Demuro M, Gilbert MTP, Munch K, Austin JJ, Cooper A, Barnes I, Möller P, Willerslev E (2009) Ancient DNA reveals late survival of mammoth and horse in interior Alaska. Proc Natl Acad Sci USA 106, 22352–22357. doi: 10.1073/pnas.0912510106
  • Hebsgaard MB, Gilbert MTP, Arneborg J, Heyn P, Allentoft ME, Bunce M, Munch K, Schweger C, Willerslev E (2009) ‘The Farm Beneath the Sand’ – an archaeological case study on ancient ‘dirt’ DNA. Antiquity 83, 430–444. doi: 10.1017/S0003598X00098537
  • Hofreiter M, Mead JI, Martin P, Poinar HN (2003) Molecular caving. Curr. Biol. 13, R693-5.
  • Jørgensen T, Haile J, Möller P, Andreev A, Boessenkool S, Rasmussen M, Kienast F, Coissac E, Taberlet P, Brochmann C, Bigelow NH, Andersen K, Orlando L, Gilbert MTP, Willerslev E (2012) A comparative study of ancient sedimentary DNA, pollen and macrofossils from permafrost sediments of northern Siberia reveals long-term vegetational stability. Mol. Ecol. 21, 1989–2003. doi: 10.1111/j.1365-294x.2011.05287.x
  • Kromer B (2009) Radiocarbon and dendrochronology. Dendrochronologia 27, 15–19. doi: 10.1016/j.dendro.2009.03.001
  • Larsen G, Eiríksson J, Knudsen KL, Heinemeier J (2002) Correlation of late Holocene terrestrial and marine tephra markers, north Iceland: implications for reservoir age changes. Polar Res. 21, 283–290. doi: 10.1111/j.1751-8369.2002.tb00082.x
  • Lejzerowicz F, Esling P, Majewski W, Szczuciński W, Decelle J, Obadia C, Arbizu PM, Pawlowski J (2013) Ancient DNA complements microfossil record in deep-sea subsurface sediments. Biol. Lett. 9, 20130283. doi: 10.1098/rsbl.2013.0283
  • Masselink G, Hughes M, Knight J (2014) Introduction to Coastal Processes and Geomorphology.
  • Monchamp M-E, Walser J-C, Pomati F, Spaak P (2016) Sedimentary DNA Reveals Cyanobacterial Community Diversity over 200 Years in Two Perialpine Lakes. Appl. Environ. Microbiol. 82, 6472–6482. doi: 10.1128/AEM.02174-16
  • Murchie TJ, Kuch M, Duggan AT, Ledger ML, Roche K, Klunk J, Karpinski E, Hackenberger D, Sadoway T, MacPhee R, Froese D, Poinar H (2020) Optimizing extraction and targeted capture of ancient environmental DNA for reconstructing past environments using the PalaeoChip Arctic-1.0 bait-set. Quaternary Research 1–24. doi: 10.1017/qua.2020.59
  • Niemeyer B, Epp LS, Stoof-Leichsenring KR, Pestryakova LA, Herzschuh U (2017) A comparison of sedimentary DNA and pollen from lake sediments in recording vegetation composition at the Siberian treeline. Mol. Ecol. Resour. 17, e46–e62. doi: 10.1111/1755-0998.12689
  • Overballe-Petersen S, Willerslev E (2014) Horizontal transfer of short and degraded DNA has evolutionary implications for microbes and eukaryotic sexual reproduction. Bioessays 36, 1005–1010. doi: 10.1002/bies.201400035
  • Pansu J, Giguet-Covex C, Ficetola GF, Gielly L, Boyer F, Zinger L, Arnaud F, Poulenard J, Taberlet P, Choler P (2015) Reconstructing long-term human impacts on plant communities: an ecological approach based on lake sediment DNA. Mol. Ecol. 24, 1485–1498. doi: 10.1111/mec.13136
  • Parducci L, Alsos IG, Unneberg P, Pedersen MW, Han L, Lammers Y, Salonen JS, Väliranta MM, Slotte T, Wohlfarth B (2019) Shotgun environmental DNA, pollen, and macrofossil analysis of lateglacial lake sediments from southern sweden. Front. Ecol. Evol. 7. doi: 10.3389/fevo.2019.00189
  • Parducci L, Bennett KD, Ficetola GF, Alsos IG, Suyama Y, Wood JR, Pedersen MW (2017) Ancient plant DNA in lake sediments. New Phytol. 214, 924–942. doi: 10.1111/nph.14470
  • Parducci L, Nota K, Wood J (2018) Reconstructing Past Vegetation Communities Using Ancient DNA from Lake Sediments, in: Lindqvist, C., Rajora, O.P. (Eds.), Paleogenomics: Genome-Scale Analysis of Ancient DNA, Population Genomics. Springer International Publishing, Cham, pp. 163–187. doi: 10.1007/13836_2018_38
  • Pedersen MW, Ginolhac A, Orlando L, Olsen J, Andersen K, Holm J, Funder S, Willerslev E, Kjær KH (2013) A comparative study of ancient environmental DNA to pollen and macrofossils from lake sediments reveals taxonomic overlap and additional plant taxa. Quat. Sci. Rev. 75, 161–168. doi: 10.1016/j.quascirev.2013.06.006
  • Pedersen MW, Overballe-Petersen S, Ermini L, Sarkissian CD, Haile J, Hellstrom M, Spens J, Thomsen PF, Bohmann K, Cappellini E, Schnell IB, Wales NA, Carøe C, Campos PF, Schmidt AM, Gilbert MT, Hansen AJ, Orlando L, Willerslev E (2015) Ancient and modern environmental DNA. Philos. Trans. R. Soc. Lond., B, Biol. Sci. 370, 20130383. Doi: 10.1098/rstb.2013.0383
  • Pedersen MW, Ruter A, Schweger C, Friebe H, Staff RA, Kjeldsen KK, Mendoza MLZ, Beaudoin AB, Zutter C, Larsen NK, Potter BA, Nielsen R, Rainville RA, Orlando L, Meltzer DJ, Kjær KH, Willerslev E (2016) Postglacial viability and colonization in North America’s ice-free corridor. Nature 537, 45–49. doi: 10.1038/nature19085
  • Poinar HN, Hofreiter M, Spaulding WG, Martin PS, Stankiewicz BA, Bland H, Evershed RP, Possnert G, Pääbo S (1998) Molecular coproscopy: dung and diet of the extinct ground sloth Nothrotheriops shastensis. Science 281, 402–406. doi: 10.1126/science.281.5375.402
  • Reimer PJ, Austin WEN, Bard E, Bayliss A, Blackwell PG, Bronk Ramsey C, Butzin M, Cheng H, Edwards RL, Friedrich M, Grootes PM, Guilderson TP, Hajdas I, Heaton TJ, Hogg AG, Hughen KA, Kromer B, Manning SW, Muscheler R, Palmer JG, Talamo S (2020) The IntCal20 Northern Hemisphere radiocarbon age calibration curve (0–55 cal kBP). Radiocarbon 1–33. doi: 10.1017/RDC.2020.41
  • Rohland N, Glocke I, Aximu-Petri A, Meyer M (2018) Extraction of highly degraded DNA from ancient bones, teeth and sediments for high-throughput sequencing. Nat. Protoc. 13, 2447–2461. doi: 10.1038/s41596-018-0050-5
  • Schulte L, Bernhardt N, Stoof-Leichsenring K, Zimmermann HH, Pestryakova LA, Epp LS, Herzschuh U (2020) Hybridization capture of larch (Larix Mill) chloroplast genomes from sedimentary ancient DNA reveals past changes of Siberian forest. Mol. Ecol. Resour. doi: 10.1111/1755-0998.13311
  • Seeber PA, McEwen GK, Löber U, Förster DW, East ML, Melzheimer J, Greenwood AD (2019) Terrestrial mammal surveillance using hybridization capture of environmental DNA from African waterholes. Mol. Ecol. Resour. 19, 1486–1496. doi: 10.1111/1755-0998.13069
  • Sjögren P, Edwards ME, Gielly L, Langdon CT, Croudace IW, Merkel MKF, Fonville T, Alsos IG (2017) Lake sedimentary DNA accurately records 20th Century introductions of exotic conifers in Scotland. New Phytol. 213, 929–941. doi: 10.1111/nph.14199
  • Slon V, Hopfe C, Weiß CL, Mafessoni F, de la Rasilla M, Lalueza-Fox C, Rosas A, Soressi M, Knul MV, Miller R, Stewart JR, Derevianko AP, Jacobs Z, Li B, Roberts RG, Shunkov MV, de Lumley H, Perrenoud C, Gušić I, Kućan Ž, Meyer M (2017) Neandertal and Denisovan DNA from Pleistocene sediments. Science 356, 605–608. doi: 10.1126/science.aam9695
  • Smith O, Momber G, Bates R, Garwood P, Fitch S, Pallen M, Gaffney V, Allaby RG (2015) Archaeology. Sedimentary DNA from a submerged site reveals wheat in the British Isles 8000 years ago. Science 347, 998–1001. doi: 10.1126/science.1261278
  • Taberlet P, Bonin A, Zinger L, Coissac E (Eds) (2018) Environmental DNA: For Biodiversity Research and Monitoring.
  • Taberlet P, Prud’Homme SM, Campione E, Roy J, Miquel C, Shehzad W, Gielly L, Rioux D, Choler P, Clément J-C, Melodelima C, Pompanon F, Coissac E (2012) Soil sampling and isolation of extracellular DNA from large amount of starting material suitable for metabarcoding studies. Mol. Ecol. 21, 1816–1820. doi: 10.1111/j.1365-294X.2011.05317.x
  • Torti A, Lever MA, Jørgensen BB (2015) Origin, dynamics, and implications of extracellular DNA pools in marine sediments. Mar. Genomics 24 Pt 3, 185–196. doi: 10.1016/j.margen.2015.08.007
  • Vasan S, Zhang X, Zhang X, Kapurniotu A, Bernhagen J, Teichberg S, Basgen J, Wagle D, Shih D, Terlecky I, Bucala R, Cerami A, Egan J, Ulrich P (1996) An agent cleaving glucose-derived protein crosslinks in vitro and in vivo. Nature 382, 275–278. doi: 10.1038/382275a0
  • Vernot B, Zavala EI, Gómez-Olivencia A, Jacobs Z, Slon V, Mafessoni F, Romagné F, Pearson A, Petr M, Sala N, Pablos A, Aranburu A, de Castro JMB, Carbonell E, Li B, Krajcarz MT, Krivoshapkin AI, Kolobova KA, Kozlikin MB, Shunkov MV, Meyer M (2021) Unearthing Neanderthal population history using nuclear and mitochondrial DNA from cave sediments. Science 372. doi: 10.1126/science.abf1667
  • Wen F, Curlango-Rivera G, Huskey DA, Xiong Z, Hawes MC (2017) Visualization of extracellular DNA released during border cell separation from the root cap. Am. J. Bot. 104, 970–978. doi: 10.3732/ajb.1700142
  • Willerslev E, Cappellini E, Boomsma W, Nielsen R, Hebsgaard MB, Brand TB, Hofreiter M, Bunce M, Poinar HN, Dahl-Jensen D, Johnsen S, Steffensen JP, Bennike O, Schwenninger J-L, Nathan R, Armitage S, de Hoog C-J, Alfimov V, Christl M, Beer J, Collins MJ (2007) Ancient biomolecules from deep ice cores reveal a forested southern Greenland. Science 317, 111–114. doi: 10.1126/science.1141758
  • Willerslev E, Cooper A (2005) Ancient DNA. Proc. Biol. Sci. 272, 3–16. doi: 10.1098/rspb.2004.2813
  • Willerslev E, Davison J, Moora M, Zobel M, Coissac E, Edwards ME, Lorenzen ED, Vestergård M, Gussarova G, Haile J, Craine J, Gielly L, Boessenkool S, Epp LS, Pearman PB, Cheddadi R, Murray D, Bråthen KA, Yoccoz N, Binney H, Taberlet P (2014) Fifty thousand years of Arctic vegetation and megafaunal diet. Nature 506, 47–51. doi: 10.1038/nature12921
  • Willerslev E, Hansen AJ, Binladen J, Brand TB, Gilbert MTP, Shapiro B, Bunce M, Wiuf C, Gilichinsky DA, Cooper A (2003) Diverse plant and animal genetic records from Holocene and Pleistocene sediments. Science 300, 791–795. doi: 10.1126/science.1084114
  • Willerslev E, Hansen AJ, Rønn R, Brand TB, Barnes I, Wiuf C, Gilichinsky D, Mitchell D, Cooper A (2004) Long-term persistence of bacterial DNA. Curr. Biol. 14, R9–R10. doi: 10.1016/j.cub.2003.12.012
  • Wilmshurst JM, Moar NT, Wood JR, Bellingham PJ, Findlater AM, Robinson JJ, Stone C (2014) Use of pollen and ancient DNA as conservation baselines for offshore islands in New Zealand. Conserv. Biol. 28, 202–212. doi: 10.1111/cobi.12150
  • Zavala EI, Jacobs Z, Vernot B, Shunkov MV, Kozlikin MB, Derevianko AP, Essel E, de Fillipo C, Nagel S, Richter J, Romagné F, Schmidt A, Li B, O’Gorman K, Slon V, Kelso J, Pääbo S, Roberts RG, Meyer M (2021) Pleistocene sediment DNA reveals hominin and faunal turnovers at Denisova Cave. Nature. doi: 10.1038/s41586-021-03675-0
  • Zimmermann HH, Raschke E, Epp LS, Stoof-Leichsenring KR, Schirrmeister L, Schwamborn G, Herzschuh U (2017) The History of Tree and Shrub Taxa on Bol’shoy Lyakhovsky Island (New Siberian Archipelago) since the Last Interglacial Uncovered by Sedimentary Ancient DNA and Pollen Data. Genes (Basel) 8. doi: 10.3390/genes8100273
  • Zimmermann HH, Stoof-Leichsenring KR, Kruse S, Müller J, Stein R, Tiedemann R, Herzschuh U (2020) Changes in the composition of marine and sea-ice diatoms derived from sedimentary ancient DNA of the eastern Fram Strait over the past 30 000 years. Ocean Sci. 16, 1017–1032. doi: 10.5194/os-16-1017-2020
  • Zinger L, Chave J, Coissac E, Iribar A, Louisanna E, Manzi S, Schilling V, Schimann H, Sommeria-Klein G, Taberlet P (2016) Extracellular DNA extraction is a fast, cheap and reliable alternative for multi-taxa surveys based on soil DNA. Soil Biology and Biochemistry 96, 16–19. doi: 10.1016/j.soilbio.2016.01.008


  1. Possible advantages of sedaDNA compared to pollen as a proxy for past plant presence are: the possibility of detecting past plant presence even in the absence of visible remains; less labour-intensive as taxonomic identification is automated; in principle, no prior taxonomic knowledge is needed for the data generation with sedaDNA (although it is highly called for in the interpretation of the data); and it is possible to obtain a higher taxonomic resolution depending on the choice of marker.
  2. For mineral-rich sediments, luminescence dating can be used as this method can be applied to sediments from a few decades old to over a million years old, and is based on the phenomenon that mineral crystals absorb electrons from ionising radiation of surrounding sediments over time. For sediment rich in organic materials, AMS radiocarbon dating of identified macroscopic remains (with calibration) is a good option. Radiocarbon dating is based on the concentration of C14 in organismic remains. The half-life of C14 (5730 years) makes it an appropriate method for samples under 50,000 years old. To increase confidence in the dating results, multiple dating techniques could be used for creating an age model for the core.
  3. Biases when working with sedaDNA can come from: taphonomic processes including differential DNA degradation and preservation, choice of metabarcoding primers, completeness of reference library, and contamination during sampling, DNA extraction and other lab processes. False positives can be limited by inclusion of multiple replicates and controls and prevention of contamination at every step of the experimental design, preparation of an appropriate reference database, and checking if the identifications fit with what is known for the age and location of the sample by a taxonomic expert.

Section 2: Methods

Chapter 9 Sequencing platforms


The revolution in genome-wide screening has vastly reduced the price for sequencing, with enormous implications in the biomedical field, industry, biodiversity monitoring, as well as in plant identification. The first plant genome (Arabidopsis thaliana L.) was sequenced using Sanger sequencing. This took 10 years to complete with an associated cost of approximately $100,000,000 (Arabidopsis Genome Initiative 2000). With current high-throughput sequencing (HTS) methods, this same genome now takes 1 week to sequence and assemble, and costs $1000 (Michael et al. 2018). Plant genomes and DNA sequences are however under-represented in the literature in comparison to other organisms such as microorganisms and animals, and most reported plant genomes belong to angiosperms with relatively small genomes. This is due to a number of confounding factors that make the sequencing of plant genomes particularly difficult including the extraction of sufficient quantities of high-quality DNA that is not irreparably damaged (Inglis et al. 2018) (see Chapter 1 DNA from plant tissue), the size and complexity of the genome (i.e., gene islands, high GC content, transposable elements), heterozygosity, and polyploidy (Chen et al. 2018) (see Chapter 16 Whole genome sequencing). Advances in high molecular weight DNA extraction, high throughput sequencing technologies, and bioinformatics approaches to deal with heterozygosity and assembly in recent years have alleviated these challenges tremendously, with enormous strides in the generation of high quality genomic datasets to test research hypotheses. In this chapter, we review the different sequencing technologies most commonly utilised today, beginning with Sanger sequencing, the current method of choice for small-scale projects. We then discuss HTS approaches, including next-generation sequencing and third-generation PCR-free sequencing methods (van Dijk et al. 2018). The underlying principles of these technologies are discussed and linked to their advantages and disadvantages in considering which method is best suited for different sequencing projects.

Sequencing platforms

Sanger sequencing

Sanger sequencing was introduced in 1977 by Sanger and colleagues, and for over 40 years, it was the most commonly-used form of sequencing (Heather and Chain 2016). Sanger sequencing is a PCR-based technique that uses chain-terminating fluorescently-labelled dideoxyribonucleotides (ddNTPs) to determine the sequence of Polymerase Chain Reactionamplified target DNA. The target DNA is mixed with both standard deoxyribonucleotide triphosphates (dNTPs) and a much lower concentration (around 1%) of four differently labelled fluorescent ddNTPs (ddATP, ddTTP, ddCTP, and ddGTP), that correspond to the 4 different dNTPs. The ddNTPs lack the chemical 3’-OH group that is required for phosphodiester bond formation. Thus, in the PCR reaction, when a fluorescently labelled ddNTP is added, the polymerase can no longer add another dNTP, and extension ceases. This results in chain termination with oligonucleotide copies of the target DNA terminated at random lengths (up to 1000 bp) by the fluorescently-labelled ddNTPs (Hagemann 2015).

In the second step of Sanger sequencing, the oligonucleotides are separated by size using capillary gel electrophoresis. A laser excites the terminal fluorescent nucleotide in each oligonucleotide, resulting in fluorescence emission that is detected and read by a computer. By reading the gel bands from smallest to largest, the 5’ to 3’ sequence of the target DNA can be determined at single base pair resolution. The data output for Sanger sequencing is a chromatogram which is automatically read by a computer to generate the DNA sequence. Primer sequences should be trimmed off the reads as these are not part of the target DNA, and the quality of the chromatogram should be assessed to determine the reliability of the generated DNA sequence. There are a number of online tutorials from both industrial and academic sources that we refer the reader to for assessing a chromatogram quality (University of Michigan, Biomedical Research Core Facilities, n.d.). Base calling accuracy can also be measured using Phred quality scores (Ewing and Green 1998; Ewing et al. 1998). A Phred quality score indicates the quality of an oligonucleotide assignment that is generated during DNA sequencing. A Phred score of 20 indicates 99% accuracy in the assignment, which is generally considered acceptable.

Sanger sequencing is not used today for large-scale genomic projects due its low throughput. The requirement of needing specific primers for a region of interest limits its easy use and application across divergent plant taxa. Additionally, the amplification of multicopy genes, such as the commonly used DNA barcode ITS (see Chapter 10 DNA barcoding), as well as markers in taxa of allopolyploid hybrid origin, result in difficult-to-interpret chromatograms. This is because nucleotide polymorphisms between different copies result in double peaks in the resulting chromatogram (Hughes et al. 2013).Nevertheless, it is still a widely-used technique for smaller scale projects on DNA barcoding and phylogenetics, especially incremental studies where new sequences are added to existing phylogenetic frameworks. Additionally, due to its high accuracy and relatively long reads, Sanger sequencing is also sometimes used in conjunction with HTS techniques that may have shorter reads and/or higher error rates to aid in the proper assembly of contigs and check the accuracy of the final sequence (Slatko et al. 2018).

Illumina sequencing

Illumina was the second HTS technique that became commercially available in the early 2000s (Heather and Chain 2016; McGinn and Gut 2013). Illumina was preceded by the Roche 454 pyrosequencing by synthesis sequencing, though this system has since been discontinued (Edwards et al. 2006; Thomas et al. 2012). As in Sanger sequencing, Illumina also uses fluorescently-labelled dNTPs though in Illumina sequencing they do not permanently block further synthesis of a growing nucleotide strand. Additionally, Illumina sequencing is done in an enormously parallel fashion. This results in dramatic time and cost reductions compared to Sanger sequencing for large-scale genomic projects. Today, Illumina, along with PacBio and Nanopore technologies, is one of the most widely used technologies for large-scale genomic projects (van Dijk et al. 2018).

In Illumina sequencing, like in other high throughput sequencing approaches, the target DNA is initially broken into shorter fragments that match the optimal fragment sequencing length of the platform, if not already present as shorter segments. These fragments are then PCR-amplified with adaptors that can be individually chemically tethered to the flow cell surface. Using bridge amplification (Clark et al. 2018), these fragments are amplified to form millions of dense clusters of DNA strands in the flow cell, as the platform cannot read single DNA strands but needs thousands of identical strands for accurate base calling. After this initial amplification step, fluorescently-labelled dNTPs are added to the flow cells. Depending on the synthesis by sequencing technology four, two or one dyes are added respectively. The newer NovaSeq, NextSeq 550, and MiniSeq platforms use the faster two dye two-channel technology.

Dyed dNTPs are added in a controlled fashion through the use of reversible blocking group chemistry, so that the emission of each added fluorescent dNTP is read before the addition of the next fluorescently-labelled dNTP. This process is done on millions of fragments simultaneously, making it a far more efficient method than Sanger sequencing for large-scale genomic projects (Slatko et al. 2018). In addition to large-scale genomic projects, Illumina is also an important technology for gene-targeted applications, including barcoding (see Chapter 10 DNA barcoding), metabarcoding (see Chapter 11 Amplicon metabarcoding), and target capture (see Chapter 14 Target capture). This is because these methods include library preparations where using HTS methods such as Illumina sequencing offers major time advantages in comparison to Sanger sequencing (Head et al. 2014).

Two limitations to consider with Illumina sequencing however are that the produced reads are relatively short (50 to 300 bp), and similarly to Sanger sequencing, most applications require a PCR amplification step. However, PCR free library kits and protocols provide increasingly good results, and have the important advantage of reducing typical PCR-induced biases. Assembling whole genomes using short read Illumina methods, especially if they are highly repetitive, can be challenging (Kyriakidou et al. 2018). As well, the requirement for a PCR amplification step introduces the possibility for bias in mixed samples (i.e., DNA from different sources may be amplified to different degrees) (Aird et al. 2011). Nevertheless, its high throughput and accurate reads make Illumina the standard choice for sequencing amplicon libraries and genome resequencing. It is also the approach of choice for degraded herbarium material where long-read platforms would bring no benefits. In addition, Illumina’s MiSeq and Miniseq instruments offer desktop solutions that produce long reads in comparison to other illumina platforms (up to 300 bp) with integrated software for data analysis (Twyford 2016). Thus, for projects more focused on targeted gene sequencing (for example amplicon barcoding, metabarcoding, and target enrichment) this platform offers an in-house integrated solution with rapid turnaround times (Ravi et al. 2018). Scaling the sequencing needs of your project to the sequencing platform is essential to avoid obtaining either too little or too much data. Many metabarcoding projects require relatively little reads to obtain all available OTUs with high numbers of reads per cluster, so costs could be optimised by combining libraries with projects that have higher output demands. Multiplexing different samples can be an efficient – and in most cases essential – way to optimise the output from a sequencing run. Samples for metabarcoding require relatively few little reads, whereas target-capture, genome skimming or deep sequencing require more sequencing depth for their applications. Care needs to be taken when multiplexing samples of different fragments lengths, such as for example metabarcoding amplicons of 150 and 300 bp, as the shorter fragments will be preferentially sequenced and thus dominate results. The same applies to normalisation of individual samples in a mixed library, where combining samples with expected low concentrations or highly degraded DNA templates will yield fewer reads when mixed with high concentration DNA. As a rule combining old and new DNA in the same library is avoided, as well as long and short amplicons. Separating these samples will necessitate extra sequencing runs, but yield the best results. Finally, it is worth mentioning that some of the super high-throughput platforms like the Novaseq S4 yield more data than necessary for many applications, or at least are hard to fill with sufficient multiplexed samples to make it worth the compounded risk of combining a large number of samples.

Table 1.

Current examples of Illumina sequencing platforms, specifications, and suitability for different applications in plant identification.

Illumina sequencing platform MiSeq HiSeq 2500* HiSeq 3000* HiSeq 4000* NextSeq 1000 and 2000 NovaSeq 6000
Maximum read length (pair ended) 2 x 300 2 x 250 2 x 150 2 x 150 2 x 150 2 x 250
Maximum reads per run (single reads) 25 million 600 million 2.5 billion 5 billion 1.1 billion 20 billion
Flow Cell output 15 Gb 300 Gb 750 Gb 1.5 Tb 330 Gb 6 Tb
Method suitability
Metabarcoding +++ +++ + + + ++
Target Capture + + + +++ + +++
Shotgun sequencing + ++ +++ +++ ++ +++
Genome skimming + ++ +++ +++ ++ +++
Organellar sequencing (plastids) + ++ +++ +++ ++ ++
Transcriptomics:gene targeted +++ +++ + + + ++
Transcriptomics: + + ++ ++ ++ +++
total RNA/mRNA seq

Pacific Biosciences

Pacific Biosciences (PacBio) sequencing is based on single molecule real time (SMRT) technologies for reading DNA and RNA sequences. No PCR amplification is required, which for certain applications can be advantageous. This includes if PCR inhibitors are/may be present, the sequence is GC rich, or if PCR bias should be avoided. Additionally, PacBio reads are considerably longer than in either Sanger or Illumina sequencing (up to 25 kb) ((Pacific Biosciences, n.d.). This reduces computational challenges related to assembling contigs into full sequences. PacBio is considered a third generation sequencing technology, as it reads the nucleotide sequence both in real-time and at the single molecule level (Amarasinghe et al. 2020).

Similarly to Illumina and Sanger sequencing, PacBio also uses fluorescently-labelled dNTPs for determining a target DNA sequence. PacBio however employs a technology called zero mode waveguides (ZMW) to read nucleotide sequences at the single molecule level. ZMWs are nanosized wells that can be etched into different materials, with attoliter (10-21 L) volumes. ZMW technology differentiates a fluorescent molecule that is floating in solution from a fluorescently-labelled nucleotide that is located at the bottom of the well. A single DNA polymerase is tethered to the bottom of each well, and when a fluorescently-labelled dNTP is incorporated into the growing DNA strand, the fluorescent label is cleaved off. There is a unique fluorescent marker for each of the 4 nucleotides, and each cleavage event is read and directly linked to a specific nucleotide (van Dijk et al. 2018). Additionally, the rate of addition can be used to infer whether the target DNA is modified (i.e., post-translationally phosphorylated or methylated), since a modified DNA strand moves more slowly through the DNA polymerase, resulting in a reduced incorporation rate for a fluorescent nucleotide. This information is extremely powerful for predicting epigenetic modifications that are critical for a variety of biological functions. In addition, chemical modifications that are often present in aDNA can also be detected, making PacBio a particularly useful technique for assessing aDNA damage (Flusberg et al. 2010) (See Chapter 8 aDNA from sediments).

While previously PacBio suffered from a high error rate in comparison to Illumina sequencing, this has been dramatically reduced by the introduction of circular consensus sequencing (CCS), also known as long high-fidelity (HiFi) reads (Eid et al. 2009). In circular consensus sequencing, the ends of a DNA strand are ligated together to circularise it. This DNA template strand is called a SMRTbell. This circularization allows for multiple reads of the same sequence (so long as the strand is not too long) through a DNA polymerase, dramatically reducing the error rate, and can provide read lengths up to 25 kB. In one recent study, large gene fragments (circa 40 kB) were read with up to 99.91% accuracy when CCS was combined with a carefully optimised protocol for the handling of DNA to reduce any fragmentation/nicks. (Wenger et al. 2019). In addition to CCS, continuous long read (CLR) techniques are especially useful for gene assemblies (Vollger et al. 2020). CLR lengths are approximately equivalent to the polymerase read length. The sequence is generated from a single continuous template from start to finish, thus emphasising the longest read possible (up to 175 kB for CLR vs. 25 kB for CCS), though the overall CLR accuracy is lower than with CLS reads (90% vs. 99% read accuracy).

Oxford Nanopore

Oxford Nanopore (or simply Nanopore) sequencing is also a third generation SMRT technology that is single-molecule based and measured in real time. Nanopore is unique from the other sequencing technologies discussed here in that no DNA polymerase is required, and no expensive chemically modified dNTPs are necessary for reading the target sequence. The system consists of an electrolytic solution and a nanosized, biologically-derived pore in an insulating solid (a material that does not conduct electricity). The biological nanopores used in this technology are derived from proteins that form pores in biological membranes that naturally function to allow for the passage of ions and biomolecules across the membrane. When an electric field is applied, ions in the electrolytic solution pass through the pore, resulting in a stable current that can be detected. When larger molecules pass through the pore, such as DNA strand, detectable disruptions in the current occur. With a DNA strand, sequences of 6–7 nucleotides move through the pore and the movement of these bases yield a changing detectable disruption. This disruption has a unique signature with a specific current change for a specific length of time that can be linked to each of the four individual nucleotides. From the current disruption pattern it is possible to deduce the sequence. As well, since it is the change in current through the pore that is detected, no other chemical markers are necessary (Jain et al. 2016; Kono and Arakawa 2019). This is an important advantage over the other technologies discussed, where the fluorescently-labelled nucleotides are expensive.

Nanopore technologies, with a read length up to 4 Mb, are rapidly becoming important due to their scalability and portability. The MinION sequencing platform (theoretical output up to 50 Gb/flow cell) is a portable and cost-effective option (87 g, available from $1000) that can be used in the field. Already, a number of excellent examples of biodiversity studies (and plant-based studies in particular) are available in the literature (Bethune et al. 2019; Maestri et al. 2019; Srivathsan et al. 2021). As well, the GridION (theoretical output up to 50 Gb/flow cell) and PromethION (theoretical output up to 290 Gb/flow cell) are both desktop-sized sequencing platforms for mid and high-throughput data generation and analysis for in-house sequencing projects. A historical drawback with Nanopore technologies is the high error rate, with a reported raw read error rate between 10 and 22% (Kono and Arakawa 2019; Krehenwinkel et al. 2019). One method to overcome this is rolling circle amplification (RCA). For sequencing experiments, a linear single-stranded DNA molecule is firstly circularised and then copied multiple times as a single sequence (Johne et al. 2009). Thus, the same DNA sequence may be read multiple times using Nanopore technologies, with resulting read accuracies as high as 99.3% (Baloğlu et al. 2020). RCA combined with neural network and machine learning approaches such as Guppy, Bonito, Sacall, SquiggleNet, DeepNano-blitz, can raise base calling accuracies even further (Wick et al. 2019; Bao et al. 2021; Boža et al. 2020; Huang et al. 2022; (Vereecke et al. 2020).

Chapter 9: Box 1. Library preparation - tips and considerations

Library preparations are essential for all experiments involving HTS. General points to consider are discussed here and we also refer to Chapter 12 Metagenomics and Chapter 15 Transcriptomics for more details.

  1. DNA fragmentation. Short-read Illumina sequencing requires target DNA in the correct size range (50–600 bp, depending on the specific platform). High molecular weight DNA can be sheared either with ultrasonication (e.g., Covaris platforms) or (more economically) with library preparation kits that incorporate a fragmentase enzyme. However, fragmentase activity is highly dependent on genome organisation, and may require optimisation for each analysed species. Different fragment lengths may be considered if the desired study requires long-read sequencing.
  2. Input DNA quality assessment. After DNA extraction using a tissue-specific protocol (see section 1 of this book), the quality of the target DNA needs to be assessed. Fragment length distribution is an important consideration to produce libraries with even DNA fragment size distributions. Fresh or silica-dried plant material may yield high molecular weight DNA which can be checked visually using agarose gel electrophoresis, but DNA isolated from herbarium material is often highly fragmented and this requires careful inspection to decide on the optimal fragmentation protocol (see below) (Chapter 1 DNA from plant tissue). Problematic samples can be assessed with a high-precision automated electrophoresis tool (e.g., Agilent TapeStation, Bioanalyzer or Fragment Analyzer).
  3. Library preparation. The library preparation protocols depend on the sequencing platform being used as well as the sort of experiment being performed (e.g., metabarcoding, target capture, or metagenomics) For Illumina platforms, dual-indexed libraries can be generated with Illumina TruSeq (Illumina), third-party kits such as NEBNext Ultra II, or non-kit based protocols (Meyer and Kircher 2010; Troll et al. 2019), including protocols for degraded DNA (Troll et al. 2019). It is often possible to use half-volume reactions with these kits to reduce the per-sample cost without significant yield loss. Steps involved in library preparation often involve the trimming of fragment termini (necessary for target capture), ligation of adaptor sequences, optimization of the library fragment size, and the addition of unique index sequences through PCR amplification using multiplex primers. This can be done using single index sequences on one side of the fragment (up to 12 samples) or using dual index sequences where different index sequences are added to each side of the fragment. This last step is also important for bringing the library concentration up to an acceptable level again, as the number of DNA fragments in the library is diminished significantly during size selection.

Ion Torrent

Unlike in other forms of sequencing, Ion Torrent technologies are not based upon optical outputs, but rather on changes in pH. When a DNA polymerase adds a nucleotide to a growing DNA strand, a proton is released upon each addition. It is this release of protons into solution, and the resulting change in the pH of the solution, that is detected in Ion Torrent technologies (Rothberg et al. 2011; Slatko et al. 2018).

Similarly to Illumina sequencing, the target DNA is initially fragmented (200–600 bps) and PCR-amplified with adaptors that can be tethered to micro-machined wells on a semiconductor chip. The plates are then flooded with one of the 4 nucleotides. If a nucleotide is added across from the complementary base in the single-stranded DNA by the DNA polymerase, it results in the release of a proton and a subsequent change in solution pH. This shift in solution pH is detected by an ion-sensitive field-effect transistor (ISEFT), which can detect changes in proton concentration. This is done in a massively parallel fashion, with 1000s of microwell plates being used simultaneously. The pH change that results from the addition of multiple nucleotides in a repetitive sequence is also detectable using this technology, as the addition of two nucleotides will result in double the voltage change as the addition of a single nucleotide. The data output with Ion Torrent technologies can provide an approximate readout of 10 MBb in a single run with conventional machines, and up to 10 GBb with the newest models. The platform however struggles with base calling of homopolymers, and for these sequences it can be a challenge to obtain accurate reads.

The Ion Torrent machine and sequencing chips are relatively inexpensive compared to Illumina and PacBio, and this made it popular in smaller labs without access to high throughput sequencing core facility sequencing, though its use is no longer as common.

Which sequencing platform?

The sequencing platform that is ultimately chosen by a scientist depends on a number of factors. This can include (but is not limited to) the scientific question being considered, the quality of target DNA (see Chapter 1 DNA from plant tissue), costs, as well as in-house expertise and/or availability of existing platforms. In all cases, however, the quality and sequencing depth of target DNA should be considered. For DNA that is primarily expected to exist in shorter sequences (i.e., samples that are expected to be degraded from herbarium or ancient sources), then technologies requiring long reads are often not necessary, and Illumina sequencing or Ion Torrent technologies may be sufficient. If however one wishes to avoid any PCR bias or acquire long reads, then using PacBio or Nanopore is advisable. Finally, it may even be useful to use two different types of sequencing to overcome each technology’s respective limitations. For example, in whole genome sequencing, hybrid methods combining Illumina with PacBio are commonly used to ensure long reads and high accuracy.

Table 2.

Sequencing platform choices for different experimental questions and sample types.

Experiment or sample considerations Recommended method(s) Comments
Whole genome or organellar sequencing project (genome skimming, genome resequencing, de novo genome assembly) Illumina, PacBio, or a combination of both Illumina is the method of choice for resequencing for high throughput short read projects due to its high read accuracy
Barcoding Sanger sequencing or PacBio CCS Larger projects are moving to PacBio CCS to reduce costs. Multiplexing very large numbers of samples is necessary to optimise costs
Metabarcoding/Target capture Illumina, MGI, DNBSEQ, or Ion Torrent PacBio and/or Nanopore may also be considered if the sequence is expected to be highly repetitive
Heavily degraded samples (i.e., herbarium or ancient DNA samples) Illumina (or Ion Torrent) PacBio may also be relevant for the study of post-genetic modifications often found in ancient DNA samples, or if dealing with hard-to-phase sequences
On-site sequencing Nanopore (MinION) Hi-C/3C-Seq/Capture-C (Illumina)
Figure 1.

Chapter 9 Infographic: Visual representation of the content of this chapter.


In the last decades, developments in sequencing platforms have primarily focused on increasing the throughput and accuracy of sequencing output, increasing the length of reads, and reducing costs. We can expect the field to continue developing further in this direction, with a focus in particular on the miniaturisation of these platforms for more on-site work, as well as better automation and integration of analytical software and data analysis pipelines. In particular, miniaturisation and automatization of data analysis can be expected to have major impacts in regulatory fields related to both food safety and trade, where the ability for non-specialists to rapidly test on-site for the presence/absence of species will be extremely useful (see Chapter 22 Healthcare and Chapter 23 Food safety). Further development of HTS technologies to be used at the single-cell level and in functional studies can be also expected.


  1. What method(s) are most commonly used for whole genome projects of plants and why? Which sequencing method(s) are most currently most commonly used for library preparations and why?
  2. In a scenario where you may want to include amplicons and primers of different length when creating a library for sequencing, how would you adapt your library setup? How would it affect your sequencing costs?
  3. Why do Nanopore and PacBio-based technologies provide longer reads than Illumina or Ion torrent-based technologies? Why are these longer reads especially useful for projects in plant identification?


Allopolyploid hybrids – A polyploid species with multiple sets of chromosomes that originate from different species. If the hybrid is derived from two diploid species, the resulting tetraploid is fertile. These allopolyploid hybrids may be at least partially reproductively isolated from the parent species from which they are derived, and allopolyploid speciation is the best known route to hybrid speciation in plants.

Bridge amplification – A method used in Illumina sequencing to create DNA clusters with 1000s of double-stranded copies of the target DNA in flow cells. After amplification and generation of these clusters is complete, the reverse strand is washed away and sequencing by synthesis takes place.

Capillary gel electrophoresis (CGE) – An analytical method for the separation of charged molecules. DNA is separated according to size with this technique, with only nanogram quantities necessary for the input. Single-base pair resolution can be achieved on fragments up to several hundred base pairs in length.

Circular consensus sequencing (CCS) – Developed by PacBio and also known as HiFi reads, involves the circulation of a target DNA strand by ligating the ends of the strand (called a SMRTbell). This SMRTbell can be read multiple times by a DNA polymerase, dramatically reducing the error rate in the generated sequence.

Electrolytic solution – An electrically conductive solution. This conductivity is often due to the presence of ions in solution (for example dissociated Na+ and Cl- ions), though non-ionic solutions can also be conductive.

Epigenetic modifications – Alterations in gene expression and cellular function without changes to the original DNA sequence. Three mechanisms for epigenetic modifications so far identified include DNA methylation, histone modification, and non-coding RNA (ncRNA)-associated gene silencing.

Insulating solid – A solid material that an electric current cannot pass through.

Ion-sensitive field-effect transistor (ISEFT) – A field effect transistor that can measure ion concentrations in solution. Changes in the H+ concentration result in a pH change in solution that results in changes in the current that is detected. This technology is used in Ion Torrent sequencing platforms to identify when a base pair is added to a growing DNA double strand and is the basis for identifying the target DNA sequence.

Phred quality scores – Scores to measure the confidence of the nucleobase identifications generated from DNA sequencing methods. They are widely accepted for assessing the quality of reads.

Rolling circular amplification (RCA) – Where a linear single-stranded DNA molecule is firstly circularised and then copied multiple times as a single sequence (Johne et al. 2009). With the nanopore platform, RCA allows the same DNA sequence to be read multiple times to give a higher read accuracy.

Single molecular real time sequencing (SMRT) – A term coined by PacBio to describe their sequencing technologies. In contrast to second generation sequencing methods, SMRT technologies possess single-molecule sensitivity and provide the sequence readout in real time, dramatically increasing the sensitivity and turnaround times for DNA sequencing.

Zero mode waveguide (ZMW) – Nanosized wells that can be etched into different materials, with attoliter (10-21 L) volumes. ZMW technology differentiates a fluorescent molecule that is floating in solution from a fluorescently-labelled nucleotide that is located at the bottom of the well. This technology is used by PacBio for the single-molecule detection of fluorescently-labelled nucleotides that are added to immobilised DNA at the bottom of these wells so that nucleotide incorporation can be detected in real time.


  • Aird D, Ross MG, Chen W-S, Danielsson M, Fennell T, Russ C, Jaffe DB, Nusbaum C, Gnirke A (2011) Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries. Genome Biol. 12, R18. doi: 10.1186/gb-2011-12-2-r18
  • Amarasinghe SL, Su S, Dong X, Zappia L, Ritchie ME, Gouil Q (2020) Opportunities and challenges in long-read sequencing data analysis. Genome Biol. 21, 30. doi: 10.1186/s13059-020-1935-5
  • Arabidopsis Genome Initiative (2000) Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408, 796–815. doi: 10.1038/35048692
  • Baloğlu B, Chen Z, Elbrecht V, Braukmann T, MacDonald S, Steinke D (2020) A workflow for accurate metabarcoding using nanopore MinION sequencing. BioRxiv. doi: 10.1101/2020.05.21.108852
  • Bethune K, Mariac C, Couderc M, Scarcelli N, Santoni S, Ardisson M, Martin J-F, Montúfar R, Klein V, Sabot F, Vigouroux Y, Couvreur TLP (2019) Long-fragment targeted capture for long-read sequencing of plastomes. Appl. Plant Sci. 7, e1243. doi: 10.1002/aps3.1243
  • Chen F, Dong W, Zhang J, Guo X, Chen J, Wang Z, Lin Z, Tang H, Zhang L (2018) The sequenced angiosperm genomes and genome databases. Front. Plant Sci. 9, 418. doi: 10.3389/fpls.2018.00418
  • Clark DP, Pazdernik NJ, McGehee MR (2018) Molecular Biology, 3rd ed. Academic Cell.
  • Edwards RA, Rodriguez-Brito B, Wegley L, Haynes M, Breitbart M, Peterson DM, Saar MO, Alexander S, Alexander EC, Rohwer F (2006) Using pyrosequencing to shed light on deep mine microbial ecology. BMC Genomics 7, 57. doi: 10.1186/1471-2164-7-57
  • Eid J, Fehr A, Gray J, Luong K, Lyle J, Otto G, Peluso P, Rank D, Baybayan P, Bettman B, Bibillo A, Bjornson K, Chaudhuri B, Christians F, Cicero R, Clark S, Dalal R, Dewinter A, Dixon J, Foquet M, Turner S (2009) Real-time DNA sequencing from single polymerase molecules. Science 323, 133–138. doi: 10.1126/science.1162986
  • Ewing B, Green P (1998) Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res. 8, 186–194.
  • Ewing B, Hillier L, Wendl MC, Green P (1998) Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res. 8, 175–185. doi: 10.1101/gr.8.3.175
  • Flusberg BA, Webster DR, Lee JH, Travers KJ, Olivares EC, Clark TA, Korlach J, Turner SW (2010) Direct detection of DNA methylation during single-molecule, real-time sequencing. Nat. Methods 7, 461–465. doi: 10.1038/nmeth.1459
  • Hagemann IS (2015) Overview of technical aspects and chemistries of next-generation sequencing, in: Clinical Genomics. Elsevier, pp. 3–19. doi: 10.1016/B978-0-12-404748-8.00001-0
  • Head SR, Komori HK, LaMere SA, Whisenant T, Van Nieuwerburgh F, Salomon DR, Ordoukhanian P (2014) Library construction for next-generation sequencing: overviews and challenges. BioTechniques 56, 61–4, 66, 68, passim. doi: 10.2144/000114133
  • Heather JM, Chain B (2016) The sequence of sequencers: The history of sequencing DNA. Genomics 107, 1–8. doi: 10.1016/j.ygeno.2015.11.003
  • Hughes KW, Petersen RH, Lodge DJ, Bergemann SE, Baumgartner K, Tulloss RE, Lickey E, Cifuentes J (2013) Evolutionary consequences of putative intra-and interspecific hybridization in agaric fungi. Mycologia 105, 1577–1594. doi: 10.3852/13-041
  • Inglis PW, Pappas M de CR, Resende LV, Grattapaglia D (2018) Fast and inexpensive protocols for consistent extraction of high quality DNA and RNA from challenging plant and fungal samples for high-throughput SNP genotyping and sequencing applications. PLoS ONE 13, e0206085. doi: 10.1371/journal.pone.0206085
  • Jain M, Olsen HE, Paten B, Akeson M (2016) The Oxford Nanopore MinION: delivery of nanopore sequencing to the genomics community. Genome Biol. 17, 239. doi: 10.1186/s13059-016-1103-0
  • Johne R, Müller H, Rector A, van Ranst M, Stevens H (2009) Rolling-circle amplification of viral DNA genomes using phi29 polymerase. Trends Microbiol. 17, 205–211. doi: 10.1016/j.tim.2009.02.004
  • Kono N, Arakawa K (2019) Nanopore sequencing: Review of potential applications in functional genomics. Dev. Growth Differ. 61, 316–326. doi: 10.1111/dgd.12608
  • Krehenwinkel H, Pomerantz A, Henderson JB, Kennedy SR, Lim JY, Swamy V, Shoobridge JD, Graham N, Patel NH, Gillespie RG, Prost S (2019) Nanopore sequencing of long ribosomal DNA amplicons enables portable and simple biodiversity assessments with high phylogenetic resolution across broad taxonomic scale. Gigascience 8. doi: 10.1093/gigascience/giz006
  • Kyriakidou M, Tai HH, Anglin NL, Ellis D, Strömvik MV (2018) Current strategies of polyploid plant genome sequence assembly. Front. Plant Sci. 9, 1660. doi: 10.3389/fpls.2018.01660
  • Maestri S, Cosentino E, Paterno M, Freitag H, Garces JM, Marcolungo L, Alfano M, Njunjić I, Schilthuizen M, Slik F, Menegon M, Rossato M, Delledonne M (2019) A Rrapid and accurate MinION-based workflow for tracking species biodiversity in the field. Genes (Basel) 10. doi: 10.3390/genes10060468
  • McGinn S, Gut IG (2013) DNA sequencing - spanning the generations. N. Biotechnol. 30, 366–372. doi: 10.1016/j.nbt.2012.11.012
  • Meyer M, Kircher M (2010) Illumina sequencing library preparation for highly multiplexed target capture and sequencing. Cold Spring Harb. Protoc. 2010, pdb.prot5448. doi: 10.1101/pdb.prot5448
  • Michael TP, Jupe F, Bemm F, Motley ST, Sandoval JP, Lanz C, Loudet O, Weigel D, Ecker JR (2018) High contiguity Arabidopsis thaliana genome assembly with a single nanopore flow cell. Nat. Commun. 9, 541. doi: 10.1038/s41467-018-03016-2
  • Pacific Biosciences (n.d.) SMRT SCIENCE SMRT SEQUENCING [WWW Document]. URL (accessed 3.22.21).
  • Ravi RK, Walton K, Khosroheidari M (2018) Miseq: A next generation sequencing platform for genomic analysis. Methods Mol. Biol. 1706, 223–232. doi: 10.1007/978-1-4939-7471-9_12
  • Rothberg JM, Hinz W, Rearick TM, Schultz J, Mileski W, Davey M, Leamon JH, Johnson K, Milgrew MJ, Edwards M, Hoon J, Simons JF, Marran D, Myers JW, Davidson JF, Branting A, Nobile JR, Puc BP, Light D, Clark TA, Bustillo J (2011) An integrated semiconductor device enabling non-optical genome sequencing. Nature 475, 348–352. doi: 10.1038/nature10242
  • Slatko BE, Gardner AF, Ausubel FM (2018) Overview of Next-Generation Sequencing Technologies. Curr. Protoc. Mol. Biol. 122, e59. doi: 10.1002/cpmb.59
  • Srivathsan A, Lee L, Katoh K, Hartop E, Kutty SN, Wong J, Yeo D, Meier R (2021) MinION barcodes: biodiversity discovery and identification by everyone, for everyone. BioRxiv. doi: 10.1101/2021.03.09.434692
  • Thomas T, Gilbert J, Meyer F (2012) Metagenomics - a guide from sampling to data analysis. Microb. Inform. Exp. 2, 3. doi: 10.1186/2042-5783-2-3
  • Troll CJ, Kapp J, Rao V, Harkins KM, Cole C, Naughton C, Morgan JM, Shapiro B, Green RE (2019) A ligation-based single-stranded library preparation method to analyze cell-free DNA and synthetic oligos. BMC Genomics 20, 1023. doi: 10.1186/s12864-019-6355-0
  • Twyford AD (2016) Will Benchtop Sequencers Resolve the Sequencing Trade-off in Plant Genetics? Front. Plant Sci. 7, 433. doi: 10.3389/fpls.2016.00433
  • University of Michigan, Biomedical Research Core Facilities (n.d.) Interpretation of Sequencing Chromatograms [WWW Document]. Interpretation of Sequencing Chromatograms. URL (accessed 3.15.21).
  • van Dijk EL, Jaszczyszyn Y, Naquin D, Thermes C (2018) The third revolution in sequencing technology. Trends Genet. 34, 666–681. doi: 10.1016/j.tig.2018.05.008
  • Vereecke N, Bokma J, Haesebrouck F, Nauwynck H, Boyen F, Pardon B, Theuns S (2020) High quality genome assemblies of Mycoplasma bovis using a taxon-specific Bonito basecaller for MinION and Flongle long-read nanopore sequencing. BMC Bioinformatics 21, 517. doi: 10.1186/s12859-020-03856-0
  • Vollger MR, Logsdon GA, Audano PA, Sulovari A, Porubsky D, Peluso P, Wenger AM, Concepcion GT, Kronenberg ZN, Munson KM, Baker C, Sanders AD, Spierings DCJ, Lansdorp PM, Surti U, Hunkapiller MW, Eichler EE (2020) Improved assembly and variant detection of a haploid human genome using single-molecule, high-fidelity long reads. Ann. Hum. Genet. 84, 125–140. doi: 10.1111/ahg.12364
  • Wenger AM, Peluso P, Rowell WJ, Chang P-C, Hall RJ, Concepcion GT, Ebler J, Fungtammasan A, Kolesnikov A, Olson ND, Töpfer A, Alonge M, Mahmoud M, Qian Y, Chin C-S, Phillippy AM, Schatz MC, Myers G, DePristo MA, Ruan J, Hunkapiller MW (2019) Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nat. Biotechnol. 37, 1155–1162. doi: 10.1038/s41587-019-0217-9


  1. For whole genome projects, PacBio and Nanopore technologies are the most commonly used technologies. This is because both are long read technologies, which reduces the bioinformatic challenges related to assembling 1000s of short contigs together for assembling a whole genome. For library preparations, Illumina platforms are still the most commonly used due to their relatively competitive costs, high accuracy, and available support from a range of analysis tools and pipelines.
  2. It is important to create equimolar pools so that the total number of DNA molecules is normalised across a library, so that one result does not dominate the others. However, even after normalisation of concentrations, it may still be the case that amplicons of very different length will not be amplified with the same efficiency. Additionally, using primers of roughly the same length so that their annealing temperatures are approximately the same is also important in order to avoid PCR bias within the same library. Thus, in a scenario with amplicons and/or primers of very different length, it is often best to put those amplicons in separate libraries. However, when amplicons and primers are of reasonably similar size, pooling the library samples can be an effective method to reduce sequencing costs.
  3. Nanopore and PacBio based technologies provide longer reads than Illumina or Ion torrent based technologies since both have platforms available that do not require for a sample to be fragmented. These long reads can be especially useful for projects in plant identification when working with sequences that are particularly repetitive or have very large genomes. Additionally, PacBio technologies can also be used when longer amplicons are required for the phasing of haplotypes for instance or when tracing polyploid ancestry.

Chapter 10 DNA barcoding

DNA barcoding

The method of identifying living organisms to species level using DNA sequences has been coined DNA barcoding (Hebert et al. 2003). It makes use of short (< 1000 bp), agreed-upon regions of the genome (a ‘barcode’) that evolve quickly enough to differ among closely related species (Kress et al. 2005). A generated barcode sequence from a sample allows for identification by matching the sequence against a reference library of sequences. Reference sequence libraries comprise sequences generated from vouchered and expert-identified materials in natural history collections, which are available through public sequence repositories or tailored databases. In other words, DNA barcodes function as molecular identifiers for individual species, in the same way as machine-readable black-and-white barcodes are used in the retail industry to identify products (Veldman et al. 2014).

DNA-based typing for species identification focused first on microbial organisms (Olive and Bean 1999). DNA barcoding as a concept distinct from DNA-based typing or phylogenetic analysis of taxon accessions was popularised by Hebert et al. (2003), who proposed to use the mitochondrial gene CO1 as the standard barcode for all animals. Despite initial scepticism (Rubinoff et al. 2006; Will and Rubinoff 2004), DNA barcoding was readily embraced by the scientific community. Assessments have since shown that CO1 can be used to distinguish over 90% of species in many animal groups: among these spiders (Barrett and Hebert 2005), birds (Hebert et al. 2004b), amphibians (Smith et al. 2008), and butterflies (Burns et al. 2008).

In recent years, the barcoding movement has grown substantially, and worldwide efforts coordinated by the Consortium for the Barcode of Life (CBOL) are now being focused on barcoding all organisms (Hobern and Hebert 2019; Hobern 2020). The amount of sequencing data derived from DNA barcoding is exponentially increasing, and it is now considered a mainstream taxonomic tool. Although DNA barcoding does not replace the need for traditional taxonomy, it does highlight the need for robust species descriptions to enable accurate identification of species from “orphan” barcodes (sequences from unnamed species). Integrative taxonomy, which is achieved by combining evidence from morphology, ecology, phylogenetics and DNA barcoding, is critically important to speed up species discovery in the light of biodiversity loss (Padial et al. 2010; Schlick-Steiner et al. 2010).

DNA barcoding and species delimitation

Species delimitation is a central tenet of taxonomy (see Chapter 17 Species delimitation). Traditionally, species were identified, described and classified based mainly on their morphological characters. This is more difficult when it comes to cryptic, hybridising or highly convergent species (Struck et al. 2018). Combining characters, such as molecular data and behaviour, can provide further confidence when attempting to distinguish between species (Schlick-Steiner et al. 2010). However, species delimitation remains fundamentally difficult due to the fact that it is unclear how a species should be defined (de Queiroz 2007). The assumption that species are fixed entities underpins every international agreement on biodiversity conservation, all national environmental legislation and the efforts of many individuals and organisations to safeguard plants and animals (Garnett and Christidis 2017). However, one of the major unresolved questions in science is ‘What is a species?’, even though this is one of the most important concepts in biology (Kennedy and Norman 2005). Species concepts differ, and a single definition that fits all organisms has not been found (de Queiroz 2007).

Most species concepts agree on species being evolving metapopulations (de Queiroz 2007), and this implies that genetic variation exists both within and between species. Advanced approaches using many accessions as well as many loci, such as species delimitation based on multispecies coalescent theory, can enhance species identification resolution. However, more data also adds new challenges, and inferred structure due to population-level processes and that due to species boundaries are hard to distinguish (Sukumaran and Knowles 2017). Initial studies on DNA barcoding suggested a significant barcoding ‘gap’ between intra- and interspecific variation (Barrett and Hebert 2005; Hebert et al. 2004a, 2003), but these studies have been criticised for undersampling both intraspecific and interspecific divergence (Meyer and Paulay 2005). A DNA barcoding reference database for identification that would include all species references should also contain multiple accessions of populations to ensure that intraspecific and interspecific variation can be distinguished In absence of this ideal situation, many studies use more or less arbitrary cut-off percentages for sequence divergence (Blaxter et al. 2005; Ghorbani et al. 2017a; Veldman et al. 2017). Species assignments in DNA barcoding are hypotheses similar to species assignments based on morphology.

To identify an unknown DNA barcode using a reference library, one can use several approaches to look at the interrelatedness of the samples (see Chapter 18 Sequence to species). Many databases including GenBank and BOLD (Ratnasingham and Hebert 2007) make use of the similarity based BLAST (Altschul et al. 1990). Similarity based on genetic distance can be used as above and for phylogenetic tree reconstruction (Hebert et al. 2003). Disadvantages of using distance based information are 1) these do not yield diagnostic characters for species distinction (DeSalle et al. 2005); 2) similarity scores do not always give the nearest neighbour as the closest relative (Koski and Golding 2001); and 3) a lack of an objective set of criteria to delineate taxa when using distances (Goldstein et al. 2000). Other common approaches are based on characters instead of distances, and these rely on phylogenetic methods using maximum likelihood (Felsenstein 1973), parsimony (Nixon 1999), Bayesian statistics (Huelsenbeck and Ronquist 2001), or multispecies coalescent methods (Yang and Rannala 2017). These phylogenetic methods are implemented in RAxML (Stamatakis 2006; Swofford 2002), PAUP* (Swofford 2002), MrBayes (Huelsenbeck and Ronquist 2001) and BPP (Yang 2015), respectively. Character-based tree building overcomes many of the shortcomings of distance-based results, but tree-based methods have limitations if single gene trees are used to infer phylogenetic relationships. Another limitation of tree building for species identification is that evolution at the species level is not hierarchical. Applying hierarchical methods and terms, such as trees, classification and monophyly for delimitation of species, is not reflective of the evolutionary history of individuals and populations within a species (DeSalle et al. 2005). Veldman et al. (2014) summarised several suggestions to minimise the effect of these drawbacks, such as a diagnostic system including other lines of evidence (DeSalle et al. 2005), a probabilistic modelling approach (Knowles and Carstens 2007), the use of dominant and codominant multi-locus markers (Hausdorf and Hennig 2010) and new heuristic methods without fixed species assignments (O’Meara 2010).

DNA barcoding for plants

The mitochondrial genome in plants evolves far too slowly to allow it to distinguish between species (Cho et al. 2004). Phylogenetic studies of plants focused early on plastid markers as well as the ribosomal DNA (Palmer et al. 1988; Ritland and Clegg 1987). In the search for alternatives to the popular animal marker COI, various genes and non-coding regions in the plastid genome were proposed (CBOL Plant Working Group 2009; Fazekas et al. 2008, 2009; Hollingsworth 2011; Kress and Erickson 2007; Kress et al. 2005). In its most basic definition, a barcode must differ between species so that species can be identified. However, a barcode should not differ much within species, and not be too different between species within the same genus or family because this would make it more difficult to assign a unknown to a group with confidence.

The plastid marker rbcL was for example good to infer relationships between angiosperm families (Soltis et al. 1999) but varies too little for species discrimination in many plant genera (China Plant BOL Group et al. 2011). In addition to being sufficiently rapidly evolving, a barcode must also be flanked by conserved regions of the genome that can function as universal amplification primer binding sites. A single primer pair that would amplify any of over 350,000 species of plants would be ideal (Kress et al. 2005). The plastid coding region matK for example has variation between species, but it can be difficult to amplify universally (de Boer et al. 2014; Kool et al. 2012) (de Boer et al. 2014; Piredda et al. 2011; Sass et al. 2007). Insufficient primer universality makes this marker difficult to use in large-scale studies across families, although using target-group specific-primers for amplification is often successful (Mahadani and Ghosh 2013; Palhares et al. 2015; Purushothaman et al. 2014; Wallace et al. 2012). Other considerations can also affect barcode marker choice.

The nuclear ribosomal marker ITS, and specifically nrITS2, is used commonly in barcoding and metabarcoding studies (China Plant BOL Group et al. 2011; Ivanova et al. 2016; Raclariu et al. 2017, 2018). nrITS2 has been long advocated as a secondary marker to plastid barcodes (China Plant BOL Group et al. 2011). However, nrDNA has limitations for phylogenetic inference that also apply to barcoding (Álvarez and Wendel 2003), including alignment difficulties and limited utility in phylogenetic inference between closely related and/or recently diverged taxa (Manzanilla et al. 2018). It is also a challenge to determine the orthology and the paralogy of nrDNA sequences in the case of hybridization events or incomplete lineage sorting (Bailey 2003; Fehrer et al. 2007; Soltis and Kuzoff 1995). nrITS is also present in multiple copies, and these copies can belong to different parental lineages in hybrids, and PCR amplification success of these copies is unrelated to whether these copies are functionally transcribed or not, which in turn has an influence on the substitution rate of these sequences (Kool et al. 2012). Bailey et al. (2003) emphasised that especially for allopolyploids nrDNA might not be the optimal choice to assess species trees, which applies equally well to species assignment in DNA barcoding studies.

The strict requirements for both universality and high variability for potential universal barcodes has led some to label DNA barcoding a “search for the Holy Grail” (Rubinoff et al. 2006). Since there is still no single plant barcoding locus combining variability and universality, the current consensus is that a combination of two or more markers should be used for standard barcoding applications (CBOL Plant Working Group 2009; China Plant BOL Group et al. 2011; Hollingsworth 2011). Thus, where the animal community is entirely focused on using standardized and defined markers for species discrimination, the plant community has a looser vision for DNA-based identification of plants with tailored solutions based on their study objective. Plant DNA-based identification incorporates plastid, nuclear ribosomal and nuclear sequence data ranging from barcodes through plastomes, genome skimming and target capture to whole genomes (Bohmann et al. 2020; Coissac et al. 2016; Hollingsworth et al. 2016; Manzanilla et al. 2018).

Hands-on plant DNA barcoding

The core plant DNA barcoding markers are rbcL and matK (CBOL Plant Working Group 2009). nrITS (or nrITS2 only) is the third most commonly used barcode (China Plant BOL Group et al. 2011; Hollingsworth 2011; Kress et al. 2005). The trnL-F spacer, psbA-trnH, and rpoC1 (Ghorbani et al. 2017b; Kool et al. 2012; Kress et al. 2005) are also reported in literature, and the trnL P6 loop is the standard barcode for plant metabarcoding studies (Taberlet et al. 2012, 2007).

Table 1.

The most commonly used primers for plant DNA barcoding.*

Barcode Primer Sequence (5’-3’) Dir. Reference
rbcLa rbcLa_f ATGTCACCACAAACAGAGACTAAAGC F Levin et al. (2003)
rbcLa_rev GTAAAATCAAGTCCACCRCG R Kress et al. (2009)
matK matk-3F CGTACAGTACTTTTGTGTTTACGAG F CBOL Plant Working Group (2009)
nrITS ITS5a CCTTATCATTTAGAGGAAGGAG F Wurdack in Stanford et al. (2000)
trnL P6 trnL-g GGGCAATCCTGAGCCAA F Taberlet et al. (2007)
trnL-h CCATTGAGTCTCTGCACCTATC R Taberlet et al. (2007)
psbA-trnH psbA GTTATGCATGAACGTAATGCTC F Sang et al. (1995)

When choosing appropriate markers for a plant DNA barcoding study it is important to consider the following questions:

What is the necessary taxonomic level of identification? For composition studies of a flora or vegetation, genus-level identifications are often sufficient. Species-level identification can however be important for other questions. Identifying all angiosperms in Greenland is more straightforward than in a Neotropical rainforest. Also, although family-level identifications in Greenland provide useful insights into the local flora, this information most often does not have meaningful applications in rainforests. After deciding on the appropriate level of identification, the researcher then needs to determine whether multiple markers are necessary to ensure that all species can be distinguished.

What kind of a reference library will you use to identify the target barcodes? Query identification in a database that contains all plants is more challenging than with a tailored reference library. For example, identifying a sequence of Oxalis (Oxalidaceae) is easy in a database of Scandinavian plant sequences because there is only a single native Oxalis species. Any queried Oxalis sequence would match the Scandinavian Oxalis acetosella because it would be the only reference Oxalis sequence in a local database. In contrast, a database with South American Oxalis species has hundreds of taxa, and identification requires a marker with sufficient variation to discriminate between these species. Thus, for Scandinavia, one could use a marker with limited variation but universal primers, whereas for South America a specific marker or markers should be sought that can distinguish all Oxalis species present in a global database. It is therefore critical to pick your marker(s) based on the expected diversity in your reference library.

What is your source of reference sequences? If you want to identify species, which is common in studies aiming to authenticate herbal drugs and supplements, you need to include all putative species in your reference library. For example, if your goal is to identify a European wild collected Hypericum, your reference library should ideally include all European Hypericum species that could be confused or substituted for Hypericum perforatum. A reference library can be compiled from de novo sequenced amplicons from voucher accessions or from reference sequences mined from public repositories.

Figure 1. Chapter 10 Infographic

: DNA barcoding of plants encompasses two streams of data from organism to DNA, one for the query sequence that should be identified and one for the reference sequence that is part of the reference library for identification. DNA source, marker choice, primer choice, sequencing approach and identification strategy all influence the ability and resolution of identification.

After choosing one or several markers, it is important to consider the following:

Are universal primers available? If yes, this facilitates your project. However, are these primers really universal? Check this by seeing whether the study publishing the primers gets cited by relevant studies and look for larger studies and reviews that might provide more information about (1) amplification success with these primers; (2) ability to amplify from degraded or poor DNA extracts, a common challenge when working with older herbarium vouchers or processed herbal products; and (3) the need to tweak amplification protocols to make these primers work. If no universal primers are available, try to find studies using this marker and see which primers were used, to find suitable primers that you can then test. If possible, use studies targeting the same target order, family or genus. If there are no previously published primers for your marker, then it is necessary to design your own. If your primers target a widely used marker, then the primer performance that is assessed based on matching these novel primers to multiple sequence alignments of published data (in-silico testing) is generally reliable. If only genomic data is available, however, the accuracy of in-silico testing will be highly dependent on the relatedness of the reference genomes.

Do the primers amplify the right part of the marker? Primers can target fragments of longer loci, i.e., parts of rbcL, matK, nrITS. It is thus important that the segment the primers amplify is useful for your study. It should generate sequences that are identifiable in your reference library and variable enough for your intended level of identification. For example, targeting trnL intron with the universal g-h primers will yield short amplicons, and these have less variation than the entire trnL-F region. Make sure you reassess your marker choice after selecting suitable primers.

How many primers per marker will you use? Long markers can be hard to amplify from degraded templates and can be split up into multiple primer pairs. Degraded DNA is a common challenge when working with a common challenge when working with older herbarium vouchers or processed herbal products. Different combinations of forward and reverse primers can also increase the chance of successful amplification as having multiple different primers can increase the chance that one of these has a good fit to the organism being tested. However, the primer pair with the best fit and targeting the shorter marker will amplify more effectively than other pairs or longer fragments, and can lead to amplification bias.

Once a suitable combination of markers has been found and suitable primers or primer panels have been selected, it is important to test the primers on a sufficient number of your samples. Template DNA quality, DNA concentration, and the effects of inhibiting secondary metabolites can all influence the efficacy of the PCR and might require optimization to obtain the best possible results for the largest number of samples. This is beyond the scope of this book, but sufficient online resources are available to help you with optimization. In addition, there are many online discussion forums to troubleshoot PCR optimization.

The subsequent chapters in section 2 describe different sequencing platforms and approaches to obtain DNA sequences for downstream analysis, and section 3 provides an overview of applications of molecular identification of plants. Depending on whether one chooses standard DNA barcoding using Sanger sequencing, DNA metabarcoding using Ion Torrent, Illumina, or other platforms, or a variety of whole or reduced library representation genome sequencing approaches, one will need to choose different wet lab steps to create the relevant sequencing libraries. Check out the relevant chapter for your application to find out more.


  1. An author writes that she used DNA barcoding to identify Bellis perennis using rbcL. The generated query sequence matched 100% with the reference of Bellis perennis in GenBank. Can you think of two situations that would falsify this finding?
  2. You are planning to use DNA barcoding to distinguish herbal medicines based on Paeonia. In the literature you find that five Paeonia species are commonly used in herbal medicines and that these can be distinguished using nrITS2 sequences. In your study of 37 herbals, you find that 15 contain species A, 7 B, 5 C, 3 D, and 2 E, whereas five samples fail to amplify nrITS2. 2A) How can you be sure that these 37 products contain only these five species? 2B) What does your experiment tell you about the five samples that failed to amplify?
  3. You want to investigate if DNA barcoding can outperform morphology-based biodiversity assessments in terms of species identification. For what material do you expect DNA barcoding to be more useful than morphology-based identification?


matK – Plastid gene coding for maturase K. matK is one of the core plant DNA barcodes.

nrITS – Internal transcribed spacer (ITS) is a spacer situated between the small-subunit rDNA and large-subunit rDNA genes. In plants, it flanks the 18S and 26S rDNA genes. nrITS is split into two spacers, nrITS1 and nrITS2 with the 5.8S rDNA gene in between. nrITS is highly variable, and primers are designed in the conservated 18S, 5.8S, and 26S rDNA genes.

psbA-trnH – Plastid intergenic spacer region between the coding genes psbA and trnH. psbA-trnH has been advocated as a plant DNA barcoding marker.

Primer – Short DNA sequence used to amplify a marker.

rbcL – Plastid gene coding for ribulose-1,5-bisphosphate carboxylase-oxygenase. Most barcoding studies target the rbcLa region, but will refer to rbcL. rbcL is one of the core plant DNA barcodes. Plastids in plants are often incorrectly referred to as chloroplasts.


  • Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J. Mol. Biol. 215, 403–410. doi: 10.1016/S0022-2836(05)80360-2
  • Álvarez I, Wendel JF (2003) Ribosomal ITS sequences and plant phylogenetic inference. Mol. Phylogenet. Evol. 29, 417–434. doi: 10.1016/S1055-7903(03)00208-2
  • Bailey C (2003) Characterization of angiosperm nrDNA polymorphism, paralogy, and pseudogenes. Mol. Phylogenet. Evol. 29, 435–455. doi: 10.1016/j.ympev.2003.08.021
  • Barrett RDH, Hebert PDN (2005) Identifying spiders through DNA barcodes. Can. J. Zool. 83, 481–491. doi: 10.1139/z05-024
  • Blaxter M, Mann J, Chapman T, Thomas F, Whitton C, Floyd R, Abebe E (2005) Defining operational taxonomic units using DNA barcode data. Philos. Trans. R. Soc. Lond. B Biol. Sci. 360, 1935–1943. doi: 10.1098/rstb.2005.1725
  • Bohmann K, Mirarab S, Bafna V, Gilbert MTP (2020) Beyond DNA barcoding: The unrealized potential of genome skim data in sample identification. Mol. Ecol. 29, 2521–2534. doi: 10.1111/mec.15507
  • Burns JM, Janzen DH, Hajibabaei M, Hallwachs W, Hebert PDN (2008) DNA barcodes and cryptic species of skipper butterflies in the genus Perichares in Area de Conservacion Guanacaste, Costa Rica. Proc Natl Acad Sci USA 105, 6350–6355. doi: 10.1073/pnas.0712181105
  • CBOL Plant Working Group (2009) A DNA barcode for land plants. Proc Natl Acad Sci USA 106, 12794–12797. doi: 10.1073/pnas.0905845106
  • Chen S, Yao H, Han J, Liu C, Song J, Shi L, Zhu Y, Ma X, Gao T, Pang X, Luo K, Li Y, Li X, Jia X, Lin Y, Leon C (2010) Validation of the ITS2 region as a novel DNA barcode for identifying medicinal plant species. PLoS ONE 5, e8613. doi: 10.1371/journal.pone.0008613
  • China Plant BOL Group, Li D-Z, Gao L-M, Li H-T, Wang H, Ge X-J, Liu J-Q, Chen Z-D, Zhou S-L, Chen S-L, Yang J-B, Fu C-X, Zeng C-X, Yan H-F, Zhu Y-J, Sun Y-S, Chen S-Y, Zhao L, Wang K, Yang T, Duan G-W (2011) Comparative analysis of a large dataset indicates that internal transcribed spacer (ITS) should be incorporated into the core barcode for seed plants. Proc Natl Acad Sci USA 108, 19641–19646. doi: 10.1073/pnas.1104551108
  • Cho Y, Mower JP, Qiu Y, Palmer JD (2004) Mitochondrial substitution rates are extraordinarily elevated and variable in a genus of flowering plants. PNAS 101, 17741–17746.
  • Coissac E, Hollingsworth PM, Lavergne S, Taberlet P (2016) From barcodes to genomes: extending the concept of DNA barcoding. Mol. Ecol. 25, 1423–1428. doi: 10.1111/mec.13549
  • DeSalle R, Egan MG, Siddall M (2005) The unholy trinity: taxonomy, species delimitation and DNA barcoding. Philos. Trans. R. Soc. Lond. B Biol. Sci. 360, 1905–1916. doi: 10.1098/rstb.2005.1722
  • de Boer HJ, Ouarghidi A, Martin G, Abbad A, Kool A (2014) DNA barcoding reveals limited accuracy of identifications based on folk taxonomy. PLoS ONE 9, e84291. doi: 10.1371/journal.pone.0084291
  • de Queiroz K (2007) Species concepts and species delimitation. Syst. Biol. 56, 879–886. doi: 10.1080/10635150701701083
  • Fazekas AJ, Burgess KS, Kesanakurti PR, Graham SW, Newmaster SG, Husband BC, Percy DM, Hajibabaei M, Barrett SCH (2008) Multiple multilocus DNA barcodes from the plastid genome discriminate plant species equally well. PLoS ONE 3, e2802. doi: 10.1371/journal.pone.0002802
  • Fazekas AJ, Kesanakurti PR, Burgess KS, Percy DM, Graham SW, Barrett SCH, Newmaster SG, Hajibabaei M, Husband BC (2009) Are plant species inherently harder to discriminate than animal species using DNA barcoding markers? Mol. Ecol. Resour. 9 Suppl s1, 130–139. doi: 10.1111/j.1755-0998.2009.02652.x
  • Fehrer J, Gemeinholzer B, Chrtek J, Bräutigam S (2007) Incongruent plastid and nuclear DNA phylogenies reveal ancient intergeneric hybridization in Pilosella hawkweeds (Hieracium, Cichorieae, Asteraceae). Mol. Phylogenet. Evol. 42, 347–361. doi: 10.1016/j.ympev.2006.07.004
  • Felsenstein J (1973) Maximum likelihood and minimum-steps methods for estimating evolutionary trees from data on discrete characters. Syst. Zool. 22, 240. doi: 10.2307/2412304
  • Garnett ST, Christidis L (2017) Taxonomy anarchy hampers conservation. Nature 546, 25–27. doi: 10.1038/546025a
  • Ghorbani A, Gravendeel B, Selliah S, Zarré S, de Boer HJ (2017a) DNA barcoding of tuberous Orchidoideae: a resource for identification of orchids used in Salep. Molecular ecology resources 17, 342–352.
  • Ghorbani A, Saeedi Y, de Boer HJ (2017b) Unidentifiable by morphology: DNA barcoding of plant material in local markets in Iran. PLoS ONE 12, e0175722. doi: 10.1371/journal.pone.0175722
  • Goldstein PZ, Desalle R, Amato G, Vogler AP (2000) Conservation genetics at the species boundary. Conserv. Biol. 14, 120–131. doi: 10.1046/j.1523-1739.2000.98122.x
  • Hausdorf B, Hennig C (2010) Species delimitation using dominant and codominant multilocus markers. Syst. Biol. 59, 491–503. doi: 10.1093/sysbio/syq039
  • Hebert PDN, Cywinska A, Ball SL, deWaard JR (2003) Biological identifications through DNA barcodes. Proc. Biol. Sci. 270, 313–321. doi: 10.1098/rspb.2002.2218
  • Hebert PDN, Penton EH, Burns JM, Janzen DH, Hallwachs W (2004a) Ten species in one: DNA barcoding reveals cryptic species in the neotropical skipper butterfly Astraptes fulgerator. Proc Natl Acad Sci USA 101, 14812–14817. doi: 10.1073/pnas.0406166101
  • Hebert PDN, Stoeckle MY, Zemlak TS, Francis CM (2004b) Identification of birds through DNA Barcodes. PLoS Biol. 2, e312. doi: 10.1371/journal.pbio.0020312
  • Hobern D, Hebert P (2019) BIOSCAN - revealing eukaryote diversity, dynamics, and interactions. BISS 3. doi: 10.3897/biss.3.37333
  • Hobern D (2020) BIOSCAN: DNA barcoding to accelerate taxonomy and biogeography for conservation and sustainability. Genome 1–4. doi: 10.1139/gen-2020-0009
  • Hollingsworth PM, Li D-Z, van der Bank M, Twyford AD (2016) Telling plant species apart with DNA: from barcodes to genomes. Philos. Trans. R. Soc. Lond. B Biol. Sci. 371. doi: 10.1098/rstb.2015.0338
  • Hollingsworth PM (2011) Refining the DNA barcode for land plants. Proc Natl Acad Sci USA 108, 19451–19452. doi: 10.1073/pnas.1116812108
  • Huelsenbeck JP, Ronquist F (2001) MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics 17, 754–755. doi: 10.1093/bioinformatics/17.8.754
  • Ivanova NV, Kuzmina ML, Braukmann TWA, Borisenko AV, Zakharov EV (2016) Authentication of herbal supplements using next-generation sequencing. PLoS ONE 11, e0156426. doi: 10.1371/journal.pone.0156426
  • Kennedy D, Norman C (2005) What don’t we know? Science 309, 75. doi: 10.1126/science.309.5731.75
  • Knowles LL, Carstens BC (2007) Delimiting species without monophyletic gene trees. Syst. Biol. 56, 887–895. doi: 10.1080/10635150701701091
  • Kool A, de Boer HJ, Krüger A, Rydberg A, Abbad A, Björk L, Martin G (2012) Molecular identification of commercialized medicinal plants in southern Morocco. PLoS ONE 7, e39459. doi: 10.1371/journal.pone.0039459
  • Koski LB, Golding GB (2001) The closest BLAST hit is often not the nearest neighbor. J. Mol. Evol. 52, 540–542. doi: 10.1007/s002390010184
  • Kress WJ, Erickson DL, Jones FA, Swenson NG, Perez R, Sanjur O, Bermingham E (2009) Plant DNA barcodes and a community phylogeny of a tropical forest dynamics plot in Panama. Proc Natl Acad Sci USA 106, 18621–18626. doi: 10.1073/pnas.0909820106
  • Kress WJ, Erickson DL (2007) A two-locus global DNA barcode for land plants: the coding rbcL gene complements the non-coding trnH-psbA spacer region. PLoS ONE 2, e508. doi: 10.1371/journal.pone.0000508
  • Kress WJ, Wurdack KJ, Zimmer EA, Weigt LA, Janzen DH (2005) Use of DNA barcodes to identify flowering plants. Proc Natl Acad Sci USA 102, 8369–8374. doi: 10.1073/pnas.0503123102
  • Levin RA, Wagner WL, Hoch PC, Nepokroeff M, Pires JC, Zimmer EA, Sytsma KJ (2003) Family-level relationships of Onagraceae based on chloroplast rbcL and ndhF data. Am. J. Bot. 90, 107–115. doi: 10.3732/ajb.90.1.107
  • Mahadani P, Ghosh SK (2013) DNA Barcoding: A tool for species identification from herbal juices. DNA Barcodes 1, 35–38. doi: 10.2478/dna-2013-0002
  • Manzanilla V, Kool A, Nguyen Nhat L, Nong Van H, Le Thi Thu H, de Boer HJ (2018) Phylogenomics and barcoding of Panax: toward the identification of ginseng species. BMC Evol. Biol. 18, 44. doi: 10.1186/s12862-018-1160-y
  • Meyer CP, Paulay G (2005) DNA barcoding: error rates based on comprehensive sampling. PLoS Biol. 3, e422. doi: 10.1371/journal.pbio.0030422
  • Nixon KC (1999) The parsimony ratchet, a new method for rapid parsimony analysis. Cladistics 15, 407–414. doi: 10.1111/j.1096-0031.1999.tb00277.x
  • O’Meara BC (2010) New heuristic methods for joint species delimitation and species tree inference. Syst. Biol. 59, 59–73. doi: 10.1093/sysbio/syp077
  • Olive DM, Bean P (1999) Principles and applications of methods for DNA-based typing of microbial organisms. J. Clin. Microbiol. 37, 1661–1669. doi: 10.1128/JCM.37.6.1661-1669.1999
  • Padial JM, Miralles A, De la Riva I, Vences M (2010) The integrative future of taxonomy. Front. Zool. 7, 16. doi: 10.1186/1742-9994-7-16
  • Palhares RM, Gonçalves Drummond M, Dos Santos Alves Figueiredo Brasil B, Pereira Cosenza G, das Graças Lins Brandão M, Oliveira G (2015) Medicinal plants recommended by the world health organization: DNA barcode identification associated with chemical analyses guarantees their quality. PLoS ONE 10, e0127866. doi: 10.1371/journal.pone.0127866
  • Palmer JD, Jansen RK, Michaels HJ, Chase MW, Manhart JR (1988) Chloroplast DNA variation and plant phylogeny. Ann Mo Bot Gard 75, 1180. doi: 10.2307/2399279
  • Piredda R, Simeone MC, Attimonelli M, Bellarosa R, Schirone B (2011) Prospects of barcoding the Italian wild dendroflora: oaks reveal severe limitations to tracking species identity. Mol. Ecol. Resour. 11, 72–83. doi: 10.1111/j.1755-0998.2010.02900.x
  • Purushothaman N, Newmaster SG, Ragupathy S, Stalin N, Suresh D, Arunraj DR, Gnanasekaran G, Vassou SL, Narasimhan D, Parani M (2014) A tiered barcode authentication tool to differentiate medicinal Cassia species in India. Genet Mol Res 13, 2959–2968.
  • Raclariu AC, Heinrich M, Ichim MC, de Boer H (2018) Benefits and limitations of DNA barcoding and metabarcoding in herbal product authentication. Phytochem. Anal. 29, 123–128. doi: 10.1002/pca.2732
  • Raclariu AC, Paltinean R, Vlase L, Labarre A, Manzanilla V, Ichim MC, Crisan G, Brysting AK, de Boer H (2017) Comparative authentication of Hypericum perforatum herbal products using DNA metabarcoding, TLC and HPLC-MS. Sci. Rep. 7, 1291. doi: 10.1038/s41598-017-01389-w
  • Ratnasingham S, Hebert PDN (2007) BOLD: The Barcode of Life Data System ( Mol. Ecol. Notes 7, 355–364. doi: 10.1111/j.1471-8286.2007.01678.x
  • Ritland K, Clegg MT (1987) Evolutionary analysis of plant DNA sequences. Am. Nat. 130, S74–S100. doi: 10.1086/284693
  • Rubinoff D, Cameron S, Will K (2006) Are plant DNA barcodes a search for the Holy Grail? Trends Ecol. Evol. 21, 1–2. doi: 10.1016/j.tree.2005.10.019
  • Sang T, Crawford DJ, Stuessy TF (1995) Documentation of reticulate evolution in peonies (Paeonia) using internal transcribed spacer sequences of nuclear ribosomal DNA: implications for biogeography and concerted evolution. Proc Natl Acad Sci USA 92, 6813–6817. doi: 10.1073/pnas.92.15.6813
  • Sass C, Little DP, Stevenson DW, Specht CD (2007) DNA barcoding in the cycadales: testing the potential of proposed barcoding markers for species identification of cycads. PLoS ONE 2, e1154. doi: 10.1371/journal.pone.0001154
  • Schlick-Steiner BC, Steiner FM, Seifert B, Stauffer C, Christian E, Crozier RH (2010) Integrative taxonomy: a multisource approach to exploring biodiversity. Annu. Rev. Entomol. 55, 421–438. doi: 10.1146/annurev-ento-112408-085432
  • Smith MA, Poyarkov NA, Hebert PDN (2008) DNA BARCODING: CO1 DNA barcoding amphibians: take the chance, meet the challenge. Mol. Ecol. Resour. 8, 235–246. doi: 10.1111/j.1471-8286.2007.01964.x
  • Soltis DE, Kuzoff RK (1995) Discordance between nuclear and chloroplast phylogenies in the heuchera group (saxifragaceae). Evolution 49, 727–742. doi: 10.1111/j.1558-5646.1995.tb02309.x
  • Soltis PS, Soltis DE, Chase MW (1999) Angiosperm phylogeny inferred from multiple genes as a tool for comparative biology. Nature 402, 402–404. doi: 10.1038/46528
  • Stamatakis A (2006) RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22, 2688–2690. doi: 10.1093/bioinformatics/btl446
  • Stanford AM, Harden R, Parks CR (2000) Phylogeny and biogeography of Juglans (Juglandaceae) based on matK and ITS sequence data. Am. J. Bot. 87, 872–882.
  • Struck TH, Feder JL, Bendiksby M, Birkeland S, Cerca J, Gusarov VI, Kistenich S, Larsson K-H, Liow LH, Nowak MD, Stedje B, Bachmann L, Dimitrov D (2018) Finding evolutionary processes hidden in cryptic species. Trends Ecol. Evol. 33, 153–163. doi: 10.1016/j.tree.2017.11.007
  • Sukumaran J, Knowles LL (2017) Multispecies coalescent delimits structure, not species. Proc Natl Acad Sci USA 114, 1607–1612. doi: 10.1073/pnas.1607921114
  • Swofford DL (2002) PAUP*: phylogenetic analysis using parsimony (* and other methods). Version. 4. Sinauer Associates, Sunderland, Massachusetts.
  • Taberlet P, Coissac E, Pompanon F, Brochmann C, Willerslev E (2012) Towards next-generation biodiversity assessment using DNA metabarcoding. Mol. Ecol. 21, 2045–2050. doi: 10.1111/j.1365-294X.2012.05470.x
  • Taberlet P, Coissac E, Pompanon F, Gielly L, Miquel C, Valentini A, Vermat T, Corthier G, Brochmann C, Willerslev E (2007) Power and limitations of the chloroplast trnL (UAA) intron for plant DNA barcoding. Nucleic Acids Res. 35, e14. doi: 10.1093/nar/gkl938
  • Tate JA, Fuertes Aguilar J, Wagstaff SJ, La Duke JC, Bodo Slotta TA, Simpson BB (2005) Phylogenetic relationships within the tribe Malveae (Malvaceae, subfamily Malvoideae) as inferred from ITS sequence data. Am. J. Bot. 92, 584–602. doi: 10.3732/ajb.92.4.584
  • Veldman S, Gravendeel B, Otieno JN, Lammers Y, Duijm E, Nieman A, Bytebier B, Ngugi G, Martos F, van Andel TR, de Boer HJ (2017) High-throughput sequencing of African chikanda cake highlights conservation challenges in orchids. Biodivers. Conserv. 26, 2029–2046. doi: 10.1007/s10531-017-1343-7
  • Veldman S, Otieno J, Gravendeel B, Andel T van Boer H de (2014) Conservation of Endangered Wild Harvested Medicinal Plants: Use of DNA Barcoding. Novel Plant Bioresources: Applications in Food, Medicine and Cosmetics 81–88.
  • Wallace LJ, Boilard SMAL, Eagle SHC, Spall JL, Shokralla S, Hajibabaei M (2012) DNA barcodes for everyday life: Routine authentication of Natural Health Products. Food Res. Int 49, 446–452. doi: 10.1016/j.foodres.2012.07.048
  • White TJ, Bruns T, Lee S, Taylor J (1990) Amplification and direct sequencing of fungal ribosomal RNA genes for phylogenetics, in: Innis, M.A., Gelfand, J.J., Sninsky, D.H., White, T.J. (Eds.) PCR Protocols: A Guide to Methods and Applications. Academic Press San Diego, CA, pp. 315–322.
  • Will KW, Rubinoff D (2004) Myth of the molecule: DNA barcodes for species cannot replace morphology for identification and classification. Cladistics 20, 47–55. doi: 10.1111/j.1096-0031.2003.00008.x
  • Yang Z, Rannala B (2017) Bayesian species identification under the multispecies coalescent provides significant improvements to DNA barcoding analyses. Mol. Ecol. 26, 3028–3036. doi: 10.1111/mec.14093
  • Yang Z (2015) The BPP program for species tree estimation and species delimitation. Curr. Zool. 61, 854–865. doi: 10.1093/czoolo/61.5.854


  1. Some things that might have been overlooked: (1) Does NCBI GenBank list more than one species of Bellis? If not, then it might be any other Bellis species not present in this database; (2) How much variation does rbcL have in Bellis? Does the query match 100% with more than one species of Bellis? If yes, then a more variable marker should be used.
  2. Answer for 2A) The study can ascertain that only these five species are present if it includes a sequence reference database of all other Paeonia species (or those possibly present). If the sequence reference database contains only the five common species, then no such conclusion can be made. 2B) The failed samples could: not include Paeonia; contain degraded Paeonia DNA that is not amplifiable; or contain inhibitors that make the DNA nonamplifiable.
  3. Answers could include: vegetative material such as roots, leaves, and seedlings, DNA extracts from bulk samples, soil DNA, faecal DNA, pollen DNA, or air-captured eDNA.

Chapter 11 Amplicon metabarcoding


What is metabarcoding?

DNA metabarcoding is an approach where taxonomically informative regions in the DNA are amplified from mixed-template samples containing DNA from different taxa for identification (Pompanon et al. 2012; Riaz et al. 2011). These taxonomically informative regions, also referred to as DNA barcodes or markers, ideally have low intraspecific variability and high interspecific variability to be able to discriminate between species, and conservative regions for universal amplification of the targeted community (Coissac et al. 2016). To target these DNA barcode regions, some prior knowledge is required for the design of primers that are complementary to flanking conservative regions of barcodes. Additionally, dependent on the metabarcoding approach used, primers can contain unique nucleotide tags to discern between samples during downstream bioinformatics processes (Binladen et al. 2007; Valentini et al. 2009b). After PCR amplification, amplicons are built into libraries where library indexes are added to allow for multiple amplicon libraries to be sequenced in one flow cell (Elbrecht and Leese 2015; Elbrecht et al. 2017). Adapters specific to the sequencing platforms are added to the PCR products (amplicons) and sequenced on a high-throughput sequencing (HTS) platform. The resulting sequences can be taxonomically identified by matching them to a reference database (De Barba et al. 2014; Kress and Erickson 2008; Taberlet et al. 2018, 2012). This method is useful for identifying different taxa from bulk samples of organismal DNA (Yu et al. 2012), and specifically to detect plants from environmental DNA (eDNA) samples including water, soil, sediment, air, and organic remains such as faeces (Deiner et al. 2017; Taberlet et al. 2012).

Plant metabarcoding

Metabarcoding is based on the DNA barcoding concept (see Chapter 10 DNA barcoding). However, for metabarcoding, samples containing DNA from a mix of different taxa are typically used. One of the first studies that used metabarcoding on a parallel sequencing system (herein referred to as DNA barcoding) to identify plants was by Valentini and colleagues (Valentini et al. 2009a) who analysed the diet of a variety of animals using their faeces. Earlier attempts at diet analyses were also made using chloroplastic (Poinar et al. 2001) and nuclear regions (Bradley et al. 2007), though these are not strictly speaking metabarcoding studies since they did not use high-throughput sequencing. Identification of plants through barcoding has had a turbulent history due to the lack of consensus on which plant barcodes should be used as standards (Pennisi 2007). In the landmark paper by Hebert and colleagues (Hebert et al. 2003), it was shown that animal species can be confidently identified through a short and highly variable piece of mitochondrial DNA called cytochrome oxidase subunit 1 (CO1). This has led many research groups to search for a similar barcode for the identification of plants (Chase et al. 2007; Kress et al. 2005). For plant species identification, the metabarcoding community has heavily relied on short fragments of plastid barcodes rbcL, trnH-psbA, matK, the P6 loop of the trnL intron and the nuclear ribosomal internal transcribed spacers nrITS1 and nrITS2 (China Plant BOL Group et al. 2011; Hollingsworth et al. 2016). There is, however, still no consensus on which plant DNA barcode(s) perform best. Studies that test various DNA barcodes for specific groups of plants find big differences between them (e.g., Braukmann et al. 2017), while others find that none of the available DNA barcodes provides species discrimination in certain plant groups (Zarrei et al. 2015). The search for the universal plant barcode is thus still ongoing.

Sample types and application

Plant metabarcoding is widely used to study the taxonomic composition of mixed template samples such as water (Zimmermann et al. 2015) (see Chapter 3 DNA from water), soil and sediments (Yoccoz et al. 2012; Ariza et al. 2022) (see Chapter 4 DNA from soil), bryophyte spores (Stech et al. 2011) airborne pollen from ambient air (Sickel et al. 2015; Kraaijeveld et al. 2015; Polling et al. 2022) (see Chapter 5 DNA from pollen), honey, food and medicine (Hawkins et al. 2015; Raclariu et al. 2018) (see Chapter 6 DNA from food and medicine), faeces and coprolites (Valentini et al. 2009a + Polling et al. 2021) (see Chapter 7 DNA from faeces), ancient sediments (Alsos et al. 2016) (see Chapter 8 DNA from ancient sediments), ice and snow (Thomsen and Willerslev 2015; Varotto et al. 2021) plant macrofossils (Murray et al. 2012), whole insects (Kajtoch 2014), gut contents (McClenaghan et al. 2015), and epilithic samples (Apothéloz-Perret-Gentil et al. 2017). DNA extraction methods are highly dependent on the type of material used and this is covered separately in Section 1 of this book.

Plant metabarcoding has been used in various types of applications including species delimitation (see Chapter 17 Species delimitation), archaeo- and palaeo-botany (Parducci et al. 2017) (see Chapter 21 Palaeobotany), healthcare (Reese et al. 2019) (see Chapter 23 Healthcare), food safety (Raclariu et al. 2017) (see Chapter 24 Food safety), environmental and biodiversity assessments (Fahner et al. 2016) (see Chapter 24 Environment and biodiversity assessments), wildlife trade (de Boer et al. 2017) (see Chapter 25 Wildlife trade), hay fever forecasts (Kraaijeveld et al. 2015) (see Chapter 5 DNA from pollen), water quality assessments (Smucker et al. 2020; Zimmermann et al. 2015) (see Chapter 3 DNA from water), and documenting environmental change (Jørgensen et al. 2012). These are some examples of plant-specific applications where metabarcoding has proven its value, though further detailed information can be found in the chapters referred to here.

Advantages and limitations of metabarcoding

DNA metabarcoding is a cost-effective method as compared to metagenomics (Chua et al. 2021a) (see Chapter 12 Metagenomics) or target capture (see Chapter 14 Target capture) as only DNA from targeted taxa is amplified and sequenced (Taberlet et al. 2012). The tagging system makes it possible to process large numbers of samples simultaneously, further decreasing the sequencing costs and increasing the total sample throughput. DNA present in low quantities (e.g., from rare species) can be targeted and amplified using specific primers and PCR-amplified. It is also a useful method for samples with low-quality DNA (i.e., degraded DNA) since it targets small barcodes that are relatively stable through time (Goldberg et al. 2016; Deiner et al. 2017). For example, plant DNA can be sequenced from ice core samples as old as 500 000 years old (Willerslev et al. 2007).

However, DNA metabarcoding also has its limitations, and the PCR amplification step has previously proven to be particularly problematic (Taberlet et al. 2012). This step can cause stochasticity (Murray et al. 2015) and create false positives (Ficetola et al. 2015), which stresses the need for both PCR and extraction replicates. However, depending on the specific research question, it may also be advisable to limit the number of PCR replicates and instead focus on sequencing depth (Smith and Peay 2014), although this would decrease species richness estimates (Dopheide et al. 2018).

Another drawback of DNA metabarcoding is primer binding bias due to mismatches between the primer and the template DNA. This can result in discrepancies between the proportion of the original taxa in the DNA extract and the amplified DNA sequences (Bista et al. 2018; Elbrecht and Leese 2015). Although quantitative results can be obtained from some primers using certain laboratory and bioinformatic controls (Ji et al. 2020; Piñol et al. 2019), this is still taxa-dependent and therefore not commonly used. Depending on the metabarcoding strategy, tag jumps during library building should also be taken into consideration as they can cause false sequence-to-sample assignments (Carøe and Bohmann 2020; Schnell et al. 2015).

Finally, the taxonomic assignment of sequences to species is heavily dependent on the DNA reference database used for sequence matching. When the reference database to which the resulting sequences are compared to is incomplete and/or consists of inaccurately identified species, this results in erroneously identified species and/or false negatives (Banchi et al. 2020; Meiklejohn et al. 2019). This also affects the species resolution of the results. For example, a reference database based on the trnL barcode region may give a resolution of 33% species identification on a large circum-arctic scale, but within a localised area, this resolution may increase to 77–93% (Sønstebø et al. 2010; Alsos et al. 2018; Chua et al. 2021b). Thus, both the plant marker of choice as well as the reference database used are important and often limiting factors in metabarcoding studies for species identification. Lastly, taxonomic assignments between different species can have the same highest identity scores, but this can be handled by using a Last Common Ancestor approach (e.g., using MEGAN Huson et al. 2006 or OBITools Boyer et al. 2016).

Setting up a metabarcoding study

At the start of any (plant) metabarcoding study lies a clearly defined research question. A study design should furthermore encompass a clear sampling strategy, and identification of suitable DNA extraction techniques for the sample type used before carrying out downstream analysis (Zinger et al. 2019). As the chapters in Section 1 already details DNA extraction methods based on specific starting materials, this section will cover the subsequent steps, starting with selecting the plant barcodes to best answer the research question, choosing a nucleotide tagging strategy, sequencing and finally analysing the sequence output using bioinformatics pipelines.

Barcode choice

Barcode choice is one of the most important aspects of metabarcoding studies as it will determine which taxa are identified and to what resolution. Considerable efforts have gone into constructing libraries for these plant barcodes and in assessing their limitations (CBOL Plant Working Group 2009; Cowan et al. 2006; Fazekas et al. 2012; Hollingsworth et al. 2011; Kress 2017). Metabarcoding studies are often heavily dependent on reducing the potentially identifiable species, e.g., using trnL P6 loop one can make species-specific identifications of the Greenland flora, but family level identification in a tropical rainforest. The objective of the study determines the level of taxonomic resolution needed, and thus the approach (marker, replicates, etc), e.g., if only relative abundances at the family level are desired or if specific species in a vegetation plot need to be identified from soil. Different research groups use different ‘preferred’ barcodes that they consider best suited for their specific target plants. Despite this lack of consensus, the efficacy of metabarcoding for identifying the majority of plant species from plant mixtures still makes this a very useful tool. When choosing barcodes for metabarcoding studies, three factors must be considered: 1) sequence availability and presence in a reference library, 2) discriminatory power / taxonomic resolution, and 3) degree of DNA degradation in the sample (Hollingsworth et al. 2011). These three steps will be briefly explained below.

  1. The first step is to check whether or not reference libraries exist for the sequences of the targeted organism(s). This is because barcodes are only useful if the sequences for the targeted organism(s) are available in sequence repositories or reference libraries (Weigand et al. 2019). For some barcodes and specific geographic regions, optimised plant reference libraries exist that minimise inaccurate identification of sequences. One such example is the arctic boreal vascular plant and bryophyte database that is based on the P6 loop of trnL (Sønstebø et al. 2010). A curated global plant database is also available for nrITS2 (Banchi et al. 2020). Premade reference databases are not complete and it is therefore recommended to compare several databases to obtain the best resolution. Another option is to construct a tailored reference database, for example using the BOLD data portal or in GenBank using the e-utilities tool kit. The use of the publicly available GenBank database is generally discouraged as it contains many erroneous sequences (e.g., Steinegger and Salzberg 2020). If the target organisms are not present in any public sources, then one would opt for constructing de novo reference libraries. The idea behind it is to sequence barcodes from specimens collected in the study site, which are then assigned taxonomical annotations/identification (see Chapter 10 DNA barcoding). The construction of regional reference libraries usually employs a combination of both strategies described above. Last, one would opt for blasting the obtained sequences to a public source. This strategy would incur multiple taxonomic assignments to one single sequence and thus a threshold of blasting similarity would have to be arbitrarily designed.
  2. Discriminatory power refers to how effectively the barcodes can discriminate between closely related species and is linked to the variability of the locus. Typically, barcodes can only identify plants up to a certain taxonomic level (resolution) depending on the barcode used and the group of plants targeted. Moreover, because reference libraries are incomplete for all DNA barcodes, some species may only be detected using one DNA barcode while others may only be detected by another. Therefore, using a single primer set will most often not result in the recovery of all species present in a sample. We recommend adopting a multilocus approach to gain highly resolved taxonomic coverage for complex samples (see e.g. Arulandhu et al. 2017).
  3. DNA is relatively unstable in the environment and can degrade quickly depending on certain factors such as age, transport, and abiotic factors (Deiner et al. 2017). In highly degraded and/or old materials, the use of very short, highly distinctive barcodes is recommended (e.g., P6 loop of trnL intron). Although this can provide a good indication of the plant community from mixed samples, some taxa cannot be identified beyond the family level (e.g., Asteraceae and Poaceae). Therefore, when possible, it is recommended to use the longer and in some cases more distinctive nuclear ribosomal barcodes ITS1 (De Barba et al. 2014; Omelchenko et al. 2019) and/or ITS2 (Yao et al. 2010). However, the nuclear ITS region is also present in fungi and in order to avoid amplification of fungal DNA, plant-specific primers should be used (Cheng et al. 2016; Chen et al. 2010; Moorhouse-Gann et al. 2018; Omelchenko et al. 2019; Timpano et al. 2020).

Metabarcoding nucleotide tagging strategies

In the metabarcoding laboratory workflow, unique nucleotide tags are added to amplicons, and these tags are used to assign sequences to the sample they originate from (Binladen et al. 2007). This allows for the pooling of many labelled PCR replicates for sequencing, and dramatically increases the throughput. Labelling amplicons with unique nucleotide tags can be done at two stages during a metabarcoding workflow: prior to library building as 5’ nucleotide tags added to the amplicons, and/or after library completion as library indexes. The strategies to achieve this labelling can be condensed into three main approaches: the ‘one-step PCR’ approach, the ‘two-step PCR’ approach, and the ‘tagged PCR approach’.

In the ‘one-step PCR’ approach, the metabarcoding barcode is amplified and built into libraries during one PCR. This is achieved through the use of metabarcoding primers that carry both adapters and library indexes (Elbrecht and Leese 2015; Elbrecht et al. 2017), though unique nucleotide tags instead of library indexes can also be added in the one-step PCR approach (Elbrecht and Steinke 2018). In this approach, each PCR replicate is a library.

In the ‘two-step PCR’ approach, sample extracts are PCR-amplified with metabarcoding primers that only carry 5’ tails. These are added to act as templates for the following second PCR and do not include any labelling. The second PCR is carried out on each PCR product with primers that carry adapters and indexes (Galan et al. 2018; Miya et al. 2015; Swift et al. 2018), although unique nucleotide tags can also be added in the first PCR (Kitson et al. 2019). In the two-step PCR approach, each PCR replicate is also a library.

In the ‘tagged PCR’ approach, DNA extracts are PCR amplified with metabarcoding primers that carry 5’ unique nucleotide tags. Next, the individually 5’ tagged PCR products are pooled and library preparation is carried out on the pools (first demonstrated by (Binladen et al. 2007) on the 454 FLX platform). Library preparation can be with (Drinkwater et al. 2019; Hibert et al. 2013) or without (Carøe and Bohmann 2020; Sigsgaard et al. 2017) an indexing PCR step. Care should be taken with using this approach, as several studies have shown it to be prone to so-called tag-jumping where amplicon sequences carry false combinations of nucleotide tags after amplification (Schnell et al. 2015). This can be avoided using specific library preparation protocols (Carøe and Bohmann 2020; Sigsgaard et al. 2017)). Finally, indexes can also be ligated to the amplicons with the primers, a technique used for example in Nanopore sequencing.

With the cost of sequencing decreasing exponentially, more effort can be put into applying technical PCR replicates to circumvent sequencing errors and other PCR related issues. When using PCR replicates they should be sequenced in separate locations on the same 96-well plate or, ideally, with replicates in separate plates.Taxa identification lies at the core of any ecological research question. Thus, it is crucial to perform a reliable and reproducible identification workflow to ensure correct identification. In general, care should be taken to avoid cross-contamination between samples by working in clean laboratories with filter-tipped pipettes and separate pre- and post-PCR labs. Normalisation of the amplicons prior to library construction is crucial to avoid overamplification of the most represented taxa in the sample. Since some often-used plant-specific marker regions are very short (e.g., trnL P6 loop, 8 to152 bp), they are prone to picking up the slightest contaminants from the environment. It is therefore recommended to work in a clean environment, e.g. an ancient DNA laboratory with protective clothing.

Sequencing platforms

The preferred platforms for sequencing are currently IonTorrent and Illumina. Both platforms require an additional post-ligation PCR-step or PCR-free ligation of platform-specific adapters to the amplicons before sequencing. However, due to the different technologies behind both platforms, both the error rates and error types can differ. For Illumina (optical sequencing), a substitution error rate of 0.1% has been identified, while IonTorrent (based on detection of hydrogen ions) can show up to 1% indel errors (Quail et al. 2012; Shin et al. 2017). The IonTorrent platform has a slightly higher error rate when the material contains high amounts of homopolymers because no good correlation exists between the number of identical bases incorporated and the observed voltage change (Bragg et al. 2013). Illumina is the most often used platform in metabarcoding studies due to its lower error rates, and the generation of relatively long reads by paired-ending (Forin-Wiart et al. 2018). Since IonTorrent and Illumina are limited in the maximum length of amplicons that can be generated (up to 600 bp), more recent sequencing platforms like Nanopore and PacBio are increasingly being used. These long read technologies have the advantage of being able to retrieve for example the whole nuclear ITS or plastid matK regions. For more information on sequencing platforms, please refer to Chapter 9 Sequencing platforms and data types.

Bioinformatics tools

Several different bioinformatic tools can be used to analyse the sequence output. Some commonly used packages are OBITools (Boyer et al. 2016), BEGUM (Yang et al. 2020), MOTHUR (Schloss et al. 2009), QIIME (Caporaso et al. 2010), and DADA2 (Callahan et al. 2016). The bioinformatics workflow includes these common steps: quality check of raw reads, removal of adapter sequences, demultiplexing, filtering of erroneous sequences, sequence dereplication, removal of singletons and PCR/sequencing errors, clustering/denoising, and taxonomic annotations using reference databases (most commonly using BLASTn). Depending on the pipelines used, sequences are either clustered into OTUs based on sequence similarity level (often 97%) such as in QIIME, MOTHUR, VSEARCH, or denoised into strictly unique sequences called ASVs such as in DADA2. The choice to cluster sequences into OTUs or denoise into ASVs is dependent on the research question. Clustering sequences into OTUs reduces sequencing errors, but increases false negatives as multiple similar species are clustered into a single OTU. In datasets where it is expected that closely related species are present, such as species with homopolymers (e.g., Vaccinium spp), denoising sequences into ASVs would be preferred since these homopolymers can be sorted out into separate sequence variants. However, using this technique may also result in artificially inflating diversity as species may have more than one sequence variant, especially if the reference database used is incomplete. Alternatively, sequences can also be assigned directly to taxons such as in OBITools, one of the most frequently used open-source programs for plant metabarcoding studies. OBITools was specifically designed for the analysis of metabarcoding data generated from HTS. It relies on filtering and sorting algorithms, which allows users to customise their pipelines tailored to their needs. A distinct feature of OBITools is its ability to account for taxonomic annotations, which allows the sorting of sequences based on taxonomy instead of OTUs/ASVs.

Future of metabarcoding

Currently, metabarcoding is the dominant technique used in the identification of plants from mixed samples. Developments and improvements in addressing methodological challenges such as PCR bias may one day allow for unbiased quantitative inferences from metabarcoding datasets. This would be a huge step forward for the metabarcoding community since it is still controversial to use read counts as an indication for biomass (Deagle et al. 2019). With the continued advances in HTS technologies coupled with the inherent limitations of metabarcoding, there is also a possibility that alternative HTS techniques can be used in the future. For example, the development of more regional DNA reference databases based on whole organelle genomes instead of single barcode regions (Coissac et al. 2016) (see Chapter 10 DNA barcoding) would encourage the use of HTS techniques that rely on whole genomes or multiple non-standard barcode regions for taxonomic identification. Particularly, if sequencing becomes cheaper and if the limitations of metagenomics (see Chapter 12 Metagenomics) or target capture (see Chapter 14 Target capture) are addressed, we may see an increase in other types of methods used to identify plants in mixed templates. However, metabarcoding has the advantage of being a cheaper option, where large numbers of samples can be processed for meaningful statistical analysis. Bioinformatics pipelines are also well-established and better reference databases are available for mini barcodes as compared to whole organelles. This makes metabarcoding the preferred technique for many applications. In addition, ongoing efforts to build curated reference databases, design better primers, and detect potential plant-specific barcode regions might increase species resolution and circumvent many of the drawbacks associated with metabarcoding (Chua et al. 2021c).

Figure 1.

Chapter 11 Infographic: Visual representation of the content of this chapter.

Metabarcoding could potentially be used to determine plant composition in a landscape from bulk arthropod samples. Bulk arthropod samples have been used for biodiversity monitoring of vertebrates (Lynggaard et al. 2019), but it has not been used for any plant-related studies. Another potential application of metabarcoding is in forensic genetics (see Chapter 26 Forensic genetics, botany and palynology), where plants are used as evidence in criminal investigations (Bryant 2013). For example, morphological identification of pollen grains has been used to solve murders and determine marijuana distribution locations (Alotaibi et al. 2020; Bryant 2013). However, metabarcoding is underutilised in these applications where morphological identification is still the main technique. One possible limiting factor for this lack of utilisation could be that pollen DNA extraction destroys the samples and therefore cannot be stored as evidence (Bell et al. 2016). Metabarcoding could also potentially be used in meta-phylogeographic studies to simultaneously study the phylogeographic features and intraspecies patterns of many species (Turon et al. 2019).


  1. How can overamplification of the most represented taxa in a single sequencing run of multiple complex mixtures be avoided?
  2. Which DNA barcode region is most suitable for dealing with plant DNA from samples where DNA is expected to be degraded?
  3. The nuclear ribosomal ITS region is shared between plants and fungi. How can undesirable fungal DNA amplification be avoided?


Adapters – Specific nucleotide sequences unique to different types of sequencing platforms that are added to amplicon libraries to allow for the attachment of library fragments to the flow cell for sequencing.

Amplicons – Products of PCR amplification.

ASVs – Amplicon sequence variants are also known as exact sequence variants or zero-radius OTUs. Although sometimes considered synonymous to OTUs, they correspond to all the unique reads in a dataset and do not require clustering used in creating OTUs.

Barcode – Targeted gene region, see Locus.

Demultiplexing – Bioinformatics step of assigning sequences to samples based on assigned nucleotide tags and/or library indexes.

Epilithic – Plant growing on surfaces of rocks, e.g., seaweeds.

Homopolymers – Nucleotide repetition, usually in tandem of more than 7 nucleotides.

Indel errors – Insertions or deletions in sequences resulting from mutations.

ITS – The internal transcribed spacer is a nuclear ribosomal region found between the small subunit ribosomal RNA (rRNA) and large-subunit rRNA genes.

Library indexes – Nucleotide index added to amplicon libraries to allow for the parallel sequencing of multiple libraries, which can be used bioinformatically to assign reads to the correct amplicon libraries.

Locus – Section and position in a chromosome where a particular DNA sequence is located. It can also be referred to as a barcode.

Macrofossils – Preserved plant remains large enough to be seen without a microscope.

matK – Maturase K is a gene found in the chloroplast genome.

Meta-phylogeography – Study of phylogeographic features and intraspecies variation.

Multiplexing – Parallel amplification of barcodes in one PCR reaction.

OTU – Operational taxonomic unit. The term is used to categorise clusters of similar sequences.

Overhangs – Stretch of unpaired nucleotides at the end of DNA fragments.

PCR – Polymerase chain reaction.

PCR stochasticity – Uneven amplification of molecules during PCR that can be a result of some sequences being present in lower copy numbers than others.

Phylogeography – Investigate the origin of genetic variation within closely related species across a landscape.

Primers – A short single-stranded nucleic acid sequence that serves as a starting point for the DNA replication in the PCR.

Primer set – Nucleic acid sequences explained above complementary to the 5’ end and 3’ end of the flanking regions of a loci.

Primer bias – Differences in DNA amplification due to a primer inefficiently binding to the target template. This can result from sequence divergence in the primer binding sites.

qPCR – Polymerase chain reaction used for quantifying DNA.

rbcL – The ribulose-1,5-bisphosphate carboxylase large subunit gene is found in the chloroplast genome.

Singletons – A sequence only present in one copy.

Nucleotide tags – Short nucleotide sequences added at the 5’ end of the primer in metabarcoding studies.

Tag jumps – Generation of amplicons with different tags than originally used, resulting in false positives in the data. For more detail see (Schnell et al. 2015).

Taxa – Plural of taxon. A taxon is a group of organisms that form a taxonomic group.

Taxonomic assignment – Matching the obtained sequences to taxa names.

trnH-psbA – An intergenic spacer region found in the chloroplast genome.

trnL – The trnL gene is part of the trnL-F region of the chloroplast genome.


  • Alotaibi SS, Sayed SM, Alosaimi M, Alharthi R, Banjar A, Abdulqader N, Alhamed R (2020) Pollen molecular biology: applications in the forensic palynology and future prospects: A review. Saudi J. Biol. Sci. 27, 1185–1190. doi: 10.1016/j.sjbs.2020.02.019
  • Alsos IG, Ehrich D, Seidenkrantz M-S, Bennike O, Kirchhefer AJ, Geirsdottir A (2016) The role of sea ice for vascular plant dispersal in the Arctic. Biol. Lett. 12. doi: 10.1098/rsbl.2016.0264
  • Alsos IG, Lammers Y, Yoccoz NG, Jørgensen T, Sjögren P, Gielly L, Edwards ME (2018) Plant DNA metabarcoding of lake sediments: How does it represent the contemporary vegetation. PLoS ONE 13, e0195403. doi: 10.1371/journal.pone.0195403
  • Apothéloz-Perret-Gentil L, Cordonier A, Straub F, Iseli J, Esling P, Pawlowski J (2017) Taxonomy-free molecular diatom index for high-throughput eDNA biomonitoring. Mol. Ecol. Resour. 17, 1231–1242. doi: 10.1111/1755-0998.12668
  • Ariza M, Fouks B, Mauvisseau Q, Halvorsen R, Alsos IG, de Boer HJ (2022) Plant biodiversity assessment through soil eDNA reflects temporal and local diversity. Methods Ecol. Evol., 00, 1–16. doi: 10.1111/2041-210X.13865
  • Arulandhu AJ, Staats M, Hagelaar R, Voorhuijzen MM, Prins TW, Scholtens I, Costessi A, Duijsings D, Rechenmann F, Gaspar FB, Barreto Crespo MT, Holst-Jensen A, Birck M, Burns M, Haynes E, Hochegger R, Klingl A, Lundberg L, Natale C, Niekamp H, Kok E (2017) Development and validation of a multi-locus DNA metabarcoding method to identify endangered species in complex samples. Gigascience 6, 1–18. doi: 10.1093/gigascience/gix080
  • Banchi E, Ametrano CG, Greco S, Stanković D, Muggia L, Pallavicini A (2020) PLANiTS: a curated sequence reference dataset for plant ITS DNA metabarcoding. Database (Oxford) 2020. doi: 10.1093/database/baz155
  • Bell KL, Burgess KS, Okamoto KC, Aranda R, Brosi BJ (2016) Review and future prospects for DNA barcoding methods in forensic palynology. Forensic Sci. Int. Genet. 21, 110–116. doi: 10.1016/j.fsigen.2015.12.010
  • Binladen J, Gilbert MTP, Bollback JP, Panitz F, Bendixen C, Nielsen R, Willerslev E (2007) The use of coded PCR primers enables high-throughput sequencing of multiple homolog amplification products by 454 parallel sequencing. PLoS ONE 2, e197. doi: 10.1371/journal.pone.0000197
  • Bista I, Carvalho GR, Tang M, Walsh K, Zhou X, Hajibabaei M, Shokralla S, Seymour M, Bradley D, Liu S, Christmas M, Creer S (2018) Performance of amplicon and shotgun sequencing for accurate biomass estimation in invertebrate community samples. Mol. Ecol. Resour. doi: 10.1111/1755-0998.12888
  • Boyer F, Mercier C, Bonin A, Le Bras Y, Taberlet P, Coissac E (2016) obitools: a unix-inspired software package for DNA metabarcoding. Mol. Ecol. Resour. 16, 176–182. doi: 10.1111/1755-0998.12428
  • Bradley BJ, Stiller M, Doran-Sheehy DM, Harris T, Chapman CA, Vigilant L, Poinar H (2007) Plant DNA sequences from feces: potential means for assessing diets of wild primates. Am. J. Primatol. 69, 699–705. doi: 10.1002/ajp.20384
  • Bragg LM, Stone G, Butler MK, Hugenholtz P, Tyson GW (2013) Shining a light on dark sequencing: characterising errors in Ion Torrent PGM data. PLoS Comput. Biol. 9, e1003031. doi: 10.1371/journal.pcbi.1003031
  • Braukmann TWA, Kuzmina ML, Sills J, Zakharov EV, Hebert PDN (2017) Testing the efficacy of DNA barcodes for identifying the vascular plants of canada. PLoS ONE 12, e0169515. doi: 10.1371/journal.pone.0169515
  • Bryant VM (2013) Use of quaternary proxies in forensic science | analytical techniques in forensic palynology, in: Encyclopedia of Quaternary Science. Elsevier, pp. 556–566. doi: 10.1016/B978-0-444-53643-3.00363-0
  • Callahan BJ, McMurdie PJ, Rosen MJ, Han AW, Johnson AJA, Holmes SP (2016) DADA2: High-resolution sample inference from Illumina amplicon data. Nat. Methods 13, 581–583. doi: 10.1038/nmeth.3869
  • Caporaso JG, Kuczynski J, Stombaugh J, Bittinger K, Bushman FD, Costello EK, Fierer N, Peña AG, Goodrich JK, Gordon JI, Huttley GA, Kelley ST, Knights D, Koenig JE, Ley RE, Lozupone CA, McDonald D, Muegge BD, Pirrung M, Reeder J, Knight R (2010) QIIME allows analysis of high-throughput community sequencing data. Nat. Methods 7, 335–336. doi: 10.1038/nmeth.f.303
  • Carøe C, Bohmann K (2020) Tagsteady: a metabarcoding library preparation protocol to avoid false assignment of sequences to samples. BioRxiv. doi: 10.1101/2020.01.22.915009
  • CBOL Plant Working Group (2009) A DNA barcode for land plants. Proc Natl Acad Sci USA 106, 12794–12797. doi: 10.1073/pnas.0905845106
  • Chase MW, Cowan RS, Hollingsworth PM, van den Berg C, Madriñán S, Petersen G, Seberg O, Jørgsensen T, Cameron KM, Carine M, Pedersen N, Hedderson TAJ, Conrad F, Salazar GA, Richardson JE, Hollingsworth ML, Barraclough TG, Kelly L, Wilkinson M (2007) A proposal for a standardised protocol to barcode all land plants. Taxon 56, 295–299. doi: 10.1002/tax.562004
  • Cheng T, Xu C, Lei L, Li C, Zhang Y, Zhou S (2016) Barcoding the kingdom Plantae: new PCR primers for ITS regions of plants with improved universality and specificity. Mol. Ecol. Resour. 16, 138–149. doi: 10.1111/1755-0998.12438
  • Chen S, Yao H, Han J, Liu C, Song J, Shi L, Zhu Y, Ma X, Gao T, Pang X, Luo K, Li Y, Li X, Jia X, Lin Y, Leon C (2010) Validation of the ITS2 region as a novel DNA barcode for identifying medicinal plant species. PLoS ONE 5, e8613. doi: 10.1371/journal.pone.0008613
  • China Plant BOL Group, Li D-Z, Gao L-M, Li H-T, Wang H, Ge X-J, Liu J-Q, Chen Z-D, Zhou S-L, Chen S-L, Yang J-B, Fu C-X, Zeng C-X, Yan H-F, Zhu Y-J, Sun Y-S, Chen S-Y, Zhao L, Wang K, Yang T, Duan G-W (2011) Comparative analysis of a large dataset indicates that internal transcribed spacer (ITS) should be incorporated into the core barcode for seed plants. Proc Natl Acad Sci USA 108, 19641–19646. doi: 10.1073/pnas.1104551108
  • Chua PYS, Crampton-Platt A, Lammers Y, Alsos IG, Boessenkool S, Bohmann K (2021a) Metagenomics: a viable tool for reconstructing herbivore diet. Mol. Ecol. Resour. 21, 2249–2263. doi: 10.1111/1755-0998.13425
  • Chua PYS, Lammers Y, Menoni E, Ekrem T, Bohmann K, Boessenkool S, Alsos IG (2021b) Molecular dietary analyses of western capercaillies (Tetrao urogallus) reveal a diverse diet. Environmental DNA 3, 1156–1171. doi: 10.1002/edn3.237
  • Chua PYS, Leerhøi F, Langkjær EMR, Noer CL, Richter SR, Marlene E, Margaryan A, Gilbert MTP, Coissac E, Alsos IG, Boessenkool S, Bohmann K (2021c) Towards the extended barcode concept: Generating DNA reference data through genome skimming of Danish plants. BioRxiv. doi: 10.1101/2021.08.11.456029
  • Coissac E, Hollingsworth PM, Lavergne S, Taberlet P (2016) From barcodes to genomes: extending the concept of DNA barcoding. Mol. Ecol. 25, 1423–1428. doi: 10.1111/mec.13549
  • Cowan RS, Chase MW, Kress WJ, Savolainen V (2006) 300,000 species to identify: problems, progress, and prospects in DNA barcoding of land plants. Taxon 55, 611–616. doi: 10.2307/25065638
  • Deagle BE, Thomas AC, McInnes JC, Clarke LJ, Vesterinen EJ, Clare EL, Kartzinel TR, Eveson JP (2019) Counting with DNA in metabarcoding studies: how should we convert sequence reads to dietary data? Mol. Ecol. 28, 391–406. doi: 10.1111/mec.14734
  • De Barba M, Miquel C, Boyer F, Mercier C, Rioux D, Coissac E, Taberlet P (2014) DNA metabarcoding multiplexing and validation of data accuracy for diet assessment: application to omnivorous diet. Mol. Ecol. Resour. 14, 306–323. doi: 10.1111/1755-0998.12188
  • de Boer HJ, Ghorbani A, Manzanilla V, Raclariu A-C, Kreziou A, Ounjai S, Osathanunkul M, Gravendeel B (2017) DNA metabarcoding of orchid-derived products reveals widespread illegal orchid trade. Proc. Biol. Sci. 284. doi: 10.1098/rspb.2017.1182
  • Deiner K, Bik HM, Mächler E, Seymour M, Lacoursière-Roussel A, Altermatt F, Creer S, Bista I, Lodge DM, de Vere N, Pfrender ME, Bernatchez L (2017) Environmental DNA metabarcoding: transforming how we survey animal and plant communities. Mol. Ecol. 26, 5872–5895. doi: 10.1111/mec.14350
  • Drinkwater R, Schnell IB, Bohmann K, Bernard H, Veron G, Clare E, Gilbert MTP, Rossiter SJ (2019) Using metabarcoding to compare the suitability of two blood-feeding leech species for sampling mammalian diversity in North Borneo. Mol. Ecol. Resour. 19, 105–117. doi: 10.1111/1755-0998.12943
  • Elbrecht V, Leese F (2015) Can DNA-based ecosystem assessments quantify species abundance? Testing primer bias and biomass-sequence relationships with an innovative metabarcoding protocol. PLoS ONE 10, e0130324. doi: 10.1371/journal.pone.0130324
  • Elbrecht V, Steinke D (2018) Scaling up DNA metabarcoding for freshwater macrozoobenthos monitoring. Freshw. Biol. doi: 10.1111/fwb.13220
  • Elbrecht V, Vamos EE, Meissner K, Aroviita J, Leese F (2017) Assessing strengths and weaknesses of DNA metabarcoding-based macroinvertebrate identification for routine stream monitoring. Methods Ecol. Evol. 8, 1265–1275. doi: 10.1111/2041-210X.12789
  • Fahner NA, Shokralla S, Baird DJ, Hajibabaei M (2016) Large-scale monitoring of plants through environmental DNA metabarcoding of soil: recovery, resolution, and annotation of four DNA markers. PLoS ONE 11, e0157505. doi: 10.1371/journal.pone.0157505
  • Fazekas AJ, Kuzmina ML, Newmaster SG, Hollingsworth PM (2012) DNA barcoding methods for land plants. Methods Mol. Biol. 858, 223–252. doi: 10.1007/978-1-61779-591-6_11
  • Ficetola GF, Pansu J, Bonin A, Coissac E, Giguet-Covex C, De Barba M, Gielly L, Lopes CM, Boyer F, Pompanon F, Rayé G, Taberlet P (2015) Replication levels, false presences and the estimation of the presence/absence from eDNA metabarcoding data. Mol. Ecol. Resour. 15, 543–556. doi: 10.1111/1755-0998.12338
  • Forin-Wiart M-A, Poulle M-L, Piry S, Cosson J-F, Larose C, Galan M (2018) Evaluating metabarcoding to analyse diet composition of species foraging in anthropogenic landscapes using Ion Torrent and Illumina sequencing. Sci. Rep. 8, 17091. doi: 10.1038/s41598-018-34430-7
  • Galan M, Pons J-B, Tournayre O, Pierre É, Leuchtmann M, Pontier D, Charbonnel N (2018) Metabarcoding for the parallel identification of several hundred predators and their prey: Application to bat species diet analysis. Mol. Ecol. Resour. 18, 474–489. doi: 10.1111/1755-0998.12749
  • Goldberg CS, Turner CR, Deiner K, Klymus KE, Thomsen PF, Murphy MA, Spear SF, McKee A, Oyler-McCance SJ, Cornman RS, Laramie MB, Mahon AR, Lance RF, Pilliod DS, Strickler KM, Waits LP, Fremier AK, Takahara T, Herder JE, Taberlet P (2016) Critical considerations for the application of environmental DNA methods to detect aquatic species. Methods Ecol. Evol. doi: 10.1111/2041-210X.12595
  • Hawkins J, de Vere N, Griffith A, Ford CR, Allainguillaume J, Hegarty MJ, Baillie L, Adams-Groom B (2015) Using DNA metabarcoding to identify the floral composition of honey: A new tool for investigating honey bee foraging preferences. PLoS ONE 10, e0134735. doi: 10.1371/journal.pone.0134735
  • Hebert PDN, Cywinska A, Ball SL, deWaard JR (2003) Biological identifications through DNA barcodes. Proc. Biol. Sci. 270, 313–321. doi: 10.1098/rspb.2002.2218
  • Hibert F, Taberlet P, Chave J, Scotti-Saintagne C, Sabatier D, Richard-Hansen C (2013) Unveiling the diet of elusive rainforest herbivores in next generation sequencing era? The tapir as a case study. PLoS ONE 8, e60799. doi: 10.1371/journal.pone.0060799
  • Hollingsworth PM, Graham SW, Little DP (2011) Choosing and using a plant DNA barcode. PLoS ONE 6, e19254. doi: 10.1371/journal.pone.0019254
  • Hollingsworth PM, Li D-Z, van der Bank M, Twyford AD (2016) Telling plant species apart with DNA: from barcodes to genomes. Philos. Trans. R. Soc. Lond. B Biol. Sci. 371. doi: 10.1098/rstb.2015.0338
  • Ji Y, Huotari T, Roslin T, Schmidt NM, Wang J, Yu DW, Ovaskainen O (2020) SPIKEPIPE: a metagenomic pipeline for the accurate quantification of eukaryotic species occurrences and intraspecific abundance change using DNA barcodes or mitogenomes. Mol. Ecol. Resour. 20, 256–267. doi: 10.1111/1755-0998.13057
  • Jørgensen T, Kjaer KH, Haile J, Rasmussen M, Boessenkool S, Andersen K, Coissac E, Taberlet P, Brochmann C, Orlando L, Gilbert MTP, Willerslev E (2012) Islands in the ice: detecting past vegetation on Greenlandic nunataks using historical records and sedimentary ancient DNA meta-barcoding. Mol. Ecol. 21, 1980–1988. doi: 10.1111/j.1365-294X.2011.05278.x
  • Kajtoch Ł (2014) A DNA metabarcoding study of a polyphagous beetle dietary diversity: the utility of barcodes and sequencing techniques. Folia Biol (Krakow) 62, 223–234. doi: 10.3409/fb62_3.223
  • Kitson JJN, Hahn C, Sands RJ, Straw NA, Evans DM, Lunt DH (2019) Detecting host-parasitoid interactions in an invasive Lepidopteran using nested tagging DNA metabarcoding. Mol. Ecol. 28, 471–483. doi: 10.1111/mec.14518
  • Kraaijeveld K, de Weger LA, Ventayol García M, Buermans H, Frank J, Hiemstra PS, den Dunnen JT (2015) Efficient and sensitive identification and quantification of airborne pollen using next-generation DNA sequencing. Mol. Ecol. Resour. 15, 8–16. doi: 10.1111/1755-0998.12288
  • Kress WJ, Erickson DL (2008) DNA barcodes: genes, genomics, and bioinformatics. Proc Natl Acad Sci USA 105, 2761–2762. doi: 10.1073/pnas.0800476105
  • Kress WJ, Wurdack KJ, Zimmer EA, Weigt LA, Janzen DH (2005) Use of DNA barcodes to identify flowering plants. Proc Natl Acad Sci USA 102, 8369–8374. doi: 10.1073/pnas.0503123102
  • Kress WJ (2017) Plant DNA barcodes: Applications today and in the future. J. Syst. Evol. 55, 291–307. doi: 10.1111/jse.12254
  • Lynggaard C, Nielsen M, Santos-Bay L, Gastauer M, Oliveira G, Bohmann K (2019) Vertebrate diversity revealed by metabarcoding of bulk arthropod samples from tropical forests. Environmental DNA 1, 329–341. doi: 10.1002/edn3.34
  • McClenaghan B, Gibson JF, Shokralla S, Hajibabaei M (2015) Discrimination of grasshopper (Orthoptera: Acrididae) diet and niche overlap using next-generation sequencing of gut contents. Ecol. Evol. 5, 3046–3055. doi: 10.1002/ece3.1585
  • Meiklejohn KA, Damaso N, Robertson JM (2019) Assessment of BOLD and GenBank - Their accuracy and reliability for the identification of biological materials. PLoS ONE 14, e0217084. doi: 10.1371/journal.pone.0217084
  • Miya M, Sato Y, Fukunaga T, Sado T, Poulsen JY, Sato K, Minamoto T, Yamamoto S, Yamanaka H, Araki H, Kondoh M, Iwasaki W (2015) MiFish, a set of universal PCR primers for metabarcoding environmental DNA from fishes: detection of more than 230 subtropical marine species. R. Soc. Open Sci. 2, 150088. doi: 10.1098/rsos.150088
  • Moorhouse-Gann RJ, Dunn JC, de Vere N, Goder M, Cole N, Hipperson H, Symondson WOC (2018) New universal ITS2 primers for high-resolution herbivory analyses using DNA metabarcoding in both tropical and temperate zones. Sci. Rep. 8, 8542. doi: 10.1038/s41598-018-26648-2
  • Murray DC, Coghlan ML, Bunce M (2015) From benchtop to desktop: important considerations when designing amplicon sequencing workflows. PLoS ONE 10, e0124671. doi: 10.1371/journal.pone.0124671
  • Murray DC, Pearson SG, Fullagar R, Chase BM, Houston J, Atchison J, White NE, Bellgard MI, Clarke E, Macphail M, Gilbert MTP, Haile J, Bunce M (2012) High-throughput sequencing of ancient plant and mammal DNA preserved in herbivore middens. Quat. Sci. Rev. 58, 135–145. doi: 10.1016/j.quascirev.2012.10.021
  • Omelchenko DO, Speranskaya AS, Ayginin AA, Khafizov K, Krinitsina AA, Fedotova AV, Pozdyshev DV, Shtratnikova VY, Kupriyanova EV, Shipulin GA, Logacheva MD (2019) Improved protocols of ITS1-based metabarcoding and their application in the analysis of plant-containing products. Genes (Basel) 10. doi: 10.3390/genes10020122
  • Parducci L, Bennett KD, Ficetola GF, Alsos IG, Suyama Y, Wood JR, Pedersen MW (2017) Ancient plant DNA in lake sediments. New Phytol. 214, 924–942. doi: 10.1111/nph.14470
  • Pennisi E (2007) Taxonomy. Wanted: a barcode for plants. Science 318, 190–191. doi: 10.1126/science.318.5848.190
  • Piñol J, Senar MA, Symondson WOC (2019) The choice of universal primers and the characteristics of the species mixture determine when DNA metabarcoding can be quantitative. Mol. Ecol. 28, 407–419. doi: 10.1111/mec.14776
  • Poinar HN, Kuch M, Sobolik KD, Barnes I, Stankiewicz AB, Kuder T, Spaulding WG, Bryant VM, Cooper A, Pääbo S (2001) A molecular analysis of dietary diversity for three archaic Native Americans. Proc Natl Acad Sci USA 98, 4317–4322. doi: 10.1073/pnas.061014798
  • Polling M, ter Schure ATM, van Geel B, van Bokhoven T, Boessenkool S, MacKay G, Langeveld BW, Ariza M, van der Plicht H, Protopopov AV, Tikhonov A, de Boer H, Gravendeel B (2021) Multiproxy analysis of permafrost preserved faeces provides an unprecedented insight into the diets and habitats of extinct and extant megafauna. Quat. Sci. Rev. 267, 107084. doi: 10.1016/j.quascirev.2021.107084
  • Polling M, Sin M, de Weger LA, Speksnijder AGCL, Koenders MJF, de Boer H, Gravendeel B (2022) DNA metabarcoding using nrITS2 provides highly qualitative and quantitative results for airborne pollen monitoring.Sci. Total Environ. 806(Part 1), 150468. doi: 10.1016/j.scitotenv.2021.150468 Pompanon, F., Deagle, B.E., Symondson, W.O.C., Brown, D.S., Jarman, S.N., Taberlet, P., 2012. Who is eating what: diet assessment using next generation sequencing. Mol. Ecol. 21, 1931–1950. doi: 10.1111/j.1365-294X.2011.05403.x
  • Quail MA, Smith M, Coupland P, Otto TD, Harris SR, Connor TR, Bertoni A, Swerdlow HP, Gu Y (2012) A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers. BMC Genomics 13, 341. doi: 10.1186/1471-2164-13-341
  • Raclariu AC, Heinrich M, Ichim MC, de Boer H (2018) Benefits and limitations of DNA barcoding and metabarcoding in herbal product authentication. Phytochem. Anal. 29, 123–128. doi: 10.1002/pca.2732
  • Raclariu AC, Paltinean R, Vlase L, Labarre A, Manzanilla V, Ichim MC, Crisan G, Brysting AK, de Boer H (2017) Comparative authentication of Hypericum perforatum herbal products using DNA metabarcoding, TLC and HPLC-MS. Sci. Rep. 7, 1291. doi: 10.1038/s41598-017-01389-w
  • Reese AT, Kartzinel TR, Petrone BL, Turnbaugh PJ, Pringle RM, David LA (2019) Using DNA metabarcoding to evaluate the plant component of human diets: a proof of concept. mSystems 4. doi: 10.1128/mSystems.00458-19
  • Riaz T, Shehzad W, Viari A, Pompanon F, Taberlet P, Coissac E (2011) ecoPrimers: inference of new DNA barcode markers from whole genome sequence analysis. Nucleic Acids Res. 39, e145. doi: 10.1093/nar/gkr732
  • Schloss PD, Westcott SL, Ryabin T, Hall JR, Hartmann M, Hollister EB, Lesniewski RA, Oakley BB, Parks DH, Robinson CJ, Sahl JW, Stres B, Thallinger GG, Van Horn DJ, Weber CF (2009) Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl. Environ. Microbiol. 75, 7537–7541. doi: 10.1128/AEM.01541-09
  • Schnell IB, Bohmann K, Gilbert MTP (2015) Tag jumps illuminated--reducing sequence-to-sample misidentifications in metabarcoding studies. Mol. Ecol. Resour. 15, 1289–1303. doi: 10.1111/1755-0998.12402
  • Shin S, Kim Y, Chul Oh S, Yu N, Lee S-T, Rak Choi J, Lee K-A (2017) Validation and optimization of the Ion Torrent S5 XL sequencer and Oncomine workflow for BRCA1 and BRCA2 genetic testing. Oncotarget 8, 34858–34866. doi: 10.18632/oncotarget.16799
  • Sigsgaard EE, Nielsen IB, Carl H, Krag MA, Knudsen SW, Xing Y, Holm-Hansen TH, Møller PR, Thomsen PF (2017) Seawater environmental DNA reflects seasonality of a coastal fish community. Mar. Biol. 164, 128. doi: 10.1007/s00227-017-3147-4
  • Smith DP, Peay KG (2014) Sequence depth, not PCR replication, improves ecological inference from next generation DNA sequencing. PLoS ONE 9, e90234. doi: 10.1371/journal.pone.0090234
  • Smucker NJ, Pilgrim EM, Nietch CT, Darling JA, Johnson BR (2020) DNA metabarcoding effectively quantifies diatom responses to nutrients in streams. Ecol. Appl. 30, e02205. doi: 10.1002/eap.2205
  • Sønstebø JH, Gielly L, Brysting AK, Elven R, Edwards M, Haile J, Willerslev E, Coissac E, Rioux D, Sannier J, Taberlet P, Brochmann C (2010) Using next-generation sequencing for molecular reconstruction of past Arctic vegetation and climate. Mol. Ecol. Resour. 10, 1009–1018. doi: 10.1111/j.1755-0998.2010.02855.x
  • Steinegger M, Salzberg SL (2020) Terminating contamination: large-scale search identifies more than 2,000,000 contaminated entries in GenBank. Genome Biol. 21, 115. doi: 10.1186/s13059-020-02023-1
  • Swift JF, Lance RF, Guan X, Britzke ER, Lindsay DL, Edwards CE (2018) Multifaceted DNA metabarcoding: Validation of a noninvasive, next-generation approach to studying bat populations. Evol. Appl. 11, 1120–1138. doi: 10.1111/eva.12644
  • Taberlet P, Bonin A, Zinger L, Coissac E (2018) Environmental DNA, Oxford Scholarship Online. Oxford University Press. doi: 10.1093/oso/9780198767220.001.0001
  • Taberlet P, Coissac E, Pompanon F, Brochmann C, Willerslev E (2012) Towards next-generation biodiversity assessment using DNA metabarcoding. Mol. Ecol. 21, 2045–2050. doi: 10.1111/j.1365-294X.2012.05470.x
  • Timpano EK, Scheible MKR, Meiklejohn KA (2020) Optimization of the second internal transcribed spacer (ITS2) for characterizing land plants from soil. PLoS ONE 15, e0231436. doi: 10.1371/journal.pone.0231436
  • Turon X, Antich A, Palacín C, Præbel K, Wangensteen OS (2019) From metabarcoding to metaphylogeography: separating the wheat from the chaff. BioRxiv. doi: 10.1101/629535
  • Valentini A, Miquel C, Nawaz MA, Bellemain E, Coissac E, Pompanon F, Gielly L, Cruaud C, Nascetti G, Wincker P, Swenson JE, Taberlet P (2009a) New perspectives in diet analysis based on DNA barcoding and parallel pyrosequencing: the trnL approach. Mol. Ecol. Resour. 9, 51–60. doi: 10.1111/j.1755-0998.2008.02352.x
  • Valentini A, Pompanon F, Taberlet P (2009b) DNA barcoding for ecologists. Trends Ecol. Evol. 24, 110–117. doi: 10.1016/j.tree.2008.09.011
  • Weigand H, Beermann AJ, Čiampor F, Costa FO, Csabai Z, Duarte S, Geiger MF, Grabowski M, Rimet F, Rulik B, Strand M, Szucsich N, Weigand AM, Willassen E, Wyler SA, Bouchez A, Borja A, Čiamporová-Zaťovičová Z, Ferreira S, Dijkstra K-DB, Ekrem T (2019) DNA barcode reference libraries for the monitoring of aquatic biota in Europe: Gap-analysis and recommendations for future work. Sci. Total Environ. 678, 499–524. doi: 10.1016/j.scitotenv.2019.04.247
  • Willerslev E, Cappellini E, Boomsma W, Nielsen R, Hebsgaard MB, Brand TB, Hofreiter M, Bunce M, Poinar HN, Dahl-Jensen D, Johnsen S, Steffensen JP, Bennike O, Schwenninger J-L, Nathan R, Armitage S, de Hoog C-J, Alfimov V, Christl M, Beer J, Collins MJ (2007) Ancient biomolecules from deep ice cores reveal a forested southern Greenland. Science 317, 111–114. doi: 10.1126/science.1141758
  • Yang C, Bohmann K, Wang X, Cai W, Wales N, Ding Z, Gopalakrishnan S, Yu DW (2020) Biodiversity Soup II: A bulk-sample metabarcoding pipeline emphasizing error reduction. BioRxiv. doi: 10.1101/2020.07.07.187666
  • Yao H, Song J, Liu C, Luo K, Han J, Li Y, Pang X, Xu H, Zhu Y, Xiao P, Chen S (2010) Use of ITS2 region as the universal DNA barcode for plants and animals. PLoS ONE 5. doi: 10.1371/journal.pone.0013102
  • Yoccoz NG, Bråthen KA, Gielly L, Haile J, Edwards ME, Goslar T, Von Stedingk H, Brysting AK, Coissac E, Pompanon F, Sønstebø JH, Miquel C, Valentini A, De Bello F, Chave J, Thuiller W, Wincker P, Cruaud C, Gavory F, Rasmussen M, Taberlet P (2012) DNA from soil mirrors plant taxonomic and growth form diversity. Mol. Ecol. 21, 3647–3655. doi: 10.1111/j.1365-294X.2012.05545.x
  • Yu DW, Ji Y, Emerson BC, Wang X, Ye C, Yang C, Ding Z (2012) Biodiversity soup: metabarcoding of arthropods for rapid biodiversity assessment and biomonitoring. Methods in Ecology and Evolution 3, 613–623. doi: 10.1111/j.2041-210X.2012.00198.x
  • Zarrei M, Talent N, Kuzmina M, Lee J, Lund J, Shipley PR, Stefanović S, Dickinson TA (2015) DNA barcodes from four loci provide poor resolution of taxonomic groups in the genus Crataegus. AoB Plants 7. doi: 10.1093/aobpla/plv045
  • Zimmermann J, Glöckner G, Jahn R, Enke N, Gemeinholzer B (2015) Metabarcoding vs. morphological identification to assess diatom diversity in environmental studies. Mol. Ecol. Resour. 15, 526–542. doi: 10.1111/1755-0998.12336
  • Zinger L, Bonin A, Alsos IG, Bálint M, Bik H, Boyer F, Chariton AA, Creer S, Coissac E, Deagle BE, De Barba M, Dickie IA, Dumbrell AJ, Ficetola GF, Fierer N, Fumagalli L, Gilbert MTP, Jarman S, Jumpponen A, Kauserud H, Taberlet P (2019) DNA metabarcoding-Need for robust experimental designs to draw sound ecological conclusions. Mol. Ecol. 28, 1857–1862. doi: 10.1111/mec.15060


  1. By using equimolar pooling of individual samples.
  2. The highly stable P6 loop can best be targeted in this case, using trnL primers.
  3. By using plant-specific ITS primers that minimise the amplification of fungal DNA.

Chapter 12 Metagenomics


Metagenomics is the study of genetic material recovered directly from environmental samples such as air, water, soil, or sediments (Bashir et al. 2014). It is also referred to as environmental genomics, ecogenomics, or community genomics (Guazzaroni et al. 2009). The DNA found in environmental samples are usually a mixture of genetic materials from multiple organisms. Typically, genomic DNA extracted from environmental samples is shotgun sequenced to identify organisms and/or make metabolic and other protein predictions (Porter and Hajibabaei 2018). Prior to sequencing, DNA molecules are first fragmented into smaller pieces. DNA molecules from samples are first randomly fragmented into size-controlled fragments. These fragments are then subsequently converted into libraries which consist of the DNA fragments attached to adapters specific to the sequencing platform used. Each library is then sequenced using the shotgun sequencing approach, often without targeted PCR amplification (Noonan et al. 2005). This provides a distinct advantage over PCR-based methods by enabling a less biassed investigation of a community, and the detection of all genes in a sample.

History of metagenomics

The term ‘metagenome’ was first coined in 1998 by Handelsman et al. (Handelsman et al. 1998). Their approach involved cloning environmental DNA extracted from soil into E. coli vectors, and screening the phenotypes for functional analysis of the soil microbiome. Earlier studies employed cloning techniques from environmental samples, although the term ‘metagenome’ had not been in use yet. For example, Stein et al. used the DNA extracted directly from seawater to investigate novel metabolisms in the marine Archaea clade Crenarchaeota (Stein et al. 1996). Similarly, Healy et al. cloned gene libraries obtained from thermophilic anaerobic microbes to discover new enzymes for biotechnological applications (Healy et al. 1995). These earlier metagenomic studies employed a functional approach by cloning genes into vectors and screening for biochemical functions, which is now more commonly referred to as functional metagenomics.

With the development of high-throughput sequencing (HTS) technologies, the need for cloning to increase the amount of starting material was eliminated. An early study recovered the first near-complete genomes of five dominant members of a natural acidophilic biofilm using an insert plasmid library and shotgun sequencing (Tyson et al. 2004). The first application of HTS to capture all representative sequences from an environmental sample was led by Venter et al. (Venter et al. 2004) in the same year. They applied shotgun sequencing to water samples collected from the Sargasso Sea, demonstrating the potential of the method to reveal the composition and function of a diverse group of microbial organisms.

The immense amount of data collected by these methods introduced challenges in data analysis, resulting in several innovations in comparative metagenomics such as clustering orthologs (Tyson et al. 2004; Yooseph et al. 2007), use of GC content to distinguish genomes (Tyson et al. 2004), and single-copy genes to check for completeness and genome size predictions (Ciccarelli et al. 2006; Raes et al. 2007; Turnbaugh et al. 2007). With these innovations and an increasing number of available reference genomes, the application of metagenomics has expanded outside its traditional use in microbial research.

Suitable samples types

Similar to metabarcoding, substrates that can be used for metagenomics in plant identification include environmental samples, fragmented template materials (i.e., dental calculus and faeces) (Weyrich et al. 2017), mixed food templates (i.e., herbal medicines, protein powder), and complex samples like honey (Bovo et al. 2018). Soil samples in particular are promising for metagenomics, as the method can also provide insight into the root microbiomes of plants (Molina-Montenegro et al. 2019; Simões et al. 2015). However, metagenomic sequencing of environmental samples can be challenging as the starting material is a mixture of DNA from viral, bacterial, archeal, and eukaryotic species. Typically, the abundance of those species is also different within a sample, complicating downstream data analysis. Although it depends on the study aim, a sample with unequal abundances can be more problematic as the reads cannot be easily assembled into longer contigs (reads coming from different organisms that do not overlap). This results in a lower probability that the correct taxonomic or functional annotations are assigned (Ayling et al. 2020). To reduce the complexity of the sample and ensure that enough target DNA is obtained, fractionation, size selection, selective lysis, or enrichment can be performed (Teeling and Glöckner 2012; Thomas et al. 2012). The amount of DNA obtained from certain types of samples can be very small depending on the degradation or the amount of starting material. As library preparation protocols require a certain amount of DNA, a prior whole-genome amplification or concentration step might be desirable. Amplification can however introduce biases for metagenomic community analysis, so one should consider whether it is necessary. Another problem associated with environmental samples is the presence of inhibitors such as humic acids present in soil, but this issue has been addressed extensively where protocols have been developed to remove such inhibitors (Delmont et al. 2011).

Uses of metagenomics

Several promising applications exist for plant-related metagenomics as compared to conventional targeted genomic approaches. Dietary studies are one such application. While dietary studies have been revolutionised by conventional metabarcoding (see Chapter 11 Amplicon metabarcoding; Kartzinel et al. 2015; Moorhouse-Gann et al. 2018; Soininen et al. 2009), they can benefit from the additional sequences and genes that metagenomics provides. In addition to plant identification in diet studies, the sequenced data can also be used to simultaneously identify and genotype the host, categorise the gut microbiome, and detect parasites (Srivathsan et al. 2016). Besides dietary studies, metagenomics has also been applied for plant authentication in herbal medicines (see Chapter 22 Healthcare; Xin et al. 2018), for detection of contaminants in the food supply chain (see Chapter 23 Food safety; Haiminen et al. 2019), in palaeobotany to characterise historical environments (see Chapter 21 Palaeobotany; Pedersen et al. 2016; Stahlschmidt et al. 2019), and to describe the shifts in ecosystems with environmental change (Parducci et al. 2019). Furthermore, the ecological information collected from various samples can be applied in conservation management (see Chapter 24 Environment and biodiversity assessments).

Similar to metabarcoding (see Chapter 11 Amplicon metabarcoding), metagenomics can potentially be used to reconstruct plant compositions from bulk arthropods samples, and to solve crimes in forensic genetics (see Chapter 26 Forensic genetics, botany, and palynology), especially by uncovering taxa that are not normally amplified in metabarcoding studies. It can also potentially be applied to plant resources for the retrieval of plant population genetic information from mixed templates (which has already been shown in mammals; Srivathsan et al. 2016, 2015). Additionally, metagenomics can also be applied in water quality studies, through both quantitative and qualitative assessment of diatoms present in water bodies (Chessman et al. 2007).

Advantages and limitations

Metagenomics is an untargeted method that captures all genetic material in a sample, which is advantageous over targeted methods as no prior knowledge of the taxa and their genes is required (Pedersen et al. 2015; Quince et al. 2017). These data can be used to identify a wide range of different taxonomic groups, including bacteria, archaea, and eukaryotes (Bovo et al. 2018; Stat et al. 2017). Furthermore, metagenomics avoids biases that can be introduced during marker amplification and thus can provide a more reliable abundance estimate compared to metabarcoding (Ziesemer et al. 2015). Metagenomics can also be used to extract information from degraded material since long templates are not necessary (Parducci et al. 2019; Pedersen et al. 2016). Finally, there is the opportunity to use the metagenomic data for alternative types of analyses, such as genomic reconstruction or gene discovery (Molina-Montenegro et al. 2019; Quince et al. 2017). A highly significant advantage of metagenomics is that it can also be used in functional ecology, where gene expression can be studied (Mackelprang et al. 2011).

Metagenomics does, however, come with some disadvantages that need to be considered. The main downside is the taxonomic inefficiency of the method. Sequenced material can originate from any part of the genome, but full nuclear genome references for most species are currently lacking. Thus, only a small proportion of species can currently be identified (Chua et al. 2021; Parducci et al. 2019; Srivathsan et al. 2016; Stat et al. 2017). This problem is exacerbated for multicellular organisms, which have a lower abundance compared to microbes in an environmental sample (Azam and Malfatti 2007) and can therefore have a smaller proportion of reads (Stat et al. 2017). However, with the sequencing of whole genomes currently underway for more plant species, issues related to unavailable reference data are becoming less problematic (Alsos et al. 2020; Chua et al. 2021; Li et al. 2019; Nevill et al. 2020). Furthermore, the low number of assigned metagenomic reads can be addressed by increasing the sequencing depth, though at an increased cost.

The process of metagenomics

Step by step laboratory workflow

DNA fragmentation

DNA fragmentation is an essential step in the metagenomic workflow, and the size of the DNA fragments required depends on the sequencing platform used. Broadly speaking, there are two methods for DNA fragmentation to obtain size-controlled DNA fragments: enzyme-based and mechanical. Each method has its associated advantages and disadvantages (Li et al. 2017). Enzyme-based methods generally use transposons, restriction enzymes, or nicking enzymes to fragment the DNA (Anderson 1981; Hoheisel et al. 1989; Seed et al. 1982; Wong et al. 1997). Although these enzyme-based methods are precise and efficient for fragmenting DNA, the fragments are not randomly fragmented, and enzymatic digestion is less efficient for DNA with high GC content (Kasoji et al. 2015; Thorstenson et al. 1998). Mechanical methods typically use sonication (Deininger 1983; Kasoji et al. 2015; Tseng et al. 2012), nebulisation (Lentz et al. 2005; Sambrook and Russell 2006), or hydrodynamic shearing (Joneja and Huang 2009; Shui et al. 2011). These methods provide better random fragmentation with increased size and distribution control as compared to enzyme-based fragmentation methods (Hengen 1997). However, while sonication is efficient and easy to use, it can cause breaks within AT-rich regions, resulting in damaged DNA fragments that cannot be sequenced (Hengen 1997). Nebulisation is a fast method for DNA fragmentation, but the DNA fragments have a wider size range and require expensive equipment. Hydrodynamic shearing produces short DNA fragments with less damage, and with a narrow size distribution. However, this method requires complex machinery and trained users (Hengen 1997; Shui et al. 2011). The choice of method for DNA fragmentation thus depends on the final fragment size required, the choice of sequencing platform, the amount of input DNA, funding, and scalability. The most important consideration is that the method must sufficiently randomly fragment the DNA so that the sequencing libraries will fully represent the starting DNA template.

Library preparation

Library preparation is another important step in the metagenomics workflow as it can affect the results of the sequencing output. The addition of adapters to the ends of DNA fragments lets it bind to the sequencing flow cell, which allows for the identification of the reads (DeWitt 2019). There are two types of library preparation: ligation-based and tagmentation. In ligation-based library preparation, DNA fragmentation and adapter ligation occur in two separate steps. Library preparation usually entails the use of double-stranded fragmented DNA as input, end-repair, 5’ end phosphorylation and A-tailing of 3’ end, adapters ligation, and PCR enrichment of adapters-ligated DNA fragments (optional) (Carøe et al. 2017; Head et al. 2014). For this method, an optimal adapter to fragment ratio (~10:1) has to be calculated as an excess of adapters may lead to the formation of adapter-dimers that can be over-amplified in the PCR step. Depending on the amount of starting DNA, amplification-free library building can be carried out if enough DNA material can be extracted (~250 ng) (Genohub 2018). The more starting material there is, the less amplification is required. In tagmentation library preparation, such as the Nextera DNA Sample Prep Kit (Illumina), DNA fragmentation and adapter ligation occur together in one reaction (Hennig et al. 2018). Libraries are prepared using a transposase enzyme which simultaneously fragments and adds adapters to the DNA (Adey et al. 2010). The first step is the tagmentation reaction, where the transposase enzyme cleaves and tags the input double-stranded unfragmented DNA with a universal overhang. This tagmentation step determines the success of the library preparation, and successful tagmentation is highly sensitive towards and dependent on the amount of input DNA (< 1 ng overtagmentation, > 1 ng undertagmentation) (Illumina 2015). This method is also sensitive to temperature and reaction time.

Sequencing approaches and platforms

DNA sequencing has gradually shifted from Sanger to HTS technologies in the last decades. These new sequencing technologies can provide much higher yields of reads at a much lower cost (see Chapter 9 Sequencing platforms and data types). Initially, 454/Roche pyrosequencing (discontinued) was the most widely used platform (Edwards et al. 2006; Thomas et al. 2012). However, the generation of artificial replicate reads and systematic homopolymer errors limits its use for metagenomic applications. Illumina sequencing offers short read lengths up to 300 bp (paired-end), generates high output (up to 1.5 billion bp per run) and high accuracy (error rates < 1%). Given the platform’s wide availability, it became the dominant choice for shotgun metagenomics. Out of the available Illumina platforms, only the MiSeq provides 300 bp read lengths, though the total output is somewhat lower, making it more suitable for single marker surveys. Illumina HiSeq 2500/4000, NextSeq, and NovaSeq all produce higher outputs and are well suited for metagenomic applications (Quince et al. 2017).

Figure 1.

Chapter 12 Infographic: Visual representation of the content of this chapter.

Short reads are bioinformatically challenging for metagenomic assembly because genes and chromosomal regions can be difficult to span, especially if they are long or composed of repetitive elements. Certain protocols have been developed to overcome such challenges (e.g., assembly after binning and taxonomic assignment), but long-read sequencing technologies offer excellent alternatives for metagenomics. PacBio and Oxford Nanopore technologies offer longer read lengths but can be accompanied by higher error rates and higher costs. In contrast to the other platforms which introduce inherent systematic errors (e.g., homopolymer regions, index hopping), errors in these platforms are mostly random, which might be overcome with technological improvements (Teeling and Glöckner 2012). Additionally, they provide read lengths long enough to span multiple genes making them a promising alternative for metagenomics.

The exact number of reads required to effectively characterise a sample using metagenomics will be highly variable, and as such, no one number for the total number of reads required can be given universally. In principle, the total number of species in the sample, the genome sizes, and the relative abundance of each species should be known to make such an estimation. As a rule of thumb, it is suggested to maximise the output to capture as many reads as possible from the rare members of the community (Quince et al. 2017).

Bioinformatics strategies

There are currently two main strategies to identify the contents of a metagenomic sample: identification of individual reads by alignment to a reference, or by assembling the reads into longer contigs prior to identification.


The most straightforward method for identification is by aligning the reads to a known reference dataset. BLAST and related tools such as MegaBLAST (Zhang et al. 2000) are commonly used for this alignment and identification of reads. Though accurate, these tools are computationally inefficient and do not scale well with increasing sizes of current metagenomic datasets and reference databases (Ye et al. 2019).

Two alternative approaches aim to speed up the identification of metagenomic datasets. These either use more compressed reference databases in combination with more efficient aligners or rely on exact alignments of k-mers between the reads and the reference (Ye et al. 2019). The Burrows-Wheeler transform (BWT) in combination with FM indexes is a good way to compress references and speed up alignments. These techniques are common for genomic mapping programs such as BWA (Li and Durbin 2009) and bowtie (Langmead and Salzberg 2012), but have also been applied in metagenomic tools such as Centrifuge (Kim et al. 2016) and MetaPhlAn (Truong et al. 2015). The alternative k-mer method uses smaller subsequences of k-length that are extracted from the metagenomic reads. The read k-mers can be directly compared to a set of k-mers from the reference database, which simplifies and speeds up the identifications. Metagenomic tools such as Kraken (Wood and Salzberg 2014) or CLARK (Ounit et al. 2015) use k-mer matching for their identifications. Both strategies are substantially faster than their traditional alignment counterparts (Ye et al. 2019), but the various tools differ from each other in terms of memory requirements, speed, and additional features. BWT-based aligners generally require less memory but are marginally slower than the more memory demanding k-mer aligners. The results, regardless of the method, are a set of identifications to the Last Common Ancestor (LCA), to account for conserved or homologous sequences, after which the results can be explored in tools such as MEGAN (Huson et al. 2007).


Assembly methods attempt to generate longer contigs before downstream analysis. These longer contigs can be used for gene identifications (Quince et al. 2017) or can result in better identifications compared to shorter reads (Vestergaard et al. 2017). De Bruijn graphs are commonly used to generate de novo contigs from genomic data (Zerbino and Birney 2008). First, a de novo assembler constructs a graph of all overlapping k-mers, which are obtained from the read data. The assembler then attempts to find a path through the graph that corresponds to contigs. Metagenomic datasets can be problematic for de Bruijn graph-based assemblers, where a large pool of (possibly) closely related species with uneven coverage between taxa can result in fragmented or incorrect contigs (Quince et al. 2017; Chua et al. 2022). Dedicated metagenomic assemblers such as MetaSpades (Nurk et al. 2017), IDBA-UD (Peng et al. 2012), and MEGAHIT (Li et al. 2015) attempt to overcome these problems. These tools construct multiple graphs at different k-mer lengths to resolve the aforementioned issues. Graphs based on smaller k-mer sizes can be beneficial for the assembly of low-abundance taxa (Quince et al. 2017), while those constructed out of larger k-mers can bridge gaps and yield longer contigs for more abundant species (Peng et al. 2012). These methods have proven to be useful for microbial datasets (Pasolli et al. 2019; Qin et al. 2010), but their application in the assembly of eukaryotic genomes can be problematic given their low abundance in environmental samples and because they have more complex genomes (Azam and Malfatti 2007).

Bioinformatic summary

Each bioinformatic strategy has its pros and cons, and the decision about which strategy to use depends on the starting material available as well as the research questions to aim to be answered. The alignment method works well when there is ample reference material available for the taxa of interest, when working with older and more fragmented material, or when the target taxa are sparse in a sample. The assembly method on the other hand performs best when there is abundant material available, which is often not the case for environmental datasets.

Future of metagenomics

As sequencing costs continue to significantly decrease, bioinformatics pipelines are optimised, and more comprehensive DNA reference libraries are available (Alsos et al. 2020; Chua et al. 2021; Li et al. 2019; Nevill et al. 2020), we can expect that the number of metagenomics studies will increase. This is because metagenomics can provide better taxonomic resolution over PCR-based methods as longer and larger reference data can be utilised. The ability to retrieve almost all the DNA content present in samples without targeted enrichment or any prior knowledge of the dataset can potentially make metagenomics a powerful tool for biomonitoring, where large amounts of ecological data are often required from minute samples. Metagenomics can also simultaneously characterise the entire microbiome and infer functional information (Chua and Rasmussen 2022), taking it beyond metabarcoding by allowing for more biological questions to be explored in more detail.


  1. What are the two main steps in the metagenomics laboratory workflow and why are they necessary?
  2. What are the challenges of using short-read sequencing for metagenomics applications and how do you overcome these challenges?
  3. What are problems caused by using environmental samples with unequal abundances in metagenomics applications?


Basic Local Alignment Search Tool (BLAST) – An alignment tool commonly used in conjunction with the NCBI nucleotide reference database for sequence identifications. Different BLAST versions exist for nucleotide or protein alignments.

Binning – Clustering sequences based on their nucleotide composition or similarity to a reference database.

Burrows-Wheeler transform – Data transformation algorithm to make transformed data more compressible.

Community genetics – Study of genetic interactions between species and their environment in complex communities.

Contigs – A longer assembled DNA sequence.

Coverage – The mean number of times a nucleotide is sequenced in a genome.

De Bruijn graphs – A popular method for the de novo assembly of contigs. The graph is built up out of k-mers that overlap, which can be solved to construct contigs.

De novo assembly – The assembly of contigs or genomes from sequenced data without the aid of a reference.

DNA fragmentation – Separating or breaking DNA molecules into smaller fragments.

DNA libraries – DNA libraries are a collection of DNA fragments with specific sequencing-platform adapters ligated to both ends.

Ecogenomics – Study of the influence of environmental factors on the genome.

Environmental genomics – Prediction of organism responses at the genetic level.

FM-index – A compressed data structure for full-text pattern searching based on the Burrows-Wheeler transform.

Functional metagenomics – Study of gene functions from DNA extracted from mixed communities.

Hydrodynamic shearing – Fragmentation of DNA molecules by forcing them through a small tube or small gauge needle at high velocity.

K-mer – A short subsequence of length k that is generated from longer sequencing reads. The shorter k-mers allow for faster alignments and assemblies.

Last Common Ancestor (LCA) – A point on the tree of life from which a set of taxa are descended.

MegaBLAST – A faster, though less accurate, version of the BLAST tool.

Metagenome – All genetic material found in an environmental sample. It contains the genomes of many different organisms.

Nebulisation – Process of breaking DNA molecules into small fragments by passing DNA solution into a nebuliser unit, resulting in a fine mist that is collected.

Orthologs – Genes in different species that evolved from a common ancestral gene.

Paired-end sequencing – Sequencing of a DNA fragment from both ends. Both sequences can either be merged into a single larger fragment, if overlap is present, or kept separate.

Read – A DNA sequence generated by a sequencer.

Shotgun sequencing – A technique that randomly fragments DNA and then reassembles the fragments by searching for overlapping regions.

Sonication – Application of sound energy to break up DNA strands into smaller fragments.


  • Adey A, Morrison HG, Asan Xun X, Kitzman JO, Turner EH, Stackhouse B, MacKenzie AP, Caruccio NC, Zhang X, Shendure J (2010) Rapid, low-input, low-bias construction of shotgun fragment libraries by high-density in vitro transposition. Genome Biol. 11, R119. doi: 10.1186/gb-2010-11-12-r119
  • Alsos IG, Lavergne S, Merkel MKF, Boleda M, Lammers Y, Alberti A, Pouchon C, Denoeud F, Pitelkova I, Pușcaș M, Roquet C, Hurdu B-I, Thuiller W, Zimmermann NE, Hollingsworth PM, Coissac E (2020) The treasure vault can be opened: large-scale genome skimming works well using herbarium and silica gel dried material. Plants 9, 432. doi: 10.3390/plants9040432
  • Anderson S (1981) Shotgun DNA sequencing using cloned DNase I-generated fragments. Nucleic Acids Res. 9, 3015–3027. doi: 10.1093/nar/9.13.3015
  • Ayling M, Clark MD, Leggett RM (2020) New approaches for metagenome assembly with short reads. Brief. Bioinformatics 21, 584–594. doi: 10.1093/bib/bbz020
  • Azam F, Malfatti F (2007) Microbial structuring of marine ecosystems. Nat. Rev. Microbiol. 5, 782–791. doi: 10.1038/nrmicro1747
  • Bashir Y, Pradeep Singh S, Kumar Konwar B (2014) Metagenomics: an application based perspective. Chinese Journal of Biology 2014, 1–7. doi: 10.1155/2014/146030
  • Bovo S, Ribani A, Utzeri VJ, Schiavo G, Bertolini F, Fontanesi L (2018) Shotgun metagenomics of honey DNA: Evaluation of a methodological approach to describe a multi-kingdom honey bee derived environmental DNA signature. PLoS ONE 13, e0205575. doi: 10.1371/journal.pone.0205575
  • Carøe C, Gopalakrishnan S, Vinner L, Mak SST, Sinding MHS, Samaniego JA, Wales N, Sicheritz-Pontén T, Gilbert MTP (2017) Single-tube library preparation for degraded DNA. Methods Ecol. Evol. doi: 10.1111/2041-210X.12871
  • Chessman BC, Bate N, Gell PA, Newall P (2007) A diatom species index for bioassessment of Australian rivers. Mar. Freshwater Res. 58, 542. doi: 10.1071/MF06220
  • Chua PYS, Leerhøi F, Langkjær EMR, Noer CL, Richter SR, Marlene E, Margaryan A, Gilbert MTP, Coissac E, Alsos IG, Boessenkool S, Bohmann K (2021) Towards the extended barcode concept: Generating DNA reference data through genome skimming of Danish plants. BioRxiv. doi: 10.1101/2021.08.11.456029
  • Chua PYS, Rasmussen JA (2022) Taking metagenomics under the wings. Nat. Rev. Microbiol. doi: 10.1038/s41579-022-00746-5
  • Chua PYS, Carøe C, Crampton-Platt A, Reyes-Avila C S, Jones G, Streicker D, Bohmann K (2022) A two-step metagenomics approach for the identification and mitochondrial DNA contig assembly of vertebrate prey from the blood meals of common vampire bats (Desmodus rotundus). Metabarcoding Metagenom. 6:e78756. doi: 10.3897/mbmg.6.78756
  • Ciccarelli FD, Doerks T, von Mering C, Creevey CJ, Snel B, Bork P (2006) Toward automatic reconstruction of a highly resolved tree of life. Science 311, 1283–1287. doi: 10.1126/science.1123061
  • Deininger PL (1983) Random subcloning of sonicated DNA: application to shotgun DNA sequence analysis. Anal. Biochem. 129, 216–223. doi: 10.1016/0003-2697(83)90072-6
  • Delmont TO, Robe P, Clark I, Simonet P, Vogel TM (2011) Metagenomic comparison of direct and indirect soil DNA extraction approaches. J. Microbiol. Methods 86, 397–400. doi: 10.1016/j.mimet.2011.06.013
  • DeWitt J (2019) A simple library prep workflow for many sequencing applications. IDT.
  • Edwards RA, Rodriguez-Brito B, Wegley L, Haynes M, Breitbart M, Peterson DM, Saar MO, Alexander S, Alexander EC, Rohwer F (2006) Using pyrosequencing to shed light on deep mine microbial ecology. BMC Genomics 7, 57. doi: 10.1186/1471-2164-7-57
  • Genohub (2018) Metagenomics sequencing guide. Genohub.
  • Guazzaroni M-E, Beloqui A, Golyshin PN, Ferrer M (2009) Metagenomics as a new technological tool to gain scientific knowledge. World J. Microbiol. Biotechnol. 25, 945–954. doi: 10.1007/s11274-009-9971-z
  • Haiminen N, Edlund S, Chambliss D, Kunitomi M, Weimer BC, Ganesan B, Baker R, Markwell P, Davis M, Huang BC, Kong N, Prill RJ, Marlowe CH, Quintanar A, Pierre S, Dubois G, Kaufman JH, Parida L, Beck KL (2019) Food authentication from shotgun sequencing reads with an application on high protein powders. npj Sci. Food 3, 24. doi: 10.1038/s41538-019-0056-6
  • Handelsman J, Rondon MR, Brady SF, Clardy J, Goodman RM (1998) Molecular biological access to the chemistry of unknown soil microbes: a new frontier for natural products. Chem. Biol. 5, R245-9. doi: 10.1016/s1074-5521(98)90108-9
  • Head SR, Komori HK, LaMere SA, Whisenant T, Van Nieuwerburgh F, Salomon DR, Ordoukhanian P (2014) Library construction for next-generation sequencing: overviews and challenges. BioTechniques 56, 61–4, 66, 68, passim. doi: 10.2144/000114133
  • Healy FG, Ray RM, Aldrich HC, Wilkie AC, Ingram LO, Shanmugam KT (1995) Direct isolation of functional genes encoding cellulases from the microbial consortia in a thermophilic, anaerobic digester maintained on lignocellulose. Appl. Microbiol. Biotechnol. 43, 667–674. doi: 10.1007/BF00164771
  • Hengen PN (1997) Shearing DNA for genomic library construction. Trends Biochem. Sci. 22, 273–274. doi: 10.1016/s0968-0004(97)01080-3
  • Hennig BP, Velten L, Racke I, Tu CS, Thoms M, Rybin V, Besir H, Remans K, Steinmetz LM (2018) Large-scale low-cost NGS library preparation using a robust Tn5 purification and tagmentation protocol. G3 (Bethesda) 8, 79–89. doi: 10.1534/g3.117.300257
  • Hoheisel JD, Nizetic D, Lehrach H (1989) Control of partial digestion combining the enzymes dam methylase and MboI. Nucleic Acids Res. 17, 9571–9582. doi: 10.1093/nar/17.23.9571
  • Huson DH, Auch AF, Qi J, Schuster SC (2007) MEGAN analysis of metagenomic data. Genome Res. 17, 377–386. doi: 10.1101/gr.5969107
  • Illumina (2015) Nextera XT library prep: tips and troubleshooting. Illumina.
  • Joneja A, Huang X (2009) A device for automated hydrodynamic shearing of genomic DNA. BioTechniques 46, 553–556. doi: 10.2144/000113123
  • Kartzinel TR, Chen PA, Coverdale TC, Erickson DL, Kress WJ, Kuzmina ML, Rubenstein DI, Wang W, Pringle RM (2015) DNA metabarcoding illuminates dietary niche partitioning by African large herbivores. Proc Natl Acad Sci USA 112, 8019–8024. doi: 10.1073/pnas.1503283112
  • Kasoji SK, Pattenden SG, Malc EP, Jayakody CN, Tsuruta JK, Mieczkowski PA, Janzen WP, Dayton PA (2015) Cavitation enhancing nanodroplets mediate efficient DNA fragmentation in a bench top ultrasonic water bath. PLoS ONE 10, e0133014. doi: 10.1371/journal.pone.0133014
  • Kim D, Song L, Breitwieser FP, Salzberg SL (2016) Centrifuge: rapid and sensitive classification of metagenomic sequences. Genome Res. 26, 1721–1729. doi: 10.1101/gr.210641.116
  • Langmead B, Salzberg SL (2012) Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359. doi: 10.1038/nmeth.1923
  • Lentz YK, Worden LR, Anchordoquy TJ, Lengsfeld CS (2005) Effect of jet nebulization on DNA: identifying the dominant degradation mechanism and mitigation methods. J. Aerosol Sci. 36, 973–990. doi: 10.1016/j.jaerosci.2004.11.017
  • Li D, Liu C-M, Luo R, Sadakane K, Lam T-W (2015) MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics 31, 1674–1676. doi: 10.1093/bioinformatics/btv033
  • Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760. doi: 10.1093/bioinformatics/btp324
  • Li H-T, Yi T-S, Gao L-M, Ma P-F, Zhang T, Yang J-B, Gitzendanner MA, Fritsch PW, Cai J, Luo Y, Wang H, van der Bank M, Zhang S-D, Wang Q-F, Wang J, Zhang Z-R, Fu C-N, Yang J, Hollingsworth PM, Chase MW, Li D-Z (2019) Origin of angiosperms and the puzzle of the Jurassic gap. Nat. Plants 5, 461–470. doi: 10.1038/s41477-019-0421-0
  • Li L, Jin M, Sun C, Wang X, Xie S, Zhou G, van den Berg A, Eijkel JCT, Shui L (2017) High efficiency hydrodynamic DNA fragmentation in a bubbling system. Sci. Rep. 7, 40745. doi: 10.1038/srep40745
  • Mackelprang R, Waldrop MP, DeAngelis KM, David MM, Chavarria KL, Blazewicz SJ, Rubin EM, Jansson JK (2011) Metagenomic analysis of a permafrost microbial community reveals a rapid response to thaw. Nature 480, 368–371. doi: 10.1038/nature10576
  • Molina-Montenegro MA, Ballesteros GI, Castro-Nallar E, Meneses C, Gallardo-Cerda J, Torres-Díaz C (2019) A first insight into the structure and function of rhizosphere microbiota in Antarctic plants using shotgun metagenomic. Polar Biol. 42, 1825–1835. doi: 10.1007/s00300-019-02556-7
  • Moorhouse-Gann RJ, Dunn JC, de Vere N, Goder M, Cole N, Hipperson H, Symondson WOC (2018) New universal ITS2 primers for high-resolution herbivory analyses using DNA metabarcoding in both tropical and temperate zones. Sci. Rep. 8, 8542. doi: 10.1038/s41598-018-26648-2
  • Nevill PG, Zhong X, Tonti-Filippini J, Byrne M, Hislop M, Thiele K, van Leeuwen S, Boykin LM, Small I (2020) Large scale genome skimming from herbarium material for accurate plant identification and phylogenomics. Plant Methods 16, 1. doi: 10.1186/s13007-019-0534-5
  • Noonan JP, Hofreiter M, Smith D, Priest JR, Rohland N, Rabeder G, Krause J, Detter JC, Pääbo S, Rubin EM (2005) Genomic sequencing of Pleistocene cave bears. Science 309, 597–599. doi: 10.1126/science.1113485
  • Nurk S, Meleshko D, Korobeynikov A, Pevzner PA (2017) metaSPAdes: a new versatile metagenomic assembler. Genome Res. 27, 824–834. doi: 10.1101/gr.213959.116
  • Ounit R, Wanamaker S, Close TJ, Lonardi S (2015) CLARK: fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers. BMC Genomics 16, 236. doi: 10.1186/s12864-015-1419-2
  • Parducci L, Alsos IG, Unneberg P, Pedersen MW, Han L, Lammers Y, Salonen JS, Väliranta MM, Slotte T, Wohlfarth B (2019) Shotgun environmental DNA, pollen, and macrofossil analysis of lateglacial lake sediments from southern Sweden. Front. Ecol. Evol. 7. doi: 10.3389/fevo.2019.00189
  • Pasolli E, Asnicar F, Manara S, Zolfo M, Karcher N, Armanini F, Beghini F, Manghi P, Tett A, Ghensi P, Collado MC, Rice BL, DuLong C, Morgan XC, Golden CD, Quince C, Huttenhower C, Segata N (2019) Extensive unexplored human microbiome diversity revealed by over 150,000 genomes from metagenomes spanning age, geography, and lifestyle. Cell 176, 649-662.e20. doi: 10.1016/j.cell.2019.01.001
  • Pedersen MW, Overballe-Petersen S, Ermini L, Sarkissian CD, Haile J, Hellstrom M, Spens J, Thomsen PF, Bohmann K, Cappellini E, Schnell IB, Wales NA, Carøe C, Campos PF, Schmidt AMZ, Gilbert MTP, Hansen AJ, Orlando L, Willerslev E (2015) Ancient and modern environmental DNA. Philos. Trans. R. Soc. Lond. B Biol. Sci. 370, 20130383. doi: 10.1098/rstb.2013.0383
  • Pedersen MW, Ruter A, Schweger C, Friebe H, Staff RA, Kjeldsen KK, Mendoza MLZ, Beaudoin AB, Zutter C, Larsen NK, Potter BA, Nielsen R, Rainville RA, Orlando L, Meltzer DJ, Kjær KH, Willerslev E (2016) Postglacial viability and colonization in North America’s ice-free corridor. Nature 537, 45–49. doi: 10.1038/nature19085
  • Peng Y, Leung HCM, Yiu SM, Chin FYL (2012) IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics 28, 1420–1428. doi: 10.1093/bioinformatics/bts174
  • Porter TM, Hajibabaei M (2018) Scaling up: A guide to high-throughput genomic approaches for biodiversity analysis. Mol. Ecol. 27, 313–338. doi: 10.1111/mec.14478
  • Qin J, Li R, Raes J, Arumugam M, Burgdorf KS, Manichanh C, Nielsen T, Pons N, Levenez F, Yamada T, Mende DR, Li J, Xu J, Li S, Li D, Cao J, Wang B, Liang H, Zheng H, Xie Y, Wang J (2010) A human gut microbial gene catalogue established by metagenomic sequencing. Nature 464, 59–65. doi: 10.1038/nature08821
  • Quince C, Walker AW, Simpson JT, Loman NJ, Segata N (2017) Shotgun metagenomics, from sampling to analysis. Nat. Biotechnol. 35, 833–844. doi: 10.1038/nbt.3935
  • Raes J, Korbel JO, Lercher MJ, von Mering C, Bork P (2007) Prediction of effective genome size in metagenomic samples. Genome Biol. 8, R10. doi: 10.1186/gb-2007-8-1-r10
  • Sambrook J, Russell DW (2006) Fragmentation of DNA by nebulization. CSH Protoc. 2006. doi: 10.1101/pdb.prot4539
  • Seed B, Parker RC, Davidson N (1982) Representation of DNA sequences in recombinant DNA libraries prepared by restriction enzyme partial digestion. Gene 19, 201–209.
  • Shui L, Bomer JG, Jin M, Carlen ET, van den Berg A (2011) Microfluidic DNA fragmentation for on-chip genomic analysis. Nanotechnology 22, 494013. doi: 10.1088/0957-4484/22/49/494013
  • Simões MF, Antunes A, Ottoni CA, Amini MS, Alam I, Alzubaidy H, Mokhtar N-A, Archer JAC, Bajic VB (2015) Soil and rhizosphere associated fungi in gray mangroves (Avicennia marina) from the Red Sea – A metagenomic approach. Genomics Proteomics Bioinformatics 13, 310–320. doi: 10.1016/j.gpb.2015.07.002
  • Soininen EM, Valentini A, Coissac E, Miquel C, Gielly L, Brochmann C, Brysting AK, Sønstebø JH, Ims RA, Yoccoz NG, Taberlet P (2009) Analysing diet of small herbivores: the efficiency of DNA barcoding coupled with high-throughput pyrosequencing for deciphering the composition of complex plant mixtures. Front. Zool. 6, 16. doi: 10.1186/1742-9994-6-16
  • Srivathsan A, Ang A, Vogler AP, Meier R (2016) Fecal metagenomics for the simultaneous assessment of diet, parasites, and population genetics of an understudied primate. Front. Zool. 13, 17. doi: 10.1186/s12983-016-0150-4
  • Srivathsan A, Sha JCM, Vogler AP, Meier R (2015) Comparing the effectiveness of metagenomics and metabarcoding for diet analysis of a leaf-feeding monkey (Pygathrix nemaeus). Mol. Ecol. Resour. 15, 250–261. doi: 10.1111/1755-0998.12302
  • Stahlschmidt MC, Collin TC, Fernandes DM, Bar-Oz G, Belfer-Cohen A, Gao Z, Jakeli N, Matskevich Z, Meshveliani T, Pritchard JK, McDermott F, Pinhasi R (2019) Ancient mammalian and plant DNA from Late Quaternary stalagmite layers at Solkota Cave, Georgia. Sci. Rep. 9, 6628. doi: 10.1038/s41598-019-43147-0
  • Stat M, Huggett MJ, Bernasconi R, DiBattista JD, Berry TE, Newman SJ, Harvey ES, Bunce M (2017) Ecosystem biomonitoring with eDNA: metabarcoding across the tree of life in a tropical marine environment. Sci. Rep. 7, 12240. doi: 10.1038/s41598-017-12501-5
  • Stein JL, Marsh TL, Wu KY, Shizuya H, DeLong EF (1996) Characterization of uncultivated prokaryotes: isolation and analysis of a 40-kilobase-pair genome fragment from a planktonic marine archaeon. J. Bacteriol. 178, 591–599. doi: 10.1128/jb.178.3.591-599.1996
  • Teeling H, Glöckner FO (2012) Current opportunities and challenges in microbial metagenome analysis – A bioinformatic perspective. Brief. Bioinformatics 13, 728–742. doi: 10.1093/bib/bbs039
  • Thomas T, Gilbert J, Meyer F (2012) Metagenomics - a guide from sampling to data analysis. Microb. Inform. Exp. 2, 3. doi: 10.1186/2042-5783-2-3
  • Thorstenson YR, Hunicke-Smith SP, Oefner PJ, Davis RW (1998) An automated hydrodynamic process for controlled, unbiased DNA shearing. Genome Res. 8, 848–855. doi: 10.1101/gr.8.8.848
  • Truong DT, Franzosa EA, Tickle TL, Scholz M, Weingart G, Pasolli E, Tett A, Huttenhower C, Segata N (2015) MetaPhlAn2 for enhanced metagenomic taxonomic profiling. Nat. Methods 12, 902–903. doi: 10.1038/nmeth.3589
  • Tseng Q, Lomonosov AM, Furlong EEM, Merten CA (2012) Fragmentation of DNA in a sub-microliter microfluidic sonication device. Lab Chip 12, 4677–4682. doi: 10.1039/c2lc40595d
  • Turnbaugh PJ, Ley RE, Hamady M, Fraser-Liggett CM, Knight R, Gordon JI (2007) The human microbiome project. Nature 449, 804–810. doi: 10.1038/nature06244
  • Tyson GW, Chapman J, Hugenholtz P, Allen EE, Ram RJ, Richardson PM, Solovyev VV, Rubin EM, Rokhsar DS, Banfield JF (2004) Community structure and metabolism through reconstruction of microbial genomes from the environment. Nature 428, 37–43. doi: 10.1038/nature02340
  • Venter JC, Remington K, Heidelberg JF, Halpern AL, Rusch D, Eisen JA, Wu D, Paulsen I, Nelson KE, Nelson W, Fouts DE, Levy S, Knap AH, Lomas MW, Nealson K, White O, Peterson J, Hoffman J, Parsons R, Baden-Tillson H, Smith HO (2004) Environmental genome shotgun sequencing of the Sargasso Sea. Science 304, 66–74. doi: 10.1126/science.1093857
  • Vestergaard G, Schulz S, Schöler A, Schloter M (2017) Making big data smart – How to use metagenomics to understand soil quality. Biol. Fertil. Soils 53, 479–484. doi: 10.1007/s00374-017-1191-3
  • Weyrich LS, Du