SHERLOCK – an Automatized Analysis of
Molecular Sequence Variation in Species
Communities Using Statistical Tests on
Patristic Tree Distances

Seidel NI; Geiger MF; Kück P

+1 (929) 600-8049

- Feedback
- Signup
- Submit Manuscript

e-Pub

Full Text

Biodiversity Online J

SHERLOCK – an Automatized Analysis of Molecular Sequence Variation in Species Communities Using Statistical Tests on Patristic Tree Distances

Seidel NI, Geiger MF and Kück P*

Leibniz Institute for the Analysis of Biodiversity Change, Germany

*Corresponding author: Patrick Kück, Leibniz Institute for the Analysis of Biodiversity Change, Adenauerallee 160, 53113 Bonn, Germany

Submission: May 10, 2022; Published: May 26, 2022

DOI: 10.31031/BOJ.2022.02.000541

ISSN 2637-7082
Volume2 Issue4

Abstract

Phylogenetic trees are commonly used to gain information on organisms evolutionary relationships based on molecular sequences (e.g. genes, proteins, genomes). In a reconstructed tree, it can be assumed that the additive branch lengths from one sequence to another reflect the amount of evolutionary change between these two sequences. The sum of branch lengths that link two nodes in a tree can be used to calculate the overall so called phylogenetic diversity of a tree, i.e. the total evolutionary change inferred for a set of taxa. With Sherlock, we provide a simple and efficient tool to statistically analyse phylogenetic diversity in sequence data in comparison with a null-model distribution based on randomly drawn sequences of the original data set. SHERLOCK incorporates external alignment and tree reconstruction software, which allows for the first time a fully automatized analysis and visualization of patristic distances on the basis of raw sequence data.

Keywords: Phylogenetics; Phylogenetic diversity; Molecular sequence analysis; Automatized pipeline

Main Method

A phylogenetic tree represents a hypothesis on the evolutionary history and diversity of a set of taxa where branch lengths are estimates of the number of character changes that occurred for a certain branch. A patristic or Phylogenetic Distance (PD) is defined as the sum of the lengths of the branches that link two nodes in a tree. The overall PD of a tree summarizes the total evolutionary change inferred for the set of taxa. Comparing the observed overall PD to a null-model distribution or between trees obtained from different sets of taxa can provide the basis for serving a wide range of research fields: prioritization of conservation areas [1] or target taxa (‘EGDE approach’ [2]), species communities (β diversity, [3]) or trait variation [4]. Local communities often (according to theory) should consist of rather distantly related species in order to reduce competition between closely related species, whereas if relatives share similar environmental tolerances local communities should contain more closely related species [5]. The overall PD of an (ideally multigene) tree represents a proxy for the scale of phenotypic differences expected between any two species of a tree across a large number of traits [6]. Data sets of phylogenetically distantly related species have a high overall PD (normalized for the number of taxa) in comparison with closely related species.

We applied the PD metric in SHERLOCK to characterize species communities (e.g. regional subsets) from within a large data set of species from a large geographic region. Mapping the observed and normalized overall PD of a particular community to a PD nullmodel distribution based on random subsets allows to test whether the taxonomic structure of an individual data set is significantly different from a null-model expectation. The extent of clustering or equipartition of a community is thereby reflected by the total, the mean, and the median branch length of the inferred community tree in proportion to a corresponding nullmodel distribution of random PD’s. Whereas there are tools available for calculating patristic distances from trees in general [7,8], SHERLOCK allows for the first time a fully automatized analysis and visualization of the distribution of branch lengths, incorporating external software for alignment processing [9] and two Maximum likelihood (ML) tree reconstruction methods [10,11]. Statistical tests and result plots are generated with R-ggplot2 [12] and gridExtra [13].

Figure 1: Process steps in SHERLOCK
1) Main process step (MPP), focusing (left to right) on raw data preparation (exclusion of potential gaps), alignment generation, ML tree reconstruction and resolution of polytomies, and patristic tree distance calculation. Different alignment and ML methods are available. Both, original and randomized data, are looped through the MPP.
2) Generation of randomized data replicates of the main pool of original data (P). Sampling conditions of random data follow user specifications about the total number of sampled species (SPNR), species related sequences (SEQNR), and the total number of replicates (REPNR). In the example above, P consists of eight sequences (np = 8; seq1 to seq8) falling under three different species (mp = 3; circled blue (seq3, seq5, seq8), grey (seq4, seq6, seq7), and violet (seq1, seq2)). First, the software checks in advance if P generally satisfies the specifications of SPNR and SEQNR, and aborts the analysis if SPNR > mp or if SEQNR > np. A subpool of sequences (S) is randomly generated from P, whereby the number of drawn species in S (ms) follows the specified number of allowed species (ms = SPNR). The software checks then if the set of randomly drawn species can satisfy the number of sequences in S (ns ≥ SEQNR). Otherwise (ns < SEQNR), S will be rejected and randomly re-sampled until a random set of species satisfies the SEQNR condition. SHERLOCK determines a fix set of ms with ns ≥ SEQNR if ns is < SEQNR in 1000 random re-sampling attempts of ms. Afterwards, the final replicate is randomly generated from S following the SEQNR condition (in our example until SEQNR = 4). This procedure is repeated for each random data until the number of random data is equal the number of defined replicates (REPNR). All random replicates are subsequently forwarded to the MPP chain processes, and the resulting PDs referenced to the PD of the original data.
3) Graphical outputs are:
i. histograms for all original data partitions, plotting the original PD against its corresponding random null distribution.
ii. Separate boxplots of the random data PD’s according to SEQNR and SPNR.

SHERLOCK reads sequence data of different species communities in fasta format. Process settings for each analysis (number of random replicates, sequence composition of the nullmodel distribution (i.e., number of entities/taxa and specimens/ sequences) and requested alignment and tree reconstruction methods) have to be defined by a text file. The null-model distribution sampling is specified by the number of entities/taxa and number of DNA sequences to be drawn from a main sequence pool, containing all sequences from coherent species communities underlying sub pools (identical sequences of the same taxon are sampled only once). As main output, SHERLOCK prints a histogram, a density, and a violin plot of PD measures of each community analysis. A more detailed list of actually identified and randomly expected PD values of analyzed communities, including 0.975 and 0.025-quantile limits, are printed as separate text files. An additional off-range file lists if the identified PD of a community is significant different from the PD distribution of randomly drawn sequences. SHERLOCK identifies, excludes, and lists all redundant sequence names of given input data in advance of the analysis. A workflow of SHERLOCK’s main process steps is shown in Figure 1. SHERLOCK is written in Perl, open source, and usable as a command line application on Linux systems. We provide a comprehensive manual describing all process steps, software implementations, script commands and input/result files of an exemplary PD analysis. SHERLOCK, the manual, and all additional files are free downloadable at GitHub: https://github.com/NathanSeidel/Sherlock.

References

© 2022 Kück P. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and build upon your work non-commercially.

Submit Query

PubMed Indexed Articles

Track Your Article

Editor In Chief

Hirotada TSUJII

Ph.D in Agriculture from Faculty of Agriculture, Tohoku University

Approaches in Poultry, Dairy & Veterinary Sciences

Maria Kuman

Research Professor, PhD, Holistic Research Institute

Advances in Complementary & Alternative Medicine

Tomasz Karski

MD PhD, Professor, Vincent Pol University

Orthopedic Research Online Journal

Jiexiong Feng

Professor, Chief Doctor, Director of Department of Pediatric Surgery, Associate Director of Department of Surgery, Doctoral Supervisor Tongji hospital, Tongji medical college, Huazhong University of Science and Technology

Research in Pediatrics & Neonatology

Muhammad Atiqullah

Senior Research Engineer and Professor, Center for Refining and Petrochemicals, Research Institute, King Fahd University of Petroleum and Minerals (KFUPM), Dhahran, Saudi Arabia

Research & Development in Material Science

Ian James Martins

Fellow of International Agency for Standards and Ratings (IASR), Edith Cowan University, Sarich Neuroscience Research Institute

Advancements in Case Studies

Thomas F George

Chancellor Emeritus / Professor Emeritus of Chemistry and Physics, University of Missouri–St. Louis

Annals of Chemical Science Research

Jose Crisologo de Sales Silva

Ph.D in Science from the Federal University of Alagoas, UFAL, Brazil

Novel Research in Sciences

Naglaa Sami Adbel Aziz Mahmoud

Assistant Professor in College of Architecture, Art and Design

Academic Journal of Engineering Studies

Tong-Ching Tom Wu

Interim Dean, College of Education and Health Sciences, Director of Biomechanics Laboratory, Sport Science Innovation Program, Bridgewater State University

Research & Investigations in Sports Medicine

Dr. Jose Luis Turabian

Professor of numerous training courses in Family Medicine

Associative Journal of Health Sciences

Dariusz Jacek Jakóbczak

Assistant Professor, Department of Electronics and Computer Science

COJ Electronics & Communications

Önder Pekcan

Emeritus Professor of Physics, Kadir Has University, Turkey

Polymer Science: Peer Review Journal

Member In

View All...

Quick Links

Editorial Board Registrations

×

Join as Editor

Join as Associate Editor
Submit your Article
Best Paper of the Volume
Reprints
Refer a Friend

×

Refer a Friend

Suggested By

Referrer Details
Advertise With Us

×

Advertise With Us

Our Recent Edition

Top Editors

Zhengcai Lou

Wenzhou Medical University, China
Ya Lie Ku

Fooyin University, Taiwan
Volkan Sarper Erikci

Saglik Bilimleri University, Turkey
Tomasz Karski

Vincent Pol University, Poland
Thamil Selvam

National Defence University of Malaysia, Malaysia
Tarik Baykara

Dogus University, Turkey
Steven Smith

Hope College, USA
Stanislav Grigoriev

Russian Academy of Sciences, Russia
Shi Zhou

Southern Cross University, Australia
Shewikar Farrag

Umm Al-Qura University, Saudi Arabia
Ray Marks

City University of New York, USA
Praveen K Maghelal

Khalifa University of Science & Technology, United Arab Emirates
Pipat Chooto

Prince of Songkla University, Thailand
Peng Yu

Hebei Normal University, China
Nawal Mohamed Khalafallah

Alexandria University, Egypt
N K Kishore

Indian Institute of Technology Kharagpur, India
Muzzalupo Innocenzo

Council for Agriculture Research and Analysis of Agri Economy (CREA), Italy
Muhammad Atiqullah

King Fahd University of Petroleum and Minerals, Saudi Arabia
Mohd Azlan Mohd Ishak

Universiti Teknologi MARA, Malaysia
Mohamed A Rashed

King Abdulaziz University, Saudi Arabia
Maurice E Morgenstein

University of Oregon, USA
Martin Sweatman

University of Edinburgh, Scotland
Maria Kuman

University of Tennessee, USA
Manuel Velasco

Central University of Venezuela, Venezuela
Majid Monajjemi

Islamic Azad University Central Tehran Branch, Iran
Luisetto Mauro

Tourin University, Italy
Lloyd Arthur Jenkins

Teaching & Public Speaking, Spain
Leonardo Milella

Paeditric Hospital "Giovanni XXIII", Italy
Katerina Chryssou

General Chemical State Laboratory , Greece
Kanakis Dimitrios

University of Nicosia, Cyprus
Jose Luis Clua Espuny

Universidad Miguel Hernández de Elche, Spain
John Korstad

Oral Roberts University, USA
Jinliang Zhang

Beijing Normal University, China
Irina Koretsky

Howard University, USA
Ian James Martins

Edith Cowan University, Australia
Hamid Yahiya Hussain

Dubai Health Authority, UAE
Gundu HR Rao

University of Minnesota, USA
GP Karmakar

Indian Institute of Technology Kharagpur, India
Ghassan George Haddad

Serhal Hospital, Lebanon
George Thomas

University of Missouri-St. Louis , USA
George Gregory Buttigieg

University of Malta, Malta
Fumihiko Hinoshita

National Center for Global Health and Medicine, Japan
Freida Pemberton

Molloy College, USA
Francisco Welington de Sousa Lima

Federal University of Piauí, Brazil
Florian Bert

Krankenhaus Nordwest Hospital, Germany
Fedor Lisetskii

Belgorod State University, Russia
Fathi Habashi

Laval University, Canada
Dora Alicia Cortes Hernandez

Cinvestav-Unidad Saltillo, Mexico
Daniel Kinem

UPMC Hamot Neuroscience Institute, USA
Conxita Mestres Miralles

Ramon Llull University, Spain
Barry Kraynack

White Bear Associates, LLC, USA
Arkady S Voloshin

Lehigh University, USA
Alireza Heidari

California Southern University, USA
Alex Guskov

Institute of Solid State Physics of RAS, Russia
Alan Diego Briem Stamm

University of Buenos Aires, Argentina
Ahmed Nasr Ghanem

Mansoura University, Egypt
Afaf K El Ansary

King Saud University, Saudi Arabia
A Bernardes

University of Coimbra, Portugal

Financial Support

Latest e-Books

Latest Video

© 2017 Crimson Publishers, All rights reserved. No part of this content may be reproduced or transmitted in any form or by any means as per the standard guidelines of fair use. Creative Commons License Open Access by Crimson Publishers is licensed under

a Creative Commons Attribution 4.0 International License. Based on a work at www.crimsonpublishers.com. Best viewed in

| Above IE 9.0 version

Scroll