Monday, May 17, 2021

On the virtues of identifying the correct publication units and the ills of Salami slicing

Publishing a paper is much more than just doing good research and writing it up. It needs a lot of thought into identifying the correct publication unit, crafting a captivating story, and delivering it with the right tone. A vivid memory that has stuck in my mind is a lecture on publication ethics that discussed the idea of salami slicing. The phrase "salami-slicing" of a paper refers to splitting up a manuscript into numerous small pieces to increase publication count. At the time it seemed to me like the evilest thing that a scientist could do and it reeked of greed and cunningness. 

Recent events have resulted in introspection on this quick judgment that I had jumped to. Alternative reasons for why "salami-slicing" could happen are listed here (not aimed at justifying):

  1. A story can become too long and convoluted without the proper amount of content.
  2. Reviewers might be inclined to comment stuff like "This manuscript is about multiple things,  and although the subjects are certainly appropriate for XYZ journal" etc. "At least 5 disparate projects are included in the paper...". Such comments can motivate or rather ensure splitting the manuscript into multiple parts. 
  3. The cost of doing research continues to increase in most biology-related domains. Pouring all of these resources into one mega monolith might not be liked by funding bodies or other relevant authorities. Focus on paper count rather than quality or thoroughness of the research is a worrying prospect. 
Having explained some background that doesn't justify "Salami-slicing", let me provide details of what Patil et. al., did. First, the manuscript titled "CoalQC - Quality control while inferring demographic histories from genomic data: Application to forest tree genomes"  dealing with various technical aspects of PSMC was posted on the Biorxiv repository in March 2020. Next, Patil et.al. managed to publish the first part of the study in the journal Gene, titled "The genome sequence of Mesua ferrea and comparative demographic histories of forest trees" in October 2020. However, the technical parts dealing with repeats, genome assembly, and parameter settings remained unscrutinized by the powerful gaze of the intellects of peer reviewers. After struggling through numerous journals that were willing to publish the manuscript without demanding article processing charges (APCs), the second part is now published in the journal Heredity, titled "Repetitive genomic regions and the inference of demographic history".

The date of acceptance (17th April 2021) for this second part is of great significance. It was the 130th birth anniversary (on 14th April) of an Indian anthropologist who wrote the book "WWTS?". Obviously, he is better known for his many other achievements. This book, published in the year 1946 is almost 300 pages long (including the appendices) long and was sold at a cost of Rs. 12/8. Many things have changed in the years since. We now have WGS data to tell us about human population history. However, the spirit of the initial book and its relevance continues to haunt India. If nothing else, the book delves into the past and challenges many ideas held dearly and venerated by a few. This possibility of being able to challenge and question dogma is what distinguishes scientific thought from non-scientific thought. The second part of the CoalQC manuscript is now published. In some ways, this manuscript challenges the existing demographic inference methodology. The fact that such critical evaluations of widely used methods are accepted and add to the discussion is of great value. This article by Patil et.al. is now available online as "Repetitive genomic regions and the inference of demographic history". Some additional material that we never published from the pre-print forms the basis for a blog post (Leaping from frogs to plants - in quest of repeats) at the Nature Ecology and Evolution community.

Sunday, May 9, 2021

Coelacanth helps in the fight against HIV

The coelacanth has been called a "living fossil" in popular media as it is thought to have barely evolved compared to the fossil record. Despite morphological similarities of the extant coelacanth to the fossil specimen, considerable molecular evolution is likely to have occurred as seen in the case of species that have been called living fossils. However, a comparison of nucleotide sequence has shown that the rate of evolution in this lineage is significantly lower than other tetrapod lineages. The reasons for the morphological stasis have been the focus of speculation and need greater investigation. Some groups of species (including the Coelacanth) that have been characterized as living fossils are species-poor and not easily amenable to molecular evolutionary analysis. 

More than 95% of all extant fish species belong to the infraclass Teleostei (teleost fish) and abundant fish species that are commercially important or serve as model organisms (such as the zebrafish) belong to this group. These teleost fish belong to the class Actinopterygii (ray-finned fishes) and are known to have had a third round of whole-genome-duplication (3R-WGD). Phylogenetic studies have consistently found that the Coelacanths and lungfish belong to the clade of Sarcopterygii (lobe-finned fish) and share a more recent common ancestor with Tetrapods than Actinopterygii. Importantly, the Coelacanths share the two rounds of whole-genome duplication (2R-WGD) found in other tetrapods and lack the third round (3R-WGD) of whole-genome duplication found in teleost fish. This close evolutionary relationship of Coelacanths with tetrapods and its phylogenetic position has made it a useful model to study the transition of vertebrates from water to land. 

Despite the challenges associated with the study of molecular evolution of Coelacanths, sequencing of its genome in 2013 helped uncover many interesting aspects. Availability of the Coelacanth genome played an important role in timing the whole genome duplication events and provided clearer evidence to support the occurrence of vertebrate whole-genome duplication events. Identification of gene loss events in tetrapods compared to the Coelacanth highlighted several adaptive events that occurred during the transition from water to land. One of the interesting finds reported was the lack of IgM (Immunoglobulin-M) in the Coelacanth genome. Due to its strategic phylogenetic location, Coelacanth genes have been studied to understand the origin and diversification of gene families. A prominent example, the origin of the restriction factor tetherin and more recently HERC's have made use of the Coelacanth gene sequences.

Ramdas et al., use an elaborate study design to investigate the SERINC family of restriction factors. During the course of their investigation, they find that one of the human paralogs SERINC2 is not able to fight HIV while all the other four SERINCs do a good job of fighting HIV. Upon further investigation, they find that SERINC2 from the Coelacanth is able to deal with HIV and this activity was lost in other lineages. One of the most interesting aspects of this study is the use of foamy viruses similar to the endogenous one recovered from the Coelacanth genome to evaluate the ability of SERINC2. The mechanism of action is also deciphered using sophisticated assays.  You can read the final published version titled "Coelacanth SERINC2 inhibits HIV-1 infectivity and is counteracted by envelope glycoprotein from foamy virus" on the website of JVI.