Friday, June 7, 2024

Genome-wide pattern discovery and the challenge of distinguishing biology from artefacts

The number of genomes that are becoming available is increasing at a very fast pace (exponentially?). One of the early analyses that bioinformaticians focussed on was the discovery of genome-wide patterns such as gene density and its correlates and genetic diversity and its determinants. Identifying such patterns and verifying the reliability of the observed pattern is one of the first steps. Once such patterns are known, trying to decipher the processes underlying these patterns would be an important next step.

Moving from Patterns to Processes would also involve the development of a strong theory that can make testable predictions. Once such a theory is available, it is possible to simulate scenarios and evaluate the theory. A well-grounded theory would be required to progress the scientific understanding of a field.

The study by Teekas et al. examined a specific type of genetic region known as low complexity regions (LCRs) in the DNA of tetrapods, a group that includes amphibians, reptiles, birds, and mammals. LCRs are sequences in DNA that are made up of repetitive or simple sequences of nucleotides (the building blocks of DNA). These regions are interesting because they can change rapidly and might help organisms adapt to their environments.

The Role of LCRs in Evolution and Adaptation

LCRs can be a source of new traits and functions in organisms. The length and composition of LCRs are influenced by two main factors: mutation and natural selection. Mutation can change the DNA sequence randomly, sometimes leading to longer or shorter LCRs. For instance, mutations such as replication slippage (where the DNA copying process makes mistakes) can cause variations in LCR length. High levels of guanine (G) and cytosine (C) nucleotides, known as high %GC content, also contribute to these changes.

On the other hand, natural selection can favour certain variations that are beneficial for survival and reproduction. The interplay between these mutations and selection pressures determines the specific characteristics of LCRs in different organisms.

Key Findings

  1. Location and Function of Positively Selected Sites (PSS):
    • Positively selected sites (PSS) are parts of genes that have undergone selection because they provide some advantage.
    • PSS and LCRs are often found at the ends of genes in tetrapods.
    • PSS at the center of genes tend to be involved in defense mechanisms, such as the immune response, while PSS at the ends of genes are associated with more general functions.
  2. Characteristics of LCR-Containing Genes:
    • Genes with LCRs in tetrapods tend to have a higher %GC content.
    • These genes show a lower ratio of non-synonymous to synonymous substitutions (ω or dN/dS), indicating strong purifying selection. Purifying selection removes harmful mutations, ensuring the gene remains functional.
    • Despite the rapid functional diversity that LCRs can provide, they are subject to intense purifying selection to maintain beneficial traits.
  3. Purity and Position of LCRs:
    • LCRs are commonly found in genes but are less pure, meaning they have more variation.
    • As the purity of LCRs increases (i.e., they become more uniform), they tend to be located in specific parts of the gene, suggesting their evolutionary role depends on their composition.

Supporting the Robustness of the Patterns

The robustness of these patterns was supported through several methods:

  1. Consistency Across Data Sets:
    • The researchers observed these patterns across multiple tetrapod species, ensuring that the findings were not limited to a single group or dataset.
  2. Statistical Analysis:
    • They used statistical methods to test whether the patterns were significant and not due to random chance. This included analyzing the %GC content and ω ratios to confirm their observations.
  3. Replication and Validation:
    • The findings were validated by comparing different gene regions and across different species to ensure that the patterns were consistent and reproducible.

Understanding Underlying Processes

By identifying these patterns, the researchers gained insights into the underlying biological processes:

  1. Mutation and Selection Dynamics:
    • The study highlighted how mutations create variability in LCRs and how natural selection shapes these variations to enhance adaptability and function.
  2. Functional Roles of Gene Regions:
    • The position-specific roles of PSS within genes revealed how different parts of a gene can evolve to serve distinct functions, such as defence mechanisms or general cellular processes.
  3. Evolutionary Strategies:
    • The variation in %GC content and ω ratios between LCR-containing and non-LCR-containing genes illustrated different evolutionary strategies. LCRs contribute to rapid adaptation while being tightly controlled by purifying selection to prevent harmful mutations.

Building a Theoretical Framework

With the increasing availability of genomes, bioinformaticians can now analyze large datasets to discover genome-wide patterns, such as gene density and genetic diversity. Identifying and verifying the reliability of these patterns is a crucial first step. Once reliable patterns are known, the next step is understanding the underlying processes.

In this study, the researchers' findings contribute to building a theoretical framework for understanding genetic diversity and adaptation:

  1. Model Development:
    • The identified patterns can be used to develop models that explain how LCRs evolve and contribute to functional diversity in genes. These models can help predict how genes adapt to environmental pressures.
  2. Integration of Mechanisms:
    • By integrating the roles of mutation and selection, the framework can explain how genetic diversity is generated and maintained in different species.
  3. Predictive and Explanatory Power:
    • A robust theoretical framework can predict new evolutionary trends and explain observed patterns in genetic data. For instance, it can help predict which genes might evolve rapidly in response to new environmental challenges.
  4. Guiding Future Research:
    • The framework can guide future research by highlighting key areas for investigation, such as the specific mechanisms by which LCRs influence gene function and adaptation.

Moving from patterns to processes involves developing a strong theory to make testable predictions. Once such a theory is available, scientists can simulate scenarios and evaluate the theory's validity. A well-grounded theory is essential for advancing the scientific understanding of a field, as it provides a structured way to interpret data and predict future observations.

Conclusion

This study sheds light on the complex dynamics of low-complexity regions in the DNA of tetrapods. By revealing how these regions are influenced by mutation and selection and how they contribute to functional diversity and adaptation, the researchers have laid the groundwork for a theoretical framework to enhance our understanding of genetic evolution. The robustness of the findings, supported by rigorous statistical analysis and validation across multiple species, ensures that the patterns identified are reliable and meaningful. This framework will be invaluable for future studies exploring genetic diversity and the evolutionary processes that shape life on Earth.

* This blogpost is generated using chatGPT 

We talked about how Open Science is changing the way science is done and the important role of the Royal Society in facilitating this. The terms Open and Science have come together to form a new coalition championed by those disillusioned by the academic research culture. For instance, Rachael Ainsworth discusses how the "Research Culture is Broken, and Open Science can Fix It".





Rachael Ainsworth is not the Royal Society. So it's not just established, well-known societies; even an individual with passion and dedication can advocate for open science. However, the world is the way it is, and it takes power, prestige, prominence in science and the credibility of a society to be heard. Michael Nielsen is somewhere in between the Royal Society and Rachael Ainsworth and describes himself as somebody who "helped pioneer...the modern open science movement". He uses interesting anecdotes involving important people to highlight the challenges of doing Open Science.


We are also moving from a text-dominated communication medium to a video-dominated medium. Given the buzz currently surrounding Open Science, it seems reasonable that the study by Teekas and group be published in Open Biology: A fast, open-access Royal Society journal publishing high-impact biology at the molecular and cellular level. Of course, the manuscript spent over a year on bioRxiv and had seen multiple journals before making its way to the pre-print server in July 2023. Here is a brief timeline:

XXXX-X-22-00131 on Wed, Sep 28, 2022, 3:04 PM
XXXX-X-22-00131R1 on Thu, May 4, 2023, 2:39 PM
XXXX-X-22-00131R2 on Fri, Oct 20, 2023, 3:53 PM
XXXX-X-22-00131R3 on Mon, Nov 20, 2023, 1:25 PM
XXXX-X-22-00131R4 on Wed, Nov 22, 2023, 12:26 PM

XXXX-23-1023 on Fri, Nov 24, 2023, 2:23 PM

XXXX-2023-2689 on Wed, Nov 29, 2023, 1:06 PM