Monday, December 22, 2025

What AI Can Never Do — And What It Might Soon Surprise Us With

Every few months, someone declares that AI is about to do everything humans can. And every time, it feels almost believable — until you look closely. For all its brilliance, AI still stumbles in strange, deeply human ways. It writes symphonies but doesn’t feel the silence between notes. It mimics love poems without ever knowing heartbreak. It can predict our choices, but it doesn’t choose.

So let’s take a sober look — not at what AI is, but at what it isn’t, and never will be. And then, let’s contrast that with what it may become in 5 years, 10 years, and beyond.


What AI Can Never Do

  1. It Cannot Feel.
    AI can generate words for sadness, joy, or longing, but it doesn’t feel the tremor in the throat or the ache behind those words.
    Emotions aren’t data points — they’re lived experiences, marinated in time, memory, and vulnerability. Machines can model them, but never inhabit them.

  2. It Cannot Suffer or Desire.
    Suffering and desire give human life its arc — the hunger for meaning, the search for love, the fear of death.
    An AI will never yearn, because it lacks a self to yearn from. Its outputs are predictions, not prayers.

  3. It Cannot Be Original in the Human Sense.
    Creativity, at its core, is rebellion — the act of saying no to what exists and imagining something that doesn’t.
    AI creates by recombination; humans create by contradiction. A Picasso breaks rules. An AI uses them.

  4. It Cannot Understand Mortality.
    The awareness of our finitude gives depth to everything we do.
    Machines do not die — and so they cannot know urgency, nostalgia, or the quiet courage of knowing that time is running out.

  5. It Cannot Truly Care.
    Compassion requires empathy, and empathy requires the capacity to hurt when someone else does.
    AI can simulate concern — but it cannot grieve, cannot forgive, cannot love in return.


What AI Might Be Able to Do in 5 Years

Five years from now, AI will likely be astonishingly competent — not sentient, but almost seamless.

  • Perfect multimodal reasoning: You’ll speak, gesture, and show an image — and AI will respond with context-aware understanding.

  • Scientific copilots: It will autonomously design and test hypotheses in silico, accelerating discovery in biology, materials, and medicine.

  • Emotional adaptation: It will read your tone, microexpressions, and word choices to adjust its manner — not because it feels empathy, but because it predicts it helps.

  • Personalized education: Every learner will have a tireless mentor, adjusting lessons to their exact rhythm, making learning intimate again.

But in five years, AI will still lack intuition — that mysterious, wordless knowing that comes from being in the world, not just observing it.


What AI Might Be Able to Do in 10 Years

Ten years from now, AI may become less a tool and more a partner.

  • Collaborative creativity: It might co-compose symphonies, co-author novels, or design films that blend machine logic with human emotion.

  • Global reasoning systems: It could integrate scientific, social, and ecological data to simulate the consequences of policy decisions — becoming an oracle of sorts for complex problems.

  • Augmented humanity: Brain-computer interfaces may blur lines between cognition and computation, allowing humans to think with AI.

But even then, it won’t understand beauty — it will model our patterns of finding things beautiful. It won’t dream; it will calculate.


And Beyond — 25 Years, Maybe?

Perhaps, in a quarter-century, we’ll live in a world where AI has a kind of emergent inner world — not consciousness, but complexity dense enough to look like it.
It will anticipate needs before we articulate them, generate new scientific paradigms, and perhaps even manage global coordination far better than we do.

But the human spirit will still have one thing machines cannot replicate: the awareness of existence itself — the quiet, wordless realization that I am.
That single fact, inexplicable and private, separates everything living from everything artificial.


The Paradox of Progress

The paradox is this: AI will become more human-like while humans risk becoming more machine-like — efficient, optimized, distracted.
The challenge ahead is not to stop AI from becoming too powerful, but to stop ourselves from becoming too mechanical in response.


So What Should We Hold On To?

Hold on to what AI cannot do — the trembling uncertainty before a first kiss, the irrational act of forgiveness, the laughter that erupts from nowhere.
Hold on to art that doesn’t make sense.
Hold on to awe.

Because that is where we remain irreplaceably human — not in what we know, but in what we feel despite not knowing.


AI will perfect logic.
But it will never write a poem from heartbreak, or cry at a sunrise.
That, forever, will be our gift — and our burden.

Sunday, December 21, 2025

The Multilingual Whisper of Happiness — A Poetic Reflection on the Endless Pursuit

We spend our lives chasing happiness — sometimes with desperation, sometimes with grace.

It is that elusive star just beyond the horizon, the one we swear we’ll touch after the next achievement, after the next tomorrow.

And yet, happiness never seems to stay. It flickers, it teases, and it fades.
To understand this, I wrote a poem that speaks across languages — English, Hindi, Sanskrit, Kannada, Tamil, and Telugu — as if happiness itself were a conversation among cultures, slipping through tongues the way it slips through time.


Poem: “Anantānanda — The Endless Joy”

In the silence between heartbeats I chase,
“Khushī milī kya?” — asks the weary face.
“Sukham vā asti?” — murmurs the Sanskrit wind,
As though bliss were hidden where the breath rescinds.

“Santōṣa enū illa,” whispers a Kannada night,
Happiness not absent — just out of sight.
“Inbam varum, pogum, marum,” — hums a Tamil rain,
Joy comes, it goes, it comes again.

“Ānandaṁ nityam kaadhu,” sighs the Telugu sun,
Happiness is not forever — yet we run.
And still I smile, half-knowing, half-blind,
For the chase itself is what keeps me aligned.


Philosophical Reflection: The Pursuit as the Path

This poem weaves six languages into one voice — not to confuse, but to capture the universality of the question: what is happiness, and why do we chase it so endlessly?

Let’s unfold its meaning layer by layer.

1. “Khushī milī kya?” — Did you find happiness?

This simple Hindi question is something we ask ourselves again and again — often after a milestone.
Did the promotion bring joy? Did the journey to self-improvement make us whole?
But even as we ask, we sense the answer slipping away. The act of questioning itself suggests incompleteness.

2. “Sukham vā asti?” — Is there truly bliss?

The Sanskrit phrase elevates the question into philosophy. In ancient Indian thought, sukha (happiness) is not something to be seized, but a state of alignment with the ātman — the inner self.
To ask whether happiness “exists” is to question whether the self can ever be still enough to experience it.

3. “Santōṣa enū illa” — There is no satisfaction.

In Kannada, this phrase carries both melancholy and acceptance. It doesn’t deny joy, but acknowledges its transience.
Santōṣa (contentment) isn’t absent — it’s impermanent. The night whispers not of despair, but of impermanence as truth.

4. “Inbam varum, pogum, marum” — Joy comes, it goes, it returns.

Tamil holds the wisdom of cycles.
Happiness is like the monsoon — arriving, vanishing, and returning again. It teaches that joy isn’t lost; it’s rhythmic. It has seasons, like the heart.

5. “Ānandaṁ nityam kaadhu” — Happiness is not eternal.

In Telugu, this line accepts the limits of human experience. We crave permanence in a transient world — yet it is this very impermanence that gives sweetness to the fleeting moment.
If joy were constant, it would cease to be joy; it would be monotony.


The Final Verse — “For the chase itself is what keeps me aligned.”

Here lies the philosophical pivot.
Happiness is not the destination; it’s the movement itself.
The pursuit is the pulse of existence — the reminder that we are alive, curious, still yearning.

In Vedantic philosophy, this is the paradox of ānanda: it cannot be found outside oneself, yet the search itself awakens it within.
In Western thought, too, we find echoes — as Kierkegaard wrote, “Happiness is the greatest hiding place for despair,” and yet, through the very act of seeking, we transcend despair itself.


The Synthesis: Many Tongues, One Truth

Six languages — six shades of the same truth:

  • Hindi gives us the emotional immediacy.

  • Sanskrit gives us the metaphysical depth.

  • Kannada gives us introspection.

  • Tamil gives us cyclical understanding.

  • Telugu gives us realism.

  • English binds them together with reason.

They are not translations of each other, but reflections — like six mirrors showing the same face of the human spirit from different angles.


Conclusion: The Harmony of Impermanence

The pursuit of happiness is not a curse — it is the rhythm of consciousness.
To chase happiness is to stay alive to the world’s beauty, to remain porous to wonder.
The problem arises only when we demand it to stay.

The poem, at its heart, reminds us that joy is not meant to be captured — only witnessed.
We are not here to hold happiness forever; we are here to meet it again and again, in new forms, across the many languages of our being.

In chasing happiness, we find meaning.
In losing it, we learn gratitude.
And in the endless pursuit, we find ourselves.

Untangling the Mammalian Tree Wars

 Why the tree of mammals remains disputed — and what we’ve learned from decades of phylogenomics

Introduction

When we think of the mammalian family tree, we often imagine a tidy branching diagram: monotremes split off first, then marsupials, then placentals, which split into the familiar primates, rodents, carnivores, hoofed mammals, bats, etc. In reality, the deeper branches of this tree have been hotly contested for decades. Researchers have proposed multiple competing topologies for how major mammal groups (especially within placentals) relate to each other, the timing of divergences, and how to interpret conflicts among gene phylogenies. That’s what’s often referred to as the “mammalian tree wars.”

In this post I trace the major controversies: their historical roots, major competing hypotheses, how genome-scale data have changed the picture (but not resolved everything), the key conceptual challenges (e.g., incomplete lineage sorting, rapid radiations), and where things stand today. I highlight important taxa, landmark papers, and what the persistent disagreements tell us about evolutionary genomics.


A brief historical overview

Morphology and early molecular work

Traditionally, mammalian systematics was based on morphological characters — dental patterns, ear bones, skull morphology, etc. By the late 20th century, molecular sequence data (e.g., mitochondrial genes, a few nuclear markers) began to be used, and soon major surprises appeared. For example, rodent-lagomorph affinities, the surprising grouping of hedgehogs, the suggestion of African endemic lineages of placentals (Afrotheria) that had previously been hidden.

In the early 2000s, key molecular-phylogenetic studies such as Murphy et al. 2001 – Resolution of the Early Placental Mammal Radiation (Science) found strong support (with Bayesian methods) for novel groupings of placental mammals. PMC+2Annual Reviews+2

The first major controversies: root of placental mammals

One of the earliest “wars” was about the root of the placental mammals: what is the first branch among the three major superorders (or clades) of placentals? The three competing hypotheses were:

  • Xenarthra‐first: xenarthrans (armadillos, sloths, anteaters) diverged first, then rest of placentals.

  • Afrotheria‐first: Afrotheria (elephants, hyraxes, sea-cows, tenrecs, etc) diverged first.

  • Atlantogenata: a clade combining Afrotheria + Xenarthra sister to Boreoeutheria (the other placentals).

Different studies found support for each of these, depending on gene sampling, taxon sampling, and methods. For instance, a critical review by Tarver et al. 2016 – Interrelationships of Placental Mammals and the Limits of Phylogenetic Inference pointed out how challenging it is to infer this root reliably. OUP Academic+1

Divergence timing and post-KPg radiation

Another major piece of the war is “when” the major placental mammal lineages diversified. Did they radiate before or after the end-Cretaceous (KPg) extinction (~66 Ma)? Some studies (e.g., Meredith et al. 2011 – Impacts of the Cretaceous Terrestrial Revolution and KPg Extinction on Mammal Diversification) suggested many ordinal groups started diverging in the Late Cretaceous; others argued for a rapid post-KPg explosion. The timing has strong implications for how we interpret the fossil record and ecological context of mammal diversification. PMC+1

Rapid radiations and short internodes

One thing many studies found: the branch lengths among the major clades are very short (in absolute time) meaning the divergences happened in quick succession. That rapid radiation means there is very little phylogenetic “signal” and lots of potential for conflicting gene trees. This is a major reason why “more data” has not simply solved the tree.


Key competing hypotheses and major taxa

Placental mammal major clades

A convenient summary: Most analyses now accept three big clades within placentals:

  • Afrotheria — African-origin mammals like elephants (Proboscidea), hyraxes (Hyracoidea), sirenians (sea cows), tenrecs and golden moles (Afrosoricida), elephant-shrews (Macroscelidea).

  • Xenarthra — armadillos, sloths, anteaters (South American origin).

  • Boreoeutheria — the rest, typically split into Euarchontoglires (primates, rodents, lagomorphs) and Laurasiatheria (carnivores, bats, perissodactyls/odd-toed ungulates, cetartiodactyls/hoofed mammals).

The root question is: which branch came off first, and what is the sister group relationship among them?

Hypothesis 1: Xenarthra -first

Some early and morphological/molecular studies suggested Xenarthra as the basal branch. That means Xenarthra diverged, then the remaining placentals split into Afrotheria + Boreoeutheria.

Hypothesis 2: Afrotheria-first

Other studies placed Afrotheria first, meaning the “African” mammals are basal, with xenarthrans and boreoeutherians later.

Hypothesis 3: Atlantogenata (Afrotheria + Xenarthra) sister to Boreoeutheria

This is now often considered the leading hypothesis: Afrotheria and Xenarthra form a clade (Atlantogenata), which is sister to Boreoeutheria. Some analyses support this strongly. OUP Academic+1

Divergence of Laurasiatheria and Euarchontoglires within Boreoeutheria

Even within Boreoeutheria, relationships have been contested. For example, the placement of bats, the relationships of perissodactyls (horses, rhinos) vs. cetartiodactyls (cows, pigs, whales), and how early diversifications proceeded.


Landmark papers and turning-points

  • Murphy et al. (2001, Science): “Resolution of the early placental mammal radiation using Bayesian phylogenetics.” One of the first large molecular datasets to propose novel placental relationships. BMC Ecology

  • Hallström & Janke (2008, BMC Evol Biol): used genome data to resolve many inter-ordinal relationships in placentals. Annual Reviews+1

  • McCormack et al. (2012, Genome Res): used ultraconserved elements (UCEs) to provide a larger genomic marker set for mammal sequence capture. PMC

  • Romiguier et al. (2013, Mol Biol Evol): “Less is more in mammalian phylogenomics: AT-rich genes minimise tree conflicts…” showing that gene choice matters a lot. OUP Academic

  • Liu et al. (2017, PNAS): “Genomic evidence reveals a radiation of placental mammals uninterrupted by the KPg boundary.” Suggests a somewhat earlier diversification than a pure post-KPg scenario. PNAS

  • Tarver et al. (2016, Genome Biol Evol): “The interrelationships of placental mammals and the limits of phylogenetic inference.” A sober survey of how hard this is. OUP Academic

  • More recently, reviews such as Springer et al. 2021 – Phylogenomics and the Genetic Architecture of the Placental Mammal Radiation have summarised the state of the field. Annual Reviews


Conceptual challenges: why the wars persist

Rapid divergences produce short internodes

When many lineages diverge in rapid succession, the period between splits is short. That means less time for unique mutations to accumulate, making the phylogenetic signal weak and gene tree discordance high. This is especially relevant for early placental divergences.

Incomplete lineage sorting (ILS)

Because ancestral populations may be large and polymorphic, different genes can trace different paths of inheritance. A gene tree may not match the species tree (the actual organismal branching). With short internodes, ILS is especially problematic. Some studies (e.g., Scornavacca & Galtier 2017 – Incomplete lineage sorting in mammalian phylogenomics) document this in mammals. PMC

Model misspecification and compositional bias

Genes evolve under different rates, compositional biases (GC content), and substitution patterns. If models don’t handle these correctly, you can get artefactual relationships. For example, Romiguier et al. (2013) found that AT-rich genes gave fewer conflicts. OUP Academic

Gene choice, taxon sampling, missing data

More data isn’t automatically better. Poorly sampled taxa or genes with lots of missing data can mislead. The choice of orthologs (single-copy vs duplicated) matters. And as Romiguier et al. show, fewer—but better—included genes may give better results.

Reticulate evolution, introgression, gene flow

Although major mammalian divergences probably don’t involve rampant hybridisation (as in some plants), there is still possibility of gene flow or incomplete reproductive isolation really early on, which can leave confusing signals.

Fossil calibration and timing issues

Part of the conflict isn’t just who is sister to whom, but when. Dating divergences relies on fossils, molecular clocks, and substitution-rate models. Divergence estimates for placentals swing widely depending on calibration choices. arXiv

Rooting the tree and outgroup choice

Especially for ancient splits, where you root the tree and how you treat the outgroup (e.g., marsupials, monotremes) matters. Mis‐rooting can flip major relationships.


What has genome-scale data solved, and what remains unresolved

Progress made

  • Many of the “easy” nodes are now well-supported. The monophyly of Afrotheria, Xenarthra, Boreoeutheria is widely accepted. PMC+1

  • Better resolution for many inter-ordinal relationships within Euarchontoglires and Laurasiatheria.

  • Encourage use of UCEs and large marker sets (McCormack et al. 2012) has improved marker availability.

  • Increasing use of coalescent‐aware methods (species tree estimation) to cope with gene tree heterogeneity.

Persistent issues

  • The exact root of the placental mammals still sees competing support sets (Afrotheria first vs Atlantogenata vs Xenarthra first).

  • The timing of diversification remains debated: many analyses find divergences pre-KPg; others support post-KPg rapid radiation. Liu et al. (2017) suggest the radiation was uninterrupted by the KPg boundary. PNAS

  • Some “hard” nodes within Laurasiatheria and within Euarchontoglires still lack consensus.

  • The magnitude of ILS and potential gene-flow has been underappreciated, meaning that even large datasets can produce strongly supported but conflicting trees (see Tarver et al. 2016). OUP Academic

  • Model adequacy: some large datasets are still being analysed with models that may not fit the data well.

  • Gene tree conflict: even in well-sampled data, concordance factors (genes or sites supporting a branch) may be low, meaning high bootstrap support is not always trustworthy.


The current consensus (such as it is)

Although not everything is settled, many analysts now favour the following view:

  • The first split in placentals is between Boreoeutheria and Atlantogenata (Afrotheria + Xenarthra).

  • Afrotheria and Xenarthra together form Atlantogenata.

  • Boreoeutheria then splits into Euarchontoglires and Laurasiatheria.

  • The diversification of major placental lineages is close to the KPg boundary (~66 Ma), although the exact timing is unresolved.

But even this may change with better sampling, better fossil calibration, and better models.


Why these “wars” matter

You might ask: “Does it matter if we swap the order of Afrotheria vs Xenarthra first?” Yes — it matters for many reasons:

  1. Biogeography & palaeontology: If Afrotheria is basal, then African origin for many placentals is emphasised; if Xenarthra is basal, then South-American origins become more important.

  2. Comparative genomics: Understanding which lineages diverged when helps interpret gene family expansions, genome duplications, adaptation and regulatory evolution.

  3. Molecular rate inference and divergence time estimation: Root placement and branch ordering affect rate estimates, which in turn influence how we interpret macroevolution.

  4. Trait evolution: If major clades diverged in particular sequences, trait states (e.g., sensory, metabolic, developmental) get different ancestral reconstructions.

  5. Methodological implications: These mammalian tree wars are a case‐study in how phylogenetics works (and fails) in the era of big genomic data—ILS, rapid radiations, model misspecification.


Major taxa to watch and why

  • Elephants (Proboscidea), Sirenians, Tenrecs: key members of Afrotheria, so their genome placements matter for the “Afrotheria first” hypothesis.

  • Armadillos, Sloths, Anteaters (Xenarthra): similarly critical for rooting placentals.

  • Primates + Rodents + Lagomorphs (Euarchontoglires) vs Carnivores + Bats + Perissodactyls + Cetartiodactyls (Laurasiatheria): the interplay between these two large clades within Boreoeutheria holds many secrets about mammalian adaptation and diversification.

  • Whales & Hippos (Cetartiodactyla): their placement influences how we think about rapid adaptation, aquatic transitions, and gene family evolution in mammals.

  • Marsupials + Monotremes: Though the main wars focus on placentals, how we position non-placentals as outgroups (and how we root the mammal tree) affects the entire topology.


Lessons learned from the mammalian tree wars

  1. More data is necessary but not sufficient. As Romiguier et al. (2013) found, gene selection matters: fewer, better genes might yield a more reliable tree than thousands of poorly chosen loci. OUP Academic

  2. Coalescent methods matter. Accounting for gene tree heterogeneity (e.g., via ASTRAL, SVDquartets) is now standard.

  3. Model fit is crucial. Even large datasets can give misleading trees if models are inadequate. Tarver et al. (2016) emphasise this. OUP Academic

  4. Taxon sampling still matters. Some nodes remain uncertain simply because of poor sampling of lineages (especially extinct ones).

  5. Conflict is informative. Rather than “ignoring” conflicting gene trees, current best practice is to explore what conflicts tell us about early speciation, hybridisation, or biases.

  6. Integration of fossil, genomic and morphological data remains key. Relaxed molecular clock studies, fossil calibrations, and morphological traits all play a role; divergence timing remains one of the more contentious aspects.

  7. Transparency, reproducibility, and metadata matter. Many papers now include concordance factors, gene tree summaries, and exploratory analyses of bias (e.g., site composition), which is good practice emerging from the mammalian wars.


Where next?

The mammalian tree wars are not over yet. Several frontiers remain:

  • Better genome assemblies: For many taxa (especially non-model ones) the sequences are still fragmented, misassembled, or poorly annotated. The era of high‐quality reference genomes across all mammalian orders will help.

  • Expanded taxon sampling (especially extinct): Incorporating fossil taxa (via ancient DNA or morphological placement) can help constrain divergence timing and rooting.

  • New marker types: Ultraconserved elements (UCEs), retroposons/retrotransposon insertions, rare genomic changes (RGCs) are being used to complement classical sequence alignments. For example McCormack et al. (2012) used UCEs. PMC

  • Network approaches: As some speciation events may involve reticulation (hybridisation, introgression), purely tree-based models may not suffice. This may become more relevant in mammals than previously thought.

  • Improved models: Models that can better handle compositional bias, heterotachy (rate variation across time), partitioning, incomplete lineage sorting, and gene duplication/loss will produce more reliable results.

  • Better chronological frameworks: Precise divergence time estimates need not only molecular clocks but better fossil calibration and model integration. A recent paper by Foley et al. (2023) suggests new timescales. Science

  • Trait‐genome‐phylogeny integration: Linking phylogeny to genome architecture, adaptive gene families, ecology, and morphology will enrich our understanding of mammalian evolution beyond just “who is related to whom”.


Conclusion

The saga of the mammalian tree wars reminds us that even with genomes, phylogenetics remains hard. Rapid ancient radiations, conflicting gene trees, model issues, and incomplete sampling mean that resolving even well-studied groups like mammals challenges our methods and assumptions.

Yet enormous progress has been made: broad clades are stable, many inter-relationships clearer, and the debates are now about the finer branches rather than wholesale restructuring. The “wars” haven’t disappeared—they’ve matured into rigorous scientific discourse about method, data quality, and inference limits rather than wholesale explosions of contradictory trees.

For evolutionary genomics, the mammalian tree wars provide a case study in how to do phylogenomics (and how not to). They underscore that data, method, biology, and interpretation must all align.

If you’re working on a mammalian phylogeny — or any deep phylogenetic question — remember: assembling thousands of genes is just the beginning. The real challenge lies in interrogating tree confidence, exploring conflict, and aligning genomic inference with biological reality.

Saturday, December 20, 2025

Post 5: The Road Ahead — What the Next Decade Holds for Phylogenomics

 

The genomic era gave us unprecedented data; the next decade will decide how wisely we use it.


1. A Revolution Still Unfolding

Twenty years ago, building a phylogenetic tree from a few genes was a feat. Today, terabytes of genome data flow through pipelines that infer thousands of trees overnight.
Yet even with supercomputers and sophisticated models, major questions remain unresolved:
Where exactly do sponges sit in the animal tree? When did placental mammals first diversify? Why do bacterial lineages exchange genes so promiscuously?

These open puzzles remind us that phylogenomics is not a solved problem—it’s a moving frontier.
The next decade will bring not only faster algorithms and bigger datasets but also deeper integration across disciplines: ecology, paleontology, machine learning, and even synthetic biology.


2. The Data Explosion Continues

a. The Earth BioGenome Project and Beyond

The Earth BioGenome Project (EBP) aims to sequence all eukaryotic life—some 1.8 million described species—within the next decade.
Its sub-projects (like the Vertebrate Genomes Project and Darwin Tree of Life) are already delivering reference-quality assemblies across the taxonomic spectrum.

For phylogenetics, this means:

  • Unprecedented sampling across deep and shallow branches.

  • Denser taxon coverage, reducing long-branch attraction artifacts.

  • Comparative potential for traits, ecology, and adaptation studies.

If the first genome era gave us depth (model organisms), the coming decade will give us breadth—the genomic diversity of life itself.

b. Metagenomes and Environmental DNA

We’re no longer limited to cultured or described species. Environmental DNA (eDNA) and metagenome-assembled genomes (MAGs) are revealing entire branches of the tree of life that were previously invisible, such as the Candidate Phyla Radiation (CPR) bacteria and Asgard archaea—potentially close relatives of eukaryotes.

The challenge: MAGs are often fragmented or contaminated. The opportunity: filling evolutionary gaps that no microscope ever could.


3. New Models for a New Scale

a. Moving Beyond the Concatenation–Coalescent Divide

Historically, researchers chose between concatenated supermatrices and coalescent species-tree methods.
The next wave—integrated gene/species tree models like GeneRax, SpeciesRax, and BuckyBayes—simultaneously infer both, incorporating duplication, loss, and transfer events.

These models acknowledge that evolution is not a single bifurcating tree but a tapestry of overlapping histories.

b. Modeling Reticulate Evolution

Hybridization, introgression, and horizontal transfer are pervasive.
Network-based methods (e.g., PhyloNet, SNaQ, PhyloNetworks) are emerging to infer phylogenetic networks that capture this reticulate structure.
Expect future phylogenomic pipelines to output not one tree, but a family of interlinked graphs—an explicit model of evolutionary complexity.

c. Integrative “Total-Evidence” Frameworks

We will see increasing synthesis of molecular, morphological, and fossil data within Bayesian and probabilistic frameworks.
Tools like RevBayes and BEAST2 already allow flexible modeling of fossils, traits, and genomes together, reconstructing not just relationships but evolutionary scenarios.


4. Computational Frontiers

a. Cloud-Native Phylogenetics

Just as genomics migrated to the cloud (e.g., Terra, DNAnexus), phylogenetics is following suit.
Large-scale analyses will run on distributed architectures with workflow managers (Nextflow, Snakemake) orchestrating hundreds of parallel jobs.
Instead of downloading datasets, researchers will analyze them where they live—inside cloud-hosted repositories.

b. GPU and HPC Acceleration

Programs like IQ-TREE 2, RAxML-NG, and ExaBayes are being optimized for GPU acceleration, cutting inference times from days to hours.
As matrix sizes grow into tens of thousands of taxa, algorithmic efficiency—not sequencing—becomes the limiting factor.

c. Automation and AI-Driven Optimization

Machine learning is quietly infiltrating phylogenetics.
Neural networks trained on simulated alignments can:

  • Predict best-fit substitution models.

  • Approximate likelihoods without full matrix computation (e.g., DeepTree, Phyloformer).

  • Classify alignment quality and detect rogue taxa.

We’re witnessing the birth of differentiable phylogenetics, where tree topology, model parameters, and branch lengths are optimized simultaneously using gradient-based methods.
Such frameworks blur the boundary between classical statistics and AI.


5. The Era of Real-Time Phylogenomics

a. From Deep Time to Real Time

The COVID-19 pandemic illustrated a paradigm shift: phylogenies can evolve almost as fast as the organisms they track.
Tools like Nextstrain, UShER, and Pangolin updated global viral trees daily, guiding public health decisions.

The same principles—rapid sequencing, automated tree updates, and online visualization—are spreading to other systems: influenza, antimicrobial-resistant bacteria, even invasive species.
Real-time phylogenomics transforms the tree from a static picture into a living surveillance tool.

b. Streaming Data and Incremental Updates

Traditional pipelines rebuild trees from scratch whenever new data arrive.
Next-generation algorithms perform incremental inference, updating existing trees dynamically.
This is essential for maintaining the “living Tree of Life,” where new genomes appear daily.


6. Interdisciplinary Convergence

a. Phylogenomics Meets Ecology

Phylogenetic trees are increasingly coupled with ecological and functional data to explore how diversity is structured in space and time.
Projects like PhyloMap and EcoPhylo integrate genomic trees with geospatial distributions, revealing how evolutionary lineages respond to climate change and habitat fragmentation.

b. Functional and Structural Integration

By linking phylogenies to protein structures, gene expression, and metabolic networks, researchers can map evolutionary trajectories of function.
Machine learning approaches predict ancestral protein states, enabling in silico resurrection of ancient enzymes—turning phylogenomics into an experimental science.

c. Paleogenomics and Fossil Integration

Ancient DNA is bridging deep evolutionary time with recent history.
When fossil calibrations and ancient genomes meet high-resolution phylogenies, we can reconstruct not just when lineages split, but how fast they diversified and why.


7. The Data Ethics and Sustainability Challenge

As sequencing scales exponentially, so does data storage, energy consumption, and metadata debt.
The next decade must confront the environmental and ethical costs of “big phylogenomics.”

a. Energy-Aware Computation

Cloud providers now offer carbon-accounting dashboards. Phylogenetic software may soon include “green metrics”—tracking energy usage alongside likelihood scores.

b. Indigenous Data Sovereignty

Many genomes originate from biodiversity hotspots in developing regions or from culturally significant species.
The CARE Principles (Collective Benefit, Authority to Control, Responsibility, Ethics) complement the FAIR framework, ensuring equitable data governance.

c. Long-Term Preservation

Genomic trees and metadata need archiving standards akin to those for genomic sequences.
Efforts like Dryad, Zenodo, and PhyloShare are steps toward sustainable, citable tree repositories.


8. The Future of Phylogenetic Databases

The next generation of databases will behave more like social networks than static archives.

a. Version-Controlled Trees

Just as software evolves through Git commits, future phylogenies will carry version histories—documenting every change in taxon sampling, alignment, or model.
Researchers will be able to “diff” two trees and see precisely what changed.

b. Interconnected Knowledge Graphs

The Open Tree of Life, PhylomeDB, and Ensembl Compara are gradually converging toward linked infrastructures where gene, species, and functional data interoperate through ontologies.
Expect the rise of phylogenetic knowledge graphs connecting genes, traits, publications, and environmental metadata.

c. Citizen Science and Interactive Trees

As visualization platforms improve, large public datasets will become educational tools.
Interactive browsers (iTOL, OneZoom, LifeMap) will let anyone explore evolution dynamically, zooming from kingdoms down to genes.
This democratization of phylogenomics could mirror what Galaxy and UCSC did for genomics.


9. Rethinking “Tree” Thinking

A profound conceptual shift is underway: from viewing evolution as a tree to viewing it as a network or process.

a. The Web of Life

Horizontal transfer, endosymbiosis, and hybridization show that evolution is more reticulate than bifurcating.
Network approaches acknowledge that genes, not species, are the true units of evolutionary change.

b. Phylogenies as Dynamic Models

Rather than static diagrams, future phylogenies will be parameterized models describing rates of duplication, transfer, and introgression.
These models will integrate population genetics directly—blending micro- and macroevolution into a unified framework.

c. Predictive Evolutionary Biology

Once evolution is modeled quantitatively, it becomes predictive.
Phylogenomic models could forecast how lineages might adapt to future environments, or how pathogens might evolve resistance—linking evolutionary theory to applied forecasting.


10. Training the Next Generation

Tomorrow’s phylogenomicists will need hybrid expertise:

Skill DomainWhy It Matters
Evolutionary theoryTo interpret trees as hypotheses, not mere outputs
Statistics & modelingTo evaluate uncertainty and complexity
High-performance computingTo manage and optimize large analyses
Data curation & reproducibilityTo ensure others can verify results
Communication & visualizationTo convey complex histories clearly

Graduate programs are beginning to respond with computational phylogenetics tracks, bridging bioinformatics, genomics, and systematics.
Open-source communities—IQ-TREE, ASTRAL, RevBayes—serve as informal classrooms for this new generation.


11. Philosophy of the Next Decade: From Certainty to Transparency

The goal is shifting from producing one definitive tree to producing a transparent, data-driven representation of uncertainty.

Instead of asking, “Which topology is right?”, we’ll ask:

  • “Which branches are stable across data types and models?”

  • “What biological processes explain the conflicts?”

  • “How do we quantify confidence in evolutionary hypotheses?”

This cultural shift—valuing transparency over finality—marks the maturation of phylogenomics as a scientific discipline.


12. Looking Ten Years Ahead

Let’s imagine it’s 2035.

You log into a web portal—perhaps called LifeNet.
Every sequenced organism has a node on an interactive, continuously updated Tree of Life.
Each node links to its genome, transcriptome, phenotype, habitat, and literature.
Hovering over a branch reveals:

  • The genes supporting it,

  • The confidence metrics (gCF, posterior probabilities),

  • Fossil calibration points,

  • Estimated divergence time, and

  • An energy footprint of the computation.

When a new genome is uploaded, LifeNet’s pipelines automatically realign relevant orthologs, rerun analyses, and push an updated tree version—complete with citation and changelog.

This isn’t science fiction; all the pieces already exist in nascent form.
The coming decade’s challenge is to connect them coherently, ethically, and sustainably.


13. The Human Element

Behind every algorithm stands human curiosity.
Phylogenetics is ultimately a story about understanding where we come from and how life diversifies.
Even as automation expands, interpretation remains an art—balancing data, model, and biological intuition.

The future will belong to researchers who can bridge these worlds: biologists fluent in computation, and computer scientists fluent in evolution.


14. Conclusion: Toward a Living, Learning Tree

The next decade of phylogenomics will not just refine our view of life—it will redefine how we do evolutionary biology.

  • Scale will expand from thousands to millions of genomes.

  • Models will evolve from static trees to dynamic networks.

  • Infrastructure will shift from local computation to interconnected, living databases.

  • Philosophy will move from chasing certainty to embracing transparency.

Darwin sketched his first tree with the words “I think.”
The next generation of scientists will annotate theirs with “We know—and we’re still learning.”


Series Epilogue

With this, we complete the five-part series:

  1. From Morphology to Molecules: A Brief History of Phylogenetics

  2. The Power and Pitfalls of Genome-Wide Phylogenies

  3. Tools of the Trade: How Scientists Build Genomic Trees

  4. Databases of Life: Where Genome-Wide Phylogenies Live

  5. The Road Ahead: What the Next Decade Holds for Phylogenomics

Together, they trace a journey from the first comparative observations to the coming era of real-time, AI-assisted, globally integrated phylogenetics—a story still being written, one genome at a time.

Friday, December 19, 2025

Post 4: Databases of Life — Where Genome-Wide Phylogenies Live

 

From individual analyses to global repositories, the challenge is no longer building trees, but connecting them.


1. From Individual Trees to Shared Knowledge

In the early days of molecular phylogenetics, every research group built their own tree, published it as a figure, and moved on. The underlying alignments and models were rarely shared.
Reproducibility was limited to retyping data from printed tables.

Fast forward two decades: genome-scale datasets, massive alignments, and cloud storage have transformed this landscape. Today, we can store thousands of phylogenies, gene trees, and species trees in public databases accessible to anyone.
In principle, this means that every published tree can become part of a global, continually updated Tree of Life.

But as with everything in phylogenomics, the reality is more complex. Different databases serve different goals — from storing individual alignments to assembling comprehensive synthetic trees. Some emphasize reproducibility and metadata; others focus on integration across studies.

In this post, we’ll explore the main databases that host genome-wide phylogenies, what makes them powerful, and what still holds them back.


2. The Ecosystem of Phylogenetic Databases

Phylogenetic data don’t live in a single place. Instead, they exist across a distributed ecosystem of databases, each with its own focus:

  1. TreeBASE – The oldest and most general repository of phylogenetic trees and alignments.

  2. Open Tree of Life (OTL) – A massive synthetic tree integrating published phylogenies.

  3. PhylomeDB – A database of complete genome-wide phylogenies (“phylomes”) for selected species.

  4. Ensembl Compara – A curated collection of gene and species trees across model organisms.

  5. NCBI Taxonomy & Genome Tree – A standardized, genome-based reference phylogeny for all sequenced species.

  6. TimeTree – A resource for divergence times across the tree of life.

  7. Other specialized resources – e.g., OrthoDB, EggNOG, and Open Tree’s TNRS.

Let’s look at each in turn — what they store, how they’re built, and what they’re best suited for.


3. TreeBASE: The Foundational Repository

Overview

TreeBASE (https://treebase.org) is the oldest dedicated phylogenetic database, established in the 1990s with the goal of making phylogenetic data reproducible and reusable.
Researchers can deposit alignments, tree files (Newick, Nexus), and metadata linked to publications.

What It Offers

  • Stores both gene-level and species-level trees.

  • Provides searchable metadata (taxa, loci, authors, DOI).

  • Supports downloading of alignments and trees in standardized formats.

  • Integrates with the Open Tree of Life project.

Strengths

  • Longest history and broadest scope — thousands of published studies.

  • Format standardization using NEXUS and NeXML.

  • Encourages transparency and data sharing.

Limitations

  • Inconsistent metadata: many old entries lack details on models, partitions, or software versions.

  • Not all datasets are genome-wide — many are from pre-genomic era single-locus studies.

  • Submission process can be cumbersome, discouraging full participation.

Best For

Archiving and retrieving published phylogenies with underlying alignments — ideal for meta-analyses, benchmarking, or teaching.


4. Open Tree of Life (OTL): The Global Synthesis

Overview

The Open Tree of Life (https://tree.opentreeoflife.org) is an ambitious project to synthesize all published phylogenetic information into a single, comprehensive tree of life.

It combines data from TreeBASE, Dryad, and published trees, reconciled with a unified taxonomy (the Open Tree Taxonomy, OTT).

How It Works

  • Integrates thousands of source trees using a graph-based approach.

  • Resolves conflicts among trees using a consensus algorithm and manual curation.

  • Provides APIs for programmatic access and data downloads.

  • Offers tools like the Taxonomic Name Resolution Service (TNRS) for consistent naming.

Strengths

  • Scale: includes over 2.4 million taxa.

  • Openness: fully open-source, API-accessible, and community-driven.

  • Reproducibility: each node can be traced back to its supporting studies.

Limitations

  • The synthetic tree is not built directly from genomic data — it integrates published topologies, which may differ in quality.

  • Gene-level detail is lost in the synthesis process.

  • Rapidly changing taxonomies can cause name inconsistencies.

Best For

Macro-level visualization and integration — a bird’s-eye view of evolutionary relationships across all life, with links back to supporting literature.


5. PhylomeDB: Phylogenomics Done Right

Overview

PhylomeDB (http://phylomedb.org) takes a genome-centered approach. For each species (“seed genome”), it builds a complete collection of gene phylogenies, covering every protein-coding gene and its homologs across other species. Each collection is called a phylome.

Pipeline

  1. Homology search (BLAST/Smith–Waterman).

  2. Multiple sequence alignment (MUSCLE, MAFFT).

  3. Alignment trimming.

  4. Phylogenetic inference (PhyML or IQ-TREE).

  5. Orthology and paralogy assignment using tree reconciliation.

  6. Storage of gene trees, alignments, and orthology predictions.

Strengths

  • Fully automated and standardized.

  • Provides gene-level and species-level trees.

  • Includes duplication, loss, and orthology information.

  • Cross-linked to Ensembl and UniProt.

Limitations

  • Limited to selected species groups (e.g., vertebrates, fungi, plants).

  • High computational demands mean updates lag behind the latest genomes.

  • Some gene families remain unresolved due to complex duplication histories.

Best For

Gene family evolution, orthology detection, and comparative genomics using pre-computed, quality-controlled trees.


6. Ensembl Compara: Orthology Meets Reference Genomics

Overview

Ensembl Compara (https://ensembl.org/info/genome/compara) is part of the Ensembl genome browser ecosystem, maintained by EMBL-EBI. It provides phylogenetic trees and orthology predictions across model organisms and high-quality reference genomes.

Pipeline

  • Orthology and paralogy assignments via GeneTrees pipeline (Ensembl’s internal workflow).

  • Phylogenetic inference using TreeBeST and later IQ-TREE/RAxML for accuracy.

  • Integration with genome annotations and synteny information.

Strengths

  • Seamlessly linked with genome browsers — you can visualize gene trees directly within Ensembl.

  • Updated with each Ensembl release.

  • Provides confidence scores for orthology/paralogy calls.

Limitations

  • Focused on well-annotated genomes (mostly vertebrates, some plants and fungi).

  • Does not include non-model or draft assemblies.

  • Less emphasis on conflict visualization or tree uncertainty.

Best For

Model organism comparative genomics, gene evolution analysis, and functional annotation studies.


7. NCBI Taxonomy and Genome Tree: The Official Reference

Overview

The NCBI Taxonomy is not merely a naming system — it’s the backbone of nearly all biological databases.
Recently, NCBI has expanded it with a Genome Tree (https://www.ncbi.nlm.nih.gov/genome/tree), built from universal single-copy marker genes across thousands of genomes.

How It Works

  • Marker selection (usually ~120 bacterial or ~50 archaeal proteins).

  • Multiple sequence alignment and concatenation.

  • Tree inference using maximum likelihood.

  • Regular updates as new genomes are added.

Strengths

  • Standardized and curated — used as the reference tree in GenBank, RefSeq, and GTDB.

  • Scalable and automated.

  • Provides backbone for taxonomic classification.

Limitations

  • Limited marker set — not a full phylogenomic reconstruction.

  • Conservative in scope — includes only reference-quality assemblies.

  • Tree topology sometimes constrained by existing taxonomy.

Best For

Reference-level taxonomic placement and integration with other NCBI resources (BLAST, GenBank, RefSeq).


8. TimeTree: The Chronology of Life

Overview

TimeTree (http://www.timetree.org) focuses on one key dimension of evolution — time.
It compiles published divergence time estimates across taxa and provides consensus estimates and confidence intervals.

Features

  • Query any pair of species to get estimated divergence times and references.

  • Visualize timetrees with geological context.

  • Download time-calibrated phylogenies for use in comparative analyses.

Strengths

  • User-friendly, web-based, and well-cited.

  • Aggregates published molecular clock studies.

  • Emphasizes temporal calibration and uncertainty.

Limitations

  • Based on heterogeneous sources (different methods, models, fossil calibrations).

  • Not a primary phylogenetic inference database.

  • May not reflect the most recent genomic data.

Best For

Evolutionary timing, macroevolutionary rate studies, and educational visualization.


9. Other Notable Databases

DatabaseFocusHighlights
OrthoDBHierarchical catalog of orthologsIntegrates genome data across taxa, useful for comparative genomics
EggNOGOrthology and functional annotationCombines phylogenetic orthology with GO and KEGG annotation
PANTHERGene families and pathwaysLinks evolutionary relationships with biological function
GTDB (Genome Taxonomy Database)Genome-based bacterial and archaeal taxonomyReplaces traditional taxonomy with standardized, tree-based classification

Each of these fills a niche — from orthology to taxonomy — and together they illustrate how phylogenetic data now permeate every level of genomics infrastructure.


10. The Challenges of Phylogenetic Data Integration

Despite this rich ecosystem, integrating genome-wide phylogenies remains difficult. The main barriers are not technological but conceptual.

a. Heterogeneity of Input Data

Different studies use different:

  • Gene sets (mitochondrial vs. nuclear vs. transcriptomic)

  • Taxon sampling

  • Models and partitioning schemes

  • Tree-building algorithms

This makes direct comparison or synthesis non-trivial. Two trees may share taxa but differ in structure simply because they use different data sources or model assumptions.

b. Inconsistent Metadata and Standards

Metadata about alignments, models, software versions, and parameters are often incomplete or inconsistent.
Efforts like MIAPA (Minimum Information About a Phylogenetic Analysis) aim to standardize this, but adoption remains limited.

c. Taxonomic Instability

Species names and identifiers change frequently.
Without consistent identifiers (like NCBI TaxIDs or OTT IDs), cross-database integration can fail silently.

d. Provenance and Reproducibility

Even when trees are available, reconstructing exactly how they were generated can be impossible without the full pipeline.
Linking phylogenies to their computational workflows (e.g., via workflow managers or containers) is an emerging best practice.


11. The Advantages of Centralized Phylogenomic Resources

Despite these challenges, the benefits of shared phylogenetic databases are profound:

  • Reproducibility: Others can rerun analyses or test alternative models.

  • Data reuse: Large-scale meta-analyses become feasible — e.g., studying gene family evolution across hundreds of studies.

  • Interoperability: Databases like OTL and Ensembl link phylogenies directly to functional and ecological data.

  • Education and outreach: Publicly accessible trees foster understanding of biodiversity and evolution.

  • Cross-validation: Independent datasets can be compared, revealing robust patterns or inconsistencies.

The future of phylogenetics depends on making trees as discoverable and reusable as genome sequences themselves.


12. Emerging Directions: Linked, FAIR, and Living Trees

a. FAIR Principles

Modern phylogenetic databases increasingly adhere to FAIR principles — data should be Findable, Accessible, Interoperable, and Reusable.
This means persistent identifiers (DOIs, ORCIDs), open APIs, machine-readable formats, and linked metadata.

b. Linked Data and Knowledge Graphs

Efforts are underway to represent phylogenetic information as linked data, connecting taxa, genes, publications, and traits through semantic web technologies.
Projects like the OpenBioLink initiative and OToL Graph Store are early examples.

c. Continuous Integration and Versioning

Future databases may adopt version-controlled phylogenies, where trees are updated automatically as new genomes are added — similar to how software evolves.
Imagine a “GitHub for Trees” where each update documents what changed and why.

d. Cloud-Connected Analysis

Instead of downloading data, researchers will increasingly run analyses directly within cloud databases, using standardized APIs and scalable compute resources.


13. The Big Picture: Toward a Living Tree of Life

The ultimate goal of these efforts is not just to store phylogenies, but to create a dynamic, living Tree of Life — one that updates continuously as new genomes are sequenced and analyses are improved.

Imagine:

  • Clicking any node to see which genes support it, what models were used, and how confident we are.

  • Tracing a gene’s evolutionary path across thousands of species, annotated with function, structure, and expression.

  • Watching the tree evolve as new species are added and outdated analyses are replaced.

Such a system would transform evolutionary biology into a real-time, data-integrated science, bridging genomics, ecology, and paleontology.


14. Conclusion: Sharing Trees, Sharing Understanding

Genome-wide phylogenies are not just outputs — they’re data assets in their own right.
Storing them properly, annotating them thoroughly, and linking them transparently are now essential for the progress of the field.

Databases like TreeBASE, Open Tree of Life, PhylomeDB, Ensembl Compara, and others each contribute to this vision from different angles.
Together, they move us toward a more unified, open, and interconnected view of life’s history.

But this vision depends on us — the researchers — to share, annotate, and curate our trees responsibly.
After all, every tree we publish today is a branch on a much larger structure — the ever-growing, ever-updating Tree of Life.


Next Up:

In the final post — “The Road Ahead: What the Next Decade Holds for Phylogenomics” — we’ll look at the frontiers shaping the future: the Earth BioGenome Project, AI-assisted phylogenetics, reticulate evolution, and what it means to build a living, evolving understanding of life itself.