Nagarjun's blog: Recent Expansion: The Bias That Both Reveals and Distorts Editing

Thursday, July 2, 2026

Recent Expansion: The Bias That Both Reveals and Distorts Editing

“recently hyperedited elements”

Source: Knisbacher and Levanon

Recent repeat expansion is the central gremlin in APOBEC dating. It helps detection because young copies preserve sharp editing signals. It hurts interpretation because many young copies can inflate counts, blur source relationships, and make a single ancestral editing event look like a crowd.

The detection advantage is straightforward. Suppose an APOBEC-edited LTR element inserts into a genome. At that moment, it is nearly identical to the source element except for the G-to-A edits. A pairwise detector can align the two and see the burst clearly. Ten million years later, both copies have accumulated unrelated substitutions. Some edited sites may be overwritten by additional mutations. Other mismatch classes accumulate. The alignment still contains the ancient edits, but the burst is harder to distinguish from background divergence.

Knisbacher and Levanon explicitly address this by noting that random mutations eventually mask the editing signal. They also show that edited elements are enriched among species-specific copies and that editing abundance correlates with young, intact retroelement content. In other words, the detection method sees best where the fossil dust is thinnest.

But recent expansion has a darker side. Imagine a repeat family undergoes a rapid burst in one lineage. The genome now contains thousands of very similar copies. Because the copies are young, even modest APOBEC bursts are detectable. A comparison across species may conclude that this lineage has unusually high APOBEC editing. That may be true, but it may also reflect more young substrate, better detectability, or both.

Now imagine one edited copy gives rise to descendants. Every descendant inherits the same edited positions. If the detector reports edited copies, the count rises. But if the biological question is “how often did APOBEC edit retroelement cDNA?”, those descendants may represent one original editing episode plus subsequent copying. This is the difference between edited-copy abundance and independent editing-event abundance.

How should a pipeline handle this?

First, report denominators. Counts of edited elements are nearly meaningless without counts of available elements, base pairs, subfamilies, and age classes. A species with more young LTR sequence should be expected to yield more detected editing.

Second, stratify by repeat age. Use species specificity, divergence from consensus, LTR-LTR divergence, subfamily age, and polymorphism where available. Compare edited and unedited copies within the same age bins.

Third, collapse likely descendants. Cluster edited copies by shared derived G-to-A sites and by flanking sequence context. If multiple copies share an improbable block of identical edited sites, they may descend from a common edited ancestor.

Fourth, use local phylogenies. Build a tree of repeat copies within a subfamily. Map edited sites onto the tree. Independent editing events should appear on terminal branches or distinct internal branches. Shared inherited edits should cluster on one branch.

Fifth, separate metrics. Publish at least four columns: edited sites, edited copies, edited subfamilies, and inferred independent editing events. These answer different questions.

Sixth, include sensitivity analysis. Ask how conclusions change if duplicates are aggressively collapsed, moderately collapsed, or not collapsed. If species rankings change dramatically, the result is copy-number-sensitive.

Seventh, avoid false precision in dating. Recent expansion can make insertion windows look narrow, but the true editing event may belong to a source lineage that predates observed copies. Conversely, multiple independent editing events may occur during a burst, making a narrow window biologically real.

This is why recent expansion is not merely a confounder. It is part of the biology. APOBEC activity matters most when mobile elements are active. A recent repeat burst provides both substrate and evolutionary pressure. The technical challenge is to decide whether the observed signal reflects more target material, stronger editing, better preservation, or repeated descent from a few edited ancestors.

Key technical takeaway: Recent expansion increases APOBEC detectability but can inflate apparent event counts. Analyses should normalize by young-repeat content and distinguish edited copies from independent editing episodes.

Thursday, July 2, 2026

Recent Expansion: The Bias That Both Reveals and Distorts Editing

No comments: