Tuesday, January 14, 2014

Telo-seq: Estimating telomere lengths with Next Generation Sequencing

Telomere lengths have been associated with changes in age and disease. The need for telomere length measurements has lead to development of many methods. However, each of these methods has its own set of advantages and drawbacks. Being able to make use of the latest advances in sequencing technology to measure telomere length is a very appealing idea. The Perl script Telo-seq is designed to demonstrate the possibility of utilizing whole genome sequencing data to estimate relative telomere lengths. Application of Telo-seq to human samples from different age groups shows the age-related reduction in telomere length that has been demonstrated in humans with other methods. The relative telomere length reduces from 0.0003 in newborns to 0.0001 in centenarians.

Availability and implementation: The perl script Telo-seq is freely available at http://code.google.com/p/telo-seq/


The ends of chromosomes contain repetitive regions known as telomeres that protect chromosomes from degradation or fusion with other chromosomes. During the normal life of a cell, the telomeres undergo reduction in size at each cell division. Hence, over time cells will loose their telomeres. This is followed by loss of DNA located at the ends of chromosomes. So eventually cells die due to the loss of their telomeres. This phenomenon is characterized by the Hayflick limit [1], which sets a limit to the number of times a cell with linear chromosomes can undergo cell division. Most unicellular organisms have circular chromosomes making it possible for them to replicate indefinitely. Changes in the length of telomeres have been studied and are known to undergo changes with factors like age, disease and stress.

Telomere lengths have been measured using various methods such as Terminal Restriction Fragment (TRF) southern blot, FISH, Q-FISH and real time PCR. New methods of measurement that are robust, easy to use and sensitive are always being developed. However, methods for telomere length measurement can have specific requirements such as cell type, amount of DNA and cell growth condition. Each method has its own set of drawbacks and advantages which has been reviewed earlier [2].

Next-Generation sequencing data has been used to estimate relative telomere length in male vs females pools by counting reads [3]. However, the effect of PCR duplicates, dynamic range of the estimates and variance between sequencing runs have not been evaluated. Here we present a easy to use stand-alone tool, Telo-Seq that can estimate relative telomere length and demonstrate that Next-Generation sequencing data can be used to measure age-related changes in relative telomere length.

Whole genome sequencing data from the cord blood of a new born, 26 year old female and a 103 year old male [4] are available in the SRA (Short Read Archive). Raw fastq files for each of the datasets was downloaded and analyzed. Reads were classified as telomeric reads (table 1) if at least 5 occurrences of the 6 base vertebrate specific telomeric repeat "TTAGGG" occurred sequentially in the 90 bp reads. Varying the required number of occurrences of the repeat does not affect the analysis as all reads analyzed are of 90 bp length. While comparing samples with different read lengths, this parameter has to be tuned accordingly based on the frequency of the telomeric repeat k-mer.

The estimation of relative telomere lengths using next generation sequencing has several advantages over the methods that are currently used. However, this method is not without limitations.


  1. Samples from most tissues or cell types can be used to obtain DNA for sequencing. With the availability of single cell sequencing methods, it will also be possible to measure telomere length differences between individual cells.
  2. Species specific requirements such as specific antibodies and special cell preparation procedures are not required.
  3. Prior knowledge of karyotype or genome assembly are not required as the measurements are relative to the total amount of DNA sequenced.


  1. The dynamic range of these measurements is very narrow, with just 100 to 300 reads per million reads sequenced. Pooling of samples is not possible due to small fraction of reads that are telomeric.
  2. Removal of PCR duplicates is almost impossible due to the repetitive nature of telomeric reads without introducing biases in the estimates.
  3. Large variance (see figure 1) between sequencing runs reduces the sensitivity of the method.

Further improvements in read length and removal of artefact's like PCR duplicates from these technologies will make this method more robust. While the cost of sequencing has come down drastically, being able to sequence a sample just to measure telomere lengths is still very expensive. However, this can be an attractive option to measure telomere length in samples that have been sequenced for other purposes.

Future development:

Coverage at different K-mer frequencies can be used to estimate the actual telomere lengths and has special utility for newly sequenced species. Large insert size mate-pair libraries can also be used along with the K-mer frequency estimates to obtain telomere lengths that are specific to chromosomes. Being able to correctly map one of the mates of a mate-pair also has utility in anchoring Next-generation genome assemblies.


  1. Shay JW & Wright WE, Nat Rev Mol Cell Biol. 2000. 1(1):72-6 [PMID:11413492].
  2. Vera E & Blasco M A, Aging (Albany NY). 2012. 4(6): 379–392 [PMCID: PMC3409675].
  3. Table 1: Details of the cell type and number of telomere reads in each replicate for all the samples analyzed.

  4. Castle JC et al, BMC Genomics. 2010. 11:244 [PMID:20398377]
  5. Heyn H et al, Proc Natl Acad Sci USA. 2012. 109(26):10522-7 [PMID:22689993]


    Table 1: Details of the cell type and number of telomere reads in each replicate for all the samples analyzed.

    SRR330574103 years417362717971600.0001535557CD4+T CellsmaleCaucasianPeripheral blood
    SRR330575103 years265722125536920.0001250131CD4+T CellsmaleCaucasianPeripheral blood
    SRR330576103 years396403533876720.0001121714CD4+T CellsmaleCaucasianPeripheral blood
    SRR330577103 years319683041479360.0001051067CD4+T CellsmaleCaucasianPeripheral blood
    SRR38924926 years1041805775173200.0001803929mononuclear cellsfemaleCaucasianPeripheral blood
    SRR38924826 years1748125685246840.0003074836mononuclear cellsfemaleCaucasianPeripheral blood
    SRR330578newborn879403139075760.0002801462CD4+T CellsmaleCaucasianUmbilical cord blood
    SRR330579newborn456801654092080.0002761636CD4+T CellsmaleCaucasianUmbilical cord blood
    SRR394135newborn20900853568040.0002448545CD4+T CellsmaleCaucasianUmbilical cord blood
    SRR394136newborn19340854936760.0002262156CD4+T CellsmaleCaucasianUmbilical cord blood
    SRR394137newborn22332855263560.0002611125CD4+T CellsmaleCaucasianUmbilical cord blood
    SRR394138newborn21432855242360.0002505956CD4+T CellsmaleCaucasianUmbilical cord blood
    SRR394139newborn24724799290720.0003093242CD4+T CellsmaleCaucasianUmbilical cord blood
    SRR394140newborn25080799295240.0003137764CD4+T CellsmaleCaucasianUmbilical cord blood
    SRR394141newborn26636798912280.0003334033CD4+T CellsmaleCaucasianUmbilical cord blood
    SRR394142newborn27300798854120.0003417395CD4+T CellsmaleCaucasianUmbilical cord blood
Figures: Figure 1: Relative telomere length measurements for Next-generation sequencing samples from different age groups.

No comments: