Telomere
lengths have been associated with changes in age and disease. The
need for telomere length measurements has lead to development of many
methods. However, each of these methods has its own set of advantages
and drawbacks. Being able to make use of the latest advances in
sequencing technology to measure telomere length is a very appealing
idea. The Perl script Telo-seq is designed to demonstrate the
possibility of utilizing whole genome sequencing data to estimate
relative telomere lengths. Application of Telo-seq to human samples
from different age groups shows the age-related reduction in telomere
length that has been demonstrated in humans with other methods. The
relative telomere length reduces from 0.0003 in newborns to 0.0001 in
centenarians.
Availability
and implementation: The
perl script Telo-seq is freely available at
http://code.google.com/p/telo-seq/
Background:
The ends of
chromosomes contain repetitive regions known as telomeres that
protect chromosomes from degradation or fusion with other
chromosomes. During the normal life of a cell, the telomeres undergo
reduction in size at each cell division. Hence, over time cells will
loose their telomeres. This is followed by loss of DNA located at the
ends of chromosomes. So eventually cells die due to the loss of
their telomeres. This phenomenon is characterized by the Hayflick
limit [1], which sets a limit to the number of times a cell with
linear chromosomes can undergo cell division. Most unicellular
organisms have circular chromosomes making it possible for them to
replicate indefinitely. Changes in the length of telomeres have been
studied and are known to undergo changes with factors like age,
disease and stress.
Telomere
lengths have been measured using various methods such as Terminal
Restriction Fragment (TRF) southern blot, FISH, Q-FISH and real time
PCR. New methods of measurement that are robust, easy to use and
sensitive are always being developed. However, methods for telomere
length measurement can have specific requirements such as cell type,
amount of DNA and cell growth condition. Each method has its own set
of drawbacks and advantages which has been reviewed earlier [2].
Next-Generation
sequencing data has been used to estimate relative telomere length in
male vs females pools by counting reads [3]. However, the effect of
PCR duplicates, dynamic range of the estimates and variance between
sequencing runs have not been evaluated. Here we present a easy to
use stand-alone tool, Telo-Seq that can estimate relative telomere
length and demonstrate that Next-Generation sequencing data can be
used to measure age-related changes in relative telomere length.
Whole genome
sequencing data from the cord blood of a new born, 26 year old female
and a 103 year old male [4] are available in the SRA (Short Read
Archive). Raw fastq files for each of the datasets was downloaded and
analyzed. Reads were classified as telomeric reads (table 1) if at
least 5 occurrences of the 6 base vertebrate specific telomeric
repeat "TTAGGG" occurred sequentially in the 90 bp reads.
Varying the required number of occurrences of the repeat does not
affect the analysis as all reads analyzed are of 90 bp length. While
comparing samples with different read lengths, this parameter has to
be tuned accordingly based on the frequency of the telomeric repeat
k-mer.
The
estimation of relative telomere lengths using next generation
sequencing has several advantages over the methods that are currently
used. However, this method is not without limitations.
Advantages:
- Samples from most tissues or cell types can be used to obtain DNA for sequencing. With the availability of single cell sequencing methods, it will also be possible to measure telomere length differences between individual cells.
- Species specific requirements such as specific antibodies and special cell preparation procedures are not required.
- Prior knowledge of karyotype or genome assembly are not required as the measurements are relative to the total amount of DNA sequenced.
Limitations:
- The dynamic range of these measurements is very narrow, with just 100 to 300 reads per million reads sequenced. Pooling of samples is not possible due to small fraction of reads that are telomeric.
- Removal of PCR duplicates is almost impossible due to the repetitive nature of telomeric reads without introducing biases in the estimates.
- Large variance (see figure 1) between sequencing runs reduces the sensitivity of the method.
Further
improvements in read length and removal of artefact's like PCR
duplicates from these technologies will make this method more robust.
While the cost of sequencing has come down drastically, being able to
sequence a sample just to measure telomere lengths is still very
expensive. However, this can be an attractive option to measure
telomere length in samples that have been sequenced for other
purposes.
Future
development:
Coverage at
different K-mer frequencies can be used to estimate the actual
telomere lengths and has special utility for newly sequenced species.
Large insert size mate-pair libraries can also be used along with the
K-mer frequency estimates to obtain telomere lengths that are
specific to chromosomes. Being able to correctly map one of the mates
of a mate-pair also has utility in anchoring Next-generation genome
assemblies.
References:
- Shay JW & Wright WE, Nat Rev Mol Cell Biol. 2000. 1(1):72-6 [PMID:11413492].
- Vera E & Blasco M A, Aging (Albany NY). 2012. 4(6): 379–392 [PMCID: PMC3409675].
- Castle JC et al, BMC Genomics. 2010. 11:244 [PMID:20398377]
- Heyn H et al, Proc Natl Acad Sci USA. 2012. 109(26):10522-7 [PMID:22689993]Tables:
Table 1: Details of the cell type and number of telomere reads in each replicate for all the samples analyzed.
Sample Age Telo_read Total_reads Ratio cell_type gender racial_classification source_tissue SRR330574 103 years 41736 271797160 0.0001535557 CD4+T Cells male Caucasian Peripheral blood SRR330575 103 years 26572 212553692 0.0001250131 CD4+T Cells male Caucasian Peripheral blood SRR330576 103 years 39640 353387672 0.0001121714 CD4+T Cells male Caucasian Peripheral blood SRR330577 103 years 31968 304147936 0.0001051067 CD4+T Cells male Caucasian Peripheral blood SRR389249 26 years 104180 577517320 0.0001803929 mononuclear cells female Caucasian Peripheral blood SRR389248 26 years 174812 568524684 0.0003074836 mononuclear cells female Caucasian Peripheral blood SRR330578 newborn 87940 313907576 0.0002801462 CD4+T Cells male Caucasian Umbilical cord blood SRR330579 newborn 45680 165409208 0.0002761636 CD4+T Cells male Caucasian Umbilical cord blood SRR394135 newborn 20900 85356804 0.0002448545 CD4+T Cells male Caucasian Umbilical cord blood SRR394136 newborn 19340 85493676 0.0002262156 CD4+T Cells male Caucasian Umbilical cord blood SRR394137 newborn 22332 85526356 0.0002611125 CD4+T Cells male Caucasian Umbilical cord blood SRR394138 newborn 21432 85524236 0.0002505956 CD4+T Cells male Caucasian Umbilical cord blood SRR394139 newborn 24724 79929072 0.0003093242 CD4+T Cells male Caucasian Umbilical cord blood SRR394140 newborn 25080 79929524 0.0003137764 CD4+T Cells male Caucasian Umbilical cord blood SRR394141 newborn 26636 79891228 0.0003334033 CD4+T Cells male Caucasian Umbilical cord blood SRR394142 newborn 27300 79885412 0.0003417395 CD4+T Cells male Caucasian Umbilical cord blood
Table 1:
Details of the cell type and number of telomere reads in each
replicate for all the samples analyzed.
No comments:
Post a Comment