DNA sequencing is a process of determining the order of the four chemical building blocks - called "nitrogenous bases" - that make up the DNA molecule.
The process used to sequence DNA is known as chain termination sequencing or Sanger DNA sequencing. It relies on a modified form of the polymerase chain reaction.
Sanger sequencing method (Chain Termination Sequencing)
Along with the nucleotides and polymerase used in the standard PCR process, the medium prepared for the chain termination reaction contains variants of each of the four DNA nucleotides that are known as dideoxynucleotides.
These dideoxynucleotides resemble regular DNA nucleotides, but lack the 3′ hydroxyl group.
Once a dideoxynucleotide has been added to an elongating DNA strand, DNA polymerase cannot add any more nucleotides.
The replication process thus ceases, and the resulting DNA fragment breaks off.
The replication medium contains only a small quantity of the dideoxynucleotide variants of each of the four DNA nucleotides.
As the polymerase chain reaction proceeds, there is a high probability that the polymerase enzyme will add a regular nucleotide to the growing chain and that the replication process will continue. But occasionally the polymerase will bind a dideoxynucleotide to the chain instead, and the reaction will terminate.
In a suspension that contains billions of elongating fragments, the end result is a series of fragments ending with an one of four dideoxynucleotide triphosphates (dd nucleotides).
Together, these fragments represent all the possible A, C, G and T nucleotide locations on the elongating strand.
Each of the four dideoxynucleotide variants can also be tagged with a different marker (for example, a dye that fluoresces a particular colour under ultraviolet light) to make the different nucleotides easily identifiable.
As the fragments separate by length and mass during gel electrophoresis, the markers indicate which nucleotide ends each fragment.
The gel can then be read from bottom to top to identify the nucleotide sequence. This step usually involves the use of an automated DNA sequencer, which speeds up the reading process.
Sanger method of DNA sequencing can be used to sequence DNA samples of up to about one thousand base pairs in a single reaction.
The sequencing of a large genome
The sequencing of a large genome now involves the following three basic steps:
1. Genome mapping
The entire genome is first randomly broken into smaller pieces of about 100 000 to 300 000 base pairs.
These sections of DNA are then cloned in a bacterial vector called a bacterial artificial chromosome or BAC.
By repeating this cycle several times, researchers obtain a series of overlapping BACs.
These BACs are then run through gel electrophoresis to determine their individual DNA fingerprints.
By studying the pattern of these fingerprints, researchers can determine the original order of the BACs within the genome.
2. Sequencing DNA
Once the original order of the BACs has been mapped, each BAC is broken by restriction endonucleases into much smaller fragments that can be sequenced using the chain termination reaction.
This sequencing step is sometimes referred to as BAC-to-BAC sequencing.
3. Analyzing the results
The pattern among the resulting overlapping DNA sequences is used to determine the order of the fragments within each BAC.
This procedure uses a number of different computer programs that can analyze DNA sequences.
Whole Genome Shotgun Sequencing
This method skips the genome mapping stage entirely.
Instead, it breaks the entire genome into random fragments of first about 2000 and then about 1000 base pairs. (Having fragments of different lengths helps make the nucleotide sequence assembly that follows more accurate.)
These fragments, which number in the millions, are then sequenced and analyzed, after which nucleotide sequences corresponding to chromosomes are assembled.
All of this is done with the aid of powerful computers and sophisticated software programs.
Next-generation sequencing (high-throughput DNA sequencing)
Next-generation DNA sequencing (NGS) is the term used to describe a number of different modern sequencing technologies including:
- Illumina (Solexa) DNA sequencing
- Roche 454 sequencing
- Ion torrent: Proton / PGM sequencing
- SOLiD sequencing
Next Generation Sequencing platforms perform massively parallel sequencing, during which millions of fragments of DNA from a single sample are sequenced in unison.
Massively parallel sequencing technology facilitates high-throughput sequencing, which allows an entire genome to be sequenced in less than one day.
In the past decade, several NGS platforms have been developed that provide low-cost, high-throughput sequencing.
The Human Genome Project
A complete draft of the human genome was first published in February 2001, making it the first mammalian genome to be sequenced.
The Human Genome Project (HGP) determined the sequence of the three billion base pairs that make up the human genome.
Among the project’s immediate results was the discovery that the DNA of all humans (Homo sapiens) is more than 99.9 percent identical.
Put another way, this means that all the differences among individuals across humanity result from variations in fewer than one in 1000 nucleotides in each individual’s genome.