Exact tally of human genes remains elusive
BOSTON — No one really knows all the genetic parts needed to make a human being.
Exactly how many genes make up the human genome remains a mystery, even though scientists announced the completion of the Human Genome Project a decade ago. The project to decipher the genetic blueprint of humans was supposed to reveal all of the protein-producing genes needed to build a human body.
“Not only do we not know what all the genes are, we don’t even know how many there are,” Steven Salzberg of the University of Maryland in College Park said October 11 during a keynote address at the Beyond the Genome conference, held in Boston. Most estimates place the human gene count in the neighborhood of 22,000 genes, which falls between the number of genes in a chicken and the number in a grape.
Grape plants have 30,434 genes, by the latest count. Chickens have 16,736 genes, a number Salzberg said will likely grow as scientists put the finishing touches on the chicken genome. As in humans, the gene totals for each species are not as precise as they seem and are subject to revision.
The most accurate estimate of the human gene count is the RefSeq database maintained by the U.S. National Institutes of Health, Salzberg said. He laid out arguments for favoring this estimate, such as its inclusion of all confirmed genes to date, in a paper published in May inGenome Biology. By the RefSeq count, humans have 22,333 genes. But another government database lists 38,621 human genes. And a different project called Gencode currently recognizes 21,671.
Such disparate numbers stem from the fact that genes comprise only about 1 percent of the 3 billion As, Ts, Gs and Cs that make up the human genetic instruction book. And the genes aren’t conveniently laid out as single, continuous stretches of genetic code. Instead, human genes are found in protein-encoding pieces called exons, interspersed with stretches of DNA that don’t make protein. These spacers are called introns.
To make matters worse, each exon in a gene codes for only a portion of a protein. Cells can mix and match different combinations of exons to make various proteins.
Traditionally, scientists have used computer programs to sift through billions of DNA letters and pinpoint the locations of genes. The programs have improved over the years, but they still aren’t as good as people at plucking exons from the sea of introns and figuring out how those protein-encoding segments are spliced together, said Clara Amid, a computational biologist at the Wellcome Trust Sanger Institute in Hinxton, England.
Amid is involved in the Gencode project, an effort to identify all the human genes and the many permutations of those genes that can lead to a dizzying number of proteins. She and her colleagues pick out genes the old-fashioned way — by hand. The researchers get plenty of clues where genes are from computerized gene-finders, studies that sequence RNA produced by genes, and from comparisons of human DNA to the genomes of other animals. Synthesizing all that information allows people to accurately find and mark the locations of genes, a process scientists call annotation. “The best computerized methods could replicate the manual annotation only 40 to 50 percent of the time,” Amid said October 12 at the Beyond the Genome conference.
The Gencode team isn’t finished with its work; several chromosomes still need the human touch. Gencode’s current count is 21,671 human genes. “The number will go up, definitely,” Amid said. Already the team has located several new genes on chromosome 4 thanks to data from RNA-sequencing projects, she said.
Exactly how many new genes might be located by sequencing RNA instead of DNA is anyone’s guess. Scientists who sequenced RNA from fruit flies discovered 1,938 new genes, Brenton Graveley from the University of Connecticut Health Center in Farmington said at the conference.
The Mammalian Gene Collection, one effort to catalog all of the full-length RNA versions of genes, lists 18,877 human genes. That number is likely to represent the lower boundary of the gene count, Salzberg said.
If new RNA sequencing methods detect the same proportion of new genes in people as were found in fruit flies, the human genome could gain about 3,000 more genes in addition to those already confirmed by RefSeq. “That would be an exciting result,” Salzberg said. “I’d be surprised, but we like surprises in science.”
Source: M. Pertea and S. Salzberg/Genome Biology 2010; Credit: T. Dubé, chicken icon: Pinare/Shutterstock, human icon: Mysontuna/Shutterstock
Though simple organisms generally have relatively small genomes, gene number is not necessarily correlated to complexity. Here are a few different organisms, along with their current estimated gene counts.