The author of this book defines genomics as the study of whole genomes via the integration of cytology, Mendelian-quantitative-population-molecular genetics with bioinformatics and automated sequencing. It attempts to answer, the author states, such questions as to the physical and chemical requirements of genomes, the possible necessity that genes be located at certain sites to function normally, that particular DNA sequences and structures needed for gene functions, the total number of functional genes necessary for a biological system, and the homology between the DNA sequences of different species. He addresses the book to biologists interested in statistical issues in genomics, for mathematicians in genomics, and for students of genetics.
To satisfy these three classes of readers, she lists three different "tracks" for them consisting of a sequence of chapters recommended. Another good feature of the book is the inclusion of exercises at the end of each chapter, an absolute necessity for the understanding of this field. For those interested specifically in the efficacy of transgenic strategies in the design of viable breeds of plants and animals, this book will be helpful, for the author emphasizes in the book that the major application of genomics will be in finding optimal breeding strategies in agriculture and forestry.
After a brief introduction in chapter 1, the author outlines briefly Mendelian, population, quantitative, and molecular genetics in chapter 2, each presented as a separate discipline. The author never really defines what a gene "is" in the context of molecular genetics, as he does in the disciplines, but instead views it as a sequence of base pairs in the DNA strands, which has the potential of being expressed via transcription, RNA splicing, and translation, to a particular protein. Fortunately, in one of the exercises at the end of the chapter, the reader finds a connection between molecular and classical genetics by examining a trait (viewed as a simple compound) of a hypothetical plant.
Genomics as a discipline is introduced in chapter 3, and defined as the analysis of data from nuclear genomes, with the intent of learning about their structure, function, and evolution. Genome structure and sources of genome variation are discussed. Biological techniques in genomics are briefly discussed for interested readers. The author is careful to point out that complex traits cannot be related to the DNA sequences available currently as little is known about the molecular identity of most genes controlling these traits. Helpful diagrams are used to illustrate the important concepts, such as mating schemes, chromosome rearrangements, "natural" populations used in genomic research, RAPD, AFLP, and a diagram outlining the history of genetic markers.
Chapter 4 is a summary of the mathematical statistics needed in the book, but the author does give an example of the methods, dealing with mapping a gene for resistance to fusiform rust disease.
The statistical modeling of a single locus is carried out in chapter 5, as an example of what can be done, and as a warm-up for multiple locus models which follow later in the book. The author outlines how to detect segregation distortion using chi-square and log likelihood tests, and methods for determining sample size for marking screening using controlled crosses. The reader can get an idea of the importance of using PCR and RFLP to screen polymorphic genetic markers. A disequilibrium coefficient is defined and log likelihood methods are used to estimate it. Heterozygosity is defined in terms of allelic frequencies and estimated statistically. The author also details the use of Monte Carlo simulation to screen polymorphic markers.
In chapter 6, the author uses goodness of fit, likelihood ratio tests, and recombination fraction estimation to perform two-locus linkage analysis. Newton-Rhapson methods are used to solve the (non-linear) likelihood equations for obtaining maximum likelihood estimates. The author answers the question as to how large the sample size should be for detecting linkage, with the expected log likelihood ratio test statistic being the tool for the determination of this. This analysis is generalized to the more difficult case of natural populations in chapter 7. Linkage disequilibrium is then used in chapter 8 to also study two-locus models. The transmission/disequilibrium test and other tests are discussed in the light of finding markers linked to disease genes.
Linkage groups, defined as groups of loci inherited together according to statistical criteria, are studied in chapter 9. Locus ordering is considered and studied as a case, interestingly, of the traveling salesman problem, and some algorithms are proposed for its solution, such as seriation, simulated annealing, and branch-and-bound. But likelihood and bootstrap approaches are also discussed.
Multi-locus models and the important concept of map distance are considered in chapter 10. This chapter is the most interesting and helpful in the book, for it discusses in detail the relationship between multi-point map distance and physical distance. Morgan's, Haldane's, Kosambi's, and other map functions are discussed. Also, and most importantly, the quality of a genomic map is quantified, using the confidence of estimated locus order and locus distribution on the map.
The pooling or merging of linkage maps are considered in chapter 11, followed by the study of QTLs in chapter 12. Regression techniques are used for single-marker analysis and interval mappings are used to locate QTLs. QTL mappings for natural populations are discussed and the author considers the "statistical power" of QTL detection experiments. The question as to what QTLs really are is addressed, particularly the role of molecular biology and genomic mapping, and the limitations of QTL mapping. The author ends his discussion of QTLs by asking what would be the best approach for modeling a quantitative trait. A brief discussion of computer methods ends the book.
After finishing the book and noting the explosive influence of molecular biology, it is natural to ask: Will statistical methods in genomics fade away and be replaced by deterministic methods based on molecular and metabolic models?