This book is open access under a CC BY 4.0 license
This open access book brings together the latest genome base prediction models currently being used by statisticians, breeders and data scientists. It provides an accessible way to understand the theory behind each statistical learning tool, the required pre-processing, the basics of model building, how to train statistical learning methods, the basic R scripts needed to implement each statistical learning tool, and the output of each tool. To do so, for each tool the book provides background theory, some elements of the R statistical software for its implementation, the conceptual underpinnings, and at least two illustrative examples with data from real-world genomic selection experiments. Lastly, worked-out examples help readers check their own comprehension.
The book will greatly appeal to readers in plant (and animal) breeding, geneticists and statisticians, as it provides in a very accessible way the necessary theory, the appropriate R code, and illustrative examples for a complete understanding of each statistical learning tool. In addition, it weighs the advantages and disadvantages of each tool.
Preface.- Chapter 1.- General elements of genomic selection and statistical learning.- Chapter. 2.- Preprocessing tools for data preparation.- Chapter. 3.- Elements for building supervised statistical machine learning models.- Chapter. 4.- Overfitting, model tuning and evaluation of prediction performance.- Chapter. 5.- Linear Mixed Models.- Chapter. 6.- Bayesian Genomic Linear Regression.- Chapter. 7.- Bayesian and classical prediction models for categorical and count data.- Chapter. 8.- Reproducing Kernel Hilbert Spaces Regression and Classification Methods.- Chapter. 9.- Support vector machines and support vector regression.- Chapter. 10.- Fundamentals of artificial neural networks and deep learning.- Chapter. 11.- Artificial neural networks and deep learning for genomic prediction of continuous outcomes.- Chapter. 12.- Artificial neural networks and deep learning for genomic prediction of binary, ordinal and mixed outcomes.- Chapter. 13.- Convolutional neural networks.- Chapter. 14.- Functional regression.- Chapter. 15.- Random forest for genomic prediction.
Dr. Osval Antonio Montesinos López earned a PhD in Statistics and Biometry from the University of Nebraska-Lincoln, USA, in 2014. He is currently a Professor of Statistics, Probability and Statistical Learning Methods at the Facultad de Telemática, University of Colima, México. His areas of interest include the development of novel genomic prediction models for plant breeding, high-dimensional data analysis, generalized linear mixed models and Bayesian analysis, multivariate analysis and experimental designs. He has contributed univariate and multivariate genomic prediction models for predicting breeding values in plants with normal, binary, count and ordinal phenotypes.
Dr. Abelardo Montesinos López holds a PhD in Probability and Statistics from the Centro de Investigación en Matemáticas (CIMAT), Guanajuato, México. He is currently a Professor of Statistical Interference, Probability and Statistical Learning Methods at the Departamento de Matemáticas, Centro Universitario de Ciencias Exactas e Ingenierías (CUCEI), Universidad de Guadalajara, Mexico. His areas of interest are: development of novel genomic prediction models for plant breeding, high dimensional data analysis, generalized linear mixed models, survival analysis, Bayesian analysis and multivariate analysis. He has contributed univariate and multivariate genomic prediction models for predicting breeding values in plants with normal, binary, count and ordinal phenotypes.
Dr. José Crossa is a distinguished Scientist at the Biometrics and Statistics Unit of the International Maize and Wheat Improvement Center (CIMMYT). He has contributed to the statistical analyses of plant breeding trials with an emphasis on modeling genotype x environment interactions, QTL x environment interactions and genomic x environment interactions. He has significantly advanced the integration of essential factors such as pedigree and trial data into genomic selection for crop breeding, by creating and describing sophisticated statistical models of proven effectiveness that have since been widely adopted. He is a Fellow of the Agronomy Society of America and of the Crop Science Society of America, Member of the Mexican Academy of Science, Member of the Mexican National Research System of the National Council of Research and Technology, invited professor at Universities in Mexico and Uruguay, and Adjunct Professor at the Department of Statistics and Department of Plant Science at the University of Nebraska-Lincoln, USA.
This book is open access under a CC BY 4.0 license
This open access book brings together the latest genome base prediction models currently being used by statisticians, breeders and data scientists. It provides an accessible way to understand the theory behind each statistical learning tool, the required pre-processing, the basics of model building, how to train statistical learning methods, the basic R scripts needed to implement each statistical learning tool, and the output of each tool. To do so, for each tool the book provides background theory, some elements of the R statistical software for its implementation, the conceptual underpinnings, and at least two illustrative examples with data from real-world genomic selection experiments. Lastly, worked-out examples help readers check their own comprehension.
The book will greatly appeal to readers in plant (and animal) breeding, geneticists and statisticians, as it provides in a very accessible way the necessary theory, the appropriate R code, and illustrative examples for a complete understanding of each statistical learning tool. In addition, it weighs the advantages and disadvantages of each tool.