"Data Science and Predictive Analytics is an effective resource for those desiring to extend their knowledge of data science, R or both. The book is comprehensive and serves as a reference guide for data analytics, especially relating to the biomedical, health care and social fields." (Mindy Capaldi, International Statistical Review, Vol. 87 (1), 2019)
1 Introduction.- 2 Foundations of R.- 3 Managing Data in R.- 4 Data Visualization.- 5 Linear Algebra & Matrix Computing.- 6 Dimensionality Reduction.- 7 Lazy Learning: Classification Using Nearest Neighbors.- 8 Probabilistic Learning: Classification Using Naive Bayes.- 9 Decision Tree Divide and Conquer Classification.- 10 Forecasting Numeric Data Using Regression Models.- 11 Black Box Machine-Learning Methods: Neural Networks and Support Vector Machines.- 12 Apriori Association Rules Learning.- 13 k-Means Clustering.- 14 Model Performance Assessment.- 15 Improving Model Performance.- 16 Specialized Machine Learning Topics.- 17 Variable/Feature Selection.- 18 Regularized Linear Modeling and Controlled Variable Selection.- 19 Big Longitudinal Data Analysis.- 20 Natural Language Processing/Text Mining.- 21 Prediction and Internal Statistical Cross Validation.- 22 Function Optimization.- 23 Deep Learning Neural Networks.- 24 Summary.- 25 Glossary.- 26 Index.- 27 Errata.
Dr. Ivo Dinov is the Director of the Statistics Online Computational Resource (SOCR) at the University of Michigan and is an expert in mathematical modeling, statistical analysis, high-throughput computational processing and scientific visualization of large datasets (Big Data). His applied research is focused on neuroscience, nursing informatics, multimodal biomedical image analysis, and distributed genomics computing. Examples of specific brain research projects Dr. Dinov is involved in include longitudinal morphometric studies of development (e.g., Autism, Schizophrenia), maturation (e.g., depression, pain) and aging (e.g., Alzheimer’s disease, Parkinson’s disease). He also studies the intricate relations between genetic traits (e.g., SNPs), clinical phenotypes (e.g., disease, behavioral and psychological test) and subject demographics (e.g., race, gender, age) in variety of brain and heart related disorders. Dr. Dinov is developing, validating and disseminating novel technology-enhanced pedagogical approaches for science education and active learning.
Over the past decade, Big Data have become ubiquitous in all economic sectors, scientific disciplines, and human activities. They have led to striking technological advances, affecting all human experiences. Our ability to manage, understand, interrogate, and interpret such extremely large, multisource, heterogeneous, incomplete, multiscale, and incongruent data has not kept pace with the rapid increase of the volume, complexity and proliferation of the deluge of digital information. There are three reasons for this shortfall. First, the volume of data is increasing much faster than the corresponding rise of our computational processing power (Kryder’s law > Moore’s law). Second, traditional discipline-bounds inhibit expeditious progress. Third, our education and training activities have fallen behind the accelerated trend of scientific, information, and communication advances. There are very few rigorous instructional resources, interactive learning materials, and dynamic training environments that support active data science learning. The textbook balances the mathematical foundations with dexterous demonstrations and examples of data, tools, modules and workflows that serve as pillars for the urgently needed bridge to close that supply and demand predictive analytic skills gap.
Exposing the enormous opportunities presented by the tsunami of Big data, this textbook aims to identify specific knowledge gaps, educational barriers, and workforce readiness deficiencies. Specifically, it focuses on the development of a transdisciplinary curriculum integrating modern computational methods, advanced data science techniques, innovative biomedical applications, and impactful health analytics.
The content of this graduate-level textbook fills a substantial gap in integrating modern engineering concepts, computational algorithms, mathematical optimization, statistical computing and biomedical inference. Big data analytic techniques and predictive scientific methods demand broad transdisciplinary knowledge, appeal to an extremely wide spectrum of readers/learners, and provide incredible opportunities for engagement throughout the academy, industry, regulatory and funding agencies.