ISBN-13: 9780195089653 / Angielski / Twarda / 2011 / 792 str.
The recent dramatic rise in the number of public datasets available free from the Internet, coupled with the evolution of the Open Source software movement, which makes powerful analysis packages like R freely available, have greatly increased both the range of opportunities for exploratory data analysis and the variety of tools that support this type of analysis.
This book will provide a thorough introduction to a useful subset of these analysis tools, illustrating what they are, what they do, and when and how they fail. Specific topics covered include descriptive characterizations like summary statistics (mean, median, standard deviation, MAD scale estimate), graphical techniques like boxplots and nonparametric density estimates, various forms of regression modeling (standard linear regression models, logistic regression, and highly robust techniques like least trimmed squares), and the recognition and treatment of important data anomalies like outliers and missing data. The unique combination of topics presented in this book separate it from any other book of its kind. Intended for use as an introductory textbook for an exploratory data analysis course or as self-study companion for professionals and graduate students, this book assumes familiarity with calculus and linear algebra, though no previous exposure to probability or statistics is required. Both simulation-based and real data examples are included, as are end-of-chapter exercises and both R code and datasets.