ISBN-13: 9783642535611 / Angielski / Miękka / 2012 / 477 str.
ISBN-13: 9783642535611 / Angielski / Miękka / 2012 / 477 str.
This book provides a new grade methodology for intelligent data analysis. It introduces a specific infrastructure of concepts needed to describe data analysis models and methods. This monograph is the only book presently available covering both the theory and application of grade data analysis and therefore aiming both at researchers, students, as well as applied practitioners. The text is richly illustrated through examples and case studies and includes a short introduction to software implementing grade methods, which can be downloaded from the editors.
1 Grade Data Analysis — A First Look.- 1.1 “Questions” from clients.- 1.2 About “Grade Models and Methods for Data Analysis”.- 1.3 Addressing the practitioner.- 1.4 Addressing the theorist.- 1.5 Regarding the analysis of data populations.- 1.6 Overview of Grade Data Analysis algorithms.- 1.7 Returning to the clients from the first page.- 1.8 Conclusion — Chapter 1.- 2 The Grade Approach.- 2.1 Introduction.- 2.2 Part 1: Quick start to the understanding of grade concepts.- 2.2.1 A simplified case of the grade approach.- 2.2.2 Examples of data distribution sources.- 2.3 Steps to making a concentration curve.- 2.4 Quick Start summary.- 2.5 Preview of Part 2, and suggestions before your eventual study of the multivariate material.- 2.6 Part 2: Understanding concentration curves.- 2.6.1 Introduction.- 2.6.2 Two identical distributions.- 2.6.3 Cylinder with partitions: cells of equal length, gas in equal proportions.- 2.6.4 Constructing a concentration curve from individual category segments.- 2.6.5 When proportions do not correspond between distributions.- 2.6.6 Using the concentration curve to introduce the concept of overrepresentation.- 2.6.7 Overrepresentation.- 2.6.8 When we manipulate both distributions: gas (unequal proportions) and cylinder (unequal cell sizes).- 2.6.9 Example application — Winners versus losers in the car sales market.- 2.6.10 Example application — Historic perspective (then vsnow) of car sales market.- 2.6.11 Reordering (prioritizing) categories and an introduction to the maximal concentration index.- 2.6.12 Part 2 summary.- 2.7 Chapter Summary.- 3 Univariate Lilliputian Model I.- 3.1 Introduction.- 3.2 Lilliputian variables and their basic parameters.- 3.2.1 The cdf of a Lilliputian variable.- 3.2.2 The expectation of a Lilliputian variable and the index ar.- 3.2.3 The first moment Lilliputian variable, its variance, and the Gini Index.- 3.2.4 Discontinuity measures.- 3.3 The main equivalence relation which creates the Univariate Lilliputian Model.- 3.3.1 Preliminary definitions and examples.- 3.3.2 Equivalent pairs of random variables.- 3.3.3 Grade transformations of univariate distributions.- 3.4 Grade parameters.- 3.4.1 The parameter ar.- 3.4.2 Normal concentration pattern.- 3.4.3 Likelihood ratio and local concentration.- 3.5 Appendix.- 3.5.1 Monotone grade probability transition function.- 3.5.2 Properties of concentration measures.- 4 Univariate Lilliputian Model II.- 4.1 Introduction.- 4.2 Lorenz Curve and Gini Index.- 4.2.1 Ratio variables and related concentration curves.- 4.2.2 First moment distribution and Lorenz curve.- 4.2.3 Lorenz Curves with horizontal and/or vertical segments.- 4.2.4 The variable called overrepresentation and its Lorenz curve.- 4.2.5 Diagram of over- and underrepresentation.- 4.2.6 Lorenz Curve and Gini Index for density transform of categorical variables.- 4.3 Order oriented concentration curves.- 4.3.1 Basic definitions.- 4.3.2 The maximal concentration curve and the maximal concentration index.- 4.3.3 Order oriented Lorenz Curve and inequality (Gini) index.- 4.3.4 Order oriented Lorenz Curve and Gini Index for the density transforms of categorical variables.- 4.3.5 Link with the two-class discriminant analysis.- 4.4 Dual concentration curve.- 4.4.1 Definition of the dual concentration curve and dual Lorenz curve.- 4.4.2 Random variable dual to a ratio variable.- 4.4.3 Dual links between overrepresentation and underrepresentation.- 4.4.4 Towards advantage problems in interpopulation comparisons.- 4.5 Appendix.- 4.5.1 Measurement scales.- 4.5.2 Supplement to Section 4.2 (the inequality measures).- 4.5.3 Supplement to Section 4.3.2 (the maximal concentration measures).- 4.5.4 Supplement to Section 4.3.3 (the ordered Lorenz Curve and Gini Index).- 4.5.5 Supplement to Section 4.4.2 (the random variable dual to a ratio variable).- 4.5.6 Bibliographical remarks to Chapter 3 and 4.- 5 Asymmetry and the inverse concentration set.- 5.1 Introduction.- 5.2 Concentration curves with a common value of the concentration index.- 5.3 Links between asymmetry and opposite orderings.- 5.4 Asymmetry in the Univariate Lilliputian Model.- 5.4.1 Asymmetry curves.- 5.4.2 Asymmetry index.- 5.4.3 Families of curves with special properties.- 5.5 Relative asymmetry.- 5.5.1 Links with measurement scales.- 5.5.2 Relative asymmetry measures.- 5.5.3 Examples.- 5.6 Appendix.- 5.6.1 The inverse concentration set.- 5.6.2 Asymmetry indices.- 5.6.3 Bibliographical remarks.- 6 Discretization and regularity.- 6.1 Introduction.- 6.2 Discretization framework.- 6.3 Optimal discretization for a given number of categories.- 6.4 Ideally regular concentration curves.- 6.5 On the determination of the number of categories.- 6.6 A parametric family of ideally regular Lilliputian curves.- 6.7 Appendix.- 6.7.1 Optimal discretization.- 6.7.2 Algorithm of optimal discretization.- 6.7.3 Bibliographical remarks.- 7 Preliminary concepts of bivariate dependence.- 7.1 Introduction.- 7.2 Contingency tables with m rows and k columns.- 7.3 Quadrant dependence.- 7.4 Matrices of ar’s for pairs of profilesTotal positivity of order two.- 7.5 The regression function.- 7.6 The monotone dependence function and the Gini Index.- 7.7 Appendix — Bibliographical remarks.- 8 Dependence Lilliputian Model.- 8.1 Introduction.- 8.2 Grade bivariate distributions and overrepresentation maps for probability tables.- 8.3 Lilliputian surfaces with uniform marginal distributions.- 8.4 Spearman’s rho and Kendall’s tau expressed by volumes and masses in the unit cube.- 8.5 Grade regression functions and related measures.- 8.6 On permuting rows and columns of m × k probability tables.- 8.6.1 Maximal grade correlation.- 8.6.2 Ordered Gini indices for marginal density transforms.- 8.6.3 Maximal Kendall’s tau.- 8.7 The hinged sequences of rows and columns.- 8.8 Appendix: Bibliographical remarks.- 9 Grade Correspondence Analysis and outlier detection.- 9.1 Introduction.- 9.2 Algorithms of GCA.- 9.2.1 GCA algorithm based on Spearman’s p*.- 9.2.2 GUA algorithm based on Kendall’s T.- 9.2.3 GCA algorithm based on Tsgn.- 9.2.4 GCA and a mixture of permuted discretized binormal tables.- 9.2.5 Folds.- 9.3 Algorithm for Smooth Grade Correspondence Analysis (SGCA).- 9.4 Examples of GCA and SGCA results.- 9.4.1 A mixture of binormals.- 9.4.2 BRIT7×7 and CARS16×16.- 9.5 Detection of rows and columns outlying the main trend.- 9.5.1 Scatterplots for rows and for columns.- 9.5.2 Measures of departure from TP2.- 9.5.3 Rejecting outlying rows and columns.- 9.6 Appendix — Bibliographical remarks.- 10 Cluster analysis based on GCA.- 10.1 Introduction.- 10.2 Single and double grade clustering.- 10.3 Optimal grade clustering.- 10.4 Cluster analysis in the detection of mixtures.- 10.4.1 Straight and reverse regular structures.- 10.4.2 Survey of small business servicing firms.- 10.4.3 SGCL results for the whole sample.- 10.4.4 SGCL results for the particular branches.- 10.4.5 Some final remarks.- 10.5 Cluster analysis and the detection of an imprecisely defined trend.- 10.5.1 The use of sources of capital by retail trade firms in Poland.- 10.5.2 Typology of firms for the pooled, three-year data.- 10.5.3 Firm typologies for annual data.- 10.5.4 Relationship between the generated firm typology and the firm profitability.- 10.6 On GCCA application to various data sets.- 10.7 Appendix.- 10.7.1 An algorithm for optimal clustering.- 10.7.2 Bibliographical remarks.- 11 Regularity and the number of clusters.- 11.1 Introduction.- 11.2 Generalization of the parabola family from the 𝕌𝕃𝕄.- 11.3 The ideal regularity of two-way data tables.- 11.4 Regularity and cluster detection.- 11.5 Cluster detection in finite data tables.- 11.6 Appendix — Bibliographical remarks.- 12 Grade approach to the analysis of finite data matrices.- 12.1 Introduction.- 12.2 Insight Examples.- 12.2.1 The Competitors-Judges Data (C/J Example).- 12.2.2 The Annual Bonus Data (A/B Example).- 12.3 Applicability of GCA.- 12.4 A revisit of the univariate data.- 12.5 Finite multivariate datasets and related inequality measures.- 12.5.1 Finite data tables and their grade regression functions.- 12.5.2 Lorenz Surfaces.- 12.5.3 Global differentiation and its decomposition.- 12.5.4 Decomposition of Difx.- 12.6 Transformations of variables.- 12.7 Detection of outliers and decomposition of a dataset.- 13 Inequality measures for multivariate distributions.- 13.1 Introduction.- 13.2 Inequality measures for multivariate distributions with finite sets of records.- 13.3 Inequality measures for multivariate distributions with nonfinite sets of records.- 13.4 Inequality measures for continuous bivariate distributions.- 13.4.1 A pair of independent uniform Lilliputian variables.- 13.4.2 A pair of functionally dependent Lilliputian variables.- 13.4.3 A family of TP2 distributions from 𝔹𝕃𝕄.- 13.4.4 Grade binormal distributions.- 13.5 Inequality measures for grade multinormal distributions.- 13.6 Inequality measures for the Moran distributions.- 13.7 Appendix — link between grade similarity and dissimilarity of two regularly dependent random variables.- 14 Case studies with multivariate data.- 14.1 Introduction.- 14.2 Case Study 1 — Main Trend of Questionnaire Data.- 14.2.1 The Questionnaire.- 14.2.2 The goal of the analysis.- 14.2.3 The Overrepresentation Map for Main Trend in dataset TOTAL.- 14.2.4 Interpretation of the results (with some general hints).- 14.3 Case Study 1 — Decomposition of the dataset into regular subpopulations.- 14.3.1 The Overrepresentation Maps for FIT-MT and OUT-MT.- 14.3.2 The grade strip charts for FIT-MT and OUT-MT.- 14.3.3 Two-way ordered clustering.- 14.4 Case Study 2 — Analysis of Engineering Data (Strength of Concrete).- 14.4.1 The variables.- 14.4.2 The goal of the analysis.- 14.4.3 The Overrepresentation Map for Main Trend in the dataset TOTAL.- 14.5 Case Study 2 — Decomposition of concrete mixtures into FITMT and OUT-MT.- 14.5.1 The Overrepresentation Maps for FIT-MT and OUT-MT.- 14.5.2 The grade strip charts for FIT-MT and OUT-MT.- 14.6 Final remarks for the two case studies.- 14.7 Appendix.- 14.7.1 Case Study 1 — further details of the analysis.- 14.7.2 Case Study 2 — further details of the analysis.- 14.7.3 Bibliographical remarks.- 15 The GradeStat program.- 15.1 Introduction.- 15.2 Main implemented features.- 15.2.1 Data overview.- 15.2.2 Charts.- 15.2.3 Preprocessing.- 15.2.4 Ordering.- 15.2.5 Clustering.- References.
This book provides a new grade methodology for intelligent data analysis. It introduces a specific infrastructure of concepts needed to describe data analysis models and methods. This monograph is the only book presently available covering both the theory and application of grade data analysis and therefore aiming both at researchers, students, as well as applied practitioners. The text is richly illustrated through examples and case studies and includes a short introduction to software implementing grade methods, which can be downloaded from the editors.
1997-2024 DolnySlask.com Agencja Internetowa