ISBN-13: 9781119674689 / Angielski / Twarda / 2021 / 400 str.
ISBN-13: 9781119674689 / Angielski / Twarda / 2021 / 400 str.
List of Figures xviiList of Tables xxiPreface xxiii1 Background of Data Science 11.1 Introduction 11.2 Origin of Data Science 21.3 Who is a Data Scientist? 21.4 Big Data 31.4.1 Characteristics of Big Data 41.4.2 Big Data Architectures 52 Matrix Algebra and Random Vectors 72.1 Introduction 72.2 Some Basics of Matrix Algebra 72.2.1 Vectors 72.2.2 Matrices 82.3 Random Variables and Distribution Functions 122.3.1 The Dirichlet Distribution 152.3.2 Multinomial Distribution 172.3.3 Multivariate Normal Distribution 182.4 Problems 193 Multivariate Analysis 213.1 Introduction 213.2 Multivariate Analysis: Overview 213.3 Mean Vectors 223.4 Variance-Covariance Matrices 243.5 Correlation Matrices 263.6 Linear Combinations of Variables 283.6.1 Linear Combinations of Sample Means 293.6.2 Linear Combinations of Sample Variance and Covariance 293.6.3 Linear Combinations of Sample Correlation 303.7 Problems 314 Time Series Forecasting 354.1 Introduction 354.2 Terminologies 364.3 Components of Time Series 394.3.1 Seasonal 394.3.2 Trend 404.3.3 Cyclical 414.3.4 Random 424.4 Transformations to Achieve Stationarity 424.5 Elimination of Seasonality via Differencing 444.6 Additive and Multiplicative Models 444.7 Measuring Accuracy of Different Time Series Techniques 454.7.1 Mean Absolute Deviation 464.7.2 Mean Absolute Percent Error 464.7.3 Mean Square Error 474.7.4 Root Mean Square Error 484.8 Averaging and Exponential Smoothing Forecasting Methods 484.8.1 Averaging Methods 494.8.1.1 Simple Moving Averages 494.8.1.2 Weighted Moving Averages 514.8.2 Exponential Smoothing Methods 544.8.2.1 Simple Exponential Smoothing 544.8.2.2 Adjusted Exponential Smoothing 554.9 Problems 575 Introduction to R 615.1 Introduction 615.2 Basic Data Types 625.2.1 Numeric Data Type 625.2.2 Integer Data Type 625.2.3 Character 635.2.4 Complex Data Types 635.2.5 Logical Data Types 645.3 Simple Manipulations - Numbers and Vectors 645.3.1 Vectors and Assignment 645.3.2 Vector Arithmetic 655.3.3 Vector Index 665.3.4 Logical Vectors 675.3.5 Missing Values 685.3.6 Index Vectors 695.3.6.1 Indexing with Logicals 695.3.6.2 A Vector of Positive Integral Quantities 695.3.6.3 A Vector of Negative Integral Quantities 695.3.6.4 Named Indexing 695.3.7 Other Types of Objects 705.3.7.1 Matrices 705.3.7.2 List 725.3.7.3 Factor 735.3.7.4 Data Frames 755.3.8 Data Import 765.3.8.1 Excel File 765.3.8.2 CSV File 765.3.8.3 Table File 775.3.8.4 Minitab File 775.3.8.5 SPSS File 775.4 Problems 786 Introduction to Python 816.1 Introduction 816.2 Basic Data Types 826.2.1 Number Data Type 826.2.1.1 Integer 826.2.1.2 Floating-Point Numbers 836.2.1.3 Complex Numbers 846.2.2 Strings 846.2.3 Lists 856.2.4 Tuples 866.2.5 Dictionaries 866.3 Number Type Conversion 876.4 Python Conditions 876.4.1 If Statements 886.4.2 The Else and Elif Clauses 896.4.3 The While Loop 906.4.3.1 The Break Statement 916.4.3.2 The Continue Statement 916.4.4 For Loops 916.4.4.1 Nested Loops 926.5 Python File Handling: Open, Read, and Close 936.6 Python Functions 936.6.1 Calling a Function in Python 946.6.2 Scope and Lifetime of Variables 946.7 Problems 957 Algorithms 977.1 Introduction 977.2 Algorithm - Definition 977.3 How toWrite an Algorithm 987.3.1 Algorithm Analysis 997.3.2 Algorithm Complexity 997.3.3 Space Complexity 1007.3.4 Time Complexity 1007.4 Asymptotic Analysis of an Algorithm 1017.4.1 Asymptotic Notations 1027.4.1.1 Big O Notation 1027.4.1.2 The Omega Notation, Omega 1027.4.1.3 The Theta Notation 1027.5 Examples of Algorithms 1047.6 Flowchart 1047.7 Problems 1058 Data Preprocessing and Data Validations 1098.1 Introduction 1098.2 Definition - Data Preprocessing 1098.3 Data Cleaning 1108.3.1 Handling Missing Data 1108.3.2 Types of Missing Data 1108.3.2.1 Missing Completely at Random 1108.3.2.2 Missing at Random 1108.3.2.3 Missing Not at Random 1118.3.3 Techniques for Handling the Missing Data 1118.3.3.1 Listwise Deletion 1118.3.3.2 Pairwise Deletion 1118.3.3.3 Mean Substitution 1128.3.3.4 Regression Imputation 1128.3.3.5 Multiple Imputation 1128.3.4 Identifying Outliers and Noisy Data 1138.3.4.1 Binning 1138.3.4.2 Box and Whisker plot 1138.4 Data Transformations 1158.4.1 Min-Max Normalization 1158.4.2 Z-score Normalization 1158.5 Data Reduction 1168.6 Data Validations 1178.6.1 Methods for Data Validation 1178.6.1.1 Simple Statistical Criterion 1178.6.1.2 Fourier Series Modeling and SSC 1188.6.1.3 Principal Component Analysis and SSC 1188.7 Problems 1199 Data Visualizations 1219.1 Introduction 1219.2 Definition - Data Visualization 1219.2.1 Scientific Visualization 1239.2.2 Information Visualization 1239.2.3 Visual Analytics 1249.3 Data Visualization Techniques 1269.3.1 Time Series Data 1269.3.2 Statistical Distributions 1279.3.2.1 Stem-and-Leaf Plots 1279.3.2.2 Q-Q Plots 1279.4 Data Visualization Tools 1299.4.1 Tableau 1299.4.2 Infogram 1309.4.3 Google Charts 1329.5 Problems 13310 Binomial and Trinomial Trees 13510.1 Introduction 13510.2 The Binomial Tree Method 13510.2.1 One Step Binomial Tree 13610.2.2 Using the Tree to Price a European Option 13910.2.3 Using the Tree to Price an American Option 14010.2.4 Using the Tree to Price Any Path Dependent Option 14110.3 Binomial Discrete Model 14110.3.1 One-Step Method 14110.3.2 Multi-step Method 14510.3.2.1 Example: European Call Option 14610.4 Trinomial Tree Method 14710.4.1 What is the Meaning of Little o and Big O? 14810.5 Problems 14811 Principal Component Analysis 15111.1 Introduction 15111.2 Background of Principal Component Analysis 15111.3 Motivation 15211.3.1 Correlation and Redundancy 15211.3.2 Visualization 15311.4 The Mathematics of PCA 15311.4.1 The Eigenvalues and Eigenvectors 15611.5 How PCAWorks 15911.5.1 Algorithm 16011.6 Application 16111.7 Problems 16212 Discriminant and Cluster Analysis 16512.1 Introduction 16512.2 Distance 16512.3 Discriminant Analysis 16612.3.1 Kullback-Leibler Divergence 16712.3.2 Chernoff Distance 16712.3.3 Application - Seismic Time Series 16912.3.4 Application - Financial Time Series 17112.4 Cluster Analysis 17312.4.1 Partitioning Algorithms 17412.4.2 k-Means Algorithm 17412.4.3 k-Medoids Algorithm 17512.4.4 Application - Seismic Time Series 17612.4.5 Application - Financial Time Series 17612.5 Problems 17713 Multidimensional Scaling 17913.1 Introduction 17913.2 Motivation 18013.3 Number of Dimensions and Goodness of Fit 18213.4 Proximity Measures 18313.5 Metric Multidimensional Scaling 18313.5.1 The Classical Solution 18413.6 Nonmetric Multidimensional Scaling 18613.6.1 Shepard-Kruskal Algorithm 18613.7 Problems 18714 Classification and Tree-Based Methods 19114.1 Introduction 19114.2 An Overview of Classification 19114.2.1 The Classification Problem 19214.2.2 Logistic Regression Model 19214.2.2.1 l1 Regularization 19314.2.2.2 l2 Regularization 19414.3 Linear Discriminant Analysis 19414.3.1 Optimal Classification and Estimation of Gaussian Distribution 19514.4 Tree-Based Methods 19714.4.1 One Single Decision Tree 19714.4.2 Random Forest 19814.5 Applications 20014.6 Problems 20215 Association Rules 20515.1 Introduction 20515.2 Market Basket Analysis 20515.3 Terminologies 20715.3.1 Itemset and Support Count 20715.3.2 Frequent Itemset 20715.3.3 Closed Frequent Itemset 20715.3.4 Maximal Frequent Itemset 20815.3.5 Association Rule 20815.3.6 Rule Evaluation Metrics 20815.4 The Apriori Algorithm 21015.4.1 An example of the Apriori Algorithm 21115.5 Applications 21315.5.1 Confidence 21415.5.2 Lift 21515.5.3 Conviction 21515.6 Problems 21616 Support Vector Machines 21916.1 Introduction 21916.2 The Maximal Margin Classifier 21916.3 Classification Using a Separating Hyperplane 22316.4 Kernel Functions 22516.5 Applications 22516.6 Problems 22717 Neural Networks 23117.1 Introduction 23117.2 Perceptrons 23117.3 Feed Forward Neural Network 23117.4 Recurrent Neural Networks 23317.5 Long Short-Term Memory 23417.5.1 Residual Connections 23517.5.2 Loss Functions 23617.5.3 Stochastic Gradient Descent 23617.5.4 Regularization - Ensemble Learning 23717.6 Application 23717.6.1 Emergent and Developed Market 23717.6.2 The Lehman Brothers Collapse 23717.6.3 Methodology 23817.6.4 Analyses of Data 23817.6.4.1 Results of the Emergent Market Index 23817.6.4.2 Results of the Developed Market Index 23817.7 Significance of Study 23917.8 Problems 24018 Fourier Analysis 24518.1 Introduction 24518.2 Definition 24518.3 Discrete Fourier Transform 24618.4 The Fast Fourier Transform (FFT) Method 24718.5 Dynamic Fourier Analysis 25018.5.1 Tapering 25118.5.2 Daniell Kernel Estimation 25218.6 Applications of the Fourier Transform 25318.6.1 Modeling Power Spectrum of Financial Returns Using Fourier Transforms 25318.6.2 Image Compression 25918.7 Problems 25919 Wavelets Analysis 26119.1 Introduction 26119.1.1 Wavelets Transform 26219.2 DiscreteWavelets Transforms 26419.2.1 HaarWavelets 26519.2.1.1 Haar Functions 26519.2.1.2 Haar Transform Matrix 26619.2.2 Daubechies Wavelets 26719.3 Applications of the Wavelets Transform 26919.3.1 Discriminating Between Mining Explosions and Cluster of Earthquakes 26919.3.1.1 Background of Data 26919.3.1.2 Results 26919.3.2 Finance 27119.3.3 Damage Detection in Frame Structures 27519.3.4 Image Compression 27519.3.5 Seismic Signals 27519.4 Problems 27620 Stochastic Analysis 27920.1 Introduction 27920.2 Necessary Definitions from Probability Theory 27920.3 Stochastic Processes 28020.3.1 The Index Set 28120.3.2 The State Space 28120.3.3 Stationary and Independent Components 28120.3.4 Stationary and Independent Increments 28220.3.5 Filtration and Standard Filtration 28320.4 Examples of Stochastic Processes 28420.4.1 Markov Chains 28520.4.1.1 Examples of Markov Processes 28620.4.1.2 The Chapman-Kolmogorov Equation 28720.4.1.3 Classification of States 28920.4.1.4 Limiting Probabilities 29020.4.1.5 Branching Processes 29120.4.1.6 Time Homogeneous Chains 29320.4.2 Martingales 29420.4.3 Simple Random Walk 29420.4.4 The Brownian Motion (Wiener Process) 29420.5 Measurable Functions and Expectations 29520.5.1 Radon-Nikodym Theorem and Conditional Expectation 29620.6 Problems 29921 Fractal Analysis - Lévy, Hurst, DFA, DEA 30121.1 Introduction and Definitions 30121.2 Lévy Processes 30121.2.1 Examples of Lévy Processes 30421.2.1.1 The Poisson Process (Jumps) 30521.2.1.2 The Compound Poisson Process 30521.2.1.3 Inverse Gaussian (IG) Process 30621.2.1.4 The Gamma Process 30721.2.2 Exponential Lévy Models 30721.2.3 Subordination of Lévy Processes 30821.2.4 Stable Distributions 30921.3 Lévy Flight Models 31121.4 Rescaled Range Analysis (Hurst Analysis) 31221.5 Detrended Fluctuation Analysis (DFA) 31521.6 Diffusion Entropy Analysis (DEA) 31621.6.1 Estimation Procedure 31721.6.1.1 The Shannon Entropy 31721.6.2 The H-alpha Relationship for the Truncated Lévy Flight 31921.7 Application - Characterization of Volcanic Time Series 32121.7.1 Background of Volcanic Data 32121.7.2 Results 32121.8 Problems 32322 Stochastic Differential Equations 32522.1 Introduction 32522.2 Stochastic Differential Equations 32522.2.1 Solution Methods of SDEs 32622.3 Examples 33522.3.1 Modeling Asset Prices 33522.3.2 Modeling Magnitude of Earthquake Series 33622.4 Multidimensional Stochastic Differential Equations 33722.4.1 The multidimensional Ornstein-Uhlenbeck Processes 33722.4.2 Solution of the Ornstein-Uhlenbeck Process 33822.5 Simulation of Stochastic Differential Equations 34022.5.1 Euler-Maruyama Scheme for Approximating Stochastic Differential Equations 34022.5.2 Euler-Milstein Scheme for Approximating Stochastic Differential Equations 34122.6 Problems 34323 Ethics: With Great Power Comes Great Responsibility 34523.1 Introduction 34523.2 Data Science Ethical Principles 34623.2.1 Enhance Value in Society 34623.2.2 Avoiding Harm 34623.2.3 Professional Competence 34723.2.4 Increasing Trustworthiness 34823.2.5 Maintaining Accountability and Oversight 34823.3 Data Science Code of Professional Conduct 34823.4 Application 35023.4.1 Project Planning 35023.4.2 Data Preprocessing 35023.4.3 Data Management 35023.4.4 Analysis and Development 35123.5 Problems 351Bibliography 353Index 359
MARIA CRISTINA MARIANI, PHD, is Shigeko K. Chan Distinguished Professor and Chair in the Department of Mathematical Sciences at The University of Texas at El Paso. She currently focuses her research on Stochastic Analysis, Differential Equations and Machine Learning with applications to Big Data and Complex Data sets arising in Public Health, Geophysics, Finance and others. Dr. Mariani is co-author of other Wiley books including Quantitative Finance.OSEI KOFI TWENEBOAH, PHD, is Assistant Professor of Data Science at Ramapo College of New Jersey. His main research is Stochastic Analysis, Machine Learning and Scientific Computing with applications to Finance, Health Sciences, and Geophysics.MARIA PIA BECCAR-VARELA, PHD, is Associate Professor of Instruction in the Department of Mathematical Sciences at the University of Texas at El Paso. Her research interests include Differential Equations, Stochastic Differential Equations, Wavelet Analysis and Discriminant Analysis applied to Finance, Health Sciences, and Earthquake Studies.
1997-2024 DolnySlask.com Agencja Internetowa