ISBN-13: 9780367346010 / Angielski / Twarda / 2019 / 356 str.
ISBN-13: 9780367346010 / Angielski / Twarda / 2019 / 356 str.
"This book organizes in one place the mathematics, probability, statistics and machine learning information that is required for a practitioner of cybersecurity analytics, as well as the basics of cybersecurity needed for a practitioner"--
Preface 1 Introduction 2 What is Data Analytics? 2.1 Data Ingestion 2.2 Data Processing and Cleaning 2.3 Visualization and Exploratory Analysis 2.3.1 Scatterplots 2.4 Pattern Recognition 2.4.1 Classification 2.4.2 Clustering 2.5 Feature extraction 2.5.1 Feature Selection 2.5.2 Random Projections 2.6 Modeling 2.6.1 Model Specification 2.6.2 Model Selection and Fitting 2.7 Evaluation 2.8 Strengths and Limitations 2.8.1 The Curse of Dimensionality 3 Security: Basics and Security Analytics 3.1 Basics of Security 3.1.1 Know Thy Enemy – Attackers and Their Motivations 3.1.2 Security Goals 3.2 Mechanisms for Ensuring Security Goals 3.2.1 Confidentiality 3.2.2 Integrity 3.2.3 Availability 3.2.4 Authentication 3.2.5 Access Control 3.2.6 Accountability 3.2.7 Non-repudiation 3.3 Threats, Attacks and Impacts 3.3.1 Passwords 3.3.2 Malware 3.3.3 Spam, Phishing and its Variants 3.3.4 Intrusions 3.3.5 Internet Surfing 3.3.6 System Maintenance and Firewalls 3.3.7 Other Vulnerabilities 3.3.8 Protecting Against Attacks 3.4 Applications of Data Science to Security Challenges 3.4.1 Cybersecurity Datasets 3.4.2 Data Science Applications 3.4.3 Passwords 3.4.4 Malware 3.4.5 Intrusions 3.4.6 Spam/Phishing 3.4.7 Credit Card Fraud/Financial Fraud 3.4.8 Opinion Spam 3.4.9 Denial of Service 3.5 Security Analytics and Why Do We Need It4 Statistics 4.1 Probability Density Estimation 4.2 Models 4.2.1 Poisson 4.2.2 Uniform 4.2.3 Normal 4.3 Parameter Estimation 4.3.1 The Bias-Variance Trade-Off 4.4 The Law of Large Numbers and the Central Limit Theorem 4.5 Confidence Intervals 4.6 Hypothesis Testing 4.7 Bayesian Statistics 4.8 Regression 4.8.1 Logistic Regression 4.9 Regularization 4.10 Principal Components 4.11 Multidimensional Scaling 4.12 Procrustes 4.13 Nonparametric Statistics 4.14 Time Series 5 Data Mining – Unsupervised Learning 5.1 Data Collection 5.2 Types of Data and Operations 5.2.1 Properties of Datasets 5.3 Data Exploration and Preprocessing 5.3.1 Data Exploration 5.3.2 Data Preprocessing/Wrangling 5.4 Data Representation 5.5 Association Rule Mining 5.5.1 Variations on the Apriori Algorithm 5.6 Clustering 5.6.1 Partitional Clustering 5.6.2 Choosing K 5.6.3 Variations on K-means Algorithm 5.6.4 Hierarchical Clustering 5.6.5 Other Clustering Algorithms 5.6.6 Measuring the Clustering Quality 5.6.7 Clustering Miscellany: Clusterability, Robustness, Incremental, 5.7 Manifold Discovery 5.7.1 Spectral Embedding 5.8 Anomaly Detection 5.8.1 Statistical Methods 5.8.2 Distance-based Outlier Detection 5.8.3 kNN based approach 5.8.4 Density-based Outlier Detection 5.8.5 Clustering-based Outlier Detection 5.8.6 One-class learning based Outliers 5.9 Security Applications and Adaptations 5.9.1 Data Mining for Intrusion Detection 5.9.2 Malware Detection 5.9.3 Stepping-stone Detection 5.9.4 Malware Clustering 5.9.5 Directed Anomaly Scoring for Spear Phishing Detection 5.10 Concluding Remarks and Further Reading 6 Machine Learning – Supervised Learning 6.1 Fundamentals of Supervised Learning 6.2 The Bayes Classifier 6.2.1 Naïve Bayes6.3 Nearest Neighbors Classifiers 6.4 Linear Classifiers 6.5 Decision Trees and Random Forests 6.5.1 Random Forest 6.6 Support Vector Machines 6.7 Semi-Supervised Classification 6.8 Neural Networks and Deep Learning 6.8.1 Perceptron 6.8.2 Neural Networks 6.8.3 Deep Networks 6.9 Topological Data Analysis 6.10 Ensemble Learning 6.10.1 Majority 6.10.2 Adaboost 6.11 One-class Learning 6.12 Online Learning 6.13 Adversarial Machine Learning 6.13.1 Adversarial Examples 6.13.2 Adversarial Training 6.13.3 Adversarial Generation 6.13.4 Beyond Continuous Data 6.14 Evaluation of Machine Learning 6.14.1 Cost-sensitive Evaluation 6.14.2 New Metrics for Unbalanced Datasets 6.15 Security Applications and Adaptations 6.15.1 Intrusion Detection 6.15.2 Malware Detection 6.15.3 Spam and Phishing Detection 6.16 For Further Reading 7 Text Mining 7.1 Tokenization 7.2 Preprocessing 7.3 Bag-Of-Words 7.4 Vector space model 7.4.1 Weighting 7.5 Latent Semantic Indexing 7.6 Embedding 7.7 Topic Models: Latent Dirichlet Allocation 7.8 Sentiment Analysis 8 Natural Language Processing 8.1 Challenges of NLP 8.2 Basics of Language Study and NLP Techniques 8.3 Text Preprocessing 8.4 Feature Engineering on Text Data 8.4.1 Morphological, Word and Phrasal Features 8.4.2 Clausal and Sentence Level Features 8.4.3 Statistical Features 8.5 Corpus-based Analysis 8.6 Advanced NLP Tasks 8.6.1 Part of Speech Tagging 8.6.2 Word sense Disambiguation 8.6.3 Language Modeling 8.6.4 Topic Modeling 8.7 Sequence to Sequence Tasks 8.8 Knowledge Bases and Frameworks 8.9 Natural Language Generation 8.10 Issues with Pipelining 8.11 Security Applications of NLP 8.11.1 Password Checking 8.11.2 Email Spam Detection 8.11.3 Phishing Email Detection 8.11.4 Malware Detection 8.11.5 Attack Generation 9 Big Data Techniques and Security 9.1 Key terms 9.2 Ingesting the Data 9.3 Persistent Storage 9.4 Computing and Analyzing 9.5 Techniques for Handling Big Data 9.6 Visualizing 9.7 Streaming Data 9.8 Big Data Security 9.8.1 Implications of Big Data Characteristics on Security and Privacy 9.8.2 Mechanisms for Big Data Security Goals A Linear Algebra Basics A.1 Vectors A.2 Matrices A.2.1 Eigenvectors and Eigenvalues A.2.2 The Singular Value Decomposition B Graphs B.1 Graph Invariants B.2 The Laplacian C Probability C.1 Probability C.1.1 Conditional Probability and Bayes’ Rule C.1.2 Base Rate Fallacy C.1.3 Expected Values and Moments C.1.4 Distribution Functions and Densities C.2 Models C.2.1 Bernoulli and Binomial C.2.2 Multinomial C.2.3 Uniform Bibliography Author Index Index
1997-2024 DolnySlask.com Agencja Internetowa