ISBN-13: 9781119642145 / Angielski / Miękka / 2020 / 432 str.
ISBN-13: 9781119642145 / Angielski / Miękka / 2020 / 432 str.
Introduction xxviiChapter 1 What is Machine Learning? 1History of Machine Learning 1Alan Turing 1Arthur Samuel 2Tom M. Mitchell 2Summary Definition 3Algorithm Types for Machine Learning 3Supervised Learning 3Unsupervised Learning 4The Human Touch 4Uses for Machine Learning 4Software 4Stock Trading 5Robotics 6Medicine and Healthcare 6Advertising 7Retail and E-commerce 7Gaming Analytics 9The Internet of Things 10Languages for Machine Learning 10Python 10R 11Matlab 11Scala 11Ruby 11Software Used in This Book 11Checking the Java Version 12Weka Toolkit 12DeepLearning4J 13Kafka 13Spark and Hadoop 13Text Editors and IDEs 13Data Repositories 14UC Irvine Machine Learning Repository 14Kaggle 14Summary 14Chapter 2 Planning for Machine Learning 15The Machine Learning Cycle 15It All Starts with a Question 16I Don't Have Data! 16Starting Local 17Transfer Learning 17Competitions 17One Solution Fits All? 18Defining the Process 18Planning 18Developing 19Testing 19Reporting 19Refining 19Production 20Avoiding Bias 20Building a Data Team 20Mathematics and Statistics 20Programming 21Graphic Design 21Domain Knowledge 21Data Processing 22Using Your Computer 22A Cluster of Machines 22Cloud-Based Services 22Data Storage 23Physical Discs 23Cloud-Based Storage 23Data Privacy 23Cultural Norms 24Generational Expectations 24The Anonymity of User Data 25Don't Cross the "Creepy Line" 25Data Quality and Cleaning 26Presence Checks 26Type Checks 27Length Checks 27Range Checks 28Format Checks 28The Britney Dilemma 28What's in a Country Name? 31Dates and Times 33Final Thoughts on Data Cleaning 33Thinking About Input Data 34Raw Text 34Comma-Separated Variables 34JSON 35YAML 37XML 37Spreadsheets 38Databases 39Thinking About Output Data 39Don't Be Afraid to Experiment 40Summary 40Chapter 3 Data Acquisition Techniques 43Scraping Data 43Copy and Paste 44Google Sheets 46Using an API 47Acquiring Weather Data 48Migrating Data 50Installing Embulk 51Using the Quick Run 51Installing Plugins 52Migrating Files to Database 53Bulk Converting CSV to JSON 55Summary 56Chapter 4 Statistics, Linear Regression, and Randomness 57Working with a Basic Dataset 57Loading and Converting the Dataset 58Introducing Basic Statistics 59Minimum and Maximum Values 60Sum 61Mean 62Arithmetic Mean 62Harmonic Mean 62Geometric Mean 63The Relationship Between the Three Averages 63Mode 65Median 66Range 67Interquartile Ranges 67Variance 68Standard Deviation 69Using Simple Linear Regression 70Using Your Spreadsheet 70Writing a Program 73Embracing Randomness 75Finding Pi with Random Numbers 76Using Monte Carlo Pi in Clojure 77Summary 80Chapter 5 Working with Decision Trees 81The Basics of Decision Trees 81Uses for Decision Trees 81Advantages of Decision Trees 82Limitations of Decision Trees 82Different Algorithm Types 82How Decision Trees Work 84Decision Trees in Weka 88The Requirement 88Training Data 89Using Weka to Create a Decision Tree 90Creating Java Code from the Classification 94Testing the Classifier Code 99Thinking About Future Iterations 101Summary 101Chapter 6 Clustering 103What is Clustering? 103Where is Clustering Used? 104The Internet 104Business and Retail 104Law Enforcement 105Computing 105Clustering Models 105How the K-Means Works 106Calculating the Number of Clusters in a Dataset 108K-Means Clustering with Weka 110Preparing the Data 110The Workbench Method 111The Command-Line Method 116Converting CSV File to ARFF 116The Coded Method 120Summary 128Chapter 7 Association Rules Learning 129Where is Association Rules Learning Used? 129Web Usage Mining 130Beer and Diapers 130How Association Rules Learning Works 131Support 133Confidence 133Lift 134Conviction 134Defining the Process 134Algorithms 135Apriori 135FP-Growth 136Mining the Baskets--A Walk-Through 136The Raw Basket Data 136Using the Weka Application 137Inspecting the Results 141Summary 142Chapter 8 Support Vector Machines 143What is a Support Vector Machine? 143Where are Support Vector Machines Used? 144The Basic Classification Principles 144Binary and Multiclass Classification 144Linear Classifiers 146Confidence 147Maximizing and Minimizing to Find the Line 147How Support Vector Machines Approach Classification 148Using Linear Classification 148Using Non-Linear Classification 150Using Support Vector Machines in Weka 151Installing LibSVM 151A Classification Walk-Through 152Implementing LibSVM with Java 158Summary 164Chapter 9 Artificial Neural Networks 165What is a Neural Network? 165Artificial Neural Network Uses 166High-Frequency Trading 166Credit Applications 167Data Center Management 167Robotics 167Medical Monitoring 168Trusting the Black Box 168Breaking Down the Artificial Neural Network 169Perceptrons 169Activation Functions 170Multilayer Perceptrons 171Back Propagation 173Data Preparation for Artificial Neural Networks 174Artificial Neural Networks with Weka 175Generating a Dataset 175Loading the Data into Weka 177Configuring the Multilayer Perceptron 178Training the Network 180Altering the Network 182Increasing the Test Data Size 183Implementing a Neural Network in Java 183Creating the Project 183Writing the Code 185Converting from CSV to Arff 188Running the Neural Network 188Developing Neural Networks with DeepLearning4J 189Modifying the Data 189Viewing Maven Dependencies 190Handling the Training Data 191Normalizing Data 191Building the Model 192Evaluating the Model 193Saving the Model 193Building and Executing the Program 194Summary 195Chapter 10 Machine Learning with Text Documents 197Preparing Text for Analysis 198Apache Tika 198Cleaning the Text Data 203Stopwords 205Stemming 206N-grams 206TF/IDF 207Loading the Documents 207Calculating the Term Frequency 208Calculating the Inverse Document Frequency 208Computing the TF/IDF Score 209Reviewing the Final Code Listing 209Word2Vec 211Loading the Raw Text Data 212Tokenizing the Strings 212Creating the Model 212Evaluating the Model 213Reviewing the Final Code 214Basic Sentiment Analysis 216Loading Positive and Negative Words 216Loading Sentences 217Calculating the Sentiment Score 217Reviewing the Final Code 218Performing a Test Run 220Further Development 220Summary 221Chapter 11 Machine Learning with Images 223What is an Image? 223Introducing Color Depth 224Images in Machine Learning 225Basic Classifi cation with Neural Networks 226Basic Settings 226Loading the MNIST Images 226Model Configuration 227Model Training 228Model Evaluation 228Convolutional Neural Networks 228How CNNs Work 228CNN Demonstration 231Downloading the Image Data 231Basic Setup 232Handling the Training and Test Data 233Image Preparation 233CNN Model Configuration 234Model Training 236Model Evaluation 236Saving the Model 237Transfer Learning 237Summary 238Chapter 12 Machine Learning Streaming with Kafka 239What You Will Learn in This Chapter 239From Machine Learning to Machine Learning Engineer 240From Batch Processing to Streaming Data Processing 241What is Kafka? 241How Does It Work? 241Fault Tolerance 243Further Reading 243Installing Kafka 243Kafka as a Single-Node Cluster 244Kafka as a Multinode Cluster 245Topics Management 247Creating Topics 248Finding Out Information About Existing Topics 248Deleting Topics 249Sending Messages from the Command Line 249Receiving Messages from the Command Line 250Kafka Tool UI 250Writing Your Own Producers and Consumers 251Producers in Java 251Consumers in Java 255Building and Running the Applications 258The Streaming API 260Building a Streaming Machine Learning System 262Planning the System 263Continuous Training 265Determining Which Models to Use for Predictions 266Determining Which Algorithms to Use 268Simple Linear Regression 271Neural Network 274Kafka Topics 281Creating the Topics 281Kafka Connect 283Why Persist the Event Data? 283The REST API Microservice 285Processing Commands and Events 287Finding Kafka Brokers 288A Command or an Event? 289Making Predictions 293Prediction Streaming API 293Prediction Functions 296Predicting Linear Regression 298Predicting the Neural Network Model 299Running the Project 301Run MySQL 301Run Zookeeper 301Run Kafka 301Create the Topics 301Run Kafka Connect 301Model Builds 302Run Events Streaming Application 302Run Prediction Streaming Application 302Start the API 302Send JSON Training Data 302Train a Model 302Make a Prediction 303Summary 303Chapter 13 Apache Spark 305Spark: A Hadoop Replacement? 305Java, Scala, or Python? 306Downloading and Installing Spark 306A Quick Intro to Spark 306Starting the Shell 307Data Sources 307Testing Spark 308Spark Monitor 309Comparing Hadoop MapReduce to Spark 310Writing Stand-Alone Programs with Spark 313Spark Programs in Java 313Spark Program Summary 318Spark SQL 318Basic Concepts 318Wrapping Up SparkSQL 323Spark Streaming 323Basic Concepts 323Creating Your First Spark Stream 324Spark Streams from Kafka 326MLib: The Machine Learning Library 327Dependencies 328Decision Trees 328Clustering 330Association Rules with FP-Growth 332Summary 335Chapter 14 Machine Learning with R 337Installing R 337macOS 337Windows 338Linux 338Your First Run 338Installing R-Studio 339The R Basics 340Variables and Vectors 340Matrices 341Lists 342Data Frames 343Installing Packages 344Loading in Data 345Plotting Data 347Simple Statistics 350Simple Linear Regression 350Creating the Data 351The Initial Graph 351Regression with the Linear Model 351Making a Prediction 352Basic Sentiment Analysis 353Using Functions to Load in Word Lists 353Writing a Function to Score Sentiment 354Testing the Function 354Apriori Association Rules 355Installing the arules Package 355Gathering the Training Data 356Importing the Transaction Data 356Running the Apriori Algorithm 357Inspecting the Results 358Accessing R from Java 358Installing the rJava Package 358Creating Your First Java Code in R 359Calling R from Java Programs 359Setting Up an Eclipse Project 360Creating the Java/R Class 361Running the Example 361Extending Your R Implementations 363Connecting to Social Media with R 364Summary 366Appendix A Kafka Quick Start 367Installing Kafka 367Starting Zookeeper 367Starting Kafka 368Creating Topics 368Listing Topics 369Describing a Topic 369Deleting Topics 369Running a Console Producer 370Running a Console Consumer 370Appendix B The Twitter API Developer Application Configuration 371Appendix C Useful Unix Commands 375Using Sample Data 375Showing the Contents: cat, more, and less 376Example Command 376Expected Output 376Filtering Content: grep 377Example Command for Finding Text 377Example Output 377Sorting Data: sort 378Example Command for Basic Sorting 378Example Output 378Finding Unique Occurrences: uniq 380Showing the Top of a File: head 381Counting Words: wc 381Locating Anything: find 382Combining Commands and Redirecting Output 383Picking a Text Editor 383Colon Frenzy: Vi and Vim 383Nano 384Emacs 384Appendix D Further Reading 385Machine Learning 385Statistics 386Big Data and Data Science 386Visualization 387Making Decisions 387Datasets 388Blogs 388Useful Websites 389The Tools of the Trade 389Index 391
JASON BELL has worked in software development for over thirty years, now he focuses on large volume data solutions and helping retail and finance customers gain insight from that data with machine learning. He is also an active committee member for several international technology conferences.
1997-2025 DolnySlask.com Agencja Internetowa