ISBN-13: 9781119680239 / Angielski / Miękka / 2022 / 336 str.
ISBN-13: 9781119680239 / Angielski / Miękka / 2022 / 336 str.
Preface xiAcknowledgments xiii1 Introduction 11.1 The Role of Computational Analysis in the Social Sciences 11.2 Why Python and/or R? 31.3 How to Use This Book 41.4 Installing R and Python 51.4.1 Installing R and RStudio 71.4.2 Installing Python and Jupyter Notebook 91.5 Installing Third-Party Packages 122 Getting Started: Fun with Data and Visualizations 132.1 Fun With Tweets 142.2 Fun With Textual Data 152.3 Fun With Visualizing Geographic Information 172.4 Fun With Networks 193 Programming Concepts for Data Analysis 233.1 About Objects and Data Types 243.1.1 Storing Single Values: Integers, Floating-Point Numbers, Booleans 253.1.2 Storing Text 263.1.3 Combining Multiple Values: Lists, Vectors, And Friends 283.1.4 Dictionaries 323.1.5 From One to More Dimensions: Matrices and n-Dimensional Arrays 333.1.6 Making Life Easier: Data Frames 343.2 Simple Control Structures: Loops and Conditions 353.2.1 Loops 363.2.2 Conditional Statements 373.3 Functions and Methods 394 How to Write Code 434.1 Re-using Code: How Not to Re-Invent the Wheel 434.2 Understanding Errors and Getting Help 464.2.1 Error Messages 464.2.2 Debugging Strategies 484.3 Best Practice: Beautiful Code, GitHub, and Notebooks 495 From File to Data Frame and Back 555.1 Why and When Do We Use Data Frames? 565.2 Reading and Saving Data 575.2.1 The Role of Files 575.2.2 Encodings and Dialects 595.2.3 File Handling Beyond Data Frames 615.3 Data from Online Sources 626 Data Wrangling 656.1 Filtering, Selecting, and Renaming 666.2 Calculating Values 676.3 Grouping and Aggregating 696.3.1 Combining Multiple Operations 706.3.2 Adding Summary Values 716.4 Merging Data 726.4.1 Equal Units of Analysis 726.4.2 Inner and Outer Joins 756.4.3 Nested Data 766.5 Reshaping Data: Wide To Long And Long To Wide 786.6 Restructuring Messy Data 797 Exploratory Data Analysis 837.1 Simple Exploratory Data Analysis 847.2 Visualizing Data 877.2.1 Plotting Frequencies and Distributions 887.2.2 Plotting Relationships 927.2.3 Plotting Geospatial Data 987.2.4 Other Possibilities 997.3 Clustering and Dimensionality Reduction 1007.3.1 k-means Clustering 1017.3.2 Hierarchical Clustering 1027.3.3 Principal Component Analysis and Singular Value Decomposition 1068 Statistical Modeling and Supervised Machine Learning 1138.1 Statistical Modeling and Prediction 1158.2 Concepts and Principles 1178.3 Classical Machine Learning: From Naïve Bayes to Neural Networks 1228.3.1 Naïve Bayes 1228.3.2 Logistic Regression 1248.3.3 Support Vector Machines 1258.3.4 Decision Trees and Random Forests 1278.3.5 Neural Networks 1298.4 Deep Learning 1308.4.1 Convolutional Neural Networks 1318.5 Validation and Best Practices 1338.5.1 Finding a Balance Between Precision and Recall 1338.5.2 Train, Validate, Test 1378.5.3 Cross-validation and Grid Search 1389 Processing Text 1419.1 Text as a String of Characters 1429.1.1 Methods for Dealing With Text 1449.2 Regular Expressions 1459.2.1 Regular Expression Syntax 1469.2.2 Example Patterns 1479.3 Using Regular Expressions in Python and R 1509.3.1 Splitting and Joining Strings, and Extracting Multiple Matches 15110 Text as Data 15510.1 The Bag of Words and the Term-Document Matrix 15610.1.1 Tokenization 15710.1.2 The DTM as a Sparse Matrix 15910.1.3 The DTM as a "Bag of Words" 16210.1.4 The (Unavoidable) Word Cloud 16310.2 Weighting and Selecting Documents and Terms 16410.2.1 Removing stop words 16510.2.2 Removing Punctuation and Noise 16710.2.3 Trimming a DTM 17010.2.4 Weighting a DTM 17110.3 Advanced Representation of Text 17210.3.1 n-grams 17310.2.3 Collocations 17410.3.3 Word Embeddings 17610.3.4 Linguistic Preprocessing 17710.4 Which Preprocessing to Use? 18211 Automatic Analysis of Text 18411.1 Deciding on the Right Method 18511.2 Obtaining a Review Dataset 18711.3 Dictionary Approaches to Text Analysis 18911.4 Supervised Text Analysis: Automatic Classification and Sentiment Analysis 19111.4.1 Putting Together a Workflow 19111.4.2 Finding the Best Classifier 19411.4.3 Using the Model 19811.4.4 Deep Learning 19911.5 Unsupervised Text Analysis: Topic Modeling 20311.5.1 Latent Dirichlet Allocation (LDA) 20311.5.2 Fitting an LDA Model 20611.5.3 Analyzing Topic Model Results 20711.5.4 Validating and Inspecting Topic Models 20811.5.5 Beyond LDA 20912 Scraping Online Data 21212.1 Using Web APIs: From Open Resources to Twitter 21312.2 Retrieving and Parsing Web Pages 21912.2.1 Retrieving and Parsing an HTML Page 21912.2.2 Crawling Websites 22312.2.3 Dynamic Web Pages 22512.3 Authentication, Cookies, and Sessions 22812.3.1 Authentication and APIs 22812.3.2 Authentication and Webpages 22912.4 Ethical, Legal, and Practical Considerations 23013 Network Data 23313.1 Representing and Visualizing Networks 23413.2 Social Network Analysis 24113.2.1 Paths and Reachability 24213.2.2 Centrality Measures 24613.2.3 Clustering and Community Detection 24814 Multimedia Data 25814.1 Beyond Text Analysis: Images, Audio and Video 25914.2 Using Existing Libraries and APIs 26114.3 Storing, Representing, and Converting Images 26314.4 Image Classification 27014.4.1 Basic Classification with Shallow Algorithms 27214.4.2 Deep Learning for Image Analysis 27314.4.3 Re-using an Open Source CNN 27915 Scaling Up and Distributing 28315.1 Storing Data in SQL and noSQL Databases 28315.1.1 When to Use a Database 28315.1.2 Choosing the Right Database 28515.1.3 A Brief Example Using SQLite 28615.2 Using Cloud Computing 28615.3 Publishing Your Source 29015.4 Distributing Your Software as Container 29116 Where to Go Next 29316.1 How Far Have We Come? 29316.2 Where To Go Next? 29416.3 Open, Transparent, and Ethical Computational Science 295Bibliography 297Index 303
Dr. Wouter van Atteveldt is an Associate Professor of Political Communication at Vrije Universiteit, Amsterdam. He is co-founder of the Computational Methods division of the International Communication Association, and Founding Chief Editor of Computational Communication Research. He has published extensively on innovative methods for analyzing political text and contributed to a number of relevant R and Python packages.Dr. Damian Trilling is an Associate Professor, Department of Communication Science, at the University of Amsterdam, and Associate Editor of Computational Communication Research. His research uses computational methods such as the analysis of digital trace data and large-scale text analysis to study the use and effects of news media. He has developed extensive teaching materials to introduce social scientists to the Python programming language.Dr. Carlos Arcila Calderón is an Associate Professor, Department of Sociology and Communication, at the University of Salamanca, Chief Editor of the journal Disertaciones, and member of the Editorial Board of Computational Communication Research. He has published extensively on new media and social media studies, and has led the prototype Autocop, a Spark-based environment to run distributed supervised sentiment analysis of Twitter messages.
1997-2025 DolnySlask.com Agencja Internetowa