ISBN-13: 9781505725933 / Angielski / Miękka / 2014 / 114 str.
Controlling the outbreak of epidemic diseases such as influenza has always been a concern for the United States. Traditional surveillance tools such as the ILINet and Virologic provide the Centers for Disease Control and Prevention (CDC) with influenza surveillance statistics at a lag of 1 to 2 weeks. The CDC requires a tool that can forecast the level of influenza activity. The rise in the popularity of social media websites such as Flickr, Twitter and Facebook has transformed the web into an interactive sharing platform. The huge amount of generated unstructured data has become an invaluable source for detecting patterns or novelties. This book explores the correlation between Twitter messages (tweets) and CDC ILI and Virologic surveillance data. Using 17 months of tweets, regression models are developed to predict influenza-related statistics. The proposed approach aggregates the weekly frequencies of hand-chosen words that are indicative of an influenza attack using separate predictor variables. The predictions generated by the best models are found to have a Pearson's correlation coefficient of 0.900 (95% CI: 0.732, 0.965) and 0.833 (95% CI: 0.574, 0.940) against the CDC ILI surveillance data and CDC Virologic surveillance data, respectively.