Chapter Goal: Introduce the readers to the manifestations of falsehood in Big Data and its ramifications.
No of pages 30
Sub -Topics
1. The Big Data Phenomenon
2. The Four V’s
3. Veracity – the fourth ‘V’
4. Tracing truth in human endeavors
5. Veracity in the context of the Web
Chapter 2: Mathematical Abstraction
Chapter Goal: Present the math behind the method and develop a mathematical framework within which the problem and its solution can be discussed.
No of pages: 30
Sub - Topics
1. A fruit vendor example
2. Building the abstraction
3. Twitter Example – Sentiment Analysis
4. Solution Space
Chapter 3: Tools and Techniques
Chapter Goal: Introduce the Machine Learning and mathematical tools to solve the problem.
No of pages : 30
Sub -
Topics: 1. Machine Learning Algorithms – a quick primer
2. Kalman Filter
3. Statistical Techniques
Chapter 4: Veracity of Web Information
Chapter Goal: Use the concepts, tools, and techniques described in chapter 3 to examine the truthfulness of microblogs
No of pages: 50
Sub - Topics:
1. Machine Learning the truthfulness of twitter data
2. Statistical approaches to detect veiled attacks
3. Applying Kalman Filter to analyze sentiment fluctuations
Chapter 5: Future Directions
Chapter Goal: Explore ideas that the readers can consider for further delving into the topic, given that this is a niche area.
1. Natural Language Processing methods
2. Knowledge Representation Techniques
3. Ensemble Methods
Vishnu Pendyala is a Senior Member of IEEE and of the Computer Society of India (CSI), with over two decades of software experience with industry leaders such as Cisco, Synopsys, Informix (now IBM), and Electronics Corporation of India Limited. He is on the executive council of CSI, a member of the Special Interest Group on Big Data Analytics, and is the founding editor of its flagship publication, Visleshana. He recently taught a short-term course on “Big Data Analytics for Humanitarian Causes,” which was sponsored by the Ministry of Human Resources, Government of India under the GIAN scheme, and he delivered multiple keynote presentations at IEEE-sponsored international conferences. Vishnu has been living and working in the Silicon Valley for over two decades.
Examine the problem of maintaining the quality of big data and discover novel solutions. You will learn the four V’s of big data, including veracity, and study the problem from various angles. The solutions discussed are drawn from diverse areas of engineering and math, including machine learning, statistics, formal methods, and the Blockchain technology.
Veracity of Big Data serves as an introduction to machine learning algorithms and diverse techniques such as the Kalman filter, SPRT, CUSUM, fuzzy logic, and Blockchain, showing how they can be used to solve problems in the veracity domain. Using examples, the math behind the techniques is explained in easy-to-understand language.
Determining the truth of big data in real-world applications involves using various tools to analyze the available information. This book delves into some of the techniques that can be used. Microblogging websites such as Twitter have played a major role in public life, including during presidential elections. The book uses examples of microblogs posted on a particular topic to demonstrate how veracity can be examined and established. Some of the techniques are described in the context of detecting veiled attacks on microblogging websites to influence public opinion.
What You'll Learn:
Understand the problem concerning data veracity and its ramifications
Develop the mathematical foundation needed to help minimize the impact of the problem using easy-to-understand language and examples
Use diverse tools and techniques such as machine learning algorithms, Blockchain, and the Kalman filter to address veracity issues