First chapter will set the basic foundation of the subject for students. Like many other books, this introductory level chapter will comprise of the basic concepts. Introduction of the following concepts will be discussed:
• Data Science
• Importance of data science
• Applications of data science
• Data Driven Decision Making
• Data analysis
Chapter-2: Widely used techniques in data science
This chapter will discuss the concepts required for one to start working on data analysis. Chapter will comprise of the concepts that student should know before performing any task on data analysis and some of the tasks that can be performed as part of data analysis. Following concepts will be discussed.
• Supervised vs Unsupervised data
• Data understanding
• Data preparation
• Modeling
• Overfitting
• Random sampling
• Cross Validation
• Feature selection
• Outlier detection
• Rule extraction
Section-2: Data science: The “How”
Chapter-3: Statistical Inference
Every part of data analysis involves statistics and statistical inference to properly utilize data and perform decision making. This chapter will provide statistical concepts to support the data analysis tasks performed by students for decision making with real life data. Following topics will be discussed:
• Probability theory
• Transformations and expectations
• Common families of distribution
• Random variables
• Preparation of random samples
• Asymptotic evaluations
• Regression and regression models
Chapter-4: Supervised Learning
In real world, we come across two types of data, supervised and unsupervised. In this chapter, we will discuss the concepts, tools and techniques related to processing of supervised data with examples and decision making out of it. The following concepts will be discussed:
• Supervised Learning
• Classification and Regression
• Generalization, Overfitting and Underfitting
• Evaluation models
• Supervised learning algorithms
Chapter-5: Unsupervised Learning
The unsupervised data forms the other half of the data available in real world applications. Like previous chapter, this chapter will include the concepts, tools and techniques related to unsupervised data with examples. Following contents will be included:
• Challenges of unsupervised learning
• Processing and scaling
• Clustering
• Dimensionality reduction, feature extraction and manifold learning
• Unsupervised learning algorithms
Chapter-6: Natural language processing
In this chapter, we will focus on one particular sort of data that has become extremely common i.e. text data. We will see in this chapter the fundamental principles of natural language processing and will look at one of the common application of NLP that is sentiment analysis. Following contents will be discussed:
• Why Text Is Important
• Why Text Is Difficult
• Representation
• Sentiment Analysis
• Lexicon-based Approaches for Text Mining
Section-3: Data Science – The “Where”
Chapter-7: Customers Analytics
In this chapter, we will introduce he use of analytics for understanding customers and predicting their behaviour in different situations. This includes the understanding of loyalty programs, market research, understanding customer lifetime value, predicting churn, and identifying potential defaulters. These are few examples of what will be contained in this chapter.
Chapter-8: Operations Analytics
In this chapter, we will prepare our readers to understand and acknowledge the use of data science for improving business operations. For example, we will discuss how analyzing data can help avoid service outages, or at least predict the service outage in order to prepare contingency plans. Analyzing data can also help in identifying redundancies which can be removed in order to significantly reduce operational costs. We will give examples on how various manufacturing and service industries are using real-time sensor data to track their systems wear and tear. This helps them improve their mean time to repair by forecasting breakdown of different components well ahead in time.
Dr Usman Qamar has over 15 years of experience in data engineering and decision sciences both in academia and industry. He has a Masters in Computer Systems Design from University of Manchester Institute of Science and Technology (UMIST), UK. His MPhil in Computer Systems was a joint degree between UMIST and University of Manchester which focused on feature selection in big data. In 2008 he was awarded PhD from University of Manchester, UK. His Post PhD work at University of Manchester, involved various research projects including hybrid mechanisms for statistical disclosure (feature selection merged with outlier analysis) for Office of National Statistics (ONS), London, UK, churn prediction for Vodafone UK and customer profile analysis for shopping with the University of Ghent, Belgium. He is currently Associate Professor of Data Engineering at National University of Sciences and Technology (NUST), Pakistan. He has authored over 200 peer reviewed publications which includes 3 books published by Springer & Co. He is on the Editorial Board of many journals including Applied Soft Computing, Neural Computing and Applications, Computers in Biology and Medicine, Array. He has successfully supervised 5 PhD students and over 100 master students.
Dr. Muhammad Summair Raza has been affiliated with the Virtual University of Pakistan for more than 8 years and has taught a number of subjects to graduate-level students. He has authored several articles in quality journals and is currently working in the field of data analysis, big data with a focus on rough sets.
This book comprehensively covers the topic of data science. Data science is an umbrella term that encompasses data analytics, data mining, machine learning, and several other related disciplines. This book synthesizes both fundamental and advanced topics of a research area that has now reached maturity. The chapters of this book are organized into three sections:
The first section is an introduction to data science. Starting from the basic concepts, the book will highlight the types of data, its use, its importance and issues that are normally faced in data analytics. Followed by discussion on wide range of applications of data science and widely used techniques in data science.
The second section is devoted to the tools and techniques of data science. It consists of data pre-processing, feature selection, classification and clustering concepts as well as an introduction to text mining and opining mining.
And finally, the third section of the book focuses on two programming languages commonly used for data science projects i.e. Python and R programming language.
Although this book primarily serves as a textbook, it will also appeal to industrial practitioners and researchers due to its focus on applications and references. The book is suitable for both undergraduate and postgraduate students as well as those carrying out research in data science. It can be used as a textbook for undergraduate students in computer science, engineering and mathematics. It can also be accessible to undergraduate students from other areas with the adequate background. The more advanced chapters can be used by postgraduate researchers intending to gather a deeper theoretical understanding.