Chapter 1. Introduction.- Chapter 2. Data Annotation and Preprocessing.- Chapter 3. Text Representation.- Chapter 4. Text Representation with Pretraining and Fine-tuning.- Chapter 5. Text classification.- Chapter 6. Text Clustering.- Chapter 7. Topic Model.- Chapter 8. Sentiment Analysis and Opinion Mining.- Chapter 9. Topic Detection and Tracking.- Chapter 10. Information Extraction.- Chapter 11. Automatic Text Summarization.
Chengqing Zong is a Professor at the National Laboratory of Pattern Recognition (NLPR), Institute of Automation, Chinese Academy of Sciences (CASIA) and an adjunct professor in the School of Artificial Intelligence at University (SAIU) of Chinese Academy of Sciences (UCAS). He authored the book “Statistical Natural Language Processing” (which is in Chinese, sold more than 32K copies), and has published more than 200 papers on machine translation, natural language processing and cognitive linguistics. He served as the chairs for numerous prestigious conferences, such as ACL, COLING, AAAI and IJCAI , and has served as an associate editor for journals such as ACM TALLIP and ACTA Automatic Sinica, and as an editorial board member for journals including IEEE Intelligent Systems, Journal of Comput. Sci. & Tech. and Machine Translation. He is currently the President of the Asian Federation of Natural Language Processing (AFNLP) and a member of International Committee on Computational Linguistics (ICCL).
Rui Xia is a Professor at the School of Computer Science and Engineering, Nanjing University of Science and Technology, China. He has published more than 50 papers in high-quality journals and top-tiered conferences in the field of natural language processing and text data mining. He serves as area chair and senior program committee member for several top conferences, such as EMNLP, COLING, IJCAI, AAAI. He received the outstanding paper award of ACL 2019, and the Distinguished Young Scholar award from the Natural Science Foundation of Jiangsu Province, China in 2020..
Jiajun Zhang is a Professor at NLPR, CASIA and an adjunct professor in the SAIU of UCAS. He has published more than 80 conference papers and journal articles on natural language processing and text mining, and received 5 best paper awards. He served as the area chair or on the senior program committees for several top conferences, such as ACL, EMNLP, COLING, AAAI and IJCAI. He is the deputy director of China’s Machine Translation Technical Committee of the Chinese Information Processing Society of China. He received Qian Wei-Chang Science and Technology Award of Chinese Information Processing and CIPS Hanvon Youth Innovation Award. He was supported by the Elite Scientists Sponsorship Program of China Association for Science and Technology (CAST).
This book discusses various aspects of text data mining. Unlike other books that focus on machine learning or databases, it approaches text data mining from a natural language processing (NLP) perspective.
The book offers a detailed introduction to the fundamental theories and methods of text data mining, ranging from pre-processing (for both Chinese and English texts), text representation and feature selection, to text classification and text clustering. It also presents the predominant applications of text data mining, for example, topic modeling, sentiment analysis and opinion mining, topic detection and tracking, information extraction, and automatic text summarization. Bringing all the related concepts and algorithms together, it offers a comprehensive, authoritative and coherent overview.
Written by three leading experts, it is valuable both as a textbook and as a reference resource for students, researchers and practitioners interested in text data mining. It can also be used for classes on text data mining or NLP.