Part I Foundation.- 1. Introduction.- 2. File Systems and File Processing.- 3. Python Native Data Structures.- 4. Regular Expressions.- Part II Data Systems: The Data Models.- 5. Data Systems Models.- 6. Tabular Model: Structure and Formats.- 7. Tabular Model: Access Operations and pandas.- 8. Tabular Model: Advanced Operations and pandas.- 9. Tabular Model: Transformations and Constraints.- 10. Relational Model: Structure and Architecture.- 11. Relational Operations: Single Table.- 12. Relational Operations: Multiple Tables.- 13. Relational Database Programming.- 14. Relational Model: Design, Constraints, and Creation.- 15. Hierarchical Model: Structure and Formats.- 16. Hierarchical Model: Operations and Programming.- 17. Hierarchical Model: Constraints.- Part III Data Systems: The Data Sources.- 18. Overview of Data Systems Sources.- 19. Networking and Client-Server.- 20. The HyperText Transfer Protocol.- 21. Interlude: Client Data Acquisition.- 22. Web Scraping.- 23. RESTful Application Programming Interfaces.- 24. Authentication and Authorization.
Thomas Bressoud is Associate Professor in computer science and data analytics at Denison University, where he has been since 2002. Dr. Bressoud worked outside of academia both before and after completing his MS and PhD degrees from Cornell University in 1996, including seven years at MIT Lincoln Laboratory working in real-time radar systems. After his Ph.D., Dr. Bressoud worked for the startup Isis Distributed Systems and, through the acquisition frenzy of the 90’s, was working for Lucent Technologies when he transferred to their research arm, Bell Laboratories in Murray Hill, NJ. In both teaching and research, Bressoud’s focus is in the systems area of computer science, specializing in high performance data systems, parallel systems, and in fault tolerance.
David White is Associate Professor in computer science, data analytics, and mathematics at Denison University. After his undergraduate degree at Bowdoin College, David carried out applied data analysis work for the Department of Defense. He went on to earn his MS in computer science, and PhD in mathematics from Wesleyan University in 2014. His research has resulted in over fifteen publications in mathematics, applied statistics, computer science, economics, and data science. In addition to publications on data science pedagogy, and a chapter for the book Data Science for Mathematicians, he has applied data science techniques to carry out research related to the opioid epidemic, gun violence, and biomedical treatments.
Encompassing a broad range of forms and sources of data, this textbook introduces data systems through a progressive presentation. Introduction to Data Systems covers data acquisition starting with local files, then progresses to data acquired from relational databases, from REST APIs and through web scraping. It teaches data forms/formats from tidy data to relationally defined sets of tables to hierarchical structure like XML and JSON using data models to convey the structure, operations, and constraints of each data form.
The starting point of the book is a foundation in Python programming found in introductory computer science classes or short courses on the language, and so does not require prerequisites of data structures, algorithms, or other courses. This makes the material accessible to students early in their educational career and equips them with understanding and skills that can be applied in computer science, data science/data analytics, and information technology programs as well as for internships and research experiences. This book is accessible to a wide variety of students. By drawing together content normally spread across upper level computer science courses, it offers a single source providing the essentials for data science practitioners. In our increasingly data-centric world, students from all domains will benefit from the “data-aptitude” built by the material in this book.