ISBN-13: 9781118834817 / Angielski / Twarda / 2015 / 480 str.
A hands on guide to web scraping and text mining for both beginners and experienced users of R
Preface xv
1 Introduction 1
1.1 Case study: World Heritage Sites in Danger 1
1.2 Some remarks on web data quality 7
1.3 Technologies for disseminating, extracting, and storing web data 9
1.3.1 Technologies for disseminating content on the Web 9
1.3.2 Technologies for information extraction from web documents 11
1.3.3 Technologies for data storage 12
1.4 Structure of the book 13
Part One A Primer onWeb and Data Technologies 15
2 HTML 17
2.1 Browser presentation and source code 18
2.2 Syntax rules 19
2.2.1 Tags, elements, and attributes 20
2.2.2 Tree structure 21
2.2.3 Comments 22
2.2.4 Reserved and special characters 22
2.2.5 Document type definition 23
2.2.6 Spaces and line breaks 23
2.3 Tags and attributes 24
2.3.1 The anchor tag 24
2.3.2 The metadata tag 25
2.3.3 The external reference tag 26
2.3.4 Emphasizing tags , , 26
2.3.5 The paragraphs tag
27
2.3.6 Heading tags
2.3.7 Listing content with
2.3.8 The organizational tags
2.3.9 The
2.3.10 The foreign script tag
Czytaj nas na: