Genre is a complex but intuitively understood concept. Home pages, FAQs, blogs, etc. are examples of genres currently thriving on the web. Automatically identifying web genres would help us find documents that are more relevant to our information needs. The aim of the research described in this book is to develop automatic genre classification algorithms. There are several challenges, however, that affect the modelling of these algorithms. First, genres on the web are instantiated in web pages, which can be considered documents of a new type, much more unpredictable and individualised than...
