ISBN-13: 9783639122633 / Angielski / Miękka / 2009 / 228 str.
ISBN-13: 9783639122633 / Angielski / Miękka / 2009 / 228 str.
Little is known about the content of the major search engines. We present an automatic ontology learning method which trains an ontology with world knowledge of hundreds of different subjects in a three-level taxonomy covering all the documents offered in our training set. We then mine this ontology to find important classification rules, and then use these rules to perform an extensive analysis of the content of the largest general purpose Internet search engines in use today. Also, instead of representing documents and collections as a set of terms, we represent them as a set of subjects, leading to a more robust representation of information and a decrease of synonymy.