ISBN-13: 9783639216448 / Angielski / Miękka / 2009 / 104 str.
Biomedical Data Mining is an ever growing area of Natural Laguage Processing. This book provides an introduction to this field and its experimental account using machine learning techniques. It describes a novel method of automatic training data production using a biomedical ontology. This is an alternative to the traditional approaches involving labour-expensive manual data annotation. More specifically we address the task of gene name disambiguation. In biomedical literature same gene names tend to be used to refer to a number of entities, e.g. gene itself, RNA sequence, the protein produced, or some other product. Therefore, when performing information extraction tasks identifying gene names is not sufficient and it is necessary to distinguish between all biological entities they refer to. We derive a set of rules from a biomedical ontology, and then apply them to tag the data. This data is then used to train a maximum entropy classifier, that proves to be capable to learn new information and improve over the ontology-based knowledge specified a priori. The machine learning techniques described in this book can be applied to text mining in any domain.