ISBN-13: 9783639121292 / Angielski / Miękka / 2009 / 280 str.
The advent of low-cost mass-sequencing of genomespresents significant data management difficulties.These will grow worse as it becomes routine tosequence the genomes of individual people andorganisms, because existing systems store and searcheach genome separately. This approach is notfeasible for searching andcomparing the genomes of millions or billions ofindividual organisms. This book seeks to solve thisproblem by describing the DASH sequence alignment andcompression algorithms. DASHmakes use of the overwhelming similarities amongstgenomes of a given species in order to compress, notonly the database size, but also the index size andsearch time. The resulting novel approach todatabase compression, index compression,bioinformatics and information-retrieval should be ofespecial interest to anyone who has an interest inthe storage and efficient searching of large datasets, whether DNA or any other subject which offerssome degree of redundancy, such as natural languagetext or web pages.