ISBN-13: 9783656659815 / Angielski / Miękka / 2014 / 44 str.
Bachelor Thesis from the year 2014 in the subject Computer Science - Bioinformatics, grade: 8.26, Lovely Professional University, Punjab, course: b.tech honors biotechnology, language: English, abstract: As the number of genomes sequenced is increasing at high rate, there is a need of gene prediction method which is quick, reliable, inexpensive. In such conditions, the computations tool will serve as an alternative to wet lab methods. The confidence level of annotation by the tool can be enhanced by preparing exhaustive training data sets. The aim is to develop a tool which will read data from a DNA sequence file in the fasta format and will annotate it. For this purpose Genome Database was used to retrieve the input data. PERL programming has been put to develop this tool for annotation. To increase the confidence level of annotation the data was validated from multiple sources. Perl script was written to find the promoter region, repeats, transcription factor binding site, base periodicity, and nucleotide frequency. The program written was also executed to identify repeats, poly (A) signals, CpG islands, ARS. The tool will annotate the DNA by predicting the gene structure based on the consensus sequences of important regulatory elements. The confidence level of annotation of the predicted gene, non-coding region, ARS, repeats etc. were checked by running test dataset. This test dataset was annotated data as reported by genome database and computational tools. Gene prediction of the non-coding regions as reported by genome database (SGD) were performed by existing tools; the regions identified as non-coding by these tools were then analyzed for presence of repeats. The BLAST was used to annotate on the basis of sequence similarity with the already annotated genes. GeneMark.hmm and FGENESH were used for gene prediction. In order to validate the predicted results, annotations of genome of Saccharomyces cerevisiae from SGD Database, and output of different comput