ISBN-13: 9786202205443 / Angielski / Miękka / 2017 / 60 str.
ISBN-13: 9786202205443 / Angielski / Miękka / 2017 / 60 str.
Statistical n-gram language models are widely used for their state of the art performance in a continuous speech recognition system. In a domain based scenario, the sequences vary at large for expressing same context by the speakers. But, holding all possible sequences in training corpora for estimating n-gram probabilities is practically difficult. Capturing long distance dependencies from a sequence is an important feature in language models that can provide non zero probability for a sparse sequence during recognition. A simpler back-off n-gram model has a problem of estimating the probabilities for sparse data, if the size of n gram increases. Also deducing knowledge from training patterns can help the language models to generalize on an unknown sequence or word by its linguistic properties like noun, singular or plural, novel position in a sentence. For a weaker generalization, n-gram model needs huge sizes of corpus for training. A simple recurrent neural network based language model approach is proposed here to efficiently overcome the above difficulties for domain based corpora.