For speech recognition, OCR, etc. determination of the structural properties of a natural language is essential. These properties can be analyzed under two different categories; morphological and statistical analysis. For statistical analysis, a corpus which is a representative sample of the natural language is needed. Word n-gram frequencies of that corpus can be determined by using suitable algorithms and missing n-grams can be estimated by using smoothing techniques. In this study, in order to compare and apply smoothing techniques to Turkish, a corpus named TurCo was created....
For speech recognition, OCR, etc. determination of the structural properties of a natural language is essential. These properties can be analyzed u...