Languages carry information. To fulfil this purpose, they employ a multitude of coding strategies. This book explores a core property of linguistic coding – called lexical diversity. Parallel text corpora of overall more than 1800 texts written in more than 1200 languages are the basis for computational analyses. Different measures of lexical diversity are discussed and tested, and Shannon’s measure of uncertainty – the entropy – is chosen to assess differences in the distributions of words. To further explain this variation, a range of descriptive, explanatory, and grouping factors...
Languages carry information. To fulfil this purpose, they employ a multitude of coding strategies. This book explores a core property of linguistic co...