Preface ixAcknowledgments xiiiBasic Notations xv1 Guiding Ideas 11.1 The Motivating Question 11.2 Further Questions About Texts 51.3 Zipf's and Herdan's Laws 81.4 Markov and Finite-State Processes 141.5 More General Stochastic Processes 201.6 Two Interpretations of Probability 231.7 Insights from Information Theory 251.8 Estimation of Entropy Rate 281.9 Entropy of Natural Language 301.10 Algorithmic Information Theory 351.11 Descriptions of a Random World 371.12 Facts and Words Related 431.13 Repetitions and Entropies 471.14 Decay of Correlations 521.15 Recapitulation 542 Probabilistic Preliminaries 572.1 Probability Measures 592.2 Product Measurable Spaces 632.3 Discrete Random Variables 652.4 From IID to Finite-State Processes 68Problems 733 Probabilistic Toolbox 773.1 Borel sigma-Fields and a Fair Coin 793.2 Integral and Expectation 833.3 Inequalities and Corollaries 873.4 Semidistributions 923.5 Conditional Probability 943.6 Modes of Convergence 1013.7 Complete Spaces 103Problems 1064 Ergodic Properties 1094.1 Plain Relative Frequency 1114.2 Birkhoff Ergodic Theorem 1164.3 Ergodic and Mixing Criteria 1194.4 Ergodic Decomposition 125Problems 1285 Entropy and Information 1315.1 Shannon Measures for Partitions 1335.2 Block Entropy and Its Limits 1395.3 Shannon Measures for Fields 1455.4 Block Entropy Limits Revisited 1555.5 Convergence of Entropy 1595.6 Entropy as Self-Information 160Problems 1636 Equipartition and Universality 1676.1 SMB Theorem 1696.2 Universal Semidistributions 1716.3 PPM Probability 1726.4 SMB Theorem Revisited 1786.5 PPM-based Statistics 180Problems 1867 Coding and Computation 1897.1 Elements of Coding 1917.2 Kolmogorov Complexity 1977.3 Algorithmic Coding Theorems 2077.4 Limits of Mathematics 2157.5 Algorithmic Randomness 220Problems 2258 Power Laws for Information 2298.1 Hilberg Exponents 2318.2 Second Order SMB Theorem 2388.3 Probabilistic and Algorithmic Facts 2418.4 Theorems About Facts and Words 248Problems 2559 Power Laws for Repetitions 2599.1 Rényi-Arimoto Entropies 2619.2 Generalized Entropy Rates 2669.3 Recurrence Times 2689.4 Subword Complexity 2729.5 Two Maximal Lengths 2809.6 Logarithmic Power Laws 284Problems 28910 AMS Processes 29110.1 AMS and Pseudo AMS Measures 29310.2 Quasiperiodic Coding 29510.3 Synchronizable Coding 29810.4 Entropy Rate in the AMS Case 301Problems 30411 Toy Examples 30711.1 Finite and Ultrafinite Energy 30911.2 Santa Fe Processes and Alike 31511.3 Encoding into a Finite Alphabet 32311.4 Random Hierarchical Association 33411.5 Toward Better Models 345Problems 348Future Research 349Bibliography 351Index 365
Aukasz D'bowski, PhD, works at the Institute of Computer Science of the Polish Academy of Sciences in Poland. His doctorate is in mathematics and computer science and his primary research focus is in the areas of information theory and discrete stochastic processes. He is also interested in the theoretical properties of statistical and neural language models