ISBN-13: 9783642883903 / Angielski / Miękka / 2012 / 459 str.
ISBN-13: 9783642883903 / Angielski / Miękka / 2012 / 459 str.
In trying to give an account of the statistical properties of language, one is faced with the problem of having to find the common thread which would show the many and multifarious forms of language statistic- embodied in scattered papers written by linguists, philosophers, mathe maticians, engineers, each using his own professional idiom - as belong ing to one great whole: quantitative linguistics. This means that the investigator has to find the system of this branch of science which would enable him to arrange the vast material in an orderly fashion, and present it as an organic whole. Such a system is conceived in this book, as comprising the following disciplines as the four main branches of literary statistics: Statistical Linguistics, Stylostatistics, Optimal Systems of Language Structure, and Linguistic Duality (Parts I-IV). The Introduction is meant to define the position of the book with regard to both, linguistics and statistics."
1. Introduction.- 1.1. Historical sketch.- 1.2. Language as a mass phenomenon — ‘Quantity Survey’ of language.- 1.3. Chance as a factor of linguistic expression and language structure.- 1.4. Structuralism and statistical linguistics.- 1.5. Language as choice and chance.- 1.6. De Saussure’s ‘Principe Linéaire’ and geometrical duality.- 1.7. Literary statistics, a new branch of applied statistics.- 1.8. Plan of the book.- I. Language as Chance I — Statistical Linguistics.- 2. Stability of Linguistic Distributions.- 2.1. A fundamental law of communication.- 2.2. Frequency distributions of linguistics — Experimental data.- 2.3. The statistical interpretation of de Saussure’s ‘langue-parole’ dichotomy.- 2.4. Comparison by rule-of-thumb methods.- 2.5. Comparison by methods of statistical inference.- 2.5.1. Standard error test.- 2.5.2. Chi-square test.- 2.6. Interpretation of test results.- 2.7. Simple and complex distributions.- 2.8. A practical criterion of stability of linguistic distributions.- 3. Explanation of Stability of Linguistic Distributions.- 3.1. Overlap between texts in vocabulary and frequency of occurrence.- 3.2. The relation between grammar and lexicon.- 3.3. The ‘grammar load’ of a language — Methods of assessment.- 3.4. Grammar as a factor of the stability of linguistic distributions.- 3.5. The mutually limiting action of grammar and lexicon components.- 3.6. Doubts about the stability of the phonemic (alphabetic) distribution.- 4. Application of the Theory of Stability of Alphabetic Distributions to a Problem of Language Mixture.- 4.1. Problems in connection with language mixture.- 4.2. The alphabetic distribution of nouns.- 4.3. The multiplicative law of the noun-initial distribution.- 4.4. Comparison of the LR component of English with Mediaeval Latin.- II. Language as Choice I — Stylostatistics.- 5. Style as a Statistical Concept.- 5.1. Quantitative features of style.- 5.2. Using statistics for determining the chronological order of texts.- 5.3. Richness of vocabulary.- 5.4. How text length in English is accounted for by vocabulary.- 5.5. The general relation between vocabulary and text length.- 5.6. Vocabulary ratios.- 5.7. Special and total vocabulary — Romance vocabulary in Chaucer’s ‘Canterbury Tales’.- 5.8. Generalisation of the quantitative law of language mixture.- 5.8.1. Explanation of the quantitative law of language mixture.- 5.9. Unsuitable mathematical models in language statistics, and their consequences.- 5.9.1. The Zipf law as an unsuitable model.- 5.9.2. The Mandelbrot Canonical Law — Shortcomings from the theoretical and practical angles.- 5.9.3. The so-called ‘Law of Least Effort’ in language.- 6. Word Count Mathematics.- 6.1. Central values and values of dispersion.- 6.2. The frequency distribution of vocabulary.- 6.3. Sampling methods for word counts.- 6.4. Illustration — The Russian word count.- 6.5. A statistical paradox and its explanation.- 6.6. A new statistical parameter — The ‘Characteristic’.- 6.7. Style as a statistical concept.- 6.8. Yule’s experiment.- 6.9. vm as a measure of the ‘langue-parole’ duality.- 6.10. Characteristic and Entropy.- 6.11. Summary.- 6.12. Words and concepts — Professional codes.- 6.12.1. Size vs. content of concepts.- 6.13. Stability of the distribution of grammar forms — Recurrence of particular grammar forms as stabilising factor.- 6.13.1. The Russian grammar-form count.- 6.13.2. Discussion.- 6.14. The chance distribution of grammar forms.- 6.15. The sound and symbol duality (Chinese).- 6.15.1. The Chinese dictionary — Radical and Phonetic.- 6.15.2. The duality principle of a Chinese dictionary.- 6.15.3. Distribution of characters according to stroke number of phonetic.- 6.15.4. Distribution of sub-classes to radicals according to the number of ideograms per sub-class.- 6.15.5. Taxonomic structure of the Chinese dictionary — Chance as a factor of Chinese lexicography.- 7. Style Relationships — Bi-Variate Stylostatistics.- 7.1. Joint word occurrence in different authors.- 7.1.1. A statistical study of political vocabulary.- 7.1.2. Sampling methods.- 7.1.3. The distribution of political vocabulary.- 7.2. Correlation of authors through vocabulary.- 7.3. Vocabulary overlaps between authors — Significance tests.- 7.4. Correlation between authors through frequency of use of words.- 7.5. Interpretation of correlation between authors.- 7.6. Correlation and disputed authorship.- 8. A Guide to Stylo-statistical Investigations.- 8.1. Preparing the punched cards (or tape) for processing linguistic information.- 8.1.1. The word as the elementally unit of running texts.- 8.1.2. The word as elementary lexical unit.- 8.1.3. Conclusions.- 8.2. Word categories to be included, and the size of sample.- 8.2.1. Type of word categories to be included in the word count.- 8.2.2. Size of sample.- 8.3. The fallacy of determining style by differences in frequency of a few grammar (‘function’) words.- III. Language as Chance II — Optimal Systems of Language Structure.- III.(A) Combinatorics on the Phonemic (Alphabetic) Level.- 9. The Combinatorial Structure of Words.- 9.1. Linguistics as a branch of semiology.- 9.2. Combinatorial structure of composite alphabetic code symbols.- 9.3. A de-coding experiment.- 9.4. Comparison of alphabetic and phonemic codes.- 9.5. Discussion — Conformity vs. discrepancy of alphabetic and phonemic codes.- 9.6. Consonant combinations in Czech and German.- 9.7. Non-random sequences of phonemes.- 9.8. The patterning of Semitic verbal roots subjected to Combinatory Analysis.- 10. Optimality of the Word-Length Distribution.- 10.1. Redundancy of coding in natural languages.- 10.2. Lognormality of the word-length distribution.- 10.3. Lognormality and Optimality.- 11. Combinatorics applied to Problems of Classical Poetry.- 11.1. The sequence of dactyls and spondees in the Latin hexameter.- 11.2. Sentence length and caesurae in the early Greek hexameter.- III.(B) Combinatorics on the Lexicon Level.- 12. Random Partitioning of Vocabulary — Vocabulary Connectivity.- 12.1. The deterministic view of the use of words and some facts against it.- 12.2. Chance, the ever-present alternative.- 12.3. Fitting the Random Partitioning Function to the results of empirical vocabulary connectivity.- 13. The Generalised Random Partitioning Function and Stylostatistics.- A. The Pauline Epistles.- 13.1. Derivation of formula for the generalised Random Partitioning Function.- 13.2. Application to the Pauline Epistles.- 13.3. The mathematical definition of uniformity of style.- 13.4. Totals of vocabulary, observed and calculated, per Epistle.- 13.5. Graphical representation of the Random Partitioning Function.- B. The New Testament.- 13.6. Application of the Random Partitioning Function to the New Testament in Greek.- 13.7. Comparison of results with current Bible exegesis.- 13.8. Graphical presentation and vocabulary totals per part.- 14. The “New Statistics” on the Vocabulary Level.- 14.1. Quadratic vs. linear fluctuations.- 14.2. Quantum statistics of language.- 14.2.1. How the need for the “New Statistics” arose in Physics.- 14.2.2. The Norm of Vocabulary Connectivity as corresponding to Black Body radiation.- III.(C) Information Theory.- 15. Principles of Information Theory.- 15.1. Relation between information theory and statistical linguistics.- 15.2. The binary code — The Entropy.- 15.3. The linguistic interpretation of entropy and redundancy.- 15.4. Efficiency of a code.- 15.5. Derivation of the entropy from the multinomial law.- 15.6. An inequality relation between the entropy and the repeat rate (and its sample statistic K).- 15.7. Efficiency of coding — The law of optimal redundancy.- 15.7.1. The condition for optimal coding.- 15.7.2. Binary coding as optimum strategy of enquiry.- 16. Information-Theoretical Analysis as a Tool of Linguistic Research.- 16.1. Language as an efficient code.- 16.2. The statistical study of word-length.- 16.3. Pitman’s Shorthand as an efficient code.- 16.4. Stability of word-length distributions.- 16.5. The mechanism of the linguistic development towards monosyllabism in the light of information theory.- 16.6. Entropy and Ectropy.- 16.7. Word-length in terms of the number of phonemes (letters).- 16.8. Relation between syllable and letter number per word.- 16.9. Different interpretation of the entropy according to the linguistic unit.- 17. Language Translations as Bi-Variate Distributions of Coding Symbols.- 17.1. Bi-variate information theory.- 17.2. The criterion of quantitative relationships between original and translation.- 17.3. The experiment — Bi-variate syllable counts.- 17.4. Stability of bi-variate syllable counts.- 17.5. Interpretation of the stability of bi-variate distributions of word-length.- 17.6. The conditioned entropy on the lexicon level.- 17.6.1. Word counts in their relation to vocabulary, word association and grammar.- IV. Language as Choice II — Linguistic Duality.- 18. The Four-fold Root of Linguistic Duality.- 18.1. Boolean law of duality.- 18.2. Duality and probability.- 18.3. The principle of duality in higher mathematics.- 18.3.1. The principle of geometrical duality in language — Interchangeability of Type and Token in linguistics statements.- 18.4. The Type-Token duality — Combinatorics of sentence formation.- 18.4.1. Combinatorics and the Alphabet-Square.- 18.4.2. Discussion.- 18.4.3. The diachronic aspect of planned combination.- 19. Duality as Correcting Factor — Inadequacy of Truly SeiologicCodes.- 19.1. De Saussure’s ‘signifiant-signifié’ relation and linguistic duality.- 19.2. The restless universe of language.- 19.3. Stunted development of languages through lack of duality.- 20. Duality and Language Translation.- 20.1. Variability of translational equivalence.- 20.2. Relation between word-length and meaning.- 20.3. The translation matrix of meaning.- 20.4. Duality of meaning as an obstacle to machine translation.- 20.5. The concept of comparative stylistics.- 20.5.1. Description of G. Barth’s work.- 20.5.2. Statistical results.- 20.5.3. Graphical analysis (sequential sampling method).- 20.6. The qualitative aspect of style.- V. Statistics for the Language Seminary.- V.(A) Statistics of Language in the Mass.- 21. Descriptive Statistics.- 21.1. Statistical distributions and elementary statistical constants.- 21.2. Empirical facts about statistical constants.- 21.3. Arithmetic mean and standard deviation of composite statistical masses.- 21.4. The Gaussian or Normal Law.- 21.4.1. Form and statistical constants of the normal distribution.- 22. Statistical Inference — The Binomial Case.- 22.1. Mathematical tools for the combinatorial technique.- 22.2.1. The argument from text to sample.- 22.2.2. The argument from sample to text.- 22.2.3. The argument from one text sample to another.- 22.3. Statistical Inference for Great Collectives.- 22.3.1. Inference from a very great statistical collective (Bernoullian Problem).- 22.3.2. Inference from sample to very great statistical collective (Bayes’ Problem).- 22.3.3. The chance distribution of rare events — The law of small numbers.- 23. Statistical Inference in the Case of Multiple Classification of Events.- 23.1. Inference from total to sample.- 23.2. Inference from sample to total.- 23.3. Inference from one sample to another.- 23.4. Inference when dealing with great statistical masses.- 23.5. Testing two distributions for compatibility — The X-square test.- 23.6. Analysis of the internal structure of a statistical mass — Lexis’ L.- 24. Theory of Correlation.- 24.1. Functional relation vs. statistical correlationship.- 24.2. The line of regression.- 24.3. Fallacies of interpretation.- 24.4. The correlation coefficient.- 24.5. Significance of the correlation coefficient.- 24.6. Bernoullian correlation — The coefficient of contingency.- V. (B) Statistics of Language in the Line.- 25. The Dimension of Time in Language Statistics.- 25.1. Statistics in the “Region of Lost Dimensions”.- 25.2. Statistics of language in the line.- 25.3. Sampling on the lexicon level.- 25.4. Random partitioning.- 25.5. A mathematical model of language mixture.- 26. Linguistic Duality and ‘Parity’.- 26.1. Language statistics and statistical physics.- 26.2. The problem of conservation of parity in fundamental physics.- 26.3. Laterality of the speech function in the brain and linguistic duality.- Appendix — A Survey of Past and Present-day Statistical Linguistics.- Author Index.
1997-2025 DolnySlask.com Agencja Internetowa