|
|
|||
|
||||
OverviewThis work is a comprehensive introduction to the statistical analysis of word frequency distributions, intended for computational linguists, corpus linguists, psycholinguists, and researchers in the field of quantitative stylistics. Word frequency distributions are characterized by very large numbers of rare words. This property leads to strange phenomena such as mean frequencies that systematically change as the number of observations is increased, relative frequencies that even in large samples are not fully reliable estimators of population probabilities, and model parameters that vary with text or corpus size. Special statistical techniques for the analysis of distributions with large numbers of rare events can be found in various technical journals. The aim of this book is to make these techniques more accessible for non-specialists, both theoretically, by means of a careful introduction to the underlying probabilistic and statistical concepts, and practically, by providing a program library implementing the main models for word frequency distributions. Full Product DetailsAuthor: R. Harald BaayenPublisher: Springer Imprint: Springer Edition: 2001 ed. Volume: 18 Dimensions: Width: 15.50cm , Height: 2.00cm , Length: 23.50cm Weight: 1.520kg ISBN: 9780792370178ISBN 10: 0792370171 Pages: 335 Publication Date: 31 July 2001 Audience: College/higher education , Professional and scholarly , Postgraduate, Research & Scholarly , Professional & Vocational Format: Hardback Publisher's Status: Active Availability: Out of print, replaced by POD We will order this item for you from a manufatured on demand supplier. Table of Contents1 Word Frequencies.- 1.1 Introduction.- 1.2 The frequency spectrum.- 1.3 Zipf.- 1.4 The quest for characteristic constants.- 1.5 The lognormal distribution.- 1.6 Discussion.- 1.7 Bibliographical Comments.- 1.8 Questions.- 2 Non-parametric models.- 2.1 Basic concepts.- 2.2 The Urn model.- 2.3 The Structural Type Distribution.- 2.4 The LNRE zone.- 2.5 Good-Turing estimates.- 2.6 Interpolation and Extrapolation.- 2.7 Discussion.- 2.8 Bibliographical Comments.- 2.9 Questions.- 3 Parametric models.- 3.1 Introduction.- 3.2 LNRE models.- 3.3 Evaluating Goodness of Fit.- 3.4 Parameter estimation.- 3.5 A comparative study.- 3.6 Comparing Lexical Measures Across Texts.- 3.7 Discussion.- 3.8 Bibliographical Comments.- 3.9 Questions.- 4 Mixture distributions.- 4.1 Introduction.- 4.2 Expectations, variances, and covariances.- 4.3 Examples of mixture distributions.- 4.4 Morphological Productivity.- 4.5 Discussion.- 4.6 Bibliographical Comments.- 4.7 Questions.- 5 The Randomness Assumption.- 5.1 The Randomness Assumption.- 5.2 Adjusted LNRE models.- 5.3 Discussion.- 5.4 Bibliographical Comments.- 6 Examples of Applications.- 6.1 Distributional properties of the lexicon.- 6.2 Morphological productivity.- 6.3 Authorship and Style.- 6.4 Beyond word frequency distributions.- 6.5 Some practical guidelines.- A List of Symbols.- B Solutions to the exercises.- C Software.- D Data sets.ReviewsFrom the reviews: <p> Baayen's book must surely in the future become the standard point of departure for statistical studies of vocabulary. (Geoffrey Sampson, Computational Linguistics, 28: 04) From the reviews: Baayen's book must surely in the future become the standard point of departure for statistical studies of vocabulary. (Geoffrey Sampson (Computational Linguistics, 28:04) From the reviews: Baayen's book must surely in the future become the standard point of departure for statistical studies of vocabulary. (Geoffrey Sampson (Computational Linguistics, 28:04) From the reviews: Baayen's book must surely in the future become the standard point of departure for statistical studies of vocabulary. (Geoffrey Sampson (Computational Linguistics, 28:04) Author InformationTab Content 6Author Website:Countries AvailableAll regions |