Please use this identifier to cite or link to this item:
http://hdl.handle.net/1942/743
Title: | The exact rank-frequency function and size-frequency function of N-grams and N-word phrases with applications | Authors: | EGGHE, Leo | Issue Date: | 2005 | Publisher: | ELSEVIER | Source: | MATHEMATICAL AND COMPUTER MODELLING, 41. p. 807-823 | Abstract: | N-grams are generalized words consisting of N consecutive symbols (letters), as they are used in a text. N-word phrases are general concepts consisting of N consecutive words, also as used in a text. Given the rank-frequency function of single letters (i.e. 1-grams) or of single words (i.e. 1-word phrases) being Zipfian, we determine in this paper the exact rank-frequency function (i.e. the occurrence of N-grams or N-word phrases on each rank) and size-frequency distribution (i.e. the density of N-grams or N-word phrases on each occurrence density) of these N-grams and N-word phrases. This paper distinguishes itself from other ones on this topic by allowing no approximations in the calculations. This leads to an intricate rank-frequency function for N-grams and N-word phrases (as we knew before from unpublished calculations) but leads surprisingly, to a very simple size-frequency function f(N) for N-grams or N-word phrases. | Keywords: | N-gram; N-word phrase; Rank-frequency distribution; Size-frequency distribution; Zipfian distribution | Document URI: | http://hdl.handle.net/1942/743 | ISSN: | 0895-7177 | DOI: | 10.1016/j.mcm.2003.12.016 | ISI #: | 000229364100015 | Category: | A1 | Type: | Journal Contribution | Validations: | ecoom 2006 |
Appears in Collections: | Research publications |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
exact 1.pdf Restricted Access | Published version | 824.36 kB | Adobe PDF | View/Open Request a copy |
exact 2.pdf | Peer-reviewed author version | 502.57 kB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.