Please use this identifier to cite or link to this item:
http://hdl.handle.net/1942/788
Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor.author | EGGHE, Leo | - |
dc.date.accessioned | 2005-05-30T12:24:47Z | - |
dc.date.available | 2005-05-30T12:24:47Z | - |
dc.date.issued | 2000 | - |
dc.identifier.citation | Scientometrics, 47(2). p. 237-252 | - |
dc.identifier.issn | 0138-9130 | - |
dc.identifier.uri | http://hdl.handle.net/1942/788 | - |
dc.description.abstract | N-grams are generalized words consisting of N consecutive symbols, as they are used in a text. This paper determines the rank-frequency distribution for redundant N-grams. For entire texts this is known to be Zipf's law (i.e., an inverse power law). For N-grams, however, we show that the rank (r)-frequency distribution is P-N(r)=C/(psi(N)(r))(beta), where psi(N) is the inverse function of f(N)(x)=x ln(N-1)x. Here we assume that the rank-frequency distribution of the symbols follows Zipf's law with exponent beta. | - |
dc.format.extent | 286280 bytes | - |
dc.format.mimetype | application/pdf | - |
dc.language.iso | en | - |
dc.publisher | KLUWER ACADEMIC PUBL | - |
dc.subject.other | N-gram; law of Zipf; rank-frequency distribution | - |
dc.subject.other | CENTRAL-LIMIT-THEOREM; INFORMATION-RETRIEVAL; ZIPFS LAW; SIMILARITY | - |
dc.title | The Distribution of N-Grams | - |
dc.type | Journal Contribution | - |
dc.identifier.epage | 252 | - |
dc.identifier.issue | 2 | - |
dc.identifier.spage | 237 | - |
dc.identifier.volume | 47 | - |
local.bibliographicCitation.jcat | A1 | - |
local.type.refereed | Refereed | - |
local.type.specified | Article | - |
dc.bibliographicCitation.oldjcat | A1 | - |
dc.identifier.doi | 10.1023/A:1005634925734 | - |
dc.identifier.isi | 000089449100005 | - |
item.contributor | EGGHE, Leo | - |
item.fullcitation | EGGHE, Leo (2000) The Distribution of N-Grams. In: Scientometrics, 47(2). p. 237-252. | - |
item.accessRights | Open Access | - |
item.fulltext | With Fulltext | - |
item.validation | ecoom 2001 | - |
crisitem.journal.issn | 0138-9130 | - |
crisitem.journal.eissn | 1588-2861 | - |
Appears in Collections: | Research publications |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
distribution.pdf | Peer-reviewed author version | 279.57 kB | Adobe PDF | View/Open |
distribution 1.pdf Restricted Access | Published version | 393.51 kB | Adobe PDF | View/Open Request a copy |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.