Please use this identifier to cite or link to this item: http://hdl.handle.net/1942/788
Full metadata record
DC FieldValueLanguage
dc.contributor.authorEGGHE, Leo-
dc.date.accessioned2005-05-30T12:24:47Z-
dc.date.available2005-05-30T12:24:47Z-
dc.date.issued2000-
dc.identifier.citationScientometrics, 47(2). p. 237-252-
dc.identifier.issn0138-9130-
dc.identifier.urihttp://hdl.handle.net/1942/788-
dc.description.abstractN-grams are generalized words consisting of N consecutive symbols, as they are used in a text. This paper determines the rank-frequency distribution for redundant N-grams. For entire texts this is known to be Zipf's law (i.e., an inverse power law). For N-grams, however, we show that the rank (r)-frequency distribution is P-N(r)=C/(psi(N)(r))(beta), where psi(N) is the inverse function of f(N)(x)=x ln(N-1)x. Here we assume that the rank-frequency distribution of the symbols follows Zipf's law with exponent beta.-
dc.format.extent286280 bytes-
dc.format.mimetypeapplication/pdf-
dc.language.isoen-
dc.publisherKLUWER ACADEMIC PUBL-
dc.subject.otherN-gram; law of Zipf; rank-frequency distribution-
dc.subject.otherCENTRAL-LIMIT-THEOREM; INFORMATION-RETRIEVAL; ZIPFS LAW; SIMILARITY-
dc.titleThe Distribution of N-Grams-
dc.typeJournal Contribution-
dc.identifier.epage252-
dc.identifier.issue2-
dc.identifier.spage237-
dc.identifier.volume47-
local.bibliographicCitation.jcatA1-
local.type.refereedRefereed-
local.type.specifiedArticle-
dc.bibliographicCitation.oldjcatA1-
dc.identifier.doi10.1023/A:1005634925734-
dc.identifier.isi000089449100005-
item.contributorEGGHE, Leo-
item.fullcitationEGGHE, Leo (2000) The Distribution of N-Grams. In: Scientometrics, 47(2). p. 237-252.-
item.accessRightsOpen Access-
item.fulltextWith Fulltext-
item.validationecoom 2001-
crisitem.journal.issn0138-9130-
crisitem.journal.eissn1588-2861-
Appears in Collections:Research publications
Files in This Item:
File Description SizeFormat 
distribution.pdfPeer-reviewed author version279.57 kBAdobe PDFView/Open
distribution 1.pdf
  Restricted Access
Published version393.51 kBAdobe PDFView/Open    Request a copy
Show simple item record

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.