Please use this identifier to cite or link to this item: http://hdl.handle.net/1942/747
Full metadata record
DC FieldValueLanguage
dc.contributor.authorEGGHE, Leo-
dc.date.accessioned2005-04-28T11:42:16Z-
dc.date.available2005-04-28T11:42:16Z-
dc.date.issued2006-
dc.identifier.citationJOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 57(9). p. 1165-1177-
dc.identifier.issn1532-2882-
dc.identifier.urihttp://hdl.handle.net/1942/747-
dc.description.abstractIn the first part of this paper we define the n-overlap vector whose coordinates consist of the fraction of the objects (e.g. books, N-grams,…) that belong to 1, 2,…, n sets (more generally: families) (e.g. libraries, databases,…). With the aid of the Lorenz concentration theory we build a theory of n-overlap similarity and corresponding measures, such as the generalized Jaccard index (generalizing the well-known Jaccard index in case ). n=2 Next we determine the distributional form of the n-overlap vector assuming certain distributions of the object’s and of the set (family)-sizes. In this section the decreasing power law and decreasing exponential distribution is explained for the n-overlap vector. Both item (token) n-overlap and source (type) n-overlap are studied. The final section is devoted to the n-overlap properties of objects indexed by a hierarchical system (e.g. books indexed by numbers from a UDC or Dewey system or by N-grams). We show that the results of Section II can be applied here. We also show that the Lorenz-order of the n-overlap vector is respected by an increase or a decrease of the level of refinement in the hierarchical system (e.g. the value N in N-grams).-
dc.format.extent290890 bytes-
dc.format.mimetypeapplication/pdf-
dc.language.isoen-
dc.publisherWiley-
dc.subject.othern-overlap vector; Lorenz; Jaccard index; power law; N-gram-
dc.titleProperties of the n-overlap vector and n-overlap similarity theory-
dc.typeJournal Contribution-
dc.identifier.epage1177-
dc.identifier.issue9-
dc.identifier.spage1165-
dc.identifier.volume57-
local.bibliographicCitation.jcatA1-
local.type.refereedRefereed-
local.type.specifiedArticle-
dc.bibliographicCitation.oldjcatA1-
dc.identifier.doi10.1002/asi.v57:9-
dc.identifier.isi000238519600003-
item.contributorEGGHE, Leo-
item.fullcitationEGGHE, Leo (2006) Properties of the n-overlap vector and n-overlap similarity theory. In: JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 57(9). p. 1165-1177.-
item.accessRightsOpen Access-
item.fulltextWith Fulltext-
item.validationecoom 2007-
crisitem.journal.issn1532-2882-
Appears in Collections:Research publications
Files in This Item:
File Description SizeFormat 
properties 1.pdf
  Restricted Access
Published version149.98 kBAdobe PDFView/Open    Request a copy
properties 2.pdfPeer-reviewed author version622.91 kBAdobe PDFView/Open
Show simple item record

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.