Please use this identifier to cite or link to this item:
http://hdl.handle.net/1942/747
Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor.author | EGGHE, Leo | - |
dc.date.accessioned | 2005-04-28T11:42:16Z | - |
dc.date.available | 2005-04-28T11:42:16Z | - |
dc.date.issued | 2006 | - |
dc.identifier.citation | JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 57(9). p. 1165-1177 | - |
dc.identifier.issn | 1532-2882 | - |
dc.identifier.uri | http://hdl.handle.net/1942/747 | - |
dc.description.abstract | In the first part of this paper we define the n-overlap vector whose coordinates consist of the fraction of the objects (e.g. books, N-grams,…) that belong to 1, 2,…, n sets (more generally: families) (e.g. libraries, databases,…). With the aid of the Lorenz concentration theory we build a theory of n-overlap similarity and corresponding measures, such as the generalized Jaccard index (generalizing the well-known Jaccard index in case ). n=2 Next we determine the distributional form of the n-overlap vector assuming certain distributions of the object’s and of the set (family)-sizes. In this section the decreasing power law and decreasing exponential distribution is explained for the n-overlap vector. Both item (token) n-overlap and source (type) n-overlap are studied. The final section is devoted to the n-overlap properties of objects indexed by a hierarchical system (e.g. books indexed by numbers from a UDC or Dewey system or by N-grams). We show that the results of Section II can be applied here. We also show that the Lorenz-order of the n-overlap vector is respected by an increase or a decrease of the level of refinement in the hierarchical system (e.g. the value N in N-grams). | - |
dc.format.extent | 290890 bytes | - |
dc.format.mimetype | application/pdf | - |
dc.language.iso | en | - |
dc.publisher | Wiley | - |
dc.subject.other | n-overlap vector; Lorenz; Jaccard index; power law; N-gram | - |
dc.title | Properties of the n-overlap vector and n-overlap similarity theory | - |
dc.type | Journal Contribution | - |
dc.identifier.epage | 1177 | - |
dc.identifier.issue | 9 | - |
dc.identifier.spage | 1165 | - |
dc.identifier.volume | 57 | - |
local.bibliographicCitation.jcat | A1 | - |
local.type.refereed | Refereed | - |
local.type.specified | Article | - |
dc.bibliographicCitation.oldjcat | A1 | - |
dc.identifier.doi | 10.1002/asi.v57:9 | - |
dc.identifier.isi | 000238519600003 | - |
item.contributor | EGGHE, Leo | - |
item.fullcitation | EGGHE, Leo (2006) Properties of the n-overlap vector and n-overlap similarity theory. In: JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 57(9). p. 1165-1177. | - |
item.accessRights | Open Access | - |
item.fulltext | With Fulltext | - |
item.validation | ecoom 2007 | - |
crisitem.journal.issn | 1532-2882 | - |
Appears in Collections: | Research publications |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
properties 1.pdf Restricted Access | Published version | 149.98 kB | Adobe PDF | View/Open Request a copy |
properties 2.pdf | Peer-reviewed author version | 622.91 kB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.