Please use this identifier to cite or link to this item:
http://hdl.handle.net/1942/747
Title: | Properties of the n-overlap vector and n-overlap similarity theory | Authors: | EGGHE, Leo | Issue Date: | 2006 | Publisher: | Wiley | Source: | JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 57(9). p. 1165-1177 | Abstract: | In the first part of this paper we define the n-overlap vector whose coordinates consist of the fraction of the objects (e.g. books, N-grams,…) that belong to 1, 2,…, n sets (more generally: families) (e.g. libraries, databases,…). With the aid of the Lorenz concentration theory we build a theory of n-overlap similarity and corresponding measures, such as the generalized Jaccard index (generalizing the well-known Jaccard index in case ). n=2 Next we determine the distributional form of the n-overlap vector assuming certain distributions of the object’s and of the set (family)-sizes. In this section the decreasing power law and decreasing exponential distribution is explained for the n-overlap vector. Both item (token) n-overlap and source (type) n-overlap are studied. The final section is devoted to the n-overlap properties of objects indexed by a hierarchical system (e.g. books indexed by numbers from a UDC or Dewey system or by N-grams). We show that the results of Section II can be applied here. We also show that the Lorenz-order of the n-overlap vector is respected by an increase or a decrease of the level of refinement in the hierarchical system (e.g. the value N in N-grams). | Keywords: | n-overlap vector; Lorenz; Jaccard index; power law; N-gram | Document URI: | http://hdl.handle.net/1942/747 | ISSN: | 1532-2882 | DOI: | 10.1002/asi.v57:9 | ISI #: | 000238519600003 | Category: | A1 | Type: | Journal Contribution | Validations: | ecoom 2007 |
Appears in Collections: | Research publications |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
properties 1.pdf Restricted Access | Published version | 149.98 kB | Adobe PDF | View/Open Request a copy |
properties 2.pdf | Peer-reviewed author version | 622.91 kB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.