Please use this identifier to cite or link to this item:
http://hdl.handle.net/1942/33510
Title: | Research and Application of Image Representation Based on Large-scale Datasets | Authors: | WANG, Qi | Advisors: | Claesen, Luc Wenyin, Liu Van Reeth, Frank |
Issue Date: | 2021 | Abstract: | With the rapid development of artificial intelligence and computer vision, image processing and analysis have become an important link that cannot be ignored in modern scientific research. Especially in the era of big data, with the popularization of electronic products and the Internet, a large amount of image data is generated almost every moment, so it is particularly important to analyze and process these image big data. The key of analyzing and understanding these data depends on the feature extraction of the image, which is also an indispensable part of the computer vision tasks. The obtained image representation is utilized as input for subsequent computer vision tasks, analysis and calculation. This dissertation studies this key factor, extract features from image big data, and obtains the final image representation for the image processing and analysis. Specifically, this dissertation takes the image retrieval task as an example to discuss how to obtain a more robust image representation in image retrieval tasks. The main research content will focus on the following issues: during the image retrieval based on a largescale dataset, 1) how to solve image representation of the unlabelled data set, 2) how to obtain a lightweight image representation, and 3) how to study with incremental image representation for product search applications. To solve these three issues, we have carried out exploration and research respectively. In the next parts, we summarize how we solved these three issues. First, we aim to achieve effective image representation for image retrieval in an unsupervised manner. To this end, we propose a fully crossdimensional weighting pooling method. In particular, we aggregate multiscale features extracted by convolutional neural networks using the proposed method, taking into account multiple aspects of visual features captured by the networks. Different weights can be assigned to the features extracted by different layers of the networks. To reduce the effort for parameter tuning, we propose an initial strategy to prune the searching space of the weights, which is achieved by designing constraint rules based on the prior knowledge on relations between the layers of the networks. Based on this, we propose weighted multilayer feature fusion for similar image representations. Extensive experiments conducted on four public realworld datasets demonstrate the effectiveness of the proposed FCroW method and the pruning strategy for image retrieval. Second, activated hidden unites in convolutional neural networks, known as feature maps, dominate image representation, which is compact and discriminative. For ultralarge data sets, high dimensional feature maps in float format not only result in high computational complexity, but also occupy massive memory space. To this end, a new image representation by aggregating convolution kernels is proposed, where some convolution kernels capturing certain patterns are activated. The topn index numbers of the convolution kernels are extracted directly as image representation in discrete integer values, which rebuild relationship between convolution kernels and image. Furthermore, a distance measurement is defined from the perspective of ordered sets to calculate positionsensitive similarities between image representations. Extensive experiments conducted on Oxford Buildings, Paris, and Holidays, etc., manifest that the proposed method achieves competitive performance on image retrieval with much lower computational cost, outperforming the ones using feature maps for image representation. Third, with the development of image processing and computer vision technology, contentbased product search has been widely applied in our life, such as online shopping, automatic checkout systems, and intelligent logistics. Given a query product image, existing product search systems mainly perform the retrieval process on predefined databases that have fixed product categories. However, in realworld applications, we usually need to expand new categories or update existing products in the product database. For existing product search methods, the models of image feature extraction and indexing must be retrained with the whole updated data, which is expensive in the cost of data annotation and training time. To this end, we propose a fewshot incremental product search framework with metalearning, which need very few annotated images and reasonable training time. In particular, our framework contains a multipooling based product semantic extractor to learn a discriminative representation for each product. Moreover, a metalearning based feature adapter is designed to guarantee the robustness of fewshot features. Furthermore, when expanding new categories in batches during a product search, we reconstruct the fewshot features by the incremental weight combiner to accommodate the incremental search task. At last, extensive experiments show that the proposed framework can achieve excellent performance for new products while guaranteeing high search accuracy of base categories after gradually expanding towards new product categories. In this dissertation, first, we mainly studied two different image representations, a feature fusion method, and a new measurement distance. Then, we also used part of the work to solve the new defined problem of image retrieval with few incremental samples. Finally, we deploy parts of our proposed methods and technologies to the application of Jingdong artificial intelligence checkout counter. | Document URI: | http://hdl.handle.net/1942/33510 | Category: | T1 | Type: | Theses and Dissertations |
Appears in Collections: | Research publications |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Research and Application of Image Representation Based on Large-scale Datasets.pdf Until 2026-02-10 | 16.31 MB | Adobe PDF | View/Open Request a copy |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.