What is Content Based Image Retrieval ?

Image databases and collections can be enormous in size, containing hundreds, thousands or even millions of images. The conventional method of image retrieval is searching for a keyword that would match the descriptive keyword assigned to the image by a human categorizer. Currently under development, even though several systems exist, is the retrieval of images based on their content, called Content Based Image Retrieval, CBIR.

While computationally expensive, the results are far more accurate than conventional image indexing. Hence, there exists a tradeoff between accuracy and computational cost. This tradeoff decreases as more efficient algorithms are utilized and increased computational power becomes inexpensive [1]. Advances in data storage and image acquisition technologies have enabled the creation of large image datasets. In this scenario, it is necessary to develop appropriate information systems to efficiently manage these collections. The most common approaches use Content-Based Image Retrieval (CBIR). The goal of CBIR systems is to support image retrieval based on content e.g., shape, color, texture. Content-based image retrieval plays a central role in the application areas such as multimedia database systems in recent years. The work focused on using low-level features like color, texture, shape and spatial layout for image representation.

Importance of Content Based Image Retrieval


Content-based image retrieval, also known as query by image content and content-based visual information retrieval is the application of computer vision to the image retrieval problem, that is, the problem of searching for digital images in large databases. Content-based means that the search makes use of the contents of the images themselves, rather than relying on human-input metadata such as captions or keywords. A content-based image retrieval system (CBIR) is a piece of software that implements CBIR.
·         CBIR or Content Based Image Retrieval is the retrieval of images based on visual features such as colour, texture and shape. CBIR is originated from fields such as statistics, pattern recognition, signal processing and Image Processing.
·         "Content-based" means that the search will analyze the actual contents of the image.
·         The term 'content' in this context refers to colors, shapes, textures, or any other information that can be derived from the image itself.
It involves two steps:
·         Feature Extraction: First step is process of extracting image features to a distinguishable extent.
·         Matching: Second step involves matching these features to yield a result that is visually similar.
      


content_based_image_retrieval

                                                Fig 1. Basic CBIR system
Fig 1 shows the basic CBIR model. CBIR systems search collection of images based on features that can be extracted from the image files themselves without manual descriptive. In past decades many CBIR systems have been developed, the common ground for them is to extract a desired image. Comparing two images and deciding if they are similar or not is a relatively easy thing to do for a human. Getting a computer to do the same thing effectively is however a different matter .

“Content-based" means that the search will analyze the actual contents of the image. The term 'content' in this context might refer colors, shapes, textures, or any other information that can be derived from the image itself. Without the ability to examine image content, searches must rely on metadata such as captions
Invention of the digital camera has given the common man the privilege to capture his world in pictures, and conveniently share them with others. One can today generate volumes of images with content as diverse as family get-togethers and national park visits. Low-cost storage and easy Web hosting has fueled the metamorphosis of common man from a passive consumer of photography in the past to a current-day active producer.
Today, searchable image data exists with extremely diverse visual and semantic content, spanning geographically disparate locations, and is rapidly growing in size. All these factors have created innumerable possibilities and hence considerations for real-world image search system designers.


Different Perspectives of Content Based Image Retrieval System




As far as technological advances are concerned, growth in content-based image retrieval has been unquestionably rapid. In recent years, there has been significant effort put into understanding the real world implications, applications, and constraints of the technology. Yet, real-world application of the technology is currently limited. We devote this section to understanding image retrieval in the real world and discuss user expectations, system constraints and requirements, and the research effort to make image retrieval a reality in the not-too-distant future.
Designing an omnipotent real-world image search engine capable of serving all categories of users requires understanding and characterizing user-system interaction and image search, from both user and system points-of-view. In Figure2.1, we propose one such dual characterization and attempt to represent all known possibilities of interaction and search. From a user perspective, embarking on an image search, journey involves considering and making decisions on the following fronts:
(1) Clarity of the user about what she wants,
(2) Where she wants to search, and
(3) The form in which the user has her query.
In an alternative view from an image retrieval system perspective, a search translates to making arrangements as per the following factors:
(1) How does the user wish the results to be presented?
(2) Where does the user desire to search, and
(3) What is the nature of user input/interaction?
In the proposed user and system spaces, real world image search instances can be considered as isolated points or point clouds, and search sessions can consist of trajectories while search engines can be thought of as surfaces. The intention of drawing cubes versus free 3D Cartesian spaces is to emphasize that the possibilities are indeed bounded by the size of the Web, the nature of user, and ways of user-system interaction. We believe that the proposed characterization will be useful for designing context-dependent search environments for real-world image retrieval systems.


          

    
                                            Fig 2.1: User and system perspectives

User Intent

We augment the search-type-based classification proposed in Smeulders et al. [2000] with a user-intent-based classification. When users search for pictures, their intent or clarity about what they desire may vary. We believe that clarity of intent plays a key role in a user’s expectation from a search system and the nature of her interaction. It can also act as a guideline for system design. We broadly characterize a user by clarity of her intent as follows.
·      Browser. This is a user browsing for pictures with no clear end-goal. A browser’s session would consist of a series of unrelated searches. A typical browser would jump across multiple topics during the course of a search session. Her queries would be incoherent and diverse in topic.
·      Surfer. A surfer is a user surfing with moderate clarity of an end-goal. A surfer’s actions may be somewhat exploratory in the beginning, with the difference that subsequent searches are expected to increase the surfer’s clarity of what she wants from the system.
·      Searcher. This is a user who is very clear about what she is searching for in the system. A searcher’s session would typically be short, with coherent searches leading to an end-result. A typical browser values ease of use and manipulation. A browser usually has plenty of time at hand and expects surprises and random search hints to elongate her session (e.g., picture of the day, week, etc.). On the other hand, a surfer would value a search environment which facilitates clarity of her goal. A surfer planning a holiday would value a hint such as “pictures of most popular destinations”. At the other extreme, the searcher views an image retrieval system from a core utilitarian perspective. Completeness of results and clarity of representation would usually be the most important factors. The impact of real-world usage from the user viewpoint has not been extensively studied. One of the few studies categorizes users as experts and novices and studies their interaction patterns with respect to a video library [Christel and Conescu 2005]. In Armitage and Enser [1997], an analysis of user needs for visual information retrieval was conducted. In the cited work, a categorization schema for user queries was proposed, with a potential to be embedded in the visual information retrieval system.
Discussion: In the end, all that matters to an end-user is her interaction with the system, and the corresponding response. The importance of building human-centered multimedia systems has been expressed lately [Jaimes et al. 2006]. In order to gain wide acceptance, image retrieval systems need to acquire a human-centered perspective as well.

 Data Scope

Understanding the nature and scope of image data plays a key role in the complexity of image search system design. Factors such as the diversity of user-base and expected user traffic for a search system also largely influence the design. Along this dimension, we classify search data into the following categories.
·      Personal Collection. This consists of a largely homogeneous collection generally small in size, accessible primarily to its owner, and usually stored on a local storage media.
·      Domain-Specific Collection. This is a homogeneous collection providing access to controlled users with very specific objectives. The collection may be large and hosted on distributed storage, depending upon the domain. Examples of such a collection are biomedical and satellite image databases.
·      Enterprise Collection. We define this as a heterogeneous collection of pictures accessible to users within an organization’s intranet. Pictures may be stored in many different locations. Access may be uniform or nonuniform, depending upon the Intranet design.
·      Archives. These are usually of historical interest and contain large volumes of structured or semi-structured homogeneous data pertaining to specific topics. Archives may be accessible to most people on the Internet, with some control of usage. Data is usually stored in multiple disks or large disk arrays.
·      Web. World Wide Web pictures are accessible to practically everyone with an Internet connection. Current WWW image search engines such as Google and Yahoo! Images have a key crawler component which regularly updates their local database to reflect on the dynamic nature of the Web. Image collection is semi-structured, nonhomogeneous, and massive in volume, and is usually stored in large disk arrays. An image retrieval system designed to serve a personal collection should focus on features such as personalization, flexibility of browsing, and display methodology. For example, Google’s Picasa system [Picasa 2004] provides a chronological display of images taking a user on a journey down memory lane. Domain-specific collections may impose specific standards for presentation of results. Searching an archive for content discovery could involve long user search sessions. Good visualization and a rich query support system should be the design goals. A system designed for the Web should be able to support massive user traffic. One way to supplement software approaches for this purpose is to provide hardware support to the system architecture. Unfortunately, very little has been explored in this direction, partly due to the lack of agreed-upon indexing and retrieval methods. The notable few applications include an FPGA implementation of a color-histogram-based image retrieval system [Kotoulas and Andreadis 2003], an FPGA implementation for subimage retrieval within an image database [Nakano and Takamichi 2003], and a method for efficient retrieval in a network of imaging devices [Woodrow and Heinzelman 2002].
Discussion: Regardless of the nature of the collection, as the expected user-base grows factors such as concurrent query support, efficient caching, and parallel and distributed processing of requests become critical. For future real-world image retrieval systems, both software and hardware approaches to address these issues are essential.

More realistically, dedicated specialized servers, optimized memory and storage support, and highly parallelizable image search algorithms to exploit cluster computing powers are where the future of large-scale image search hardware support lies.


 Query Modalities and Processing
In the realm of image retrieval, an important parameter to measure user-system interaction level is the complexity of queries supported by the system. From a user perspective, this translates to the different modalities she can use to query a system. We describe next the various querying modalities, their characteristics, and the system support required thereof.
·      Keywords: This is a search in which the user poses a simple query in the form of a word or bigram. This is currently the most popular way to search images, for example, the Google and Yahoo! image search engines.
·      Free-Text: This is where the user frames a complex phrase, sentence, question, or story about what she desires from the system.
·      Image: Here, the user wishes to search for an image similar to a query image. Using an example image is perhaps the most representative way of querying a CBIR system in the absence of reliable metadata.
·      Graphics: This consists of a hand-drawn or computer-generated picture, or graphics could be presented as query.
·      Composite: These are methods that involve using one or more of the aforesaid modalities for querying a system. This also covers interactive querying such as in relevance feedback systems. The aforementioned query modalities require different processing methods and/or support for user interaction. The processing becomes more complex when visual queries and/or user interactions are involved.

We next broadly characterize query processing from a system perspective.
·      Text-Based: Text-based query processing usually boils down to performing one or more simple keyword-based searches and then retrieving matching pictures. Processing a free text could involve parsing, processing, and understanding the query as a whole. Some form of natural language processing may also be involved.
·      Content-Based: Content-based query processing lies at the heart of all CBIR systems. Processing of query (image or graphics) involves extraction of visual features and/or segmentation and search in the visual feature space for similar images. An appropriate feature representation and a similarity measure to rank pictures, given a query, are essential here. These will be discussed in detail in Section 3.
·      Composite: Composite processing may involve both content- and text-based processing in varying proportions. An example of a system which supports such processing is the story picturing engine [Joshi et al. 2006b].
·      Interactive-Simple: User interaction using a single modality needs to be supported by a system. An example is a relevance-feedback-based image retrieval system.
·      Interactive-Composite: The user may interact using more than one modality (e.g., text and images). This is perhaps the most advanced form of query processing that is required to be performed by an image retrieval system.
Processing text-based queries involves keyword matching using simple set-theoretic operations, and therefore a response can be generated very quickly. However, in very large systems working with millions of pictures and keywords, efficient indexing methods may be required. Indexing of text has been studied in database research for decades now. Efficient indexing is critical to the building and functioning of very large text based databases and search engines. Research on efficient ways to index images by content has been largely overshadowed by research on efficient visual representation and similarity measures. Most of the methods used for visual indexing are adopted from text-indexing research. In Petrakis et al. [2002], R-trees are used for indexing images represented as attributed relational graphs (ARGs). Retrieval of images using wavelet coefficients as image representations and R-trees for indexing has been studied in Natsev et al. [2004]. Visual content matching using graph-based image representation and an efficient metric indexing algorithm has been proposed in Berretti et al. [2001].
Composite querying methods provide the users with more flexibility for expressing themselves. Some recent innovations in querying include sketch-based retrieval of color images [Chalechale et al. 2005]. Querying using 3D models [Assfalg et al. 2002] has been motivated by the fact that 2D image queries are unable to capture the spatial arrangement of objects within the image. In another interesting work, a multimodal system involving hand gestures and speech for querying and relevance feedback was presented in Kaster et al. [2003]. Certain new interaction-based querying paradigms which statistically model the user’s interest [Fang et al. 2005], or help the user refine her queries by providing cues and hints [Jaimes et al. 2004; Nagamine et al. 2004], have been explored for image retrieval.
Use of mobile devices has become widespread lately. Mobile users have limited querying capabilities due to inherent scrolling and typing constraints. Relevance feedback has been explored for quickly narrowing down search to such user needs. However, mobile users can be expected to provide only limited feedback. Hence, it becomes necessary to design intelligent feedback methods to cater to users with small displays. The performance of different relevance feedback algorithms for small devices has been studied and compared in Vinay et al. [2005, 2004]. In the cited work, a tree-structured representation for all possible user-system actions was used to determine an upper bound on the performance gains that such systems can achieve.

Discussion:  A prerequisite for supporting text-based query processing is the presence of reliable metadata with pictures. However, pictures rarely come with reliable human tags. In recent years, there has been effort put into building interactive, public domain games for large-scale collection of high-level manual annotations. One such game (the ESP game) has become very popular and has helped accumulate human annotations for about a hundred thousand pictures [von Ahn and Dabbish 2004]. Collection of manual tags for pictures has the dual advantage of: (1) facilitating text-based querying, and (2) building reliable training datasets for content-based analysis and automatic annotation algorithms. As explored in Datta et al. [2007], it is possible to effectively bridge the paradigms of keyword- and content-based search through a unified framework to provide the user the flexibility of both, without losing out on the search scope.
Visualization
Presentation of search results is perhaps one of the most important factors in the acceptance and popularity of an image retrieval system.
We characterize common visualization schemes for image search as follows:
·      Relevance-Ordered: The most popular way to present search results is relevance ordered, as adopted by Google and Yahoo! for their image search engines. Results are ordered by some numeric measure of relevance to the query.
·      Time-Ordered: In time-ordered image search, pictures are shown in a chronological ordering rather than by relevance. Google’s Picasa system [Picasa 2004] for personal collections provides an option to visualize a chronological timeline using pictures.
·      Clustered: Clustering of images by their metadata or visual content has been an active research topic for several years .Clustering of search results, besides being an intuitive and desirable form of presentation, has also been used to improve retrieval performance [Chen et al. 2005].
·      Hierarchical: If metadata associated with images can be arranged in tree order (e.g., Word Net topical hierarchies [Miller 1995]), it can be a very useful aid in visualization. Hierarchical visualization of search results is desirable for archives, especially for educational purposes.
·      Composite: Combining consists of mixing two or more of the preceding forms of visualization scheme, and is used especially for personalized systems. Hierarchical clustering and visualization of concept graphs are examples of composite visualizations.
In order to design interfaces for image retrieval systems, it helps to understand factors like how people manage their digital photographs [Rodden and Wood 2003] or frame their queries for visual art images Cunningham et al. [2004]. In Rodden et al. [2001], user studies on various ways of arranging images for browsing purposes are conducted, and the observation is that both visual-feature-based and concept-based arrangements have their own merits and demerits. Thinking beyond the typical grid-based arrangement of top matching images, spiral and concentric visualization of retrieval results have been explored in Torres et al. [2003]. For personal images, innovative arrangements of query results based on visual content, time-stamps, and efficient use of screen space add new dimensions to the browsing experience [Huynh et al. 2005].
Portable devices such as personal digital assistants (PDAs) and vehicle communication and control systems are becoming very popular as client-side systems for querying and accessing remote multimedia databases. A portable-device user is often constrained in the way she can formulate her query and interact with a remote image server. There are inherent scrolling and browsing constraints which can constrict user feedback.
Moreover, there are bandwidth limitations which need to be taken into consideration when designing retrieval systems for such devices. Some additional factors which become important here are size and color depth of display. Personalization of search for small displays by modeling interaction from the gathered usage data has been proposed in Bertini et al. [2005]. An image attention model for adapting images based on user attention for small displays has been proposed in Chen et al. [2003]. Efficient ways of browsing large images interactively, such as those encountered in pathology or remote sensing, using small displays over a communication channel are discussed in Li and Sun [2003]. User-log-based approaches to smarter ways of image browsing on mobile devices have been proposed in Xie et al. [2005].
Image trans coding techniques, which aim at adapting multimedia (image and video) content to the capabilities of the client device, have been studied extensively in the last several years [Shanableh and Ghanbari 2000; Vetro et al. 2003; Bertini et al. 2003; Cucchiara et al. 2003]. A class of methods known as semantic trans coding aims at designing intelligent trans coding systems which can adapt semantically to user requirements [Bertini et al. 2003; Cucchiara et al. 2003]. For achieving this, classes of relevance are constructed and trans coding systems are programmed differently for different classes.
Discussion: Study of organizations which maintain image management and retrieval systems has provided useful insights into system design, querying, and visualization.
In Tope and Enser [2000], case studies on the design and implementation of many different electronic retrieval systems have been reported. The final verdict of acceptance/ rejection for any visualization scheme comes from end-users. While simple, intuitive interfaces such as grid-based displays have become acceptable to most search engine users, advanced visualization techniques could still be in the making. It becomes critical for visualization designers to ensure that the added complexity does not become overkill [7].
Keywords: Image Processing, Content-Based Image Retrieval, Image retrieval methods, perspective of content image retrieval, User intent, data scope etc.

References :

[1]         Barbeau Jerome, Vignes-Lebbe Regine, and Stamon Georges, “A Signature based on Delaunay Graph and Co-occurrence Matrix,” Laboratoire Informatique et Systematique, Universiyt of Paris, Paris, France, July 2002, Found at:http://www.math-info.univ-paris5.fr/sip-lab/barbeau/barbeau.pdf
[2]         P.S.SUHASINI , Dr. K.SRI RAMA KRISHNA, Dr. I. V. MURALI KRISHNA “CBIR USING COLOR HISTOGRAM PROCESSING” Journal of Theoretical and Applied Information Technology © 2005 - 2009 JATIT. All rights reserved. www.jatit.org 116 Vol6. No1. (pp 116 - 122)
[3]         Darshak G. Thakore1, A. I. Trivedi “Content based image retrieval techniques – Issues, analysis and the state of the art”
[4]         Neetu Sharma., Paresh Rawat  and jaikaran Singh “Efficient CBIR Using Color Histogram Processing” Signal & Image Processing : An International Journal(SIPIJ) Vol.2, No.1, March 2011 DOI : 10.5121/sipij.2011.2108 94
[5]         http://www.studymode.com/essays/Content-Based-Image-Retrival-951834.html
[6]         http://en.wikipedia.org/wiki/Content-based_image_retrieval

[7]         Ritendra Datta, Dhiraj Joshi, Jia Li, and James Z. Wang “Image Retrieval: Ideas, Influences, and Trends of the New Age” The Pennsylvania State University