The problem of information retrievalHere we discuss the problem of web searches specifically, since that will be a major application of this technology. Generally, a web search for documents, with a commercial system such as Google, begins with a set of keywords (query terms). The search can result in a large collection of documents some of which are not the ones the user is looking for. The unwanted ones are generally viewed as irrelevant whereas the others are obviously relevant. One of the main problems with web searches with systems such as Google is that a very large percentage of the retrieved documents might be irrelevant. In that case, the user might have no choice but to search through the collection to find the right ones. So all current web search processes are inefficient and inadequate from the user's point of view. From a user perspective, there should be a faster way to retrieve documents that are relevant. This problem, of poor filtering by a search engine to get the right set of documents, arises mainly because it is difficult for a user to specify the right combination of keywords (both negative and positive words) to get the most relevant documents. Current commercial search engines compound this problem by not providing any other means of working with the user in getting to the right set of documents. The only means available to the user is the use of additional keywords to weed out irrelevant documents. But adding keywords to weed out irrelevant documents can become a very tedious and intractable exercise. Relevance feedback from users - a 30-year old idea with no good method to exploit itSearch results are typically presented to the user as a ranked list of documents, in the decreasing order of relevance to the user's query. In a system that involves user relevance feedback, the user is given an opportunity to inspect the ranked list (or a subset of it) and indicate which documents are relevant to the user's query and which are not. This information is then used by the relevance feedback method to induce a new ranking of documents. The new ranking, possibly including new documents, is displayed to the user and the process repeats.So in relevance feedback systems, a user can provide additional information to the system than just a list of keywords or query terms. In relevance feedback systems, a user can guide the search for right documents using means other than keywords. Although the relevance feedback idea has been around for more than thirty years and there is much ongoing research on these methods, none of the commercial search engines actually allow relevance feedback mechanisms. Why? Our hunch is that the existing methods are not robust enough to be used in commercial systems. Our inventionWe have invented a new method for relevance feedback that can be used safely in commercial systems. Our prototype system, based on Google, works well and retrieves relevant documents quickly.Application areas1. For web searches: Our relevance feedback system can be used with commercial search engines such as Google, Microsoft and Yahoo to expedite searches for relevant documents.2. For searches in specialized databases: Such as medical, financial and manufacturing. 3. For searches in regular database: Can be used with Oracle, IBM and Microsoft databases.4. For searches using small devices: Web search is a huge problem on small devices that have small screens, such as cell phones, because one can view only so little at a time. Commercial web search engines (Google, Microsoft, Yahoo) are struggling with this problem and have no solution yet. Our new method could be extremely useful in getting to the right documents quickly on small devices with limited screen sizes.Summary: Web searches are a huge commercial application and it's going to grow. Our invention could play a significant role in web searches. Not only would the users derive significant benefits in efficiency, but also search engines too from reduced computational loads and infrastructure cost.
|Published - Mar 15 2006