A personalised query expansion approach using context
Users of the Web usually use search engines to find answers to a variety of questions. Although search engines can rapidly process a large number of Web documents, in many cases, the answers returned by search engines are not relevant to the user’s information need, although they do contain the same keywords as the query. This is because the Web contains information sources created by numerous authors independently, and the authors’ vocabularies vary greatly. Furthermore, most words in natural languages have inherent ambiguity. This vocabulary mismatch between user queries and Web sources is often addressed through query expansion. Moreover, user questions are often short. The results of a search can be improved when the length of the question is long. Various query expansion methods that add useful question-related terms before processing the question have been proposed and proven to increase the performance of the result. Some of these query expansion methods add contextual information related to the user and the question. On the other hand, human communications are quite successful and seem to be very easy. This is mainly due to the understanding of language and the world knowledge that humans have. Human communication is more successful when there is an implicit understanding of everyday situations of others who take part in the communication. Here the implicit situational information, or the “context” that humans share, enables them to have a more meaningful interaction amongst themselves. Similar to human–human communications, improving computers’ access to context can increase the richness of human–computer communications, giving more useful computational services to users. Based on the above factors, this research proposes a method to make use of context in order to understand and process user requests. Here, the term “context” means the meanings associated with key query terms and preferences that have to be decided in order to process the query. As in a natural environment, results produced to different users for the same question could vary in an automated system. If the automated system knows users’ preferences related to the question, then it could make use of these preferences to process user queries, producing more relevant and useful results to the user. Hence, a new approach for a personalised query expansion is proposed in this research, where user queries are expanded with user preferences and hence the expanded queries that will be used for processing vary for different users. An architecture that is required for such a Web application to carryout a personalised query expansion with contextual information is also proposed in the thesis. The preferences that could be used for the query expansion are therefore user-specific. Users have different set of preferences depending on the tasks they want to perform. Similar tasks that have same types of preferences can be grouped into task based domains. Hence, user preferences will be the same in a domain, and will vary across domains. Furthermore, there can be different types of subtasks that could be performed within a domain. The set of preferences that could be used for each sub task could vary, and it will be a sub set of the set of preferences of the domain. Hence, an approach for a personalised query expansion which adds user, domain and task-specific preferences to user queries is proposed in this research. The main stages of this expansion are identified and discussed in this thesis. Each of these stages requires different contextual information which is represented in the context model. Out of the main stages identified in the query expansion process, the first three stages, the domain identification, task identification, and missing parameter identification, are explored in the thesis. As the preferences used for the expansion depend on the query domain, it is necessary to identify the domain of the query at first instance. Hence, a domain identification algorithm which makes use of eight different features is proposed in the thesis to identify domains of given queries. This domain identification also reduces the ambiguity of query terms. When the query domain is identified, context/associating meanings of query terms are known. This limits the scope of the possible misinterpretations of query terms. A domain ontology, domain dictionary, and user profile are used by the domain identification algorithm. The domain ontology consists of objects and their categories, attributes of objects and their categories, relationships among objects, and instances and their categories in the domain. The domain dictionary consists of objects and attributes. This is created automatically from the domain ontology. The user profile has the long term preferences of the user that are domain-specific and general. When the domain of the query is known, in order to decide the preferences of the user, the task specified in the query has to be identified. This task identification process is found to be similar in domains with similar activities. Hence, domains are grouped at this stage. These domain groups and the rules that could be used to find out the tasks in the domain groups are identified and discussed in the thesis. For each sub tasks in the domain groups, the types of preferences that could be used to expand user queries are identified and are used to expand user queries. An experiment is designed to evaluate the performance of the proposed approach. The first three stages of the query expansion, the domain identification, task identification, and missing parameter identification, are implemented and evaluated. Samples of five domains are implemented, and queries are collected in these domains from various users. In order to create new domains, a wizard is provided by the system. This system also allows editing the existing domains, domain groups, and types of preferences in sub tasks of the domain groups. Instances of the attributes are manually identified and added to the system using the interface provided by the system. In each of the stages of the query expansion, the results of the queries are manually identified, and are compared with the results produced by the system. The results have confirmed that the proposed method has a positive impact in query expansion. The experiments, results and evaluation of the proposed query expansion approach are also presented in the thesis. The proposed approach for the query expansion could be used by search engines, organisations with a limited set of task domains, and any application that can be improved by making use of personalised query expansion.