Personalised Search
Personalized search refers to search experiences that are tailored specifically to an individual’s interests by incorporating information about the individual beyond specific query provided.
Two general approaches to personalizing search results:
Modifying the user’s query is also called query expansion, query expansion involves evaluting a user’s input (typed into the search query area and sometimes other type of data), and expanding the search to match additional documents.
Re-ranking search results are also called Search results filtering/adaptation are based on personal preference, therefore, the data type can be various.
Personal information gathering:
Typically use supervised machine learning:
Relevant(Binary: Yes or No)
The personalised search narrows the search space. Therefore, it might limits the diversity of search result.
Privacy issues. Selling users’ profile or preference to other companies.
Motivation: It’s hard for user to type on Mobile. So the query can be short. PMSE aims to tailor search result to user’s preference.
Ontology: Formal naming, defining and ordering of the entities in a given domain, and their interrelationships.
ARR: Average Relevant Ranking
Entropy: The uncertainty about the information content
Top-N Precision: Precision of results, considering only those in the top $N$ entities
Client: The client is the user’s interface, it collects user’s data (clickthrough data, ontology data, history) and send them to the server. It displays the results. (Naive Bayes & Voting System)
Server: Responsile for forwarding queries to a search engine and extracts the ontological information from the results and stores it. Then, the ontological information combines with the input vectors in RSVM training to create weighted content. Then, combines with GPS data. In the end, use these information to re-rank the results.
Evaluation of Personalised search system
Quantitative & Qualitative
User rank all search results $\Rightarrow$ ARR and Top-N precision. There is no qualitative data collected in this experiment.
Terms
Forward links: a hyperlink to a page
Back links: a hyperlink to the current page
PageRank is based on two hypothesis:
Quantitative hypothesis: More back links, more important the page
Qualitative hypothesis: The link from highly ranked Page has high weight
Initial Stage: Built a network based on links. Assign same PageRank value to each page.
Update Stage: Every Page divide its PageRank value equally by the number of its forward links. Therefore, every link has its weight.
$Link weight = \frac{PageRank}{\parallel forward links \parallel}$
Because there are some dangling links which has 0 forward link. The PageRank formula makes use of damping factor $q$. The meaning of $q$ is the probability of a user continue browsing from current page. $1 - q$ is the probability that user stop browsing.
PageRank formula:
$PR = \sum_{n} \text{backlink’s weight} \times q \times (1 - q)$