Search (112 results, page 5 of 6)

  • × theme_ss:"Retrievalalgorithmen"
  1. Shah, B.; Raghavan, V.; Dhatric, P.; Zhao, X.: ¬A cluster-based approach for efficient content-based image retrieval using a similarity-preserving space transformation method (2006) 0.01
    0.00967433 = product of:
      0.03869732 = sum of:
        0.03869732 = weight(_text_:have in 118) [ClassicSimilarity], result of:
          0.03869732 = score(doc=118,freq=2.0), product of:
            0.22215667 = queryWeight, product of:
              3.1531634 = idf(docFreq=5157, maxDocs=44421)
              0.07045517 = queryNorm
            0.17418933 = fieldWeight in 118, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1531634 = idf(docFreq=5157, maxDocs=44421)
              0.0390625 = fieldNorm(doc=118)
      0.25 = coord(1/4)
    
    Abstract
    The techniques of clustering and space transformation have been successfully used in the past to solve a number of pattern recognition problems. In this article, the authors propose a new approach to content-based image retrieval (CBIR) that uses (a) a newly proposed similarity-preserving space transformation method to transform the original low-level image space into a highlevel vector space that enables efficient query processing, and (b) a clustering scheme that further improves the efficiency of our retrieval system. This combination is unique and the resulting system provides synergistic advantages of using both clustering and space transformation. The proposed space transformation method is shown to preserve the order of the distances in the transformed feature space. This strategy makes this approach to retrieval generic as it can be applied to object types, other than images, and feature spaces more general than metric spaces. The CBIR approach uses the inexpensive "estimated" distance in the transformed space, as opposed to the computationally inefficient "real" distance in the original space, to retrieve the desired results for a given query image. The authors also provide a theoretical analysis of the complexity of their CBIR approach when used for color-based retrieval, which shows that it is computationally more efficient than other comparable approaches. An extensive set of experiments to test the efficiency and effectiveness of the proposed approach has been performed. The results show that the approach offers superior response time (improvement of 1-2 orders of magnitude compared to retrieval approaches that either use pruning techniques like indexing, clustering, etc., or space transformation, but not both) with sufficiently high retrieval accuracy.
  2. Shiri, A.A.; Revie, C.: Query expansion behavior within a thesaurus-enhanced search environment : a user-centered evaluation (2006) 0.01
    0.00967433 = product of:
      0.03869732 = sum of:
        0.03869732 = weight(_text_:have in 1056) [ClassicSimilarity], result of:
          0.03869732 = score(doc=1056,freq=2.0), product of:
            0.22215667 = queryWeight, product of:
              3.1531634 = idf(docFreq=5157, maxDocs=44421)
              0.07045517 = queryNorm
            0.17418933 = fieldWeight in 1056, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1531634 = idf(docFreq=5157, maxDocs=44421)
              0.0390625 = fieldNorm(doc=1056)
      0.25 = coord(1/4)
    
    Abstract
    The study reported here investigated the query expansion behavior of end-users interacting with a thesaurus-enhanced search system on the Web. Two groups, namely academic staff and postgraduate students, were recruited into this study. Data were collected from 90 searches performed by 30 users using the OVID interface to the CAB abstracts database. Data-gathering techniques included questionnaires, screen capturing software, and interviews. The results presented here relate to issues of search-topic and search-term characteristics, number and types of expanded queries, usefulness of thesaurus terms, and behavioral differences between academic staff and postgraduate students in their interaction. The key conclusions drawn were that (a) academic staff chose more narrow and synonymous terms than did postgraduate students, who generally selected broader and related terms; (b) topic complexity affected users' interaction with the thesaurus in that complex topics required more query expansion and search term selection; (c) users' prior topic-search experience appeared to have a significant effect on their selection and evaluation of thesaurus terms; (d) in 50% of the searches where additional terms were suggested from the thesaurus, users stated that they had not been aware of the terms at the beginning of the search; this observation was particularly noticeable in the case of postgraduate students.
  3. Dannenberg, R.B.; Birmingham, W.P.; Pardo, B.; Hu, N.; Meek, C.; Tzanetakis, G.: ¬A comparative evaluation of search techniques for query-by-humming using the MUSART testbed (2007) 0.01
    0.00967433 = product of:
      0.03869732 = sum of:
        0.03869732 = weight(_text_:have in 1269) [ClassicSimilarity], result of:
          0.03869732 = score(doc=1269,freq=2.0), product of:
            0.22215667 = queryWeight, product of:
              3.1531634 = idf(docFreq=5157, maxDocs=44421)
              0.07045517 = queryNorm
            0.17418933 = fieldWeight in 1269, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1531634 = idf(docFreq=5157, maxDocs=44421)
              0.0390625 = fieldNorm(doc=1269)
      0.25 = coord(1/4)
    
    Abstract
    Query-by-humming systems offer content-based searching for melodies and require no special musical training or knowledge. Many such systems have been built, but there has not been much useful evaluation and comparison in the literature due to the lack of shared databases and queries. The MUSART project testbed allows various search algorithms to be compared using a shared framework that automatically runs experiments and summarizes results. Using this testbed, the authors compared algorithms based on string alignment, melodic contour matching, a hidden Markov model, n-grams, and CubyHum. Retrieval performance is very sensitive to distance functions and the representation of pitch and rhythm, which raises questions about some previously published conclusions. Some algorithms are particularly sensitive to the quality of queries. Our queries, which are taken from human subjects in a realistic setting, are quite difficult, especially for n-gram models. Finally, simulations on query-by-humming performance as a function of database size indicate that retrieval performance falls only slowly as the database size increases.
  4. Li, X.: ¬A new robust relevance model in the language model framework (2008) 0.01
    0.00967433 = product of:
      0.03869732 = sum of:
        0.03869732 = weight(_text_:have in 3076) [ClassicSimilarity], result of:
          0.03869732 = score(doc=3076,freq=2.0), product of:
            0.22215667 = queryWeight, product of:
              3.1531634 = idf(docFreq=5157, maxDocs=44421)
              0.07045517 = queryNorm
            0.17418933 = fieldWeight in 3076, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1531634 = idf(docFreq=5157, maxDocs=44421)
              0.0390625 = fieldNorm(doc=3076)
      0.25 = coord(1/4)
    
    Abstract
    In this paper, a new robust relevance model is proposed that can be applied to both pseudo and true relevance feedback in the language-modeling framework for document retrieval. There are at least three main differences between our new relevance model and other relevance models. The proposed model brings back the original query into the relevance model by treating it as a short, special document, in addition to a number of top-ranked documents returned from the first round retrieval for pseudo feedback, or a number of relevant documents for true relevance feedback. Second, instead of using a uniform prior as in the original relevance model proposed by Lavrenko and Croft, documents are assigned with different priors according to their lengths (in terms) and ranks in the first round retrieval. Third, the probability of a term in the relevance model is further adjusted by its probability in a background language model. In both pseudo and true relevance cases, we have compared the performance of our model to that of the two baselines: the original relevance model and a linear combination model. Our experimental results show that the proposed new model outperforms both of the two baselines in terms of mean average precision.
  5. Lavrenko, V.: ¬A generative theory of relevance (2009) 0.01
    0.00967433 = product of:
      0.03869732 = sum of:
        0.03869732 = weight(_text_:have in 293) [ClassicSimilarity], result of:
          0.03869732 = score(doc=293,freq=2.0), product of:
            0.22215667 = queryWeight, product of:
              3.1531634 = idf(docFreq=5157, maxDocs=44421)
              0.07045517 = queryNorm
            0.17418933 = fieldWeight in 293, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1531634 = idf(docFreq=5157, maxDocs=44421)
              0.0390625 = fieldNorm(doc=293)
      0.25 = coord(1/4)
    
    Abstract
    A modern information retrieval system must have the capability to find, organize and present very different manifestations of information - such as text, pictures, videos or database records - any of which may be of relevance to the user. However, the concept of relevance, while seemingly intuitive, is actually hard to define, and it's even harder to model in a formal way. Lavrenko does not attempt to bring forth a new definition of relevance, nor provide arguments as to why any particular definition might be theoretically superior or more complete. Instead, he takes a widely accepted, albeit somewhat conservative definition, makes several assumptions, and from them develops a new probabilistic model that explicitly captures that notion of relevance. With this book, he makes two major contributions to the field of information retrieval: first, a new way to look at topical relevance, complementing the two dominant models, i.e., the classical probabilistic model and the language modeling approach, and which explicitly combines documents, queries, and relevance in a single formalism; second, a new method for modeling exchangeable sequences of discrete random variables which does not make any structural assumptions about the data and which can also handle rare events. Thus his book is of major interest to researchers and graduate students in information retrieval who specialize in relevance modeling, ranking algorithms, and language modeling.
  6. Lee, J.-T.; Seo, J.; Jeon, J.; Rim, H.-C.: Sentence-based relevance flow analysis for high accuracy retrieval (2011) 0.01
    0.00967433 = product of:
      0.03869732 = sum of:
        0.03869732 = weight(_text_:have in 746) [ClassicSimilarity], result of:
          0.03869732 = score(doc=746,freq=2.0), product of:
            0.22215667 = queryWeight, product of:
              3.1531634 = idf(docFreq=5157, maxDocs=44421)
              0.07045517 = queryNorm
            0.17418933 = fieldWeight in 746, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1531634 = idf(docFreq=5157, maxDocs=44421)
              0.0390625 = fieldNorm(doc=746)
      0.25 = coord(1/4)
    
    Abstract
    Traditional ranking models for information retrieval lack the ability to make a clear distinction between relevant and nonrelevant documents at top ranks if both have similar bag-of-words representations with regard to a user query. We aim to go beyond the bag-of-words approach to document ranking in a new perspective, by representing each document as a sequence of sentences. We begin with an assumption that relevant documents are distinguishable from nonrelevant ones by sequential patterns of relevance degrees of sentences to a query. We introduce the notion of relevance flow, which refers to a stream of sentence-query relevance within a document. We then present a framework to learn a function for ranking documents effectively based on various features extracted from their relevance flows and leverage the output to enhance existing retrieval models. We validate the effectiveness of our approach by performing a number of retrieval experiments on three standard test collections, each comprising a different type of document: news articles, medical references, and blog posts. Experimental results demonstrate that the proposed approach can improve the retrieval performance at the top ranks significantly as compared with the state-of-the-art retrieval models regardless of document type.
  7. Liu, R.-L.; Huang, Y.-C.: Ranker enhancement for proximity-based ranking of biomedical texts (2011) 0.01
    0.00967433 = product of:
      0.03869732 = sum of:
        0.03869732 = weight(_text_:have in 947) [ClassicSimilarity], result of:
          0.03869732 = score(doc=947,freq=2.0), product of:
            0.22215667 = queryWeight, product of:
              3.1531634 = idf(docFreq=5157, maxDocs=44421)
              0.07045517 = queryNorm
            0.17418933 = fieldWeight in 947, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1531634 = idf(docFreq=5157, maxDocs=44421)
              0.0390625 = fieldNorm(doc=947)
      0.25 = coord(1/4)
    
    Abstract
    Biomedical decision making often requires relevant evidence from the biomedical literature. Retrieval of the evidence calls for a system that receives a natural language query for a biomedical information need and, among the huge amount of texts retrieved for the query, ranks relevant texts higher for further processing. However, state-of-the-art text rankers have weaknesses in dealing with biomedical queries, which often consist of several correlating concepts and prefer those texts that completely talk about the concepts. In this article, we present a technique, Proximity-Based Ranker Enhancer (PRE), to enhance text rankers by term-proximity information. PRE assesses the term frequency (TF) of each term in the text by integrating three types of term proximity to measure the contextual completeness of query terms appearing in nearby areas in the text being ranked. Therefore, PRE may serve as a preprocessor for (or supplement to) those rankers that consider TF in ranking, without the need to change the algorithms and development processes of the rankers. Empirical evaluation shows that PRE significantly improves various kinds of text rankers, and when compared with several state-of-the-art techniques that enhance rankers by term-proximity information, PRE may more stably and significantly enhance the rankers.
  8. Lalmas, M.: XML retrieval (2009) 0.01
    0.00967433 = product of:
      0.03869732 = sum of:
        0.03869732 = weight(_text_:have in 998) [ClassicSimilarity], result of:
          0.03869732 = score(doc=998,freq=2.0), product of:
            0.22215667 = queryWeight, product of:
              3.1531634 = idf(docFreq=5157, maxDocs=44421)
              0.07045517 = queryNorm
            0.17418933 = fieldWeight in 998, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1531634 = idf(docFreq=5157, maxDocs=44421)
              0.0390625 = fieldNorm(doc=998)
      0.25 = coord(1/4)
    
    Abstract
    Documents usually have a content and a structure. The content refers to the text of the document, whereas the structure refers to how a document is logically organized. An increasingly common way to encode the structure is through the use of a mark-up language. Nowadays, the most widely used mark-up language for representing structure is the eXtensible Mark-up Language (XML). XML can be used to provide a focused access to documents, i.e. returning XML elements, such as sections and paragraphs, instead of whole documents in response to a query. Such focused strategies are of particular benefit for information repositories containing long documents, or documents covering a wide variety of topics, where users are directed to the most relevant content within a document. The increased adoption of XML to represent a document structure requires the development of tools to effectively access documents marked-up in XML. This book provides a detailed description of query languages, indexing strategies, ranking algorithms, presentation scenarios developed to access XML documents. Major advances in XML retrieval were seen from 2002 as a result of INEX, the Initiative for Evaluation of XML Retrieval. INEX, also described in this book, provided test sets for evaluating XML retrieval effectiveness. Many of the developments and results described in this book were investigated within INEX.
  9. Dang, E.K.F.; Luk, R.W.P.; Allan, J.: Beyond bag-of-words : bigram-enhanced context-dependent term weights (2014) 0.01
    0.00967433 = product of:
      0.03869732 = sum of:
        0.03869732 = weight(_text_:have in 2283) [ClassicSimilarity], result of:
          0.03869732 = score(doc=2283,freq=2.0), product of:
            0.22215667 = queryWeight, product of:
              3.1531634 = idf(docFreq=5157, maxDocs=44421)
              0.07045517 = queryNorm
            0.17418933 = fieldWeight in 2283, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1531634 = idf(docFreq=5157, maxDocs=44421)
              0.0390625 = fieldNorm(doc=2283)
      0.25 = coord(1/4)
    
    Abstract
    While term independence is a widely held assumption in most of the established information retrieval approaches, it is clearly not true and various works in the past have investigated a relaxation of the assumption. One approach is to use n-grams in document representation instead of unigrams. However, the majority of early works on n-grams obtained only modest performance improvement. On the other hand, the use of information based on supporting terms or "contexts" of queries has been found to be promising. In particular, recent studies showed that using new context-dependent term weights improved the performance of relevance feedback (RF) retrieval compared with using traditional bag-of-words BM25 term weights. Calculation of the new term weights requires an estimation of the local probability of relevance of each query term occurrence. In previous studies, the estimation of this probability was based on unigrams that occur in the neighborhood of a query term. We explore an integration of the n-gram and context approaches by computing context-dependent term weights based on a mixture of unigrams and bigrams. Extensive experiments are performed using the title queries of the Text Retrieval Conference (TREC)-6, TREC-7, TREC-8, and TREC-2005 collections, for RF with relevance judgment of either the top 10 or top 20 documents of an initial retrieval. We identify some crucial elements needed in the use of bigrams in our methods, such as proper inverse document frequency (IDF) weighting of the bigrams and noise reduction by pruning bigrams with large document frequency values. We show that enhancing context-dependent term weights with bigrams is effective in further improving retrieval performance.
  10. Ozdemiray, A.M.; Altingovde, I.S.: Explicit search result diversification using score and rank aggregation methods (2015) 0.01
    0.00967433 = product of:
      0.03869732 = sum of:
        0.03869732 = weight(_text_:have in 2856) [ClassicSimilarity], result of:
          0.03869732 = score(doc=2856,freq=2.0), product of:
            0.22215667 = queryWeight, product of:
              3.1531634 = idf(docFreq=5157, maxDocs=44421)
              0.07045517 = queryNorm
            0.17418933 = fieldWeight in 2856, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1531634 = idf(docFreq=5157, maxDocs=44421)
              0.0390625 = fieldNorm(doc=2856)
      0.25 = coord(1/4)
    
    Abstract
    Search result diversification is one of the key techniques to cope with the ambiguous and underspecified information needs of web users. In the last few years, strategies that are based on the explicit knowledge of query aspects emerged as highly effective ways of diversifying search results. Our contributions in this article are two-fold. First, we extensively evaluate the performance of a state-of-the-art explicit diversification strategy and pin-point its potential weaknesses. We propose basic yet novel optimizations to remedy these weaknesses and boost the performance of this algorithm. As a second contribution, inspired by the success of the current diversification strategies that exploit the relevance of the candidate documents to individual query aspects, we cast the diversification problem into the problem of ranking aggregation. To this end, we propose to materialize the re-rankings of the candidate documents for each query aspect and then merge these rankings by adapting the score(-based) and rank(-based) aggregation methods. Our extensive experimental evaluations show that certain ranking aggregation methods are superior to existing explicit diversification strategies in terms of diversification effectiveness. Furthermore, these ranking aggregation methods have lower computational complexity than the state-of-the-art diversification strategies.
  11. Bhansali, D.; Desai, H.; Deulkar, K.: ¬A study of different ranking approaches for semantic search (2015) 0.01
    0.00967433 = product of:
      0.03869732 = sum of:
        0.03869732 = weight(_text_:have in 3696) [ClassicSimilarity], result of:
          0.03869732 = score(doc=3696,freq=2.0), product of:
            0.22215667 = queryWeight, product of:
              3.1531634 = idf(docFreq=5157, maxDocs=44421)
              0.07045517 = queryNorm
            0.17418933 = fieldWeight in 3696, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1531634 = idf(docFreq=5157, maxDocs=44421)
              0.0390625 = fieldNorm(doc=3696)
      0.25 = coord(1/4)
    
    Abstract
    Search Engines have become an integral part of our day to day life. Our reliance on search engines increases with every passing day. With the amount of data available on Internet increasing exponentially, it becomes important to develop new methods and tools that help to return results relevant to the queries and reduce the time spent on searching. The results should be diverse but at the same time should return results focused on the queries asked. Relation Based Page Rank [4] algorithms are considered to be the next frontier in improvement of Semantic Web Search. The probability of finding relevance in the search results as posited by the user while entering the query is used to measure the relevance. However, its application is limited by the complexity of determining relation between the terms and assigning explicit meaning to each term. Trust Rank is one of the most widely used ranking algorithms for semantic web search. Few other ranking algorithms like HITS algorithm, PageRank algorithm are also used for Semantic Web Searching. In this paper, we will provide a comparison of few ranking approaches.
  12. Jiang, X.; Sun, X.; Yang, Z.; Zhuge, H.; Lapshinova-Koltunski, E.; Yao, J.: Exploiting heterogeneous scientific literature networks to combat ranking bias : evidence from the computational linguistics area (2016) 0.01
    0.00967433 = product of:
      0.03869732 = sum of:
        0.03869732 = weight(_text_:have in 4017) [ClassicSimilarity], result of:
          0.03869732 = score(doc=4017,freq=2.0), product of:
            0.22215667 = queryWeight, product of:
              3.1531634 = idf(docFreq=5157, maxDocs=44421)
              0.07045517 = queryNorm
            0.17418933 = fieldWeight in 4017, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1531634 = idf(docFreq=5157, maxDocs=44421)
              0.0390625 = fieldNorm(doc=4017)
      0.25 = coord(1/4)
    
    Abstract
    It is important to help researchers find valuable papers from a large literature collection. To this end, many graph-based ranking algorithms have been proposed. However, most of these algorithms suffer from the problem of ranking bias. Ranking bias hurts the usefulness of a ranking algorithm because it returns a ranking list with an undesirable time distribution. This paper is a focused study on how to alleviate ranking bias by leveraging the heterogeneous network structure of the literature collection. We propose a new graph-based ranking algorithm, MutualRank, that integrates mutual reinforcement relationships among networks of papers, researchers, and venues to achieve a more synthetic, accurate, and less-biased ranking than previous methods. MutualRank provides a unified model that involves both intra- and inter-network information for ranking papers, researchers, and venues simultaneously. We use the ACL Anthology Network as the benchmark data set and construct the gold standard from computer linguistics course websites of well-known universities and two well-known textbooks. The experimental results show that MutualRank greatly outperforms the state-of-the-art competitors, including PageRank, HITS, CoRank, Future Rank, and P-Rank, in ranking papers in both improving ranking effectiveness and alleviating ranking bias. Rankings of researchers and venues by MutualRank are also quite reasonable.
  13. Hubert, G.; Pitarch, Y.; Pinel-Sauvagnat, K.; Tournier, R.; Laporte, L.: TournaRank : when retrieval becomes document competition (2018) 0.01
    0.00967433 = product of:
      0.03869732 = sum of:
        0.03869732 = weight(_text_:have in 87) [ClassicSimilarity], result of:
          0.03869732 = score(doc=87,freq=2.0), product of:
            0.22215667 = queryWeight, product of:
              3.1531634 = idf(docFreq=5157, maxDocs=44421)
              0.07045517 = queryNorm
            0.17418933 = fieldWeight in 87, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1531634 = idf(docFreq=5157, maxDocs=44421)
              0.0390625 = fieldNorm(doc=87)
      0.25 = coord(1/4)
    
    Abstract
    Numerous feature-based models have been recently proposed by the information retrieval community. The capability of features to express different relevance facets (query- or document-dependent) can explain such a success story. Such models are most of the time supervised, thus requiring a learning phase. To leverage the advantages of feature-based representations of documents, we propose TournaRank, an unsupervised approach inspired by real-life game and sport competition principles. Documents compete against each other in tournaments using features as evidences of relevance. Tournaments are modeled as a sequence of matches, which involve pairs of documents playing in turn their features. Once a tournament is ended, documents are ranked according to their number of won matches during the tournament. This principle is generic since it can be applied to any collection type. It also provides great flexibility since different alternatives can be considered by changing the tournament type, the match rules, the feature set, or the strategies adopted by documents during matches. TournaRank was experimented on several collections to evaluate our model in different contexts and to compare it with related approaches such as Learning To Rank and fusion ones: the TREC Robust2004 collection for homogeneous documents, the TREC Web2014 (ClueWeb12) collection for heterogeneous web documents, and the LETOR3.0 collection for comparison with supervised feature-based models.
  14. Jiang, J.-D.; Jiang, J.-Y.; Cheng, P.-J.: Cocluster hypothesis and ranking consistency for relevance ranking in web search (2019) 0.01
    0.00967433 = product of:
      0.03869732 = sum of:
        0.03869732 = weight(_text_:have in 247) [ClassicSimilarity], result of:
          0.03869732 = score(doc=247,freq=2.0), product of:
            0.22215667 = queryWeight, product of:
              3.1531634 = idf(docFreq=5157, maxDocs=44421)
              0.07045517 = queryNorm
            0.17418933 = fieldWeight in 247, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1531634 = idf(docFreq=5157, maxDocs=44421)
              0.0390625 = fieldNorm(doc=247)
      0.25 = coord(1/4)
    
    Abstract
    Conventional approaches to relevance ranking typically optimize ranking models by each query separately. The traditional cluster hypothesis also does not consider the dependency between related queries. The goal of this paper is to leverage similar search intents to perform ranking consistency so that the search performance can be improved accordingly. Different from the previous supervised approach, which learns relevance by click-through data, we propose a novel cocluster hypothesis to bridge the gap between relevance ranking and ranking consistency. A nearest-neighbors test is also designed to measure the extent to which the cocluster hypothesis holds. Based on the hypothesis, we further propose a two-stage unsupervised approach, in which two ranking heuristics and a cost function are developed to optimize the combination of consistency and uniqueness (or inconsistency). Extensive experiments have been conducted on a real and large-scale search engine log. The experimental results not only verify the applicability of the proposed cocluster hypothesis but also show that our approach is effective in boosting the retrieval performance of the commercial search engine and reaches a comparable performance to the supervised approach.
  15. González-Ibáñez, R.; Esparza-Villamán, A.; Vargas-Godoy, J.C.; Shah, C.: ¬A comparison of unimodal and multimodal models for implicit detection of relevance in interactive IR (2019) 0.01
    0.00967433 = product of:
      0.03869732 = sum of:
        0.03869732 = weight(_text_:have in 417) [ClassicSimilarity], result of:
          0.03869732 = score(doc=417,freq=2.0), product of:
            0.22215667 = queryWeight, product of:
              3.1531634 = idf(docFreq=5157, maxDocs=44421)
              0.07045517 = queryNorm
            0.17418933 = fieldWeight in 417, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1531634 = idf(docFreq=5157, maxDocs=44421)
              0.0390625 = fieldNorm(doc=417)
      0.25 = coord(1/4)
    
    Abstract
    Implicit detection of relevance has been approached by many during the last decade. From the use of individual measures to the use of multiple features from different sources (multimodality), studies have shown the feasibility to automatically detect whether a document is relevant. Despite promising results, it is not clear yet to what extent multimodality constitutes an effective approach compared to unimodality. In this article, we hypothesize that it is possible to build unimodal models capable of outperforming multimodal models in the detection of perceived relevance. To test this hypothesis, we conducted three experiments to compare unimodal and multimodal classification models built using a combination of 24 features. Our classification experiments showed that a univariate unimodal model based on the left-click feature supports our hypothesis. On the other hand, our prediction experiment suggests that multimodality slightly improves early classification compared to the best unimodal models. Based on our results, we argue that the feasibility for practical applications of state-of-the-art multimodal approaches may be strongly constrained by technology, cultural, ethical, and legal aspects, in which case unimodality may offer a better alternative today for supporting relevance detection in interactive information retrieval systems.
  16. Krasakis, A.M.; Yates, A.; Kanoulas, E.: Corpus-informed Retrieval Augmented Generation of Clarifying Questions (2024) 0.01
    0.00967433 = product of:
      0.03869732 = sum of:
        0.03869732 = weight(_text_:have in 2369) [ClassicSimilarity], result of:
          0.03869732 = score(doc=2369,freq=2.0), product of:
            0.22215667 = queryWeight, product of:
              3.1531634 = idf(docFreq=5157, maxDocs=44421)
              0.07045517 = queryNorm
            0.17418933 = fieldWeight in 2369, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1531634 = idf(docFreq=5157, maxDocs=44421)
              0.0390625 = fieldNorm(doc=2369)
      0.25 = coord(1/4)
    
    Abstract
    This study aims to develop models that generate corpus informed clarifying questions for web search, in a way that ensures the questions align with the available information in the retrieval corpus. We demonstrate the effectiveness of Retrieval Augmented Language Models (RAG) in this process, emphasising their ability to (i) jointly model the user query and retrieval corpus to pinpoint the uncertainty and ask for clarifications end-to-end and (ii) model more evidence documents, which can be used towards increasing the breadth of the questions asked. However, we observe that in current datasets search intents are largely unsupported by the corpus, which is problematic both for training and evaluation. This causes question generation models to ``hallucinate'', ie. suggest intents that are not in the corpus, which can have detrimental effects in performance. To address this, we propose dataset augmentation methods that align the ground truth clarifications with the retrieval corpus. Additionally, we explore techniques to enhance the relevance of the evidence pool during inference, but find that identifying ground truth intents within the corpus remains challenging. Our analysis suggests that this challenge is partly due to the bias of current datasets towards clarification taxonomies and calls for data that can support generating corpus-informed clarifications.
  17. Hüther, H.: Selix im DFG-Projekt Kascade (1998) 0.01
    0.009572854 = product of:
      0.038291417 = sum of:
        0.038291417 = weight(_text_:und in 6151) [ClassicSimilarity], result of:
          0.038291417 = score(doc=6151,freq=2.0), product of:
            0.15626246 = queryWeight, product of:
              2.217899 = idf(docFreq=13141, maxDocs=44421)
              0.07045517 = queryNorm
            0.24504554 = fieldWeight in 6151, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.217899 = idf(docFreq=13141, maxDocs=44421)
              0.078125 = fieldNorm(doc=6151)
      0.25 = coord(1/4)
    
    Source
    Knowledge Management und Kommunikationssysteme: Proceedings des 6. Internationalen Symposiums für Informationswissenschaft (ISI '98) Prag, 3.-7. November 1998 / Hochschulverband für Informationswissenschaft (HI) e.V. Konstanz ; Fachrichtung Informationswissenschaft der Universität des Saarlandes, Saarbrücken. Hrsg.: Harald H. Zimmermann u. Volker Schramm
  18. Oberhauser, O.; Labner, J.: Relevance Ranking in Online-Katalogen : Informationsstand und Perspektiven (2003) 0.01
    0.009476642 = product of:
      0.03790657 = sum of:
        0.03790657 = weight(_text_:und in 3188) [ClassicSimilarity], result of:
          0.03790657 = score(doc=3188,freq=4.0), product of:
            0.15626246 = queryWeight, product of:
              2.217899 = idf(docFreq=13141, maxDocs=44421)
              0.07045517 = queryNorm
            0.24258271 = fieldWeight in 3188, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              2.217899 = idf(docFreq=13141, maxDocs=44421)
              0.0546875 = fieldNorm(doc=3188)
      0.25 = coord(1/4)
    
    Source
    Mitteilungen der Vereinigung Österreichischer Bibliothekarinnen und Bibliothekare. 56(2003) H.3/4, S.49-63
  19. Weller, K.; Stock, W.G.: Transitive meronymy : automatic concept-based query expansion using weighted transitive part-whole relations (2008) 0.01
    0.009476642 = product of:
      0.03790657 = sum of:
        0.03790657 = weight(_text_:und in 2835) [ClassicSimilarity], result of:
          0.03790657 = score(doc=2835,freq=4.0), product of:
            0.15626246 = queryWeight, product of:
              2.217899 = idf(docFreq=13141, maxDocs=44421)
              0.07045517 = queryNorm
            0.24258271 = fieldWeight in 2835, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              2.217899 = idf(docFreq=13141, maxDocs=44421)
              0.0546875 = fieldNorm(doc=2835)
      0.25 = coord(1/4)
    
    Abstract
    Transitive Meronymie. Automatische begriffsbasierte Suchanfrageerweiterung unter Nutzung gewichteter transitiver Teil-Ganzes-Relationen. Unsere theoretisch orientierte Arbeit isoliert transitive Teil-Ganzes-Beziehungen. Wir diskutieren den Einsatz der Meronymie bei der automatischen begriffsbasierten Suchanfrageerweiterung im Information Retrieval. Aus praktischen Gründen schlagen wir vor, die Bestandsrelationen zu spezifizieren und die einzelnen Arten mit unterschiedlichen Gewichtungswerten zu versehen, die im Retrieval genutzt werden. Für das Design von Wissensordnungen ist bedeutsam, dass innerhalb der Begriffsleiter einer Abstraktionsrelation ein Begriff alle seine Teile (sowie alle transitiven Teile der Teile) an seine Unterbegriffe vererbt.
    Source
    Information - Wissenschaft und Praxis. 59(2008) H.3, S.165-170
  20. Reimer, U.: Empfehlungssysteme (2023) 0.01
    0.009476642 = product of:
      0.03790657 = sum of:
        0.03790657 = weight(_text_:und in 1520) [ClassicSimilarity], result of:
          0.03790657 = score(doc=1520,freq=4.0), product of:
            0.15626246 = queryWeight, product of:
              2.217899 = idf(docFreq=13141, maxDocs=44421)
              0.07045517 = queryNorm
            0.24258271 = fieldWeight in 1520, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              2.217899 = idf(docFreq=13141, maxDocs=44421)
              0.0546875 = fieldNorm(doc=1520)
      0.25 = coord(1/4)
    
    Abstract
    Mit der wachsenden Informationsflut steigen die Anforderungen an Informationssysteme, aus der Menge potenziell relevanter Information die in einem bestimmten Kontext relevanteste zu selektieren. Empfehlungssysteme spielen hier eine besondere Rolle, da sie personalisiert - d. h. kontextspezifisch und benutzerindividuell - relevante Information herausfiltern können. Definition: Ein Empfehlungssystem empfiehlt einem Benutzer bzw. einer Benutzerin in einem definierten Kontext aus einer gegebenen Menge von Empfehlungsobjekten eine Teilmenge als relevant. Empfehlungssysteme machen Benutzer auf Objekte aufmerksam, die sie möglicherweise nie gefunden hätten, weil sie nicht danach gesucht hätten oder sie in der schieren Menge an insgesamt relevanter Information untergegangen wären.
    Source
    Grundlagen der Informationswissenschaft. Hrsg.: Rainer Kuhlen, Dirk Lewandowski, Wolfgang Semar und Christa Womser-Hacker. 7., völlig neu gefasste Ausg

Years

Languages

  • e 75
  • d 36
  • m 1
  • More… Less…

Types

  • a 93
  • m 7
  • x 7
  • el 2
  • r 2
  • s 2
  • p 1
  • More… Less…