Search (64 results, page 1 of 4)

  • × theme_ss:"Retrievalalgorithmen"
  1. Reddaway, S.: High speed text retrieval from large databases on a massively parallel processor (1991) 0.07
    0.06864205 = product of:
      0.2745682 = sum of:
        0.2745682 = weight(_text_:high in 7744) [ClassicSimilarity], result of:
          0.2745682 = score(doc=7744,freq=2.0), product of:
            0.32575038 = queryWeight, product of:
              4.7680445 = idf(docFreq=1025, maxDocs=44421)
              0.06831949 = queryNorm
            0.8428791 = fieldWeight in 7744, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.7680445 = idf(docFreq=1025, maxDocs=44421)
              0.125 = fieldNorm(doc=7744)
      0.25 = coord(1/4)
    
  2. Cecchini, R.L.; Lorenzetti, C.M.; Maguitman, A.G.; Brignole, N.B.: Multiobjective evolutionary algorithms for context-based search (2010) 0.04
    0.044584323 = product of:
      0.17833729 = sum of:
        0.17833729 = weight(_text_:high in 469) [ClassicSimilarity], result of:
          0.17833729 = score(doc=469,freq=6.0), product of:
            0.32575038 = queryWeight, product of:
              4.7680445 = idf(docFreq=1025, maxDocs=44421)
              0.06831949 = queryNorm
            0.5474661 = fieldWeight in 469, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              4.7680445 = idf(docFreq=1025, maxDocs=44421)
              0.046875 = fieldNorm(doc=469)
      0.25 = coord(1/4)
    
    Abstract
    Formulating high-quality queries is a key aspect of context-based search. However, determining the effectiveness of a query is challenging because multiple objectives, such as high precision and high recall, are usually involved. In this work, we study techniques that can be applied to evolve contextualized queries when the criteria for determining query quality are based on multiple objectives. We report on the results of three different strategies for evolving queries: (a) single-objective, (b) multiobjective with Pareto-based ranking, and (c) multiobjective with aggregative ranking. After a comprehensive evaluation with a large set of topics, we discuss the limitations of the single-objective approach and observe that both the Pareto-based and aggregative strategies are highly effective for evolving topical queries. In particular, our experiments lead us to conclude that the multiobjective techniques are superior to a baseline as well as to well-known and ad hoc query reformulation techniques.
  3. He, J.; Meij, E.; Rijke, M. de: Result diversification based on query-specific cluster ranking (2011) 0.04
    0.0371536 = product of:
      0.1486144 = sum of:
        0.1486144 = weight(_text_:high in 355) [ClassicSimilarity], result of:
          0.1486144 = score(doc=355,freq=6.0), product of:
            0.32575038 = queryWeight, product of:
              4.7680445 = idf(docFreq=1025, maxDocs=44421)
              0.06831949 = queryNorm
            0.45622173 = fieldWeight in 355, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              4.7680445 = idf(docFreq=1025, maxDocs=44421)
              0.0390625 = fieldNorm(doc=355)
      0.25 = coord(1/4)
    
    Abstract
    Result diversification is a retrieval strategy for dealing with ambiguous or multi-faceted queries by providing documents that cover as many facets of the query as possible. We propose a result diversification framework based on query-specific clustering and cluster ranking, in which diversification is restricted to documents belonging to clusters that potentially contain a high percentage of relevant documents. Empirical results show that the proposed framework improves the performance of several existing diversification methods. The framework also gives rise to a simple yet effective cluster-based approach to result diversification that selects documents from different clusters to be included in a ranked list in a round robin fashion. We describe a set of experiments aimed at thoroughly analyzing the behavior of the two main components of the proposed diversification framework, ranking and selecting clusters for diversification. Both components have a crucial impact on the overall performance of our framework, but ranking clusters plays a more important role than selecting clusters. We also examine properties that clusters should have in order for our diversification framework to be effective. Most relevant documents should be contained in a small number of high-quality clusters, while there should be no dominantly large clusters. Also, documents from these high-quality clusters should have a diverse content. These properties are strongly correlated with the overall performance of the proposed diversification framework.
  4. Kwok, K.L.: ¬A network approach to probabilistic information retrieval (1995) 0.04
    0.036402944 = product of:
      0.14561178 = sum of:
        0.14561178 = weight(_text_:high in 6696) [ClassicSimilarity], result of:
          0.14561178 = score(doc=6696,freq=4.0), product of:
            0.32575038 = queryWeight, product of:
              4.7680445 = idf(docFreq=1025, maxDocs=44421)
              0.06831949 = queryNorm
            0.44700417 = fieldWeight in 6696, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.7680445 = idf(docFreq=1025, maxDocs=44421)
              0.046875 = fieldNorm(doc=6696)
      0.25 = coord(1/4)
    
    Abstract
    Shows how probabilistic information retrieval based on document components may be implemented as a feedforward (feedbackward) artificial neural network. The network supports adaptation of connection weights as well as the growing of new edges between queries and terms based on user relevance feedback data for training, and it reflects query modification and expansion in information retrieval. A learning rule is applied that can also be viewed as supporting sequential learning using a harmonic sequence learning rate. Experimental results with 4 standard small collections and a large Wall Street Journal collection show that small query expansion levels of about 30 terms can achieve most of the gains at the low-recall high-precision region, while larger expansion levels continue to provide gains at the high-recall low-precision region of a precision recall curve
  5. Thelwall, M.: Can Google's PageRank be used to find the most important academic Web pages? (2003) 0.04
    0.036402944 = product of:
      0.14561178 = sum of:
        0.14561178 = weight(_text_:high in 5457) [ClassicSimilarity], result of:
          0.14561178 = score(doc=5457,freq=4.0), product of:
            0.32575038 = queryWeight, product of:
              4.7680445 = idf(docFreq=1025, maxDocs=44421)
              0.06831949 = queryNorm
            0.44700417 = fieldWeight in 5457, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.7680445 = idf(docFreq=1025, maxDocs=44421)
              0.046875 = fieldNorm(doc=5457)
      0.25 = coord(1/4)
    
    Abstract
    Google's PageRank is an influential algorithm that uses a model of Web use that is dominated by its link structure in order to rank pages by their estimated value to the Web community. This paper reports on the outcome of applying the algorithm to the Web sites of three national university systems in order to test whether it is capable of identifying the most important Web pages. The results are also compared with simple inlink counts. It was discovered that the highest inlinked pages do not always have the highest PageRank, indicating that the two metrics are genuinely different, even for the top pages. More significantly, however, internal links dominated external links for the high ranks in either method and superficial reasons accounted for high scores in both cases. It is concluded that PageRank is not useful for identifying the top pages in a site and that it must be combined with a powerful text matching techniques in order to get the quality of information retrieval results provided by Google.
  6. Costa Carvalho, A. da; Rossi, C.; Moura, E.S. de; Silva, A.S. da; Fernandes, D.: LePrEF: Learn to precompute evidence fusion for efficient query evaluation (2012) 0.03
    0.030335786 = product of:
      0.12134314 = sum of:
        0.12134314 = weight(_text_:high in 1278) [ClassicSimilarity], result of:
          0.12134314 = score(doc=1278,freq=4.0), product of:
            0.32575038 = queryWeight, product of:
              4.7680445 = idf(docFreq=1025, maxDocs=44421)
              0.06831949 = queryNorm
            0.37250346 = fieldWeight in 1278, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.7680445 = idf(docFreq=1025, maxDocs=44421)
              0.0390625 = fieldNorm(doc=1278)
      0.25 = coord(1/4)
    
    Abstract
    State-of-the-art search engine ranking methods combine several distinct sources of relevance evidence to produce a high-quality ranking of results for each query. The fusion of information is currently done at query-processing time, which has a direct effect on the response time of search systems. Previous research also shows that an alternative to improve search efficiency in textual databases is to precompute term impacts at indexing time. In this article, we propose a novel alternative to precompute term impacts, providing a generic framework for combining any distinct set of sources of evidence by using a machine-learning technique. This method retains the advantages of producing high-quality results, but avoids the costs of combining evidence at query-processing time. Our method, called Learn to Precompute Evidence Fusion (LePrEF), uses genetic programming to compute a unified precomputed impact value for each term found in each document prior to query processing, at indexing time. Compared with previous research on precomputing term impacts, our method offers the advantage of providing a generic framework to precompute impact using any set of relevance evidence at any text collection, whereas previous research articles do not. The precomputed impact values are indexed and used later for computing document ranking at query-processing time. By doing so, our method effectively reduces the query processing to simple additions of such impacts. We show that this approach, while leading to results comparable to state-of-the-art ranking methods, also can lead to a significant decrease in computational costs during query processing.
  7. Carpineto, C.; Romano, G.: Information retrieval through hybrid navigation of lattice representations (1996) 0.03
    0.030030895 = product of:
      0.12012358 = sum of:
        0.12012358 = weight(_text_:high in 503) [ClassicSimilarity], result of:
          0.12012358 = score(doc=503,freq=2.0), product of:
            0.32575038 = queryWeight, product of:
              4.7680445 = idf(docFreq=1025, maxDocs=44421)
              0.06831949 = queryNorm
            0.3687596 = fieldWeight in 503, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.7680445 = idf(docFreq=1025, maxDocs=44421)
              0.0546875 = fieldNorm(doc=503)
      0.25 = coord(1/4)
    
    Abstract
    Presents a comprehensive approach to automatic organization and hybrid navigation of text databases. An organizing stage builds a particular lattice representation of the data, through text indexing followed by lattice clustering of the indexed texts. The lattice representation supports the navigation state of the system, a visual retrieval interface that combines 3 main retrieval strategies: browsing, querying, and bounding. Such a hybrid paradigm permits high flexibility in trading off information exploration and retrieval, and had good retrieval performance. Compares information retrieval using lattice-based hybrid navigation with conventional Boolean querying. Experiments conducted on 2 medium-sized bibliographic databases showed that the performance of lattice retrieval was comparable to or better than Boolean retrieval
  8. Heinz, S.; Zobel, J.: Efficient single-pass index construction for text databases (2003) 0.03
    0.030030895 = product of:
      0.12012358 = sum of:
        0.12012358 = weight(_text_:high in 2678) [ClassicSimilarity], result of:
          0.12012358 = score(doc=2678,freq=2.0), product of:
            0.32575038 = queryWeight, product of:
              4.7680445 = idf(docFreq=1025, maxDocs=44421)
              0.06831949 = queryNorm
            0.3687596 = fieldWeight in 2678, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.7680445 = idf(docFreq=1025, maxDocs=44421)
              0.0546875 = fieldNorm(doc=2678)
      0.25 = coord(1/4)
    
    Abstract
    Efficient construction of inverted indexes is essential to provision of search over large collections of text data. In this article, we review the principal approaches to inversion, analyze their theoretical cost, and present experimental results. We identify the drawbacks of existing inversion approaches and propose a single-pass inversion method that, in contrast to previous approaches, does not require the complete vocabulary of the indexed collection in main memory, can operate within limited resources, and does not sacrifice speed with high temporary storage requirements. We show that the performance of the single-pass approach can be improved by constructing inverted files in segments, reducing the cost of disk accesses during inversion of large volumes of data.
  9. Vechtomova, O.; Karamuftuoglu, M.: Elicitation and use of relevance feedback information (2006) 0.03
    0.030030895 = product of:
      0.12012358 = sum of:
        0.12012358 = weight(_text_:high in 1966) [ClassicSimilarity], result of:
          0.12012358 = score(doc=1966,freq=2.0), product of:
            0.32575038 = queryWeight, product of:
              4.7680445 = idf(docFreq=1025, maxDocs=44421)
              0.06831949 = queryNorm
            0.3687596 = fieldWeight in 1966, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.7680445 = idf(docFreq=1025, maxDocs=44421)
              0.0546875 = fieldNorm(doc=1966)
      0.25 = coord(1/4)
    
    Abstract
    The paper presents two approaches to interactively refining user search formulations and their evaluation in the new High Accuracy Retrieval from Documents (HARD) track of TREC-12. The first method consists of asking the user to select a number of sentences that represent documents. The second method consists of showing to the user a list of noun phrases extracted from the initial document set. Both methods then expand the query based on the user feedback. The TREC results show that one of the methods is an effective means of interactive query expansion and yields significant performance improvements. The paper presents a comparison of the methods and detailed analysis of the evaluation results.
  10. Joss, M.W.; Wszola, S.: ¬The engines that can : text search and retrieval software, their strategies, and vendors (1996) 0.03
    0.025740769 = product of:
      0.102963075 = sum of:
        0.102963075 = weight(_text_:high in 5191) [ClassicSimilarity], result of:
          0.102963075 = score(doc=5191,freq=2.0), product of:
            0.32575038 = queryWeight, product of:
              4.7680445 = idf(docFreq=1025, maxDocs=44421)
              0.06831949 = queryNorm
            0.31607968 = fieldWeight in 5191, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.7680445 = idf(docFreq=1025, maxDocs=44421)
              0.046875 = fieldNorm(doc=5191)
      0.25 = coord(1/4)
    
    Abstract
    Traces the development of text searching and retrieval software designed to cope with the increasing demands made by the storage and handling of large amounts of data, recorded on high data storage media, from CD-ROM to multi gigabyte storage media and online information services, with particular reference to the need to cope with graphics as well as conventional ASCII text. Includes details of: Boolean searching, fuzzy searching and matching; relevance ranking; proximity searching and improved strategies for dealing with text searching in very large databases. Concludes that the best searching tools for CD-ROM publishers are those optimized for searching and retrieval on CD-ROM. CD-ROM drives have relatively lower random seek times than hard discs and so the software most appropriate to the medium is that which can effectively arrange the indexes and text on the CD-ROM to avoid continuous random access searching. Lists and reviews a selection of software packages designed to achieve the sort of results required for rapid CD-ROM searching
  11. Klas, C.-P.; Fuhr, N.; Schaefer, A.: Evaluating strategic support for information access in the DAFFODIL system (2004) 0.03
    0.025740769 = product of:
      0.102963075 = sum of:
        0.102963075 = weight(_text_:high in 3419) [ClassicSimilarity], result of:
          0.102963075 = score(doc=3419,freq=2.0), product of:
            0.32575038 = queryWeight, product of:
              4.7680445 = idf(docFreq=1025, maxDocs=44421)
              0.06831949 = queryNorm
            0.31607968 = fieldWeight in 3419, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.7680445 = idf(docFreq=1025, maxDocs=44421)
              0.046875 = fieldNorm(doc=3419)
      0.25 = coord(1/4)
    
    Abstract
    The digital library system Daffodil is targeted at strategic support of users during the information search process. For searching, exploring and managing digital library objects it provides user-customisable information seeking patterns over a federation of heterogeneous digital libraries. In this paper evaluation results with respect to retrieval effectiveness, efficiency and user satisfaction are presented. The analysis focuses on strategic support for the scientific work-flow. Daffodil supports the whole work-flow, from data source selection over information seeking to the representation, organisation and reuse of information. By embedding high level search functionality into the scientific work-flow, the user experiences better strategic system support due to a more systematic work process. These ideas have been implemented in Daffodil followed by a qualitative evaluation. The evaluation has been conducted with 28 participants, ranging from information seeking novices to experts. The results are promising, as they support the chosen model.
  12. Song, D.; Bruza, P.D.: Towards context sensitive information inference (2003) 0.02
    0.02145064 = product of:
      0.08580256 = sum of:
        0.08580256 = weight(_text_:high in 2428) [ClassicSimilarity], result of:
          0.08580256 = score(doc=2428,freq=2.0), product of:
            0.32575038 = queryWeight, product of:
              4.7680445 = idf(docFreq=1025, maxDocs=44421)
              0.06831949 = queryNorm
            0.26339972 = fieldWeight in 2428, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.7680445 = idf(docFreq=1025, maxDocs=44421)
              0.0390625 = fieldNorm(doc=2428)
      0.25 = coord(1/4)
    
    Abstract
    Humans can make hasty, but generally robust judgements about what a text fragment is, or is not, about. Such judgements are termed information inference. This article furnishes an account of information inference from a psychologistic stance. By drawing an theories from nonclassical logic and applied cognition, an information inference mechanism is proposed that makes inferences via computations of information flow through an approximation of a conceptual space. Within a conceptual space information is represented geometrically. In this article, geometric representations of words are realized as vectors in a high dimensional semantic space, which is automatically constructed from a text corpus. Two approaches were presented for priming vector representations according to context. The first approach uses a concept combination heuristic to adjust the vector representation of a concept in the light of the representation of another concept. The second approach computes a prototypical concept an the basis of exemplar trace texts and moves it in the dimensional space according to the context. Information inference is evaluated by measuring the effectiveness of query models derived by information flow computations. Results show that information flow contributes significantly to query model effectiveness, particularly with respect to precision. Moreover, retrieval effectiveness compares favorably with two probabilistic query models, and another based an semantic association. More generally, this article can be seen as a contribution towards realizing operational systems that mimic text-based human reasoning.
  13. Hoenkamp, E.: Unitary operators on the document space (2003) 0.02
    0.02145064 = product of:
      0.08580256 = sum of:
        0.08580256 = weight(_text_:high in 4457) [ClassicSimilarity], result of:
          0.08580256 = score(doc=4457,freq=2.0), product of:
            0.32575038 = queryWeight, product of:
              4.7680445 = idf(docFreq=1025, maxDocs=44421)
              0.06831949 = queryNorm
            0.26339972 = fieldWeight in 4457, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.7680445 = idf(docFreq=1025, maxDocs=44421)
              0.0390625 = fieldNorm(doc=4457)
      0.25 = coord(1/4)
    
    Abstract
    When people search for documents, they eventually want content, not words. Hence, search engines should relate documents more by their underlying concepts than by the words they contain. One promising technique to do so is Latent Semantic Indexing (LSI). LSI dramatically reduces the dimension of the document space by mapping it into a space spanned by conceptual indices. Empirically, the number of concepts that can represent the documents are far fewer than the great variety of words in the textual representation. Although this almost obviates the problem of lexical matching, the mapping incurs a high computational cost compared to document parsing, indexing, query matching, and updating. This article accomplishes several things. First, it shows how the technique underlying LSI is just one example of a unitary operator, for which there are computationally more attractive alternatives. Second, it proposes the Haar transform as such an alternative, as it is memory efficient, and can be computed in linear to sublinear time. Third, it generalizes LSI by a multiresolution representation of the document space. The approach not only preserves the advantages of LSI at drastically reduced computational costs, it also opens a spectrum of possibilities for new research.
  14. Widyantoro, D.H.; Ioerger, T.R.; Yen, J.: Learning user Interest dynamics with a three-descriptor representation (2001) 0.02
    0.02145064 = product of:
      0.08580256 = sum of:
        0.08580256 = weight(_text_:high in 185) [ClassicSimilarity], result of:
          0.08580256 = score(doc=185,freq=2.0), product of:
            0.32575038 = queryWeight, product of:
              4.7680445 = idf(docFreq=1025, maxDocs=44421)
              0.06831949 = queryNorm
            0.26339972 = fieldWeight in 185, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.7680445 = idf(docFreq=1025, maxDocs=44421)
              0.0390625 = fieldNorm(doc=185)
      0.25 = coord(1/4)
    
    Abstract
    The use of documents ranked high by user feedback to profile user interests is commonly done with Rocchio's `s algorithm which uses a single list of attribute value pairs called a descriptor to carry term value weights for an individual. Negative feed back on old preferences or positive feedback on new preferences adjusts the descriptor at a fixed, predetermined, and often slow pace. Widyantoro, et alia, suggest a three descriptor model which adds two short term interest descriptors, one each for positive and negative feedback. User short term interest in a particular document is computed by subtracting the similarity measure with the negative descriptor from the similarity measure with the positive descriptor. Using a constant to represent the desired impact of long and short term interests these values may be summed for a single interest value. Using the Reuters 21578 1.0 test collection split into training and test sets, topics with at least 100 documents in a tight cluster were chosen. The TDR handles change well showing better recovery speed and accuracy than the single descriptor model. The nearest neighbor update strategy appears to keep the category concept relatively consistent when multiple TDRs are used.
  15. Shah, B.; Raghavan, V.; Dhatric, P.; Zhao, X.: ¬A cluster-based approach for efficient content-based image retrieval using a similarity-preserving space transformation method (2006) 0.02
    0.02145064 = product of:
      0.08580256 = sum of:
        0.08580256 = weight(_text_:high in 118) [ClassicSimilarity], result of:
          0.08580256 = score(doc=118,freq=2.0), product of:
            0.32575038 = queryWeight, product of:
              4.7680445 = idf(docFreq=1025, maxDocs=44421)
              0.06831949 = queryNorm
            0.26339972 = fieldWeight in 118, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.7680445 = idf(docFreq=1025, maxDocs=44421)
              0.0390625 = fieldNorm(doc=118)
      0.25 = coord(1/4)
    
    Abstract
    The techniques of clustering and space transformation have been successfully used in the past to solve a number of pattern recognition problems. In this article, the authors propose a new approach to content-based image retrieval (CBIR) that uses (a) a newly proposed similarity-preserving space transformation method to transform the original low-level image space into a highlevel vector space that enables efficient query processing, and (b) a clustering scheme that further improves the efficiency of our retrieval system. This combination is unique and the resulting system provides synergistic advantages of using both clustering and space transformation. The proposed space transformation method is shown to preserve the order of the distances in the transformed feature space. This strategy makes this approach to retrieval generic as it can be applied to object types, other than images, and feature spaces more general than metric spaces. The CBIR approach uses the inexpensive "estimated" distance in the transformed space, as opposed to the computationally inefficient "real" distance in the original space, to retrieve the desired results for a given query image. The authors also provide a theoretical analysis of the complexity of their CBIR approach when used for color-based retrieval, which shows that it is computationally more efficient than other comparable approaches. An extensive set of experiments to test the efficiency and effectiveness of the proposed approach has been performed. The results show that the approach offers superior response time (improvement of 1-2 orders of magnitude compared to retrieval approaches that either use pruning techniques like indexing, clustering, etc., or space transformation, but not both) with sufficiently high retrieval accuracy.
  16. Cheng, C.-S.; Chung, C.-P.; Shann, J.J.-J.: Fast query evaluation through document identifier assignment for inverted file-based information retrieval systems (2006) 0.02
    0.02145064 = product of:
      0.08580256 = sum of:
        0.08580256 = weight(_text_:high in 1979) [ClassicSimilarity], result of:
          0.08580256 = score(doc=1979,freq=2.0), product of:
            0.32575038 = queryWeight, product of:
              4.7680445 = idf(docFreq=1025, maxDocs=44421)
              0.06831949 = queryNorm
            0.26339972 = fieldWeight in 1979, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.7680445 = idf(docFreq=1025, maxDocs=44421)
              0.0390625 = fieldNorm(doc=1979)
      0.25 = coord(1/4)
    
    Abstract
    Compressing an inverted file can greatly improve query performance of an information retrieval system (IRS) by reducing disk I/Os. We observe that a good document identifier assignment (DIA) can make the document identifiers in the posting lists more clustered, and result in better compression as well as shorter query processing time. In this paper, we tackle the NP-complete problem of finding an optimal DIA to minimize the average query processing time in an IRS when the probability distribution of query terms is given. We indicate that the greedy nearest neighbor (Greedy-NN) algorithm can provide excellent performance for this problem. However, the Greedy-NN algorithm is inappropriate if used in large-scale IRSs, due to its high complexity O(N2 × n), where N denotes the number of documents and n denotes the number of distinct terms. In real-world IRSs, the distribution of query terms is skewed. Based on this fact, we propose a fast O(N × n) heuristic, called partition-based document identifier assignment (PBDIA) algorithm, which can efficiently assign consecutive document identifiers to those documents containing frequently used query terms, and improve compression efficiency of the posting lists for those terms. This can result in reduced query processing time. The experimental results show that the PBDIA algorithm can yield a competitive performance versus the Greedy-NN for the DIA problem, and that this optimization problem has significant advantages for both long queries and parallel information retrieval (IR).
  17. Lee, J.-T.; Seo, J.; Jeon, J.; Rim, H.-C.: Sentence-based relevance flow analysis for high accuracy retrieval (2011) 0.02
    0.02145064 = product of:
      0.08580256 = sum of:
        0.08580256 = weight(_text_:high in 746) [ClassicSimilarity], result of:
          0.08580256 = score(doc=746,freq=2.0), product of:
            0.32575038 = queryWeight, product of:
              4.7680445 = idf(docFreq=1025, maxDocs=44421)
              0.06831949 = queryNorm
            0.26339972 = fieldWeight in 746, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.7680445 = idf(docFreq=1025, maxDocs=44421)
              0.0390625 = fieldNorm(doc=746)
      0.25 = coord(1/4)
    
  18. Wiggers, G.; Verberne, S.; Loon, W. van; Zwenne, G.-J.: Bibliometric-enhanced legal information retrieval : combining usage and citations as flavors of impact relevance (2023) 0.02
    0.02145064 = product of:
      0.08580256 = sum of:
        0.08580256 = weight(_text_:high in 2024) [ClassicSimilarity], result of:
          0.08580256 = score(doc=2024,freq=2.0), product of:
            0.32575038 = queryWeight, product of:
              4.7680445 = idf(docFreq=1025, maxDocs=44421)
              0.06831949 = queryNorm
            0.26339972 = fieldWeight in 2024, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.7680445 = idf(docFreq=1025, maxDocs=44421)
              0.0390625 = fieldNorm(doc=2024)
      0.25 = coord(1/4)
    
    Abstract
    Bibliometric-enhanced information retrieval uses bibliometrics (e.g., citations) to improve ranking algorithms. Using a data-driven approach, this article describes the development of a bibliometric-enhanced ranking algorithm for legal information retrieval, and the evaluation thereof. We statistically analyze the correlation between usage of documents and citations over time, using data from a commercial legal search engine. We then propose a bibliometric boost function that combines usage of documents with citation counts. The core of this function is an impact variable based on usage and citations that increases in influence as citations and usage counts become more reliable over time. We evaluate our ranking function by comparing search sessions before and after the introduction of the new ranking in the search engine. Using a cost model applied to 129,571 sessions before and 143,864 sessions after the intervention, we show that our bibliometric-enhanced ranking algorithm reduces the time of a search session of legal professionals by 2 to 3% on average for use cases other than known-item retrieval or updating behavior. Given the high hourly tariff of legal professionals and the limited time they can spend on research, this is expected to lead to increased efficiency, especially for users with extremely long search sessions.
  19. Ghali, M.-K.; Farrag, A.; Won, D.; Jin, Y.: Enhancing knowledge retrieval with in-context learning and semantic search through Generative AI (2024) 0.02
    0.02145064 = product of:
      0.08580256 = sum of:
        0.08580256 = weight(_text_:high in 2367) [ClassicSimilarity], result of:
          0.08580256 = score(doc=2367,freq=2.0), product of:
            0.32575038 = queryWeight, product of:
              4.7680445 = idf(docFreq=1025, maxDocs=44421)
              0.06831949 = queryNorm
            0.26339972 = fieldWeight in 2367, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.7680445 = idf(docFreq=1025, maxDocs=44421)
              0.0390625 = fieldNorm(doc=2367)
      0.25 = coord(1/4)
    
    Abstract
    Retrieving and extracting knowledge from extensive research documents and large databases presents significant challenges for researchers, students, and professionals in today's information-rich era. Existing retrieval systems, which rely on general-purpose Large Language Models (LLMs), often fail to provide accurate responses to domain-specific inquiries. Additionally, the high cost of pretraining or fine-tuning LLMs for specific domains limits their widespread adoption. To address these limitations, we propose a novel methodology that combines the generative capabilities of LLMs with the fast and accurate retrieval capabilities of vector databases. This advanced retrieval system can efficiently handle both tabular and non-tabular data, understand natural language user queries, and retrieve relevant information without fine-tuning. The developed model, Generative Text Retrieval (GTR), is adaptable to both unstructured and structured data with minor refinement. GTR was evaluated on both manually annotated and public datasets, achieving over 90% accuracy and delivering truthful outputs in 87% of cases. Our model achieved state-of-the-art performance with a Rouge-L F1 score of 0.98 on the MSMARCO dataset. The refined model, Generative Tabular Text Retrieval (GTR-T), demonstrated its efficiency in large database querying, achieving an Execution Accuracy (EX) of 0.82 and an Exact-Set-Match (EM) accuracy of 0.60 on the Spider dataset, using an open-source LLM. These efforts leverage Generative AI and In-Context Learning to enhance human-text interaction and make advanced AI capabilities more accessible. By integrating robust retrieval systems with powerful LLMs, our approach aims to democratize access to sophisticated AI tools, improving the efficiency, accuracy, and scalability of AI-driven information retrieval and database querying.
  20. Mandl, T.: Web- und Multimedia-Dokumente : Neuere Entwicklungen bei der Evaluierung von Information Retrieval Systemen (2003) 0.02
    0.018190257 = product of:
      0.07276103 = sum of:
        0.07276103 = weight(_text_:und in 2734) [ClassicSimilarity], result of:
          0.07276103 = score(doc=2734,freq=12.0), product of:
            0.15152574 = queryWeight, product of:
              2.217899 = idf(docFreq=13141, maxDocs=44421)
              0.06831949 = queryNorm
            0.48018923 = fieldWeight in 2734, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              2.217899 = idf(docFreq=13141, maxDocs=44421)
              0.0625 = fieldNorm(doc=2734)
      0.25 = coord(1/4)
    
    Abstract
    Die Menge an Daten im Internet steigt weiter rapide an. Damit wächst auch der Bedarf an qualitativ hochwertigen Information Retrieval Diensten zur Orientierung und problemorientierten Suche. Die Entscheidung für die Benutzung oder Beschaffung von Information Retrieval Software erfordert aussagekräftige Evaluierungsergebnisse. Dieser Beitrag stellt neuere Entwicklungen bei der Evaluierung von Information Retrieval Systemen vor und zeigt den Trend zu Spezialisierung und Diversifizierung von Evaluierungsstudien, die den Realitätsgrad derErgebnisse erhöhen. DerSchwerpunkt liegt auf dem Retrieval von Fachtexten, Internet-Seiten und Multimedia-Objekten.
    Source
    Information - Wissenschaft und Praxis. 54(2003) H.4, S.203-210

Years

Languages

  • d 36
  • e 27
  • m 1
  • More… Less…

Types

  • a 52
  • x 7
  • m 3
  • r 2
  • el 1
  • s 1
  • More… Less…