Search (58 results, page 1 of 3)

  • × theme_ss:"Retrievalalgorithmen"
  1. Ravana, S.D.; Rajagopal, P.; Balakrishnan, V.: Ranking retrieval systems using pseudo relevance judgments (2015) 0.04
    0.044460382 = product of:
      0.17784153 = sum of:
        0.17784153 = weight(_text_:human in 3591) [ClassicSimilarity], result of:
          0.17784153 = score(doc=3591,freq=12.0), product of:
            0.30094394 = queryWeight, product of:
              4.3671384 = idf(docFreq=1531, maxDocs=44421)
              0.068911016 = queryNorm
            0.5909457 = fieldWeight in 3591, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              4.3671384 = idf(docFreq=1531, maxDocs=44421)
              0.0390625 = fieldNorm(doc=3591)
      0.25 = coord(1/4)
    
    Abstract
    Purpose In a system-based approach, replicating the web would require large test collections, and judging the relevancy of all documents per topic in creating relevance judgment through human assessors is infeasible. Due to the large amount of documents that requires judgment, there are possible errors introduced by human assessors because of disagreements. The paper aims to discuss these issues. Design/methodology/approach This study explores exponential variation and document ranking methods that generate a reliable set of relevance judgments (pseudo relevance judgments) to reduce human efforts. These methods overcome problems with large amounts of documents for judgment while avoiding human disagreement errors during the judgment process. This study utilizes two key factors: number of occurrences of each document per topic from all the system runs; and document rankings to generate the alternate methods. Findings The effectiveness of the proposed method is evaluated using the correlation coefficient of ranked systems using mean average precision scores between the original Text REtrieval Conference (TREC) relevance judgments and pseudo relevance judgments. The results suggest that the proposed document ranking method with a pool depth of 100 could be a reliable alternative to reduce human effort and disagreement errors involved in generating TREC-like relevance judgments. Originality/value Simple methods proposed in this study show improvement in the correlation coefficient in generating alternate relevance judgment without human assessors while contributing to information retrieval evaluation.
  2. Zhu, B.; Chen, H.: Validating a geographical image retrieval system (2000) 0.04
    0.03772589 = product of:
      0.15090355 = sum of:
        0.15090355 = weight(_text_:human in 5769) [ClassicSimilarity], result of:
          0.15090355 = score(doc=5769,freq=6.0), product of:
            0.30094394 = queryWeight, product of:
              4.3671384 = idf(docFreq=1531, maxDocs=44421)
              0.068911016 = queryNorm
            0.5014341 = fieldWeight in 5769, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              4.3671384 = idf(docFreq=1531, maxDocs=44421)
              0.046875 = fieldNorm(doc=5769)
      0.25 = coord(1/4)
    
    Abstract
    This paper summarizes a prototype geographical image retrieval system that demonstrates how to integrate image processing and information analysis techniques to support large-scale content-based image retrieval. By using an image as its interface, the prototype system addresses a troublesome aspect of traditional retrieval models, which require users to have complete knowledge of the low-level features of an image. In addition we describe an experiment to validate against that of human subjects in an effort to address the scarcity of research evaluating performance of an algorithm against that of human beings. The results of the experiment indicate that the system could do as well as human subjects in accomplishing the tasks of similarity analysis and image categorization. We also found that under some circumstances texture features of an image are insufficient to represent an geographic image. We believe, however, that our image retrieval system provides a promising approach to integrating image processing techniques and information retrieval algorithms
  3. Efron, M.; Winget, M.: Query polyrepresentation for ranking retrieval systems without relevance judgments (2010) 0.03
    0.03080306 = product of:
      0.12321224 = sum of:
        0.12321224 = weight(_text_:human in 456) [ClassicSimilarity], result of:
          0.12321224 = score(doc=456,freq=4.0), product of:
            0.30094394 = queryWeight, product of:
              4.3671384 = idf(docFreq=1531, maxDocs=44421)
              0.068911016 = queryNorm
            0.40941924 = fieldWeight in 456, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.3671384 = idf(docFreq=1531, maxDocs=44421)
              0.046875 = fieldNorm(doc=456)
      0.25 = coord(1/4)
    
    Abstract
    Ranking information retrieval (IR) systems with respect to their effectiveness is a crucial operation during IR evaluation, as well as during data fusion. This article offers a novel method of approaching the system-ranking problem, based on the widely studied idea of polyrepresentation. The principle of polyrepresentation suggests that a single information need can be represented by many query articulations-what we call query aspects. By skimming the top k (where k is small) documents retrieved by a single system for multiple query aspects, we collect a set of documents that are likely to be relevant to a given test topic. Labeling these skimmed documents as putatively relevant lets us build pseudorelevance judgments without undue human intervention. We report experiments where using these pseudorelevance judgments delivers a rank ordering of IR systems that correlates highly with rankings based on human relevance judgments.
  4. Liddy, E.D.; Paik, W.; McKenna, M.; Yu, E.S.: ¬A natural language text retrieval system with relevance feedback (1995) 0.03
    0.025411226 = product of:
      0.1016449 = sum of:
        0.1016449 = weight(_text_:human in 3199) [ClassicSimilarity], result of:
          0.1016449 = score(doc=3199,freq=2.0), product of:
            0.30094394 = queryWeight, product of:
              4.3671384 = idf(docFreq=1531, maxDocs=44421)
              0.068911016 = queryNorm
            0.33775362 = fieldWeight in 3199, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.3671384 = idf(docFreq=1531, maxDocs=44421)
              0.0546875 = fieldNorm(doc=3199)
      0.25 = coord(1/4)
    
    Abstract
    Outlines a fully integrated retrieval engine that processes documents and queries at the multiple, complex linguistic levels that humans use to construe meaning. Currently undergoing beta site trials, the DR-LINK natural language text retrieval system allows searchers to state queries as fully formed, natural sentences. The meaning and matching of both queries and documents is accomplished at the conceptual level of human expression, not by the simple concurrence of keywords. Furthermore, the natural browsing behaviour of information searchers is accomodated by allowing documents identified as potentially relevant by the explicit semantics of the system to be used as relevance feedback queries which provide an appropriate implicit semantic representation of the information seeker's need
  5. Carpineto, C.; Romano, G.: Information retrieval through hybrid navigation of lattice representations (1996) 0.03
    0.025411226 = product of:
      0.1016449 = sum of:
        0.1016449 = weight(_text_:human in 503) [ClassicSimilarity], result of:
          0.1016449 = score(doc=503,freq=2.0), product of:
            0.30094394 = queryWeight, product of:
              4.3671384 = idf(docFreq=1531, maxDocs=44421)
              0.068911016 = queryNorm
            0.33775362 = fieldWeight in 503, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.3671384 = idf(docFreq=1531, maxDocs=44421)
              0.0546875 = fieldNorm(doc=503)
      0.25 = coord(1/4)
    
    Source
    International journal of human-computer studies. 45(1996) no.5, S.553-578
  6. French, J.C.; Powell, A.L.; Schulman, E.: Using clustering strategies for creating authority files (2000) 0.02
    0.021781052 = product of:
      0.087124206 = sum of:
        0.087124206 = weight(_text_:human in 5811) [ClassicSimilarity], result of:
          0.087124206 = score(doc=5811,freq=2.0), product of:
            0.30094394 = queryWeight, product of:
              4.3671384 = idf(docFreq=1531, maxDocs=44421)
              0.068911016 = queryNorm
            0.2895031 = fieldWeight in 5811, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.3671384 = idf(docFreq=1531, maxDocs=44421)
              0.046875 = fieldNorm(doc=5811)
      0.25 = coord(1/4)
    
    Abstract
    As more online databases are integrated into digital libraries, the issue of quality control of the data becomes increasingly important, especially as it relates to the effective retrieval of information. Authority work, the need to discover and reconcile variant forms of strings in bibliographical entries, will become more critical in the future. Spelling variants, misspellings, and transliteration differences will all increase the difficulty of retrieving information. We investigate a number of approximate string matching techniques that have traditionally been used to help with this problem. We then introduce the notion of approximate word matching and show how it can be used to improve detection and categorization of variant forms. We demonstrate the utility of these approaches using data from the Astrophysics Data System and show how we can reduce the human effort involved in the creation of authority files
  7. Chen, H.; Zhang, Y.; Houston, A.L.: Semantic indexing and searching using a Hopfield net (1998) 0.02
    0.021781052 = product of:
      0.087124206 = sum of:
        0.087124206 = weight(_text_:human in 6704) [ClassicSimilarity], result of:
          0.087124206 = score(doc=6704,freq=2.0), product of:
            0.30094394 = queryWeight, product of:
              4.3671384 = idf(docFreq=1531, maxDocs=44421)
              0.068911016 = queryNorm
            0.2895031 = fieldWeight in 6704, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.3671384 = idf(docFreq=1531, maxDocs=44421)
              0.046875 = fieldNorm(doc=6704)
      0.25 = coord(1/4)
    
    Abstract
    Presents a neural network approach to document semantic indexing. Reports results of a study to apply a Hopfield net algorithm to simulate human associative memory for concept exploration in the domain of computer science and engineering. The INSPEC database, consisting of 320.000 abstracts from leading periodical articles was used as the document test bed. Benchmark tests conformed that 3 parameters: maximum number of activated nodes; maximum allowable error; and maximum number of iterations; were useful in positively influencing network convergence behaviour without negatively impacting central processing unit performance. Another series of benchmark tests was performed to determine the effectiveness of various filtering techniques in reducing the negative impact of noisy input terms. Preliminary user tests conformed expectations that the Hopfield net is potentially useful as an associative memory technique to improve document recall and precision by solving discrepancies between indexer vocabularies and end user vocabularies
  8. Thelwall, M.; Vaughan, L.: New versions of PageRank employing alternative Web document models (2004) 0.02
    0.021781052 = product of:
      0.087124206 = sum of:
        0.087124206 = weight(_text_:human in 799) [ClassicSimilarity], result of:
          0.087124206 = score(doc=799,freq=2.0), product of:
            0.30094394 = queryWeight, product of:
              4.3671384 = idf(docFreq=1531, maxDocs=44421)
              0.068911016 = queryNorm
            0.2895031 = fieldWeight in 799, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.3671384 = idf(docFreq=1531, maxDocs=44421)
              0.046875 = fieldNorm(doc=799)
      0.25 = coord(1/4)
    
    Abstract
    Introduces several new versions of PageRank (the link based Web page ranking algorithm), based on an information science perspective on the concept of the Web document. Although the Web page is the typical indivisible unit of information in search engine results and most Web information retrieval algorithms, other research has suggested that aggregating pages based on directories and domains gives promising alternatives, particularly when Web links are the object of study. The new algorithms introduced based on these alternatives were used to rank four sets of Web pages. The ranking results were compared with human subjects' rankings. The results of the tests were somewhat inconclusive: the new approach worked well for the set that includes pages from different Web sites; however, it does not work well in ranking pages that are from the same site. It seems that the new algorithms may be effective for some tasks but not for others, especially when only low numbers of links are involved or the pages to be ranked are from the same site or directory.
  9. Fu, X.: Towards a model of implicit feedback for Web search (2010) 0.02
    0.021781052 = product of:
      0.087124206 = sum of:
        0.087124206 = weight(_text_:human in 297) [ClassicSimilarity], result of:
          0.087124206 = score(doc=297,freq=2.0), product of:
            0.30094394 = queryWeight, product of:
              4.3671384 = idf(docFreq=1531, maxDocs=44421)
              0.068911016 = queryNorm
            0.2895031 = fieldWeight in 297, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.3671384 = idf(docFreq=1531, maxDocs=44421)
              0.046875 = fieldNorm(doc=297)
      0.25 = coord(1/4)
    
    Abstract
    This research investigated several important issues in using implicit feedback techniques to assist searchers with difficulties in formulating effective search strategies. It focused on examining the relationship between types of behavioral evidence that can be captured from Web searches and searchers' interests. A carefully crafted observation study was conducted to capture, examine, and elucidate the analytical processes and work practices of human analysts when they simulated the role of an implicit feedback system by trying to infer searchers' interests from behavioral traces. Findings provided rare insight into the complexities and nuances in using behavioral evidence for implicit feedback and led to the proposal of an implicit feedback model for Web search that bridged previous studies on behavioral evidence and implicit feedback measures. A new level of analysis termed an analytical lens emerged from the data and provides a road map for future research on this topic.
  10. Mandl, T.: Web- und Multimedia-Dokumente : Neuere Entwicklungen bei der Evaluierung von Information Retrieval Systemen (2003) 0.02
    0.018347751 = product of:
      0.073391005 = sum of:
        0.073391005 = weight(_text_:und in 2734) [ClassicSimilarity], result of:
          0.073391005 = score(doc=2734,freq=12.0), product of:
            0.15283768 = queryWeight, product of:
              2.217899 = idf(docFreq=13141, maxDocs=44421)
              0.068911016 = queryNorm
            0.48018923 = fieldWeight in 2734, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              2.217899 = idf(docFreq=13141, maxDocs=44421)
              0.0625 = fieldNorm(doc=2734)
      0.25 = coord(1/4)
    
    Abstract
    Die Menge an Daten im Internet steigt weiter rapide an. Damit wächst auch der Bedarf an qualitativ hochwertigen Information Retrieval Diensten zur Orientierung und problemorientierten Suche. Die Entscheidung für die Benutzung oder Beschaffung von Information Retrieval Software erfordert aussagekräftige Evaluierungsergebnisse. Dieser Beitrag stellt neuere Entwicklungen bei der Evaluierung von Information Retrieval Systemen vor und zeigt den Trend zu Spezialisierung und Diversifizierung von Evaluierungsstudien, die den Realitätsgrad derErgebnisse erhöhen. DerSchwerpunkt liegt auf dem Retrieval von Fachtexten, Internet-Seiten und Multimedia-Objekten.
    Source
    Information - Wissenschaft und Praxis. 54(2003) H.4, S.203-210
  11. Song, D.; Bruza, P.D.: Towards context sensitive information inference (2003) 0.02
    0.018150877 = product of:
      0.07260351 = sum of:
        0.07260351 = weight(_text_:human in 2428) [ClassicSimilarity], result of:
          0.07260351 = score(doc=2428,freq=2.0), product of:
            0.30094394 = queryWeight, product of:
              4.3671384 = idf(docFreq=1531, maxDocs=44421)
              0.068911016 = queryNorm
            0.2412526 = fieldWeight in 2428, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.3671384 = idf(docFreq=1531, maxDocs=44421)
              0.0390625 = fieldNorm(doc=2428)
      0.25 = coord(1/4)
    
    Abstract
    Humans can make hasty, but generally robust judgements about what a text fragment is, or is not, about. Such judgements are termed information inference. This article furnishes an account of information inference from a psychologistic stance. By drawing an theories from nonclassical logic and applied cognition, an information inference mechanism is proposed that makes inferences via computations of information flow through an approximation of a conceptual space. Within a conceptual space information is represented geometrically. In this article, geometric representations of words are realized as vectors in a high dimensional semantic space, which is automatically constructed from a text corpus. Two approaches were presented for priming vector representations according to context. The first approach uses a concept combination heuristic to adjust the vector representation of a concept in the light of the representation of another concept. The second approach computes a prototypical concept an the basis of exemplar trace texts and moves it in the dimensional space according to the context. Information inference is evaluated by measuring the effectiveness of query models derived by information flow computations. Results show that information flow contributes significantly to query model effectiveness, particularly with respect to precision. Moreover, retrieval effectiveness compares favorably with two probabilistic query models, and another based an semantic association. More generally, this article can be seen as a contribution towards realizing operational systems that mimic text-based human reasoning.
  12. Chen, H.; Lally, A.M.; Zhu, B.; Chau, M.: HelpfulMed : Intelligent searching for medical information over the Internet (2003) 0.02
    0.018150877 = product of:
      0.07260351 = sum of:
        0.07260351 = weight(_text_:human in 2615) [ClassicSimilarity], result of:
          0.07260351 = score(doc=2615,freq=2.0), product of:
            0.30094394 = queryWeight, product of:
              4.3671384 = idf(docFreq=1531, maxDocs=44421)
              0.068911016 = queryNorm
            0.2412526 = fieldWeight in 2615, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.3671384 = idf(docFreq=1531, maxDocs=44421)
              0.0390625 = fieldNorm(doc=2615)
      0.25 = coord(1/4)
    
    Abstract
    The Medical professionals and researchers need information from reputable sources to accomplish their work. Unfortunately, the Web has a large number of documents that are irrelevant to their work, even those documents that purport to be "medically-related." This paper describes an architecture designed to integrate advanced searching and indexing algorithms, an automatic thesaurus, or "concept space," and Kohonen-based Self-Organizing Map (SOM) technologies to provide searchers with finegrained results. Initial results indicate that these systems provide complementary retrieval functionalities. HelpfulMed not only allows users to search Web pages and other online databases, but also allows them to build searches through the use of an automatic thesaurus and browse a graphical display of medical-related topics. Evaluation results for each of the different components are included. Our spidering algorithm outperformed both breadth-first search and PageRank spiders an a test collection of 100,000 Web pages. The automatically generated thesaurus performed as well as both MeSH and UMLS-systems which require human mediation for currency. Lastly, a variant of the Kohonen SOM was comparable to MeSH terms in perceived cluster precision and significantly better at perceived cluster recall.
  13. Dannenberg, R.B.; Birmingham, W.P.; Pardo, B.; Hu, N.; Meek, C.; Tzanetakis, G.: ¬A comparative evaluation of search techniques for query-by-humming using the MUSART testbed (2007) 0.02
    0.018150877 = product of:
      0.07260351 = sum of:
        0.07260351 = weight(_text_:human in 1269) [ClassicSimilarity], result of:
          0.07260351 = score(doc=1269,freq=2.0), product of:
            0.30094394 = queryWeight, product of:
              4.3671384 = idf(docFreq=1531, maxDocs=44421)
              0.068911016 = queryNorm
            0.2412526 = fieldWeight in 1269, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.3671384 = idf(docFreq=1531, maxDocs=44421)
              0.0390625 = fieldNorm(doc=1269)
      0.25 = coord(1/4)
    
    Abstract
    Query-by-humming systems offer content-based searching for melodies and require no special musical training or knowledge. Many such systems have been built, but there has not been much useful evaluation and comparison in the literature due to the lack of shared databases and queries. The MUSART project testbed allows various search algorithms to be compared using a shared framework that automatically runs experiments and summarizes results. Using this testbed, the authors compared algorithms based on string alignment, melodic contour matching, a hidden Markov model, n-grams, and CubyHum. Retrieval performance is very sensitive to distance functions and the representation of pitch and rhythm, which raises questions about some previously published conclusions. Some algorithms are particularly sensitive to the quality of queries. Our queries, which are taken from human subjects in a realistic setting, are quite difficult, especially for n-gram models. Finally, simulations on query-by-humming performance as a function of database size indicate that retrieval performance falls only slowly as the database size increases.
  14. Silva, R.M.; Gonçalves, M.A.; Veloso, A.: ¬A Two-stage active learning method for learning to rank (2014) 0.02
    0.018150877 = product of:
      0.07260351 = sum of:
        0.07260351 = weight(_text_:human in 2184) [ClassicSimilarity], result of:
          0.07260351 = score(doc=2184,freq=2.0), product of:
            0.30094394 = queryWeight, product of:
              4.3671384 = idf(docFreq=1531, maxDocs=44421)
              0.068911016 = queryNorm
            0.2412526 = fieldWeight in 2184, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.3671384 = idf(docFreq=1531, maxDocs=44421)
              0.0390625 = fieldNorm(doc=2184)
      0.25 = coord(1/4)
    
    Abstract
    Learning to rank (L2R) algorithms use a labeled training set to generate a ranking model that can later be used to rank new query results. These training sets are costly and laborious to produce, requiring human annotators to assess the relevance or order of the documents in relation to a query. Active learning algorithms are able to reduce the labeling effort by selectively sampling an unlabeled set and choosing data instances that maximize a learning function's effectiveness. In this article, we propose a novel two-stage active learning method for L2R that combines and exploits interesting properties of its constituent parts, thus being effective and practical. In the first stage, an association rule active sampling algorithm is used to select a very small but effective initial training set. In the second stage, a query-by-committee strategy trained with the first-stage set is used to iteratively select more examples until a preset labeling budget is met or a target effectiveness is achieved. We test our method with various LETOR benchmarking data sets and compare it with several baselines to show that it achieves good results using only a small portion of the original training sets.
  15. Purpura, A.; Silvello, G.; Susto, G.A.: Learning to rank from relevance judgments distributions (2022) 0.02
    0.018150877 = product of:
      0.07260351 = sum of:
        0.07260351 = weight(_text_:human in 1646) [ClassicSimilarity], result of:
          0.07260351 = score(doc=1646,freq=2.0), product of:
            0.30094394 = queryWeight, product of:
              4.3671384 = idf(docFreq=1531, maxDocs=44421)
              0.068911016 = queryNorm
            0.2412526 = fieldWeight in 1646, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.3671384 = idf(docFreq=1531, maxDocs=44421)
              0.0390625 = fieldNorm(doc=1646)
      0.25 = coord(1/4)
    
    Abstract
    LEarning TO Rank (LETOR) algorithms are usually trained on annotated corpora where a single relevance label is assigned to each available document-topic pair. Within the Cranfield framework, relevance labels result from merging either multiple expertly curated or crowdsourced human assessments. In this paper, we explore how to train LETOR models with relevance judgments distributions (either real or synthetically generated) assigned to document-topic pairs instead of single-valued relevance labels. We propose five new probabilistic loss functions to deal with the higher expressive power provided by relevance judgments distributions and show how they can be applied both to neural and gradient boosting machine (GBM) architectures. Moreover, we show how training a LETOR model on a sampled version of the relevance judgments from certain probability distributions can improve its performance when relying either on traditional or probabilistic loss functions. Finally, we validate our hypothesis on real-world crowdsourced relevance judgments distributions. Overall, we observe that relying on relevance judgments distributions to train different LETOR models can boost their performance and even outperform strong baselines such as LambdaMART on several test collections.
  16. Ghali, M.-K.; Farrag, A.; Won, D.; Jin, Y.: Enhancing knowledge retrieval with in-context learning and semantic search through Generative AI (2024) 0.02
    0.018150877 = product of:
      0.07260351 = sum of:
        0.07260351 = weight(_text_:human in 2367) [ClassicSimilarity], result of:
          0.07260351 = score(doc=2367,freq=2.0), product of:
            0.30094394 = queryWeight, product of:
              4.3671384 = idf(docFreq=1531, maxDocs=44421)
              0.068911016 = queryNorm
            0.2412526 = fieldWeight in 2367, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.3671384 = idf(docFreq=1531, maxDocs=44421)
              0.0390625 = fieldNorm(doc=2367)
      0.25 = coord(1/4)
    
    Abstract
    Retrieving and extracting knowledge from extensive research documents and large databases presents significant challenges for researchers, students, and professionals in today's information-rich era. Existing retrieval systems, which rely on general-purpose Large Language Models (LLMs), often fail to provide accurate responses to domain-specific inquiries. Additionally, the high cost of pretraining or fine-tuning LLMs for specific domains limits their widespread adoption. To address these limitations, we propose a novel methodology that combines the generative capabilities of LLMs with the fast and accurate retrieval capabilities of vector databases. This advanced retrieval system can efficiently handle both tabular and non-tabular data, understand natural language user queries, and retrieve relevant information without fine-tuning. The developed model, Generative Text Retrieval (GTR), is adaptable to both unstructured and structured data with minor refinement. GTR was evaluated on both manually annotated and public datasets, achieving over 90% accuracy and delivering truthful outputs in 87% of cases. Our model achieved state-of-the-art performance with a Rouge-L F1 score of 0.98 on the MSMARCO dataset. The refined model, Generative Tabular Text Retrieval (GTR-T), demonstrated its efficiency in large database querying, achieving an Execution Accuracy (EX) of 0.82 and an Exact-Set-Match (EM) accuracy of 0.60 on the Spider dataset, using an open-source LLM. These efforts leverage Generative AI and In-Context Learning to enhance human-text interaction and make advanced AI capabilities more accessible. By integrating robust retrieval systems with powerful LLMs, our approach aims to democratize access to sophisticated AI tools, improving the efficiency, accuracy, and scalability of AI-driven information retrieval and database querying.
  17. Austin, D.: How Google finds your needle in the Web's haystack : as we'll see, the trick is to ask the web itself to rank the importance of pages... (2006) 0.02
    0.017968452 = product of:
      0.07187381 = sum of:
        0.07187381 = weight(_text_:human in 218) [ClassicSimilarity], result of:
          0.07187381 = score(doc=218,freq=4.0), product of:
            0.30094394 = queryWeight, product of:
              4.3671384 = idf(docFreq=1531, maxDocs=44421)
              0.068911016 = queryNorm
            0.23882788 = fieldWeight in 218, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.3671384 = idf(docFreq=1531, maxDocs=44421)
              0.02734375 = fieldNorm(doc=218)
      0.25 = coord(1/4)
    
    Abstract
    Imagine a library containing 25 billion documents but with no centralized organization and no librarians. In addition, anyone may add a document at any time without telling anyone. You may feel sure that one of the documents contained in the collection has a piece of information that is vitally important to you, and, being impatient like most of us, you'd like to find it in a matter of seconds. How would you go about doing it? Posed in this way, the problem seems impossible. Yet this description is not too different from the World Wide Web, a huge, highly-disorganized collection of documents in many different formats. Of course, we're all familiar with search engines (perhaps you found this article using one) so we know that there is a solution. This article will describe Google's PageRank algorithm and how it returns pages from the web's collection of 25 billion documents that match search criteria so well that "google" has become a widely used verb. Most search engines, including Google, continually run an army of computer programs that retrieve pages from the web, index the words in each document, and store this information in an efficient format. Each time a user asks for a web search using a search phrase, such as "search engine," the search engine determines all the pages on the web that contains the words in the search phrase. (Perhaps additional information such as the distance between the words "search" and "engine" will be noted as well.) Here is the problem: Google now claims to index 25 billion pages. Roughly 95% of the text in web pages is composed from a mere 10,000 words. This means that, for most searches, there will be a huge number of pages containing the words in the search phrase. What is needed is a means of ranking the importance of the pages that fit the search criteria so that the pages can be sorted with the most important pages at the top of the list. One way to determine the importance of pages is to use a human-generated ranking. For instance, you may have seen pages that consist mainly of a large number of links to other resources in a particular area of interest. Assuming the person maintaining this page is reliable, the pages referenced are likely to be useful. Of course, the list may quickly fall out of date, and the person maintaining the list may miss some important pages, either unintentionally or as a result of an unstated bias. Google's PageRank algorithm assesses the importance of web pages without human evaluation of the content. In fact, Google feels that the value of its service is largely in its ability to provide unbiased results to search queries; Google claims, "the heart of our software is PageRank." As we'll see, the trick is to ask the web itself to rank the importance of pages.
  18. Nagelschmidt, M.: Verfahren zur Anfragemodifikation im Information Retrieval (2008) 0.02
    0.017765133 = product of:
      0.07106053 = sum of:
        0.07106053 = weight(_text_:und in 3774) [ClassicSimilarity], result of:
          0.07106053 = score(doc=3774,freq=20.0), product of:
            0.15283768 = queryWeight, product of:
              2.217899 = idf(docFreq=13141, maxDocs=44421)
              0.068911016 = queryNorm
            0.4649412 = fieldWeight in 3774, product of:
              4.472136 = tf(freq=20.0), with freq of:
                20.0 = termFreq=20.0
              2.217899 = idf(docFreq=13141, maxDocs=44421)
              0.046875 = fieldNorm(doc=3774)
      0.25 = coord(1/4)
    
    Abstract
    Für das Modifizieren von Suchanfragen kennt das Information Retrieval vielfältige Möglichkeiten. Nach einer einleitenden Darstellung der Wechselwirkung zwischen Informationsbedarf und Suchanfrage wird eine konzeptuelle und typologische Annäherung an Verfahren zur Anfragemodifikation gegeben. Im Anschluss an eine kurze Charakterisierung des Fakten- und des Information Retrieval, sowie des Vektorraum- und des probabilistischen Modells, werden intellektuelle, automatische und interaktive Modifikationsverfahren vorgestellt. Neben klassischen intellektuellen Verfahren, wie der Blockstrategie und der "Citation Pearl Growing"-Strategie, umfasst die Darstellung der automatischen und interaktiven Verfahren Modifikationsmöglichkeiten auf den Ebenen der Morphologie, der Syntax und der Semantik von Suchtermen. Darüber hinaus werden das Relevance Feedback, der Nutzen informetrischer Analysen und die Idee eines assoziativen Retrievals auf der Basis von Clustering- und terminologischen Techniken, sowie zitationsanalytischen Verfahren verfolgt. Ein Eindruck für die praktischen Gestaltungsmöglichkeiten der behandelten Verfahren soll abschließend durch fünf Anwendungsbeispiele vermittelt werden.
  19. Fuhr, N.: Zur Überwindung der Diskrepanz zwischen Retrievalforschung und -praxis (1990) 0.02
    0.016749129 = product of:
      0.066996515 = sum of:
        0.066996515 = weight(_text_:und in 6624) [ClassicSimilarity], result of:
          0.066996515 = score(doc=6624,freq=10.0), product of:
            0.15283768 = queryWeight, product of:
              2.217899 = idf(docFreq=13141, maxDocs=44421)
              0.068911016 = queryNorm
            0.4383508 = fieldWeight in 6624, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              2.217899 = idf(docFreq=13141, maxDocs=44421)
              0.0625 = fieldNorm(doc=6624)
      0.25 = coord(1/4)
    
    Abstract
    In diesem Beitrag werden einige Forschungsergebnisse des Information Retrieval vorgestellt, die unmittelbar zur Verbesserung der Retrievalqualität für bereits existierende Datenbanken eingesetzt werden können: Linguistische Algorithmen zur Grund- und Stammformreduktion unterstützen die Suche nach Flexions- und Derivationsformen von Suchtermen. Rankingalgorithmen, die Frage- und Dokumentterme gewichten, führen zu signifikant besseren Retrievalergebnissen als beim Booleschen Retrieval. Durch Relevance Feedback können die Retrievalqualität weiter gesteigert und außerdem der Benutzer bei der sukzessiven Modifikation seiner Frageformulierung unterstützt werden. Es wird eine benutzerfreundliche Bedienungsoberfläche für ein System vorgestellt, das auf diesen Konzepten basiert.
  20. Tober, M.; Hennig, L.; Furch, D.: SEO Ranking-Faktoren und Rang-Korrelationen 2014 : Google Deutschland (2014) 0.02
    0.016749129 = product of:
      0.066996515 = sum of:
        0.066996515 = weight(_text_:und in 2484) [ClassicSimilarity], result of:
          0.066996515 = score(doc=2484,freq=10.0), product of:
            0.15283768 = queryWeight, product of:
              2.217899 = idf(docFreq=13141, maxDocs=44421)
              0.068911016 = queryNorm
            0.4383508 = fieldWeight in 2484, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              2.217899 = idf(docFreq=13141, maxDocs=44421)
              0.0625 = fieldNorm(doc=2484)
      0.25 = coord(1/4)
    
    Abstract
    Dieses Whitepaper beschäftigt sich mit der Definition und Bewertung von Faktoren, die eine hohe Rangkorrelation-Koeffizienz mit organischen Suchergebnissen aufweisen und dient dem Zweck der tieferen Analyse von Suchmaschinen-Algorithmen. Die Datenerhebung samt Auswertung bezieht sich auf Ranking-Faktoren für Google-Deutschland im Jahr 2014. Zusätzlich wurden die Korrelationen und Faktoren unter anderem anhand von Durchschnitts- und Medianwerten sowie Entwicklungstendenzen zu den Vorjahren hinsichtlich ihrer Relevanz für vordere Suchergebnis-Positionen interpretiert.

Years

Languages

  • d 36
  • e 21
  • m 1
  • More… Less…

Types

  • a 45
  • x 7
  • m 3
  • el 2
  • r 2
  • s 1
  • More… Less…