-
Fuhr, N.: Modelle im Information Retrieval (2013)
0.01
0.013538062 = product of:
0.054152247 = sum of:
0.054152247 = weight(_text_:und in 1724) [ClassicSimilarity], result of:
0.054152247 = score(doc=1724,freq=4.0), product of:
0.15626246 = queryWeight, product of:
2.217899 = idf(docFreq=13141, maxDocs=44421)
0.07045517 = queryNorm
0.34654674 = fieldWeight in 1724, product of:
2.0 = tf(freq=4.0), with freq of:
4.0 = termFreq=4.0
2.217899 = idf(docFreq=13141, maxDocs=44421)
0.078125 = fieldNorm(doc=1724)
0.25 = coord(1/4)
- Source
- Grundlagen der praktischen Information und Dokumentation. Handbuch zur Einführung in die Informationswissenschaft und -praxis. 6., völlig neu gefaßte Ausgabe. Hrsg. von R. Kuhlen, W. Semar u. D. Strauch. Begründet von Klaus Laisiepen, Ernst Lutterbeck, Karl-Heinrich Meyer-Uhlenried
-
Ziegler, B.: ESS: ein schneller Algorithmus zur Mustersuche in Zeichenfolgen (1996)
0.01
0.013401995 = product of:
0.05360798 = sum of:
0.05360798 = weight(_text_:und in 612) [ClassicSimilarity], result of:
0.05360798 = score(doc=612,freq=2.0), product of:
0.15626246 = queryWeight, product of:
2.217899 = idf(docFreq=13141, maxDocs=44421)
0.07045517 = queryNorm
0.34306374 = fieldWeight in 612, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
2.217899 = idf(docFreq=13141, maxDocs=44421)
0.109375 = fieldNorm(doc=612)
0.25 = coord(1/4)
- Source
- Informatik: Forschung und Entwicklung. 11(1996) no.2, S.69-83
-
Hora, M.: Methoden für das Ranking in Discovery-Systemen (2018)
0.01
0.013401995 = product of:
0.05360798 = sum of:
0.05360798 = weight(_text_:und in 968) [ClassicSimilarity], result of:
0.05360798 = score(doc=968,freq=8.0), product of:
0.15626246 = queryWeight, product of:
2.217899 = idf(docFreq=13141, maxDocs=44421)
0.07045517 = queryNorm
0.34306374 = fieldWeight in 968, product of:
2.828427 = tf(freq=8.0), with freq of:
8.0 = termFreq=8.0
2.217899 = idf(docFreq=13141, maxDocs=44421)
0.0546875 = fieldNorm(doc=968)
0.25 = coord(1/4)
- Abstract
- Discovery-Systeme bieten meist als Standardeinstellung eine Sortierung nach Relevanz an. Wie die Relevanz ermittelt wird, ist häufig intransparent. Dabei wären Kenntnisse darüber aus Nutzersicht ein wichtiger Faktor in der Informationskompetenz, während Bibliotheken sicherstellen sollten, dass das Ranking zum eigenen Bestand und Publikum passt. In diesem Aufsatz wird dargestellt, wie Discovery-Systeme Treffer auswählen und bewerten. Dazu gehören Indexierung, Prozessierung, Text-Matching und weitere Relevanzkriterien, z. B. Popularität oder Verfügbarkeit. Schließlich müssen alle betrachteten Kriterien zu einem zentralen Score zusammengefasst werden. Ein besonderer Fokus wird auf das Ranking von EBSCO Discovery Service, Primo und Summon gelegt.
-
Behnert, C.; Plassmeier, K.; Borst, T.; Lewandowski, D.: Evaluierung von Rankingverfahren für bibliothekarische Informationssysteme (2019)
0.01
0.013401995 = product of:
0.05360798 = sum of:
0.05360798 = weight(_text_:und in 23) [ClassicSimilarity], result of:
0.05360798 = score(doc=23,freq=8.0), product of:
0.15626246 = queryWeight, product of:
2.217899 = idf(docFreq=13141, maxDocs=44421)
0.07045517 = queryNorm
0.34306374 = fieldWeight in 23, product of:
2.828427 = tf(freq=8.0), with freq of:
8.0 = termFreq=8.0
2.217899 = idf(docFreq=13141, maxDocs=44421)
0.0546875 = fieldNorm(doc=23)
0.25 = coord(1/4)
- Abstract
- Dieser Beitrag beschreibt eine Studie zur Entwicklung und Evaluierung von Rankingverfahren für bibliothekarische Informationssysteme. Dazu wurden mögliche Faktoren für das Relevanzranking ausgehend von den Verfahren in Websuchmaschinen identifiziert, auf den Bibliothekskontext übertragen und systematisch evaluiert. Mithilfe eines Testsystems, das auf dem ZBW-Informationsportal EconBiz und einer web-basierten Software zur Evaluierung von Suchsystemen aufsetzt, wurden verschiedene Relevanzfaktoren (z. B. Popularität in Verbindung mit Aktualität) getestet. Obwohl die getesteten Rankingverfahren auf einer theoretischen Ebene divers sind, konnten keine einheitlichen Verbesserungen gegenüber den Baseline-Rankings gemessen werden. Die Ergebnisse deuten darauf hin, dass eine Adaptierung des Rankings auf individuelle Nutzer bzw. Nutzungskontexte notwendig sein könnte, um eine höhere Performance zu erzielen.
- Source
- Information - Wissenschaft und Praxis. 70(2019) H.1, S.14-23
-
Wilhelmy, A.: Phonetische Ähnlichkeitssuche in Datenbanken (1991)
0.01
0.012843332 = product of:
0.05137333 = sum of:
0.05137333 = weight(_text_:und in 6684) [ClassicSimilarity], result of:
0.05137333 = score(doc=6684,freq=10.0), product of:
0.15626246 = queryWeight, product of:
2.217899 = idf(docFreq=13141, maxDocs=44421)
0.07045517 = queryNorm
0.3287631 = fieldWeight in 6684, product of:
3.1622777 = tf(freq=10.0), with freq of:
10.0 = termFreq=10.0
2.217899 = idf(docFreq=13141, maxDocs=44421)
0.046875 = fieldNorm(doc=6684)
0.25 = coord(1/4)
- Abstract
- In dialoggesteuerten Systemen zur Informationswiedergewinnung (Information Retrieval Systems, IRS) kann man - vergröbernd - das Wechselspiel zwischen Mensch und Computer als iterativen Prozess zur Erhöhung von Genauigkeit (Precision) auf der einen und Vollständigkeit (Recall) der Nachweise auf der anderen Seite verstehen. Vorgestellt wird ein maschinell anwendbares Verfahren, das auf phonologische Untersuchungen des Sprachwissenschaftlers Nikolaj S. Trubetzkoy (1890-1938) zurückgeht. In den Grundzügen kann es erheblich zur Verbesserung der Nachweisvollständigkeit beitragen. Dadurch, daß es die 'Ähnlichkeitsumgebungen' von Suchbegriffen in die Recherche mit einbezieht, zeigt es sich vor allem für Systeme mit koordinativer maschineller Indexierung als vorteilhaft. Bei alphabetischen Begriffen erweist sich die Einführung eines solchen zunächst nur auf den Benutzer hin orientierten Verfahrens auch aus technischer Sicht als günstig, da damit die Anzahl der Zugriffe bei den Suchvorgängen auch für große Datenvolumina niedrig gehalten werden kann
- Source
- Bibliotheken mit und ohne Grenzen: Informationsgesellschaft und Bibliothek. Der österreichische Bibliothekartag 1990, Bregenz, 4.-8.9.1990, Vorträge und Kommissionssitzungen
-
Mutschke, P.: Autorennetzwerke : Verfahren zur Netzwerkanalyse als Mehrwertdienste für Informationssysteme (2004)
0.01
0.012843332 = product of:
0.05137333 = sum of:
0.05137333 = weight(_text_:und in 5050) [ClassicSimilarity], result of:
0.05137333 = score(doc=5050,freq=10.0), product of:
0.15626246 = queryWeight, product of:
2.217899 = idf(docFreq=13141, maxDocs=44421)
0.07045517 = queryNorm
0.3287631 = fieldWeight in 5050, product of:
3.1622777 = tf(freq=10.0), with freq of:
10.0 = termFreq=10.0
2.217899 = idf(docFreq=13141, maxDocs=44421)
0.046875 = fieldNorm(doc=5050)
0.25 = coord(1/4)
- Abstract
- Virtuelle Bibliotheken enthalten eine Fülle an Informationen, die in ihrer Vielfalt und Tiefe von Standardsuchmaschinen nicht erschöpfend erfasst wird. Der Arbeitsbericht informiert über Entwicklungen am IZ, die darauf abzielen, Wissen über das Interaktionsgeschehen in wissenschaftlichen Communities und den sozialen Status ihrer Akteure für das Retrieval auszunutzen. Grundlage hierfür sind soziale Netzwerke, die sich durch Kooperation der wissenschaftlichen Akteure konstituieren und in den Dokumenten der Datenbasis z.B. als Koautorbeziehungen repräsentiert sind (Autorennetzwerke). Die in dem Bericht beschriebenen Studien zur Small-World-Topologie von Autorennetzwerken zeigen, dass diese Netzwerke ein erhebliches Potential für Informationssysteme haben. Der Bericht diskutiert Szenarios, die beschreiben, wie Autorennetzwerke und hier insbesondere das Konzept der Akteurszentralität für die Informationssuche in Datenbanken sinnvoll genutzt werden können. Kernansatz dieser Retrievalmodelle ist die Suche nach Experten und das Ranking von Dokumenten auf der Basis der Zentralität von Autoren in Autorennetzwerken.
-
Weiß, B.: Verwandte Seiten finden : "Ähnliche Seiten" oder "What's Related" (2005)
0.01
0.012663696 = product of:
0.050654784 = sum of:
0.050654784 = weight(_text_:und in 993) [ClassicSimilarity], result of:
0.050654784 = score(doc=993,freq=14.0), product of:
0.15626246 = queryWeight, product of:
2.217899 = idf(docFreq=13141, maxDocs=44421)
0.07045517 = queryNorm
0.32416478 = fieldWeight in 993, product of:
3.7416575 = tf(freq=14.0), with freq of:
14.0 = termFreq=14.0
2.217899 = idf(docFreq=13141, maxDocs=44421)
0.0390625 = fieldNorm(doc=993)
0.25 = coord(1/4)
- Abstract
- Die Link-Struktur-Analyse (LSA) ist nicht nur beim Crawling, dem Webseitenranking, der Abgrenzung geographischer Bereiche, der Vorhersage von Linkverwendungen, dem Auffinden von "Mirror"-Seiten, dem Kategorisieren von Webseiten und beim Generieren von Webseitenstatistiken eines der wichtigsten Analyseverfahren, sondern auch bei der Suche nach verwandten Seiten. Um qualitativ hochwertige verwandte Seiten zu finden, bildet sie nach herrschender Meinung den Hauptbestandteil bei der Identifizierung von ähnlichen Seiten innerhalb themenspezifischer Graphen vernetzter Dokumente. Dabei wird stets von zwei Annahmen ausgegangen: Links zwischen zwei Dokumenten implizieren einen verwandten Inhalt beider Dokumente und wenn die Dokumente aus unterschiedlichen Quellen (von unterschiedlichen Autoren, Hosts, Domänen, .) stammen, so bedeutet dies das eine Quelle die andere über einen Link empfiehlt. Aufbauend auf dieser Idee entwickelte Kleinberg 1998 den HITS Algorithmus um verwandte Seiten über die Link-Struktur-Analyse zu bestimmen. Dieser Ansatz wurde von Bharat und Henzinger weiterentwickelt und später auch in Algorithmen wie dem Companion und Cocitation Algorithmus zur Suche von verwandten Seiten basierend auf nur einer Anfrage-URL weiter verfolgt. In der vorliegenden Seminararbeit sollen dabei die Algorithmen, die hinter diesen Überlegungen stehen, näher erläutert werden und im Anschluss jeweils neuere Forschungsansätze auf diesem Themengebiet aufgezeigt werden.
- Content
- Ausarbeitung im Rahmen des Seminars Suchmaschinen und Suchalgorithmen, Institut für Wirtschaftsinformatik Praktische Informatik in der Wirtschaft, Westfälische Wilhelms-Universität Münster. - Vgl.: http://www-wi.uni-muenster.de/pi/lehre/ss05/seminarSuchen/Ausarbeitungen/BurkhardWei%DF.pdf
-
Evans, R.: Beyond Boolean : relevance ranking, natural language and the new search paradigm (1994)
0.01
0.011609196 = product of:
0.046436783 = sum of:
0.046436783 = weight(_text_:have in 192) [ClassicSimilarity], result of:
0.046436783 = score(doc=192,freq=2.0), product of:
0.22215667 = queryWeight, product of:
3.1531634 = idf(docFreq=5157, maxDocs=44421)
0.07045517 = queryNorm
0.20902719 = fieldWeight in 192, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
3.1531634 = idf(docFreq=5157, maxDocs=44421)
0.046875 = fieldNorm(doc=192)
0.25 = coord(1/4)
- Abstract
- New full-text search engines that employ relevance ranking have become available online services. These software tools provide increased ease of use by making natural language queries possible, and deliver superior recall. Even inexperienced end users can execute searchers with good results. For experienced database searchers, the ranked search engines offer a technology that is complementary to structured Boolean strategy, not necessarily a replacement. Even traditional Boolean queries become useful when the results are ranked by probable relevance, such ranking can free users from overwhelming output. Relevance ranking also permits the use of statistical inference methods to find related terms. using such tools to their best advantage requires rethinking some basic techniques, such as progressively narrowing queries until the retrieved set is small enough. users should broaden their search to maximize recall, then browse retrieved documents or pare the set down from the top
-
Joss, M.W.; Wszola, S.: ¬The engines that can : text search and retrieval software, their strategies, and vendors (1996)
0.01
0.011609196 = product of:
0.046436783 = sum of:
0.046436783 = weight(_text_:have in 5191) [ClassicSimilarity], result of:
0.046436783 = score(doc=5191,freq=2.0), product of:
0.22215667 = queryWeight, product of:
3.1531634 = idf(docFreq=5157, maxDocs=44421)
0.07045517 = queryNorm
0.20902719 = fieldWeight in 5191, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
3.1531634 = idf(docFreq=5157, maxDocs=44421)
0.046875 = fieldNorm(doc=5191)
0.25 = coord(1/4)
- Abstract
- Traces the development of text searching and retrieval software designed to cope with the increasing demands made by the storage and handling of large amounts of data, recorded on high data storage media, from CD-ROM to multi gigabyte storage media and online information services, with particular reference to the need to cope with graphics as well as conventional ASCII text. Includes details of: Boolean searching, fuzzy searching and matching; relevance ranking; proximity searching and improved strategies for dealing with text searching in very large databases. Concludes that the best searching tools for CD-ROM publishers are those optimized for searching and retrieval on CD-ROM. CD-ROM drives have relatively lower random seek times than hard discs and so the software most appropriate to the medium is that which can effectively arrange the indexes and text on the CD-ROM to avoid continuous random access searching. Lists and reviews a selection of software packages designed to achieve the sort of results required for rapid CD-ROM searching
-
Fox, E.; Betrabet, S.; Koushik, M.; Lee, W.: Extended Boolean models (1992)
0.01
0.011609196 = product of:
0.046436783 = sum of:
0.046436783 = weight(_text_:have in 4512) [ClassicSimilarity], result of:
0.046436783 = score(doc=4512,freq=2.0), product of:
0.22215667 = queryWeight, product of:
3.1531634 = idf(docFreq=5157, maxDocs=44421)
0.07045517 = queryNorm
0.20902719 = fieldWeight in 4512, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
3.1531634 = idf(docFreq=5157, maxDocs=44421)
0.046875 = fieldNorm(doc=4512)
0.25 = coord(1/4)
- Abstract
- The classical interpretation of Boolean operators in an information retrieval system is in general too strict. A standard Boolean query rarely comes close to retrieving all and only those documents which are relevant to a query. Many models have been proposed with the aim of softening the interpretation of the Boolean operators in order to improve the precision and recall of the search results. This chapter discusses 3 such models: the Mixed Min and Max (MMM), the Paice, and the P-noem models. The MMM and Paice models are essentially variations of the classical fuzzy-set model, while the P-norm scheme is a distance-based approach. Our experimental results indicate that each of the above models provide better performance than the classical Boolean model in terms of retrieval effectiveness
-
Baeza-Yates, R.; Navarro, G.: Block addressing indices for approximate text retrieval (2000)
0.01
0.011609196 = product of:
0.046436783 = sum of:
0.046436783 = weight(_text_:have in 5295) [ClassicSimilarity], result of:
0.046436783 = score(doc=5295,freq=2.0), product of:
0.22215667 = queryWeight, product of:
3.1531634 = idf(docFreq=5157, maxDocs=44421)
0.07045517 = queryNorm
0.20902719 = fieldWeight in 5295, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
3.1531634 = idf(docFreq=5157, maxDocs=44421)
0.046875 = fieldNorm(doc=5295)
0.25 = coord(1/4)
- Abstract
- The issue of reducing the space overhead when indexing large text databases is becoming more and more important, as the text collection grow in size. Another subject, which is gaining importance as text databases grow and get more heterogeneous and error prone, is that of flexible string matching. One of the best tools to make the search more flexible is to allow a limited number of differences between the words found and those sought. This is called 'approximate text searching'. which is becoming more and more popular. In recent years some indexing schemes with very low space overhead have appeared, some of them dealing with approximate searching. These low overhead indices (whose most notorious exponent is Glimpse) are modified inverted files, where space is saved by making the lists of occurences point to text blocks instead of exact word positions. Despite their existence, little is known about the expected behaviour of these 'block addressing' indices, and even less is known when it comes to cope with approximate search. Our main contribution is an analytical study of the space-time trade-offs for indexed text searching
-
Zhu, B.; Chen, H.: Validating a geographical image retrieval system (2000)
0.01
0.011609196 = product of:
0.046436783 = sum of:
0.046436783 = weight(_text_:have in 5769) [ClassicSimilarity], result of:
0.046436783 = score(doc=5769,freq=2.0), product of:
0.22215667 = queryWeight, product of:
3.1531634 = idf(docFreq=5157, maxDocs=44421)
0.07045517 = queryNorm
0.20902719 = fieldWeight in 5769, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
3.1531634 = idf(docFreq=5157, maxDocs=44421)
0.046875 = fieldNorm(doc=5769)
0.25 = coord(1/4)
- Abstract
- This paper summarizes a prototype geographical image retrieval system that demonstrates how to integrate image processing and information analysis techniques to support large-scale content-based image retrieval. By using an image as its interface, the prototype system addresses a troublesome aspect of traditional retrieval models, which require users to have complete knowledge of the low-level features of an image. In addition we describe an experiment to validate against that of human subjects in an effort to address the scarcity of research evaluating performance of an algorithm against that of human beings. The results of the experiment indicate that the system could do as well as human subjects in accomplishing the tasks of similarity analysis and image categorization. We also found that under some circumstances texture features of an image are insufficient to represent an geographic image. We believe, however, that our image retrieval system provides a promising approach to integrating image processing techniques and information retrieval algorithms
-
French, J.C.; Powell, A.L.; Schulman, E.: Using clustering strategies for creating authority files (2000)
0.01
0.011609196 = product of:
0.046436783 = sum of:
0.046436783 = weight(_text_:have in 5811) [ClassicSimilarity], result of:
0.046436783 = score(doc=5811,freq=2.0), product of:
0.22215667 = queryWeight, product of:
3.1531634 = idf(docFreq=5157, maxDocs=44421)
0.07045517 = queryNorm
0.20902719 = fieldWeight in 5811, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
3.1531634 = idf(docFreq=5157, maxDocs=44421)
0.046875 = fieldNorm(doc=5811)
0.25 = coord(1/4)
- Abstract
- As more online databases are integrated into digital libraries, the issue of quality control of the data becomes increasingly important, especially as it relates to the effective retrieval of information. Authority work, the need to discover and reconcile variant forms of strings in bibliographical entries, will become more critical in the future. Spelling variants, misspellings, and transliteration differences will all increase the difficulty of retrieving information. We investigate a number of approximate string matching techniques that have traditionally been used to help with this problem. We then introduce the notion of approximate word matching and show how it can be used to improve detection and categorization of variant forms. We demonstrate the utility of these approaches using data from the Astrophysics Data System and show how we can reduce the human effort involved in the creation of authority files
-
Aigrain, P.; Longueville, V.: ¬A model for the evaluation of expansion techniques in information retrieval systems (1994)
0.01
0.011609196 = product of:
0.046436783 = sum of:
0.046436783 = weight(_text_:have in 6331) [ClassicSimilarity], result of:
0.046436783 = score(doc=6331,freq=2.0), product of:
0.22215667 = queryWeight, product of:
3.1531634 = idf(docFreq=5157, maxDocs=44421)
0.07045517 = queryNorm
0.20902719 = fieldWeight in 6331, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
3.1531634 = idf(docFreq=5157, maxDocs=44421)
0.046875 = fieldNorm(doc=6331)
0.25 = coord(1/4)
- Abstract
- We describe an evaluation model for expansion systems in information retrieval, that is, systems expanding a user selection of documents in order to provide the user with a larger set of documents sharing the same or related chracteristics. Our model leads to a test protocal and practical estimates of the efficieny of an expansion system provided that it is possible for a sample of users to exhaustively scan the content of a subset of the database in order to decide which documents would have been selected by an 'ideal' expansion system. This condition is met only by databases whose unit contents can be quickly apprehended, such as still image databases or synthetic bibliographical references. We compare our model with other types of possible indicators, and discuss the precision to which our measure can be estimated, using data from experimentation with an image database system developed by our research team
-
Ponte, J.M.: Language models for relevance feedback (2000)
0.01
0.011609196 = product of:
0.046436783 = sum of:
0.046436783 = weight(_text_:have in 1035) [ClassicSimilarity], result of:
0.046436783 = score(doc=1035,freq=2.0), product of:
0.22215667 = queryWeight, product of:
3.1531634 = idf(docFreq=5157, maxDocs=44421)
0.07045517 = queryNorm
0.20902719 = fieldWeight in 1035, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
3.1531634 = idf(docFreq=5157, maxDocs=44421)
0.046875 = fieldNorm(doc=1035)
0.25 = coord(1/4)
- Abstract
- The language modeling approach to Information Retrieval (IR) is a conceptually simple model of IR originally developed by Ponte and Croft (1998). In this approach, the query is treated as a random event and documents are ranked according to the likelihood that the query would be generated via a language model estimated for each document. The intuition behind this approach is that users have a prototypical document in mind and will choose query terms accordingly. The intuitive appeal of this method is that inferences about the semantic content of documents do not need to be made resulting in a conceptually simple model. In this paper, techniques for relevance feedback and routing are derived from the language modeling approach in a straightforward manner and their effectiveness is demonstrated empirically. These experiments demonstrate further proof of concept for the language modeling approach to retrieval
-
Nie, J.-Y.: Query expansion and query translation as logical inference (2003)
0.01
0.011609196 = product of:
0.046436783 = sum of:
0.046436783 = weight(_text_:have in 2425) [ClassicSimilarity], result of:
0.046436783 = score(doc=2425,freq=2.0), product of:
0.22215667 = queryWeight, product of:
3.1531634 = idf(docFreq=5157, maxDocs=44421)
0.07045517 = queryNorm
0.20902719 = fieldWeight in 2425, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
3.1531634 = idf(docFreq=5157, maxDocs=44421)
0.046875 = fieldNorm(doc=2425)
0.25 = coord(1/4)
- Abstract
- A number of studies have examined the problems of query expansion in monolingual Information Retrieval (IR), and query translation for crosslanguage IR. However, no link has been made between them. This article first shows that query translation is a special case of query expansion. There is also another set of studies an inferential IR. Again, there is no relationship established with query translation or query expansion. The second claim of this article is that logical inference is a general form that covers query expansion and query translation. This analysis provides a unified view of different subareas of IR. We further develop the inferential IR approach in two particular contexts: using fuzzy logic and probability theory. The evaluation formulas obtained are shown to strongly correspond to those used in other IR models. This indicates that inference is indeed the core of advanced IR.
-
Bodoff, D.; Robertson, S.: ¬A new unified probabilistic model (2004)
0.01
0.011609196 = product of:
0.046436783 = sum of:
0.046436783 = weight(_text_:have in 3129) [ClassicSimilarity], result of:
0.046436783 = score(doc=3129,freq=2.0), product of:
0.22215667 = queryWeight, product of:
3.1531634 = idf(docFreq=5157, maxDocs=44421)
0.07045517 = queryNorm
0.20902719 = fieldWeight in 3129, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
3.1531634 = idf(docFreq=5157, maxDocs=44421)
0.046875 = fieldNorm(doc=3129)
0.25 = coord(1/4)
- Abstract
- This paper proposes a new unified probabilistic model. Two previous models, Robertson et al.'s "Model 0" and "Model 3," each have strengths and weaknesses. The strength of Model 0 not found in Model 3, is that it does not require relevance data about the particular document or query, and, related to that, its probability estimates are straightforward. The strength of Model 3 not found in Model 0 is that it can utilize feedback information about the particular document and query in question. In this paper we introduce a new unified probabilistic model that combines these strengths: the expression of its probabilities is straightforward, it does not require that data must be available for the particular document or query in question, but it can utilize such specific data if it is available. The model is one way to resolve the difficulty of combining two marginal views in probabilistic retrieval.
-
Fan, W.; Fox, E.A.; Pathak, P.; Wu, H.: ¬The effects of fitness functions an genetic programming-based ranking discovery for Web search (2004)
0.01
0.011609196 = product of:
0.046436783 = sum of:
0.046436783 = weight(_text_:have in 3239) [ClassicSimilarity], result of:
0.046436783 = score(doc=3239,freq=2.0), product of:
0.22215667 = queryWeight, product of:
3.1531634 = idf(docFreq=5157, maxDocs=44421)
0.07045517 = queryNorm
0.20902719 = fieldWeight in 3239, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
3.1531634 = idf(docFreq=5157, maxDocs=44421)
0.046875 = fieldNorm(doc=3239)
0.25 = coord(1/4)
- Abstract
- Genetic-based evolutionary learning algorithms, such as genetic algorithms (GAs) and genetic programming (GP), have been applied to information retrieval (IR) since the 1980s. Recently, GP has been applied to a new IR taskdiscovery of ranking functions for Web search-and has achieved very promising results. However, in our prior research, only one fitness function has been used for GP-based learning. It is unclear how other fitness functions may affect ranking function discovery for Web search, especially since it is weIl known that choosing a proper fitness function is very important for the effectiveness and efficiency of evolutionary algorithms. In this article, we report our experience in contrasting different fitness function designs an GP-based learning using a very large Web corpus. Our results indicate that the design of fitness functions is instrumental in performance improvement. We also give recommendations an the design of fitness functions for genetic-based information retrieval experiments.
-
Dominich, S.; Skrop, A.: PageRank and interaction information retrieval (2005)
0.01
0.011609196 = product of:
0.046436783 = sum of:
0.046436783 = weight(_text_:have in 4268) [ClassicSimilarity], result of:
0.046436783 = score(doc=4268,freq=2.0), product of:
0.22215667 = queryWeight, product of:
3.1531634 = idf(docFreq=5157, maxDocs=44421)
0.07045517 = queryNorm
0.20902719 = fieldWeight in 4268, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
3.1531634 = idf(docFreq=5157, maxDocs=44421)
0.046875 = fieldNorm(doc=4268)
0.25 = coord(1/4)
- Abstract
- The PageRank method is used by the Google Web search engine to compute the importance of Web pages. Two different views have been developed for the Interpretation of the PageRank method and values: (a) stochastic (random surfer): the PageRank values can be conceived as the steady-state distribution of a Markov chain, and (b) algebraic: the PageRank values form the eigenvector corresponding to eigenvalue 1 of the Web link matrix. The Interaction Information Retrieval (1**2 R) method is a nonclassical information retrieval paradigm, which represents a connectionist approach based an dynamic systems. In the present paper, a different Interpretation of PageRank is proposed, namely, a dynamic systems viewpoint, by showing that the PageRank method can be formally interpreted as a particular case of the Interaction Information Retrieval method; and thus, the PageRank values may be interpreted as neutral equilibrium points of the Web.
-
Thelwall, M.: Can Google's PageRank be used to find the most important academic Web pages? (2003)
0.01
0.011609196 = product of:
0.046436783 = sum of:
0.046436783 = weight(_text_:have in 5457) [ClassicSimilarity], result of:
0.046436783 = score(doc=5457,freq=2.0), product of:
0.22215667 = queryWeight, product of:
3.1531634 = idf(docFreq=5157, maxDocs=44421)
0.07045517 = queryNorm
0.20902719 = fieldWeight in 5457, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
3.1531634 = idf(docFreq=5157, maxDocs=44421)
0.046875 = fieldNorm(doc=5457)
0.25 = coord(1/4)
- Abstract
- Google's PageRank is an influential algorithm that uses a model of Web use that is dominated by its link structure in order to rank pages by their estimated value to the Web community. This paper reports on the outcome of applying the algorithm to the Web sites of three national university systems in order to test whether it is capable of identifying the most important Web pages. The results are also compared with simple inlink counts. It was discovered that the highest inlinked pages do not always have the highest PageRank, indicating that the two metrics are genuinely different, even for the top pages. More significantly, however, internal links dominated external links for the high ranks in either method and superficial reasons accounted for high scores in both cases. It is concluded that PageRank is not useful for identifying the top pages in a site and that it must be combined with a powerful text matching techniques in order to get the quality of information retrieval results provided by Google.