-
Houston, R.D.; Harmon, G.: Vannevar Bush and Memex (2007)
0.23
0.2292732 = product of:
0.9170928 = sum of:
0.9170928 = weight(_text_:harmon in 5197) [ClassicSimilarity], result of:
0.9170928 = score(doc=5197,freq=2.0), product of:
0.55196565 = queryWeight, product of:
9.398883 = idf(docFreq=9, maxDocs=44421)
0.058726728 = queryNorm
1.6615034 = fieldWeight in 5197, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
9.398883 = idf(docFreq=9, maxDocs=44421)
0.125 = fieldNorm(doc=5197)
0.25 = coord(1/4)
-
Houston, R.D.; Harmon, E.G.: Re-envisioning the information concept : systematic definitions (2002)
0.11
0.1146366 = product of:
0.4585464 = sum of:
0.4585464 = weight(_text_:harmon in 1136) [ClassicSimilarity], result of:
0.4585464 = score(doc=1136,freq=2.0), product of:
0.55196565 = queryWeight, product of:
9.398883 = idf(docFreq=9, maxDocs=44421)
0.058726728 = queryNorm
0.8307517 = fieldWeight in 1136, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
9.398883 = idf(docFreq=9, maxDocs=44421)
0.0625 = fieldNorm(doc=1136)
0.25 = coord(1/4)
-
Harmon, J.C.; Burk, B.L.: Better service through flexible rules : cataloging a collection of annual reports in a most un-CONSER-like manner (2000)
0.10
0.100307025 = product of:
0.4012281 = sum of:
0.4012281 = weight(_text_:harmon in 398) [ClassicSimilarity], result of:
0.4012281 = score(doc=398,freq=2.0), product of:
0.55196565 = queryWeight, product of:
9.398883 = idf(docFreq=9, maxDocs=44421)
0.058726728 = queryNorm
0.72690773 = fieldWeight in 398, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
9.398883 = idf(docFreq=9, maxDocs=44421)
0.0546875 = fieldNorm(doc=398)
0.25 = coord(1/4)
-
Harmon, G.: Remembering William Goffman : mathematical information science pioneer (2008)
0.09
0.08597746 = product of:
0.34390983 = sum of:
0.34390983 = weight(_text_:harmon in 3110) [ClassicSimilarity], result of:
0.34390983 = score(doc=3110,freq=2.0), product of:
0.55196565 = queryWeight, product of:
9.398883 = idf(docFreq=9, maxDocs=44421)
0.058726728 = queryNorm
0.6230638 = fieldWeight in 3110, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
9.398883 = idf(docFreq=9, maxDocs=44421)
0.046875 = fieldNorm(doc=3110)
0.25 = coord(1/4)
-
Chen, K.-H.: Evaluating Chinese text retrieval with multilingual queries (2002)
0.07
0.06786877 = product of:
0.27147508 = sum of:
0.27147508 = weight(_text_:judge in 2851) [ClassicSimilarity], result of:
0.27147508 = score(doc=2851,freq=2.0), product of:
0.45402667 = queryWeight, product of:
7.731176 = idf(docFreq=52, maxDocs=44421)
0.058726728 = queryNorm
0.59792763 = fieldWeight in 2851, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
7.731176 = idf(docFreq=52, maxDocs=44421)
0.0546875 = fieldNorm(doc=2851)
0.25 = coord(1/4)
- Abstract
- This paper reports the design of a Chinese test collection with multilingual queries and the application of this test collection to evaluate information retrieval Systems. The effective indexing units, IR models, translation techniques, and query expansion for Chinese text retrieval are identified. The collaboration of East Asian countries for construction of test collections for cross-language multilingual text retrieval is also discussed in this paper. As well, a tool is designed to help assessors judge relevante and gather the events of relevante judgment. The log file created by this tool will be used to analyze the behaviors of assessors in the future.
-
Seadle, M.: Project ethnography : an anthropological approach to assessing digital library services (2000)
0.07
0.06786877 = product of:
0.27147508 = sum of:
0.27147508 = weight(_text_:judge in 2162) [ClassicSimilarity], result of:
0.27147508 = score(doc=2162,freq=2.0), product of:
0.45402667 = queryWeight, product of:
7.731176 = idf(docFreq=52, maxDocs=44421)
0.058726728 = queryNorm
0.59792763 = fieldWeight in 2162, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
7.731176 = idf(docFreq=52, maxDocs=44421)
0.0546875 = fieldNorm(doc=2162)
0.25 = coord(1/4)
- Abstract
- OFTEN LIBRARIES TRY TO ASSESS DIGITAL LIBRARY SERVICE for their user populations in comprehensive terms that judge its overall success or failure. This article's key assumption is that the people involved must be understood before services can be assessed, especially if evaluators and developers intend to improve a digital library product. Its argument is simply that anthropology can provide the initial understanding, the intellectual basis, on which informed choices about sample population, survey design, or focus group selection can reasonably be made. As an example, this article analyzes the National Gallery of the Spoken Word (NGSW). It includes brief descriptions of nine NGSW micro-cultures and three pairs of dichotomies within these micro-cultures.
-
Kabel, S.; Hoog, R. de; Wielinga, B.J.; Anjewierden, A.: ¬The added value of task and ontology-based markup for information retrieval (2004)
0.06
0.058173236 = product of:
0.23269294 = sum of:
0.23269294 = weight(_text_:judge in 3210) [ClassicSimilarity], result of:
0.23269294 = score(doc=3210,freq=2.0), product of:
0.45402667 = queryWeight, product of:
7.731176 = idf(docFreq=52, maxDocs=44421)
0.058726728 = queryNorm
0.5125094 = fieldWeight in 3210, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
7.731176 = idf(docFreq=52, maxDocs=44421)
0.046875 = fieldNorm(doc=3210)
0.25 = coord(1/4)
- Abstract
- In this report, we investigate how retrieving information can be improved through task-related indexing of documents based an ontologies. Different index types, varying from content-based keywords to structured task based indexing ontologies, are compared in an experiment that simulates the task of creating instructional material from a database of source material. To be able to judge the added value of task- and ontology-related indexes, traditional information retrieval performance measures are extended with new measures reflecting the quality of the material produced with the retrieved information. The results of the experiment show that a structured task-based indexing ontology improves the quality of the product created from retrieved material only to some extent, but that it certainly improves the efficiency and effectiveness of search and retrieval and precision of use.
-
Holsapple, C.W.; Joshi, K.D.: ¬A formal knowledge management ontology : conduct, activities, resources, and influences (2004)
0.06
0.058173236 = product of:
0.23269294 = sum of:
0.23269294 = weight(_text_:judge in 3235) [ClassicSimilarity], result of:
0.23269294 = score(doc=3235,freq=2.0), product of:
0.45402667 = queryWeight, product of:
7.731176 = idf(docFreq=52, maxDocs=44421)
0.058726728 = queryNorm
0.5125094 = fieldWeight in 3235, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
7.731176 = idf(docFreq=52, maxDocs=44421)
0.046875 = fieldNorm(doc=3235)
0.25 = coord(1/4)
- Abstract
- This article describes a collaboratively engineered general-purpose knowledge management (KM) ontology that can be used by practitioners, researchers, and educators. The ontology is formally characterized in terms of nearly one hundred definitions and axioms that evolved from a Delphi-like process involving a diverse panel of over 30 KM practitioners and researchers. The ontology identifies and relates knowledge manipulation activities that an entity (e.g., an organization) can perform to operate an knowledge resources. It introduces a taxonomy for these resources, which indicates classes of knowledge that may be stored, embedded, and/or represented in an entity. It recognizes factors that influence the conduct of KM both within and across KM episodes. The Delphi panelists judge the ontology favorably overall: its ability to unify KM concepts, its comprehensiveness, and utility. Moreover, various implications of the ontology for the KM field are examined as indicators of its utility for practitioners, educators, and researchers.
-
Maglaughlin, K.L.; Sonnenwald, D.H.: User perspectives an relevance criteria : a comparison among relevant, partially relevant, and not-relevant judgements (2002)
0.06
0.058173236 = product of:
0.23269294 = sum of:
0.23269294 = weight(_text_:judge in 201) [ClassicSimilarity], result of:
0.23269294 = score(doc=201,freq=2.0), product of:
0.45402667 = queryWeight, product of:
7.731176 = idf(docFreq=52, maxDocs=44421)
0.058726728 = queryNorm
0.5125094 = fieldWeight in 201, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
7.731176 = idf(docFreq=52, maxDocs=44421)
0.046875 = fieldNorm(doc=201)
0.25 = coord(1/4)
- Abstract
- In this issue Maglaughin and Sonnenwald provided 12 graduate students with searches related to the student's work and asked them to judge the twenty most recent retrieved representations by highlighting passages thought to contribute to relevance, marking out passages detracting from relevance, and providing a relevant, partially relevant or relevant judgement on each. By recorded interview they were asked about how these decisions were made and to describe the three classes of judgement. The union of criteria identified in past studies did not seem to fully capture the information supplied so a new set was produced and coding agreement found to be adequate. Twenty-nine criteria were identified and grouped into six categories based upon the focus of the criterion. Multiple criteria are used for most judgements, and most criteria may have either a positive or negative effect. Content was the most frequently mentioned criterion.
-
Karamuftuoglu, M.: Information arts and information science : time to unite? (2006)
0.06
0.058173236 = product of:
0.23269294 = sum of:
0.23269294 = weight(_text_:judge in 330) [ClassicSimilarity], result of:
0.23269294 = score(doc=330,freq=2.0), product of:
0.45402667 = queryWeight, product of:
7.731176 = idf(docFreq=52, maxDocs=44421)
0.058726728 = queryNorm
0.5125094 = fieldWeight in 330, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
7.731176 = idf(docFreq=52, maxDocs=44421)
0.046875 = fieldNorm(doc=330)
0.25 = coord(1/4)
- Abstract
- This article explicates the common ground between two currently independent fields of inquiry, namely information arts and information science, and suggests a framework that could unite them as a single field of study. The article defines and clarifies the meaning of information art and presents an axiological framework that could be used to judge the value of works of information art. The axiological framework is applied to examples of works of information art to demonstrate its use. The article argues that both information arts and information science could be studied under a common framework; namely, the domain-analytic or sociocognitive approach. It also is argued that the unification of the two fields could help enhance the meaning and scope of both information science and information arts and therefore be beneficial to both fields.
-
Hobson, S.P.; Dorr, B.J.; Monz, C.; Schwartz, R.: Task-based evaluation of text summarization using Relevance Prediction (2007)
0.06
0.058173236 = product of:
0.23269294 = sum of:
0.23269294 = weight(_text_:judge in 1938) [ClassicSimilarity], result of:
0.23269294 = score(doc=1938,freq=2.0), product of:
0.45402667 = queryWeight, product of:
7.731176 = idf(docFreq=52, maxDocs=44421)
0.058726728 = queryNorm
0.5125094 = fieldWeight in 1938, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
7.731176 = idf(docFreq=52, maxDocs=44421)
0.046875 = fieldNorm(doc=1938)
0.25 = coord(1/4)
- Abstract
- This article introduces a new task-based evaluation measure called Relevance Prediction that is a more intuitive measure of an individual's performance on a real-world task than interannotator agreement. Relevance Prediction parallels what a user does in the real world task of browsing a set of documents using standard search tools, i.e., the user judges relevance based on a short summary and then that same user - not an independent user - decides whether to open (and judge) the corresponding document. This measure is shown to be a more reliable measure of task performance than LDC Agreement, a current gold-standard based measure used in the summarization evaluation community. Our goal is to provide a stable framework within which developers of new automatic measures may make stronger statistical statements about the effectiveness of their measures in predicting summary usefulness. We demonstrate - as a proof-of-concept methodology for automatic metric developers - that a current automatic evaluation measure has a better correlation with Relevance Prediction than with LDC Agreement and that the significance level for detected differences is higher for the former than for the latter.
-
Díaz, A.; Gervás, P.: User-model based personalized summarization (2007)
0.06
0.058173236 = product of:
0.23269294 = sum of:
0.23269294 = weight(_text_:judge in 1952) [ClassicSimilarity], result of:
0.23269294 = score(doc=1952,freq=2.0), product of:
0.45402667 = queryWeight, product of:
7.731176 = idf(docFreq=52, maxDocs=44421)
0.058726728 = queryNorm
0.5125094 = fieldWeight in 1952, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
7.731176 = idf(docFreq=52, maxDocs=44421)
0.046875 = fieldNorm(doc=1952)
0.25 = coord(1/4)
- Abstract
- The potential of summary personalization is high, because a summary that would be useless to decide the relevance of a document if summarized in a generic manner, may be useful if the right sentences are selected that match the user interest. In this paper we defend the use of a personalized summarization facility to maximize the density of relevance of selections sent by a personalized information system to a given user. The personalization is applied to the digital newspaper domain and it used a user-model that stores long and short term interests using four reference systems: sections, categories, keywords and feedback terms. On the other side, it is crucial to measure how much information is lost during the summarization process, and how this information loss may affect the ability of the user to judge the relevance of a given document. The results obtained in two personalization systems show that personalized summaries perform better than generic and generic-personalized summaries in terms of identifying documents that satisfy user preferences. We also considered a user-centred direct evaluation that showed a high level of user satisfaction with the summaries.
-
Moreira Orengo, V.; Huyck, C.: Relevance feedback and cross-language information retrieval (2006)
0.06
0.058173236 = product of:
0.23269294 = sum of:
0.23269294 = weight(_text_:judge in 1970) [ClassicSimilarity], result of:
0.23269294 = score(doc=1970,freq=2.0), product of:
0.45402667 = queryWeight, product of:
7.731176 = idf(docFreq=52, maxDocs=44421)
0.058726728 = queryNorm
0.5125094 = fieldWeight in 1970, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
7.731176 = idf(docFreq=52, maxDocs=44421)
0.046875 = fieldNorm(doc=1970)
0.25 = coord(1/4)
- Abstract
- This paper presents a study of relevance feedback in a cross-language information retrieval environment. We have performed an experiment in which Portuguese speakers are asked to judge the relevance of English documents; documents hand-translated to Portuguese and documents automatically translated to Portuguese. The goals of the experiment were to answer two questions (i) how well can native Portuguese searchers recognise relevant documents written in English, compared to documents that are hand translated and automatically translated to Portuguese; and (ii) what is the impact of misjudged documents on the performance improvement that can be achieved by relevance feedback. Surprisingly, the results show that machine translation is as effective as hand translation in aiding users to assess relevance in the experiment. In addition, the impact of misjudged documents on the performance of RF is overall just moderate, and varies greatly for different query topics.
-
Leroy, G.; Miller, T.; Rosemblat, G.; Browne, A.: ¬A balanced approach to health information evaluation : a vocabulary-based naïve Bayes classifier and readability formulas (2008)
0.06
0.058173236 = product of:
0.23269294 = sum of:
0.23269294 = weight(_text_:judge in 2998) [ClassicSimilarity], result of:
0.23269294 = score(doc=2998,freq=2.0), product of:
0.45402667 = queryWeight, product of:
7.731176 = idf(docFreq=52, maxDocs=44421)
0.058726728 = queryNorm
0.5125094 = fieldWeight in 2998, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
7.731176 = idf(docFreq=52, maxDocs=44421)
0.046875 = fieldNorm(doc=2998)
0.25 = coord(1/4)
- Abstract
- Since millions seek health information online, it is vital for this information to be comprehensible. Most studies use readability formulas, which ignore vocabulary, and conclude that online health information is too difficult. We developed a vocabularly-based, naïve Bayes classifier to distinguish between three difficulty levels in text. It proved 98% accurate in a 250-document evaluation. We compared our classifier with readability formulas for 90 new documents with different origins and asked representative human evaluators, an expert and a consumer, to judge each document. Average readability grade levels for educational and commercial pages was 10th grade or higher, too difficult according to current literature. In contrast, the classifier showed that 70-90% of these pages were written at an intermediate, appropriate level indicating that vocabulary usage is frequently appropriate in text considered too difficult by readability formula evaluations. The expert considered the pages more difficult for a consumer than the consumer did.
-
Cosijn, E.: Relevance judgments and measurements (2009)
0.06
0.058173236 = product of:
0.23269294 = sum of:
0.23269294 = weight(_text_:judge in 842) [ClassicSimilarity], result of:
0.23269294 = score(doc=842,freq=2.0), product of:
0.45402667 = queryWeight, product of:
7.731176 = idf(docFreq=52, maxDocs=44421)
0.058726728 = queryNorm
0.5125094 = fieldWeight in 842, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
7.731176 = idf(docFreq=52, maxDocs=44421)
0.046875 = fieldNorm(doc=842)
0.25 = coord(1/4)
- Abstract
- Users intuitively know which documents are relevant when they see them. Formal relevance assessment, however, is a complex issue. In this entry relevance assessment are described both from a human perspective and a systems perspective. Humans judge relevance in terms of the relation between the documents retrieved and the way in which these documents are understood and used. This is a subjective and personal judgment and is called user relevance. Systems compute a function between the query and the document features that the systems builders believe will cause documents to be ranked by the likelihood that a user will find the documents relevant. This is an objective measurement of relevance in terms of relations between the query and the documents retrieved-this is called system relevance (or sometimes similarity).
-
Information ethics : privacy, property, and power (2005)
0.05
0.054119907 = product of:
0.108239815 = sum of:
0.09695539 = weight(_text_:judge in 3392) [ClassicSimilarity], result of:
0.09695539 = score(doc=3392,freq=2.0), product of:
0.45402667 = queryWeight, product of:
7.731176 = idf(docFreq=52, maxDocs=44421)
0.058726728 = queryNorm
0.21354558 = fieldWeight in 3392, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
7.731176 = idf(docFreq=52, maxDocs=44421)
0.01953125 = fieldNorm(doc=3392)
0.011284425 = weight(_text_:und in 3392) [ClassicSimilarity], result of:
0.011284425 = score(doc=3392,freq=4.0), product of:
0.13024996 = queryWeight, product of:
2.217899 = idf(docFreq=13141, maxDocs=44421)
0.058726728 = queryNorm
0.086636685 = fieldWeight in 3392, product of:
2.0 = tf(freq=4.0), with freq of:
4.0 = termFreq=4.0
2.217899 = idf(docFreq=13141, maxDocs=44421)
0.01953125 = fieldNorm(doc=3392)
0.5 = coord(2/4)
- BK
- 06.00 / Information und Dokumentation: Allgemeines
- Classification
- 06.00 / Information und Dokumentation: Allgemeines
- Footnote
- Rez. in: JASIST 58(2007) no.2, S.302 (L.A. Ennis):"This is an important and timely anthology of articles "on the normative issues surrounding information control" (p. 11). Using an interdisciplinary approach, Moore's work takes a broad look at the relatively new field of information ethics. Covering a variety of disciplines including applied ethics, intellectual property, privacy, free speech, and more, the book provides information professionals of all kinds with a valuable and thought-provoking resource. Information Ethics is divided into five parts and twenty chapters or articles. At the end of each of the five parts, the editor has included a few "discussion cases," which allows the users to apply what they just read to potential real life examples. Part I, "An Ethical Framework for Analysis," provides readers with an introduction to reasoning and ethics. This complex and philosophical section of the book contains five articles and four discussion cases. All five of the articles are really thought provoking and challenging writings on morality. For instance, in the first article, "Introduction to Moral Reasoning," Tom Regan examines how not to answer a moral question. For example, he thinks using what the majority believes as a means of determining what is and is not moral is flawed. "The Metaphysics of Morals" by Immanuel Kant looks at the reasons behind actions. According to Kant, to be moral one has to do the right thing for the right reasons. By including materials that force the reader to think more broadly and deeply about what is right and wrong, Moore has provided an important foundation and backdrop for the rest of the book. Part II, "Intellectual Property: Moral and Legal Concerns," contains five articles and three discussion cases for tackling issues like ownership, patents, copyright, and biopiracy. This section takes a probing look at intellectual and intangible property from a variety of viewpoints. For instance, in "Intellectual Property is Still Property," Judge Frank Easterbrook argues that intellectual property is no different than physical property and should not be treated any differently by law. Tom Palmer's article, "Are Patents and Copyrights Morally Justified," however, uses historical examples to show how intellectual and physical properties differ.
-
Spink, A.; Greisdorf, H.: Regions and levels : Measuring and mapping users' relevance judgements (2001)
0.05
0.048477694 = product of:
0.19391078 = sum of:
0.19391078 = weight(_text_:judge in 6586) [ClassicSimilarity], result of:
0.19391078 = score(doc=6586,freq=2.0), product of:
0.45402667 = queryWeight, product of:
7.731176 = idf(docFreq=52, maxDocs=44421)
0.058726728 = queryNorm
0.42709115 = fieldWeight in 6586, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
7.731176 = idf(docFreq=52, maxDocs=44421)
0.0390625 = fieldNorm(doc=6586)
0.25 = coord(1/4)
- Abstract
- The dichotomous bipolar approach to relevance has produced an abundance of information retrieval (M) research. However, relevance studies that include consideration of users' partial relevance judgments are moving to a greater relevance clarity and congruity to impact the design of more effective [R systems. The study reported in this paper investigates the various regions of across a distribution of users' relevance judgments, including how these regions may be categorized, measured, and evaluated. An instrument was designed using four scales for collecting, measuring, and describing enduser relevance judgments. The instrument was administered to 21 end-users who conducted searches on their own information problems and made relevance judgments on a total of 1059 retrieved items. Findings include: (1) overlapping regions of relevance were found to impact the usefulness of precision ratios as a measure of IR system effectiveness, (2) both positive and negative levels of relevance are important to users as they make relevance judgments, (3) topicality was used more to reject rather than accept items as highly relevant, (4) utility was more used to judge items highly relevant, and (5) the nature of relevance judgment distribution suggested a new IR evaluation measure-median effect. Findings suggest that the middle region of a distribution of relevance judgments, also called "partial relevance," represents a key avenue for ongoing study. The findings provide implications for relevance theory, and the evaluation of IR systems
-
Pu, H.-T.; Chuang, S.-L.; Yang, C.: Subject categorization of query terms for exploring Web users' search interests (2002)
0.05
0.048477694 = product of:
0.19391078 = sum of:
0.19391078 = weight(_text_:judge in 1587) [ClassicSimilarity], result of:
0.19391078 = score(doc=1587,freq=2.0), product of:
0.45402667 = queryWeight, product of:
7.731176 = idf(docFreq=52, maxDocs=44421)
0.058726728 = queryNorm
0.42709115 = fieldWeight in 1587, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
7.731176 = idf(docFreq=52, maxDocs=44421)
0.0390625 = fieldNorm(doc=1587)
0.25 = coord(1/4)
- Abstract
- Subject content analysis of Web query terms is essential to understand Web searching interests. Such analysis includes exploring search topics and observing changes in their frequency distributions with time. To provide a basis for in-depth analysis of users' search interests on a larger scale, this article presents a query categorization approach to automatically classifying Web query terms into broad subject categories. Because a query is short in length and simple in structure, its intended subject(s) of search is difficult to judge. Our approach, therefore, combines the search processes of real-world search engines to obtain highly ranked Web documents based on each unknown query term. These documents are used to extract cooccurring terms and to create a feature set. An effective ranking function has also been developed to find the most appropriate categories. Three search engine logs in Taiwan were collected and tested. They contained over 5 million queries from different periods of time. The achieved performance is quite encouraging compared with that of human categorization. The experimental results demonstrate that the approach is efficient in dealing with large numbers of queries and adaptable to the dynamic Web environment. Through good integration of human and machine efforts, the frequency distributions of subject categories in response to changes in users' search interests can be systematically observed in real time. The approach has also shown potential for use in various information retrieval applications, and provides a basis for further Web searching studies.
-
Nicholson, S.: Bibliomining for automated collection development in a digital library setting : using data mining to discover Web-based scholarly research works (2003)
0.05
0.048477694 = product of:
0.19391078 = sum of:
0.19391078 = weight(_text_:judge in 2867) [ClassicSimilarity], result of:
0.19391078 = score(doc=2867,freq=2.0), product of:
0.45402667 = queryWeight, product of:
7.731176 = idf(docFreq=52, maxDocs=44421)
0.058726728 = queryNorm
0.42709115 = fieldWeight in 2867, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
7.731176 = idf(docFreq=52, maxDocs=44421)
0.0390625 = fieldNorm(doc=2867)
0.25 = coord(1/4)
- Abstract
- This research creates an intelligent agent for automated collection development in a digital library setting. It uses a predictive model based an facets of each Web page to select scholarly works. The criteria came from the academic library selection literature, and a Delphi study was used to refine the list to 41 criteria. A Perl program was designed to analyze a Web page for each criterion and applied to a large collection of scholarly and nonscholarly Web pages. Bibliomining, or data mining for libraries, was then used to create different classification models. Four techniques were used: logistic regression, nonparametric discriminant analysis, classification trees, and neural networks. Accuracy and return were used to judge the effectiveness of each model an test datasets. In addition, a set of problematic pages that were difficult to classify because of their similarity to scholarly research was gathered and classified using the models. The resulting models could be used in the selection process to automatically create a digital library of Webbased scholarly research works. In addition, the technique can be extended to create a digital library of any type of structured electronic information.
-
White, H.D.: Combining bibliometrics, information retrieval, and relevance theory : part 2: some implications for information science (2007)
0.05
0.048477694 = product of:
0.19391078 = sum of:
0.19391078 = weight(_text_:judge in 1437) [ClassicSimilarity], result of:
0.19391078 = score(doc=1437,freq=2.0), product of:
0.45402667 = queryWeight, product of:
7.731176 = idf(docFreq=52, maxDocs=44421)
0.058726728 = queryNorm
0.42709115 = fieldWeight in 1437, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
7.731176 = idf(docFreq=52, maxDocs=44421)
0.0390625 = fieldNorm(doc=1437)
0.25 = coord(1/4)
- Abstract
- When bibliometric data are converted to term frequency (tf) and inverse document frequency (idf) values, plotted as pennant diagrams, and interpreted according to Sperber and Wilson's relevance theory (RT), the results evoke major variables of information science (IS). These include topicality, in the sense of intercohesion and intercoherence among texts; cognitive effects of texts in response to people's questions; people's levels of expertise as a precondition for cognitive effects; processing effort as textual or other messages are received; specificity of terms as it affects processing effort; relevance, defined in RT as the effects/effort ratio; and authority of texts and their authors. While such concerns figure automatically in dialogues between people, they become problematic when people create or use or judge literature-based information systems. The difficulty of achieving worthwhile cognitive effects and acceptable processing effort in human-system dialogues explains why relevance is the central concern of IS. Moreover, since relevant communication with both systems and unfamiliar people is uncertain, speakers tend to seek cognitive effects that cost them the least effort. Yet hearers need greater effort, often greater specificity, from speakers if their responses are to be highly relevant in their turn. This theme of mismatch manifests itself in vague reference questions, underdeveloped online searches, uncreative judging in retrieval evaluation trials, and perfunctory indexing. Another effect of least effort is a bias toward topical relevance over other kinds. RT can explain these outcomes as well as more adaptive ones. Pennant diagrams, applied here to a literature search and a Bradford-style journal analysis, can model them. Given RT and the right context, bibliometrics may predict psychometrics.