-
Chen, H.; Martinez, J.; Kirchhoff, A.; Ng, T.D.; Schatz, B.R.: Alleviating search uncertainty through concept associations : automatic indexing, co-occurence analysis, and parallel computing (1998)
0.05
0.05479776 = product of:
0.10959552 = sum of:
0.022471312 = weight(_text_:und in 6202) [ClassicSimilarity], result of:
0.022471312 = score(doc=6202,freq=2.0), product of:
0.15283768 = queryWeight, product of:
2.217899 = idf(docFreq=13141, maxDocs=44421)
0.068911016 = queryNorm
0.14702731 = fieldWeight in 6202, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
2.217899 = idf(docFreq=13141, maxDocs=44421)
0.046875 = fieldNorm(doc=6202)
0.087124206 = weight(_text_:human in 6202) [ClassicSimilarity], result of:
0.087124206 = score(doc=6202,freq=2.0), product of:
0.30094394 = queryWeight, product of:
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.068911016 = queryNorm
0.2895031 = fieldWeight in 6202, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.046875 = fieldNorm(doc=6202)
0.5 = coord(2/4)
- Abstract
- In this article, we report research on an algorithmic approach to alleviating search uncertainty in a large information space. Grounded on object filtering, automatic indexing, and co-occurence analysis, we performed a large-scale experiment using a parallel supercomputer (SGI Power Challenge) to analyze 400.000+ abstracts in an INSPEC computer engineering collection. Two system-generated thesauri, one based on a combined object filtering and automatic indexing method, and the other based on automatic indexing only, were compaed with the human-generated INSPEC subject thesaurus. Our user evaluation revealed that the system-generated thesauri were better than the INSPEC thesaurus in 'concept recall', but in 'concept precision' the 3 thesauri were comparable. Our analysis also revealed that the terms suggested by the 3 thesauri were complementary and could be used to significantly increase 'variety' in search terms the thereby reduce search uncertainty
- Theme
- Konzeption und Anwendung des Prinzips Thesaurus
-
Boyack, K.W.; Wylie,B.N.; Davidson, G.S.: Information Visualization, Human-Computer Interaction, and Cognitive Psychology : Domain Visualizations (2002)
0.04
0.036301754 = product of:
0.14520702 = sum of:
0.14520702 = weight(_text_:human in 2352) [ClassicSimilarity], result of:
0.14520702 = score(doc=2352,freq=2.0), product of:
0.30094394 = queryWeight, product of:
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.068911016 = queryNorm
0.4825052 = fieldWeight in 2352, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.078125 = fieldNorm(doc=2352)
0.25 = coord(1/4)
-
Landauer, T.K.; Foltz, P.W.; Laham, D.: ¬An introduction to Latent Semantic Analysis (1998)
0.03
0.03080306 = product of:
0.12321224 = sum of:
0.12321224 = weight(_text_:human in 2162) [ClassicSimilarity], result of:
0.12321224 = score(doc=2162,freq=4.0), product of:
0.30094394 = queryWeight, product of:
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.068911016 = queryNorm
0.40941924 = fieldWeight in 2162, product of:
2.0 = tf(freq=4.0), with freq of:
4.0 = termFreq=4.0
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.046875 = fieldNorm(doc=2162)
0.25 = coord(1/4)
- Abstract
- Latent Semantic Analysis (LSA) is a theory and method for extracting and representing the contextual-usage meaning of words by statistical computations applied to a large corpus of text (Landauer and Dumais, 1997). The underlying idea is that the aggregate of all the word contexts in which a given word does and does not appear provides a set of mutual constraints that largely determines the similarity of meaning of words and sets of words to each other. The adequacy of LSA's reflection of human knowledge has been established in a variety of ways. For example, its scores overlap those of humans on standard vocabulary and subject matter tests; it mimics human word sorting and category judgments; it simulates word-word and passage-word lexical priming data; and as reported in 3 following articles in this issue, it accurately estimates passage coherence, learnability of passages by individual students, and the quality and quantity of knowledge contained in an essay.
-
Baofu, P.: ¬The future of information architecture : conceiving a better way to understand taxonomy, network, and intelligence (2008)
0.03
0.025669215 = product of:
0.10267686 = sum of:
0.10267686 = weight(_text_:human in 3257) [ClassicSimilarity], result of:
0.10267686 = score(doc=3257,freq=4.0), product of:
0.30094394 = queryWeight, product of:
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.068911016 = queryNorm
0.34118268 = fieldWeight in 3257, product of:
2.0 = tf(freq=4.0), with freq of:
4.0 = termFreq=4.0
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.0390625 = fieldNorm(doc=3257)
0.25 = coord(1/4)
- Abstract
- The Future of Information Architecture examines issues surrounding why information is processed, stored and applied in the way that it has, since time immemorial. Contrary to the conventional wisdom held by many scholars in human history, the recurrent debate on the explanation of the most basic categories of information (eg space, time causation, quality, quantity) has been misconstrued, to the effect that there exists some deeper categories and principles behind these categories of information - with enormous implications for our understanding of reality in general. To understand this, the book is organised in to four main parts: Part I begins with the vital question concerning the role of information within the context of the larger theoretical debate in the literature. Part II provides a critical examination of the nature of data taxonomy from the main perspectives of culture, society, nature and the mind. Part III constructively invesitgates the world of information network from the main perspectives of culture, society, nature and the mind. Part IV proposes six main theses in the authors synthetic theory of information architecture, namely, (a) the first thesis on the simpleness-complicatedness principle, (b) the second thesis on the exactness-vagueness principle (c) the third thesis on the slowness-quickness principle (d) the fourth thesis on the order-chaos principle, (e) the fifth thesis on the symmetry-asymmetry principle, and (f) the sixth thesis on the post-human stage.
-
Jiang, Y.; Zhang, X.; Tang, Y.; Nie, R.: Feature-based approaches to semantic similarity assessment of concepts using Wikipedia (2015)
0.03
0.025669215 = product of:
0.10267686 = sum of:
0.10267686 = weight(_text_:human in 3682) [ClassicSimilarity], result of:
0.10267686 = score(doc=3682,freq=4.0), product of:
0.30094394 = queryWeight, product of:
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.068911016 = queryNorm
0.34118268 = fieldWeight in 3682, product of:
2.0 = tf(freq=4.0), with freq of:
4.0 = termFreq=4.0
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.0390625 = fieldNorm(doc=3682)
0.25 = coord(1/4)
- Abstract
- Semantic similarity assessment between concepts is an important task in many language related applications. In the past, several approaches to assess similarity by evaluating the knowledge modeled in an (or multiple) ontology (or ontologies) have been proposed. However, there are some limitations such as the facts of relying on predefined ontologies and fitting non-dynamic domains in the existing measures. Wikipedia provides a very large domain-independent encyclopedic repository and semantic network for computing semantic similarity of concepts with more coverage than usual ontologies. In this paper, we propose some novel feature based similarity assessment methods that are fully dependent on Wikipedia and can avoid most of the limitations and drawbacks introduced above. To implement similarity assessment based on feature by making use of Wikipedia, firstly a formal representation of Wikipedia concepts is presented. We then give a framework for feature based similarity based on the formal representation of Wikipedia concepts. Lastly, we investigate several feature based approaches to semantic similarity measures resulting from instantiations of the framework. The evaluation, based on several widely used benchmarks and a benchmark developed in ourselves, sustains the intuitions with respect to human judgements. Overall, several methods proposed in this paper have good human correlation and constitute some effective ways of determining similarity between Wikipedia concepts.
-
Ahn, J.-w.; Brusilovsky, P.: Adaptive visualization for exploratory information retrieval (2013)
0.03
0.025669215 = product of:
0.10267686 = sum of:
0.10267686 = weight(_text_:human in 3717) [ClassicSimilarity], result of:
0.10267686 = score(doc=3717,freq=4.0), product of:
0.30094394 = queryWeight, product of:
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.068911016 = queryNorm
0.34118268 = fieldWeight in 3717, product of:
2.0 = tf(freq=4.0), with freq of:
4.0 = termFreq=4.0
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.0390625 = fieldNorm(doc=3717)
0.25 = coord(1/4)
- Abstract
- As the volume and breadth of online information is rapidly increasing, ad hoc search systems become less and less efficient to answer information needs of modern users. To support the growing complexity of search tasks, researchers in the field of information developed and explored a range of approaches that extend the traditional ad hoc retrieval paradigm. Among these approaches, personalized search systems and exploratory search systems attracted many followers. Personalized search explored the power of artificial intelligence techniques to provide tailored search results according to different user interests, contexts, and tasks. In contrast, exploratory search capitalized on the power of human intelligence by providing users with more powerful interfaces to support the search process. As these approaches are not contradictory, we believe that they can re-enforce each other. We argue that the effectiveness of personalized search systems may be increased by allowing users to interact with the system and learn/investigate the problem in order to reach the final goal. We also suggest that an interactive visualization approach could offer a good ground to combine the strong sides of personalized and exploratory search approaches. This paper proposes a specific way to integrate interactive visualization and personalized search and introduces an adaptive visualization based search system Adaptive VIBE that implements it. We tested the effectiveness of Adaptive VIBE and investigated its strengths and weaknesses by conducting a full-scale user study. The results show that Adaptive VIBE can improve the precision and the productivity of the personalized search system while helping users to discover more diverse sets of information.
- Footnote
- Beitrag im Rahmen einer Special section on Human-computer Information Retrieval.
-
Jiang, Y.; Bai, W.; Zhang, X.; Hu, J.: Wikipedia-based information content and semantic similarity computation (2017)
0.03
0.025669215 = product of:
0.10267686 = sum of:
0.10267686 = weight(_text_:human in 3877) [ClassicSimilarity], result of:
0.10267686 = score(doc=3877,freq=4.0), product of:
0.30094394 = queryWeight, product of:
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.068911016 = queryNorm
0.34118268 = fieldWeight in 3877, product of:
2.0 = tf(freq=4.0), with freq of:
4.0 = termFreq=4.0
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.0390625 = fieldNorm(doc=3877)
0.25 = coord(1/4)
- Abstract
- The Information Content (IC) of a concept is a fundamental dimension in computational linguistics. It enables a better understanding of concept's semantics. In the past, several approaches to compute IC of a concept have been proposed. However, there are some limitations such as the facts of relying on corpora availability, manual tagging, or predefined ontologies and fitting non-dynamic domains in the existing methods. Wikipedia provides a very large domain-independent encyclopedic repository and semantic network for computing IC of concepts with more coverage than usual ontologies. In this paper, we propose some novel methods to IC computation of a concept to solve the shortcomings of existing approaches. The presented methods focus on the IC computation of a concept (i.e., Wikipedia category) drawn from the Wikipedia category structure. We propose several new IC-based measures to compute the semantic similarity between concepts. The evaluation, based on several widely used benchmarks and a benchmark developed in ourselves, sustains the intuitions with respect to human judgments. Overall, some methods proposed in this paper have a good human correlation and constitute some effective ways of determining IC values for concepts and semantic similarity between concepts.
-
Qu, R.; Fang, Y.; Bai, W.; Jiang, Y.: Computing semantic similarity based on novel models of semantic representation using Wikipedia (2018)
0.03
0.025669215 = product of:
0.10267686 = sum of:
0.10267686 = weight(_text_:human in 52) [ClassicSimilarity], result of:
0.10267686 = score(doc=52,freq=4.0), product of:
0.30094394 = queryWeight, product of:
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.068911016 = queryNorm
0.34118268 = fieldWeight in 52, product of:
2.0 = tf(freq=4.0), with freq of:
4.0 = termFreq=4.0
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.0390625 = fieldNorm(doc=52)
0.25 = coord(1/4)
- Abstract
- Computing Semantic Similarity (SS) between concepts is one of the most critical issues in many domains such as Natural Language Processing and Artificial Intelligence. Over the years, several SS measurement methods have been proposed by exploiting different knowledge resources. Wikipedia provides a large domain-independent encyclopedic repository and a semantic network for computing SS between concepts. Traditional feature-based measures rely on linear combinations of different properties with two main limitations, the insufficient information and the loss of semantic information. In this paper, we propose several hybrid SS measurement approaches by using the Information Content (IC) and features of concepts, which avoid the limitations introduced above. Considering integrating discrete properties into one component, we present two models of semantic representation, called CORM and CARM. Then, we compute SS based on these models and take the IC of categories as a supplement of SS measurement. The evaluation, based on several widely used benchmarks and a benchmark developed by ourselves, sustains the intuitions with respect to human judgments. In summary, our approaches are more efficient in determining SS between concepts and have a better human correlation than previous methods such as Word2Vec and NASARI.
-
Beaulieu, M.: Experiments on interfaces to support query expansion (1997)
0.03
0.025411226 = product of:
0.1016449 = sum of:
0.1016449 = weight(_text_:human in 5704) [ClassicSimilarity], result of:
0.1016449 = score(doc=5704,freq=2.0), product of:
0.30094394 = queryWeight, product of:
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.068911016 = queryNorm
0.33775362 = fieldWeight in 5704, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.0546875 = fieldNorm(doc=5704)
0.25 = coord(1/4)
- Abstract
- Focuses on the user and human-computer interaction (HCI) aspects of the research based on the Okapi text retrieval system. Describes 3 experiments using different approaches to query expansion, highlighting the relationship between the functionality of a system and different interface designs. These experiments involve both automatic and interactive query expansion, and both character based and GUI (graphical user interface) environments. The effectiveness of the search interaction for query expansion depends on resolving opposing interface and functional aspects, e.g. automatic vs. interactive query expansion, explicit vs. implicit use of a thesaurus, and document vs. query space
-
Hemmje, M.: LyberWorld - a 3D graphical user interface for fulltext retrieval (1995)
0.03
0.025411226 = product of:
0.1016449 = sum of:
0.1016449 = weight(_text_:human in 3385) [ClassicSimilarity], result of:
0.1016449 = score(doc=3385,freq=2.0), product of:
0.30094394 = queryWeight, product of:
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.068911016 = queryNorm
0.33775362 = fieldWeight in 3385, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.0546875 = fieldNorm(doc=3385)
0.25 = coord(1/4)
- Source
- Proceeding CHI '95 Conference Companion on Human Factors in Computing Systems
-
Zhang, J.; Mostafa, J.; Tripathy, H.: Information retrieval by semantic analysis and visualization of the concept space of D-Lib® magazine (2002)
0.02
0.023634411 = product of:
0.094537646 = sum of:
0.094537646 = weight(_text_:java in 2211) [ClassicSimilarity], result of:
0.094537646 = score(doc=2211,freq=2.0), product of:
0.4856509 = queryWeight, product of:
7.0475073 = idf(docFreq=104, maxDocs=44421)
0.068911016 = queryNorm
0.19466174 = fieldWeight in 2211, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
7.0475073 = idf(docFreq=104, maxDocs=44421)
0.01953125 = fieldNorm(doc=2211)
0.25 = coord(1/4)
- Content
- The JAVA applet is available at <http://ella.slis.indiana.edu/~junzhang/dlib/IV.html>. A prototype of this interface has been developed and is available at <http://ella.slis.indiana.edu/~junzhang/dlib/IV.html>. The D-Lib search interface is available at <http://www.dlib.org/Architext/AT-dlib2query.html>.
-
Chen, H.; Zhang, Y.; Houston, A.L.: Semantic indexing and searching using a Hopfield net (1998)
0.02
0.021781052 = product of:
0.087124206 = sum of:
0.087124206 = weight(_text_:human in 6704) [ClassicSimilarity], result of:
0.087124206 = score(doc=6704,freq=2.0), product of:
0.30094394 = queryWeight, product of:
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.068911016 = queryNorm
0.2895031 = fieldWeight in 6704, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.046875 = fieldNorm(doc=6704)
0.25 = coord(1/4)
- Abstract
- Presents a neural network approach to document semantic indexing. Reports results of a study to apply a Hopfield net algorithm to simulate human associative memory for concept exploration in the domain of computer science and engineering. The INSPEC database, consisting of 320.000 abstracts from leading periodical articles was used as the document test bed. Benchmark tests conformed that 3 parameters: maximum number of activated nodes; maximum allowable error; and maximum number of iterations; were useful in positively influencing network convergence behaviour without negatively impacting central processing unit performance. Another series of benchmark tests was performed to determine the effectiveness of various filtering techniques in reducing the negative impact of noisy input terms. Preliminary user tests conformed expectations that the Hopfield net is potentially useful as an associative memory technique to improve document recall and precision by solving discrepancies between indexer vocabularies and end user vocabularies
-
Huang, L.; Milne, D.; Frank, E.; Witten, I.H.: Learning a concept-based document similarity measure (2012)
0.02
0.021781052 = product of:
0.087124206 = sum of:
0.087124206 = weight(_text_:human in 1372) [ClassicSimilarity], result of:
0.087124206 = score(doc=1372,freq=2.0), product of:
0.30094394 = queryWeight, product of:
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.068911016 = queryNorm
0.2895031 = fieldWeight in 1372, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.046875 = fieldNorm(doc=1372)
0.25 = coord(1/4)
- Abstract
- Document similarity measures are crucial components of many text-analysis tasks, including information retrieval, document classification, and document clustering. Conventional measures are brittle: They estimate the surface overlap between documents based on the words they mention and ignore deeper semantic connections. We propose a new measure that assesses similarity at both the lexical and semantic levels, and learns from human judgments how to combine them by using machine-learning techniques. Experiments show that the new measure produces values for documents that are more consistent with people's judgments than people are with each other. We also use it to classify and cluster large document sets covering different genres and topics, and find that it improves both classification and clustering performance.
-
Lund, K.; Burgess, C.: Producing high-dimensional semantic spaces from lexical co-occurrence (1996)
0.02
0.021781052 = product of:
0.087124206 = sum of:
0.087124206 = weight(_text_:human in 2704) [ClassicSimilarity], result of:
0.087124206 = score(doc=2704,freq=2.0), product of:
0.30094394 = queryWeight, product of:
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.068911016 = queryNorm
0.2895031 = fieldWeight in 2704, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.046875 = fieldNorm(doc=2704)
0.25 = coord(1/4)
- Abstract
- A procedure that processes a corpus of text and produces numeric vectors containing information about its meanings for each word is presented. This procedure is applied to a large corpus of natural language text taken from Usenet, and the resulting vectors are examined to determine what information is contained within them. These vectors provide the coordinates in a high-dimensional space in which word relationships can be analyzed. Analyses of both vector similarity and multidimensional scaling demonstrate that there is significant semantic information carried in the vectors. A comparison of vector similarity with human reaction times in a single-word priming experiment is presented. These vectors provide the basis for a representational model of semantic memory, hyperspace analogue to language (HAL).
-
Schek, M.: Automatische Klassifizierung und Visualisierung im Archiv der Süddeutschen Zeitung (2005)
0.02
0.020201197 = product of:
0.08080479 = sum of:
0.08080479 = weight(_text_:und in 5884) [ClassicSimilarity], result of:
0.08080479 = score(doc=5884,freq=76.0), product of:
0.15283768 = queryWeight, product of:
2.217899 = idf(docFreq=13141, maxDocs=44421)
0.068911016 = queryNorm
0.5286968 = fieldWeight in 5884, product of:
8.717798 = tf(freq=76.0), with freq of:
76.0 = termFreq=76.0
2.217899 = idf(docFreq=13141, maxDocs=44421)
0.02734375 = fieldNorm(doc=5884)
0.25 = coord(1/4)
- Abstract
- Die Süddeutsche Zeitung (SZ) verfügt seit ihrer Gründung 1945 über ein Pressearchiv, das die Texte der eigenen Redakteure und zahlreicher nationaler und internationaler Publikationen dokumentiert und auf Anfrage für Recherchezwecke bereitstellt. Die Einführung der EDV begann Anfang der 90er Jahre mit der digitalen Speicherung zunächst der SZ-Daten. Die technische Weiterentwicklung ab Mitte der 90er Jahre diente zwei Zielen: (1) dem vollständigen Wechsel von der Papierablage zur digitalen Speicherung und (2) dem Wandel von einer verlagsinternen Dokumentations- und Auskunftsstelle zu einem auch auf dem Markt vertretenen Informationsdienstleister. Um die dabei entstehenden Aufwände zu verteilen und gleichzeitig Synergieeffekte zwischen inhaltlich verwandten Archiven zu erschließen, gründeten der Süddeutsche Verlag und der Bayerische Rundfunk im Jahr 1998 die Dokumentations- und Informationszentrum (DIZ) München GmbH, in der die Pressearchive der beiden Gesellschafter und das Bildarchiv des Süddeutschen Verlags zusammengeführt wurden. Die gemeinsam entwickelte Pressedatenbank ermöglichte das standortübergreifende Lektorat, die browserbasierte Recherche für Redakteure und externe Kunden im Intraund Internet und die kundenspezifischen Content Feeds für Verlage, Rundfunkanstalten und Portale. Die DIZPressedatenbank enthält zur Zeit 6,9 Millionen Artikel, die jeweils als HTML oder PDF abrufbar sind. Täglich kommen ca. 3.500 Artikel hinzu, von denen ca. 1.000 lektoriert werden. Das Lektorat erfolgt im DIZ nicht durch die Vergabe von Schlagwörtern am Dokument, sondern durch die Verlinkung der Artikel mit "virtuellen Mappen", den Dossiers. Diese stellen die elektronische Repräsentation einer Papiermappe dar und sind das zentrale Erschließungsobjekt. Im Gegensatz zu statischen Klassifikationssystemen ist die Dossierstruktur dynamisch und aufkommensabhängig, d.h. neue Dossiers werden hauptsächlich anhand der aktuellen Berichterstattung erstellt. Insgesamt enthält die DIZ-Pressedatenbank ca. 90.000 Dossiers, davon sind 68.000 Sachthemen (Topics), Personen und Institutionen. Die Dossiers sind untereinander zum "DIZ-Wissensnetz" verlinkt.
DIZ definiert das Wissensnetz als Alleinstellungsmerkmal und wendet beträchtliche personelle Ressourcen für die Aktualisierung und Oualitätssicherung der Dossiers auf. Nach der Umstellung auf den komplett digitalisierten Workflow im April 2001 identifizierte DIZ vier Ansatzpunkte, wie die Aufwände auf der Inputseite (Lektorat) zu optimieren sind und gleichzeitig auf der Outputseite (Recherche) das Wissensnetz besser zu vermarkten ist: 1. (Teil-)Automatische Klassifizierung von Pressetexten (Vorschlagwesen) 2. Visualisierung des Wissensnetzes (Topic Mapping) 3. (Voll-)Automatische Klassifizierung und Optimierung des Wissensnetzes 4. Neue Retrievalmöglichkeiten (Clustering, Konzeptsuche) Die Projekte 1 und 2 "Automatische Klassifizierung und Visualisierung" starteten zuerst und wurden beschleunigt durch zwei Entwicklungen: - Der Bayerische Rundfunk (BR), ursprünglich Mitbegründer und 50%-Gesellschafter der DIZ München GmbH, entschloss sich aus strategischen Gründen, zum Ende 2003 aus der Kooperation auszusteigen. - Die Medienkrise, hervorgerufen durch den massiven Rückgang der Anzeigenerlöse, erforderte auch im Süddeutschen Verlag massive Einsparungen und die Suche nach neuen Erlösquellen. Beides führte dazu, dass die Kapazitäten im Bereich Pressedokumentation von ursprünglich rund 20 (nur SZ, ohne BR-Anteil) auf rund 13 zum 1. Januar 2004 sanken und gleichzeitig die Aufwände für die Pflege des Wissensnetzes unter verstärkten Rechtfertigungsdruck gerieten. Für die Projekte 1 und 2 ergaben sich daraus drei quantitative und qualitative Ziele: - Produktivitätssteigerung im Lektorat - Konsistenzverbesserung im Lektorat - Bessere Vermarktung und intensivere Nutzung der Dossiers in der Recherche Alle drei genannten Ziele konnten erreicht werden, wobei insbesondere die Produktivität im Lektorat gestiegen ist. Die Projekte 1 und 2 "Automatische Klassifizierung und Visualisierung" sind seit Anfang 2004 erfolgreich abgeschlossen. Die Folgeprojekte 3 und 4 laufen seit Mitte 2004 und sollen bis Mitte 2005 abgeschlossen sein. Im folgenden wird in Abschnitt 2 die Produktauswahl und Arbeitsweise der Automatischen Klassifizierung beschrieben. Abschnitt 3 schildert den Einsatz der Wissensnetz-Visualisierung in Lektorat und Recherche. Abschnitt 4 fasst die Ergebnisse der Projekte 1 und 2 zusammen und gibt einen Ausblick auf die Ziele der Projekte 3 und 4.
-
Hauer, M.: Neue OPACs braucht das Land ... dandelon.com (2006)
0.02
0.019460732 = product of:
0.07784293 = sum of:
0.07784293 = weight(_text_:und in 47) [ClassicSimilarity], result of:
0.07784293 = score(doc=47,freq=24.0), product of:
0.15283768 = queryWeight, product of:
2.217899 = idf(docFreq=13141, maxDocs=44421)
0.068911016 = queryNorm
0.50931764 = fieldWeight in 47, product of:
4.8989797 = tf(freq=24.0), with freq of:
24.0 = termFreq=24.0
2.217899 = idf(docFreq=13141, maxDocs=44421)
0.046875 = fieldNorm(doc=47)
0.25 = coord(1/4)
- Abstract
- In dandelon.com werden im Gegensatz zu den bisherigen Federated Search-Portal-Ansätzen die Titel von Medien neu mittels intelligentCAPTURE dezentral und kollaborativ erschlossen und inhaltlich stark erweitert. intelligentCAPTURE erschließt maschinell bisher Buchinhaltsverzeichnisse, Bücher, Klappentexte, Aufsätze und Websites, übernimmt bibliografische Daten aus Bibliotheken (XML, Z.39.50), von Verlagen (ONIX + Cover Pages), Zeitschriftenagenturen (Swets) und Buchhandel (SOAP) und exportierte maschinelle Indexate und aufbereitete Dokumente an die Bibliothekskataloge (MAB, MARC, XML) oder Dokumentationssysteme, an dandelon.com und teils auch an Fachportale. Die Daten werden durch Scanning und OCR, durch Import von Dateien und Lookup auf Server und durch Web-Spidering/-Crawling gewonnen. Die Qualität der Suche in dandelon.com ist deutlich besser als in bisherigen Bibliothekssystemen. Die semantische, multilinguale Suche mit derzeit 1,2 Millionen Fachbegriffen trägt zu den guten Suchergebnissen stark bei.
- Source
- Spezialbibliotheken zwischen Auftrag und Ressourcen: 6.-9. September 2005 in München, 30. Arbeits- und Fortbildungstagung der ASpB e.V. / Sektion 5 im Deutschen Bibliotheksverband. Red.: M. Brauer
-
Song, D.; Bruza, P.D.: Towards context sensitive information inference (2003)
0.02
0.018150877 = product of:
0.07260351 = sum of:
0.07260351 = weight(_text_:human in 2428) [ClassicSimilarity], result of:
0.07260351 = score(doc=2428,freq=2.0), product of:
0.30094394 = queryWeight, product of:
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.068911016 = queryNorm
0.2412526 = fieldWeight in 2428, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.0390625 = fieldNorm(doc=2428)
0.25 = coord(1/4)
- Abstract
- Humans can make hasty, but generally robust judgements about what a text fragment is, or is not, about. Such judgements are termed information inference. This article furnishes an account of information inference from a psychologistic stance. By drawing an theories from nonclassical logic and applied cognition, an information inference mechanism is proposed that makes inferences via computations of information flow through an approximation of a conceptual space. Within a conceptual space information is represented geometrically. In this article, geometric representations of words are realized as vectors in a high dimensional semantic space, which is automatically constructed from a text corpus. Two approaches were presented for priming vector representations according to context. The first approach uses a concept combination heuristic to adjust the vector representation of a concept in the light of the representation of another concept. The second approach computes a prototypical concept an the basis of exemplar trace texts and moves it in the dimensional space according to the context. Information inference is evaluated by measuring the effectiveness of query models derived by information flow computations. Results show that information flow contributes significantly to query model effectiveness, particularly with respect to precision. Moreover, retrieval effectiveness compares favorably with two probabilistic query models, and another based an semantic association. More generally, this article can be seen as a contribution towards realizing operational systems that mimic text-based human reasoning.
-
Chen, H.; Lally, A.M.; Zhu, B.; Chau, M.: HelpfulMed : Intelligent searching for medical information over the Internet (2003)
0.02
0.018150877 = product of:
0.07260351 = sum of:
0.07260351 = weight(_text_:human in 2615) [ClassicSimilarity], result of:
0.07260351 = score(doc=2615,freq=2.0), product of:
0.30094394 = queryWeight, product of:
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.068911016 = queryNorm
0.2412526 = fieldWeight in 2615, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.0390625 = fieldNorm(doc=2615)
0.25 = coord(1/4)
- Abstract
- The Medical professionals and researchers need information from reputable sources to accomplish their work. Unfortunately, the Web has a large number of documents that are irrelevant to their work, even those documents that purport to be "medically-related." This paper describes an architecture designed to integrate advanced searching and indexing algorithms, an automatic thesaurus, or "concept space," and Kohonen-based Self-Organizing Map (SOM) technologies to provide searchers with finegrained results. Initial results indicate that these systems provide complementary retrieval functionalities. HelpfulMed not only allows users to search Web pages and other online databases, but also allows them to build searches through the use of an automatic thesaurus and browse a graphical display of medical-related topics. Evaluation results for each of the different components are included. Our spidering algorithm outperformed both breadth-first search and PageRank spiders an a test collection of 100,000 Web pages. The automatically generated thesaurus performed as well as both MeSH and UMLS-systems which require human mediation for currency. Lastly, a variant of the Kohonen SOM was comparable to MeSH terms in perceived cluster precision and significantly better at perceived cluster recall.
-
Ng, K.B.: Toward a theoretical framework for understanding the relationship between situated action and planned action models of behavior in information retrieval contexts : contributions from phenomenology (2002)
0.02
0.018150877 = product of:
0.07260351 = sum of:
0.07260351 = weight(_text_:human in 3588) [ClassicSimilarity], result of:
0.07260351 = score(doc=3588,freq=2.0), product of:
0.30094394 = queryWeight, product of:
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.068911016 = queryNorm
0.2412526 = fieldWeight in 3588, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.0390625 = fieldNorm(doc=3588)
0.25 = coord(1/4)
- Abstract
- In human-computer interaction (HCI), a successful interaction sequence can take its own momentum and drift away from what the user has originally planned. However, this does not mean that planned actions play no important role in the overall performance. In this paper, the author tries to construct a line of argument to demonstrate that it is impossible to consider an action without an a priori plan, even according to the phenomenological position taken for granted by the situated action theory. Based on the phenomenological analysis of problematic situations and typification the author argues that, just like "situated-ness", "planned-ness" of an action should also be understood in the context of the situation. Successful plan can be developed and executed for familiar context. The first part of the paper treats information seeking behavior as a special type of social action and applies Alfred Schutz's phenomenology of sociology to understand the importance and necessity of plan. The second part reports results of a quasi-experiment focusing on plan deviation within an information seeking context. It was found that when the searcher's situation changed from problematic to non-problematic, the degree of plan deviation decreased significantly. These results support the argument proposed in the first part of the paper.
-
Zenz, G.; Zhou, X.; Minack, E.; Siberski, W.; Nejdl, W.: Interactive query construction for keyword search on the Semantic Web (2012)
0.02
0.018150877 = product of:
0.07260351 = sum of:
0.07260351 = weight(_text_:human in 1430) [ClassicSimilarity], result of:
0.07260351 = score(doc=1430,freq=2.0), product of:
0.30094394 = queryWeight, product of:
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.068911016 = queryNorm
0.2412526 = fieldWeight in 1430, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.0390625 = fieldNorm(doc=1430)
0.25 = coord(1/4)
- Abstract
- With the advance of the semantic Web, increasing amounts of data are available in a structured and machine-understandable form. This opens opportunities for users to employ semantic queries instead of simple keyword-based ones to accurately express the information need. However, constructing semantic queries is a demanding task for human users [11]. To compose a valid semantic query, a user has to (1) master a query language (e.g., SPARQL) and (2) acquire sufficient knowledge about the ontology or the schema of the data source. While there are systems which support this task with visual tools [21, 26] or natural language interfaces [3, 13, 14, 18], the process of query construction can still be complex and time consuming. According to [24], users prefer keyword search, and struggle with the construction of semantic queries although being supported with a natural language interface. Several keyword search approaches have already been proposed to ease information seeking on semantic data [16, 32, 35] or databases [1, 31]. However, keyword queries lack the expressivity to precisely describe the user's intent. As a result, ranking can at best put query intentions of the majority on top, making it impossible to take the intentions of all users into consideration.