-
Zhang, J.; Mostafa, J.; Tripathy, H.: Information retrieval by semantic analysis and visualization of the concept space of D-Lib® magazine (2002)
0.06
0.06396042 = product of:
0.12792084 = sum of:
0.094950035 = weight(_text_:java in 2211) [ClassicSimilarity], result of:
0.094950035 = score(doc=2211,freq=2.0), product of:
0.48776937 = queryWeight, product of:
7.0475073 = idf(docFreq=104, maxDocs=44421)
0.06921162 = queryNorm
0.19466174 = fieldWeight in 2211, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
7.0475073 = idf(docFreq=104, maxDocs=44421)
0.01953125 = fieldNorm(doc=2211)
0.032970794 = weight(_text_:however in 2211) [ClassicSimilarity], result of:
0.032970794 = score(doc=2211,freq=2.0), product of:
0.28742972 = queryWeight, product of:
4.1529117 = idf(docFreq=1897, maxDocs=44421)
0.06921162 = queryNorm
0.114709064 = fieldWeight in 2211, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
4.1529117 = idf(docFreq=1897, maxDocs=44421)
0.01953125 = fieldNorm(doc=2211)
0.5 = coord(2/4)
- Abstract
- From the user's perspective, however, it is still difficult to use current information retrieval systems. Users frequently have problems expressing their information needs and translating those needs into queries. This is partly due to the fact that information needs cannot be expressed appropriately in systems terms. It is not unusual for users to input search terms that are different from the index terms information systems use. Various methods have been proposed to help users choose search terms and articulate queries. One widely used approach is to incorporate into the information system a thesaurus-like component that represents both the important concepts in a particular subject area and the semantic relationships among those concepts. Unfortunately, the development and use of thesauri is not without its own problems. The thesaurus employed in a specific information system has often been developed for a general subject area and needs significant enhancement to be tailored to the information system where it is to be used. This thesaurus development process, if done manually, is both time consuming and labor intensive. Usage of a thesaurus in searching is complex and may raise barriers for the user. For illustration purposes, let us consider two scenarios of thesaurus usage. In the first scenario the user inputs a search term and the thesaurus then displays a matching set of related terms. Without an overview of the thesaurus - and without the ability to see the matching terms in the context of other terms - it may be difficult to assess the quality of the related terms in order to select the correct term. In the second scenario the user browses the whole thesaurus, which is organized as in an alphabetically ordered list. The problem with this approach is that the list may be long, and neither does it show users the global semantic relationship among all the listed terms.
- Content
- The JAVA applet is available at <http://ella.slis.indiana.edu/~junzhang/dlib/IV.html>. A prototype of this interface has been developed and is available at <http://ella.slis.indiana.edu/~junzhang/dlib/IV.html>. The D-Lib search interface is available at <http://www.dlib.org/Architext/AT-dlib2query.html>.
-
Arenas, M.; Cuenca Grau, B.; Kharlamov, E.; Marciuska, S.; Zheleznyakov, D.: Faceted search over ontology-enhanced RDF data (2014)
0.03
0.027976645 = product of:
0.11190658 = sum of:
0.11190658 = weight(_text_:however in 3207) [ClassicSimilarity], result of:
0.11190658 = score(doc=3207,freq=4.0), product of:
0.28742972 = queryWeight, product of:
4.1529117 = idf(docFreq=1897, maxDocs=44421)
0.06921162 = queryNorm
0.38933545 = fieldWeight in 3207, product of:
2.0 = tf(freq=4.0), with freq of:
4.0 = termFreq=4.0
4.1529117 = idf(docFreq=1897, maxDocs=44421)
0.046875 = fieldNorm(doc=3207)
0.25 = coord(1/4)
- Abstract
- An increasing number of applications rely on RDF, OWL2, and SPARQL for storing and querying data. SPARQL, however, is not targeted towards end-users, and suitable query interfaces are needed. Faceted search is a prominent approach for end-user data access, and several RDF-based faceted search systems have been developed. There is, however, a lack of rigorous theoretical underpinning for faceted search in the context of RDF and OWL2. In this paper, we provide such solid foundations. We formalise faceted interfaces for this context, identify a fragment of first-order logic capturing the underlying queries, and study the complexity of answering such queries for RDF and OWL2 profiles. We then study interface generation and update, and devise efficiently implementable algorithms. Finally, we have implemented and tested our faceted search algorithms for scalability, with encouraging results.
-
Kruschwitz, U.; AI-Bakour, H.: Users want more sophisticated search assistants : results of a task-based evaluation (2005)
0.02
0.023313873 = product of:
0.09325549 = sum of:
0.09325549 = weight(_text_:however in 5575) [ClassicSimilarity], result of:
0.09325549 = score(doc=5575,freq=4.0), product of:
0.28742972 = queryWeight, product of:
4.1529117 = idf(docFreq=1897, maxDocs=44421)
0.06921162 = queryNorm
0.32444623 = fieldWeight in 5575, product of:
2.0 = tf(freq=4.0), with freq of:
4.0 = termFreq=4.0
4.1529117 = idf(docFreq=1897, maxDocs=44421)
0.0390625 = fieldNorm(doc=5575)
0.25 = coord(1/4)
- Abstract
- The Web provides a massive knowledge source, as do intranets and other electronic document collections. However, much of that knowledge is encoded implicitly and cannot be applied directly without processing into some more appropriate structures. Searching, browsing, question answering, for example, could all benefit from domain-specific knowledge contained in the documents, and in applications such as simple search we do not actually need very "deep" knowledge structures such as ontologies, but we can get a long way with a model of the domain that consists of term hierarchies. We combine domain knowledge automatically acquired by exploiting the documents' markup structure with knowledge extracted an the fly to assist a user with ad hoc search requests. Such a search system can suggest query modification options derived from the actual data and thus guide a user through the space of documents. This article gives a detailed account of a task-based evaluation that compares a search system that uses the outlined domain knowledge with a standard search system. We found that users do use the query modification suggestions proposed by the system. The main conclusion we can draw from this evaluation, however, is that users prefer a system that can suggest query modifications over a standard search engine, which simply presents a ranked list of documents. Most interestingly, we observe this user preference despite the fact that the baseline system even performs slightly better under certain criteria.
-
Zenz, G.; Zhou, X.; Minack, E.; Siberski, W.; Nejdl, W.: Interactive query construction for keyword search on the Semantic Web (2012)
0.02
0.023313873 = product of:
0.09325549 = sum of:
0.09325549 = weight(_text_:however in 1430) [ClassicSimilarity], result of:
0.09325549 = score(doc=1430,freq=4.0), product of:
0.28742972 = queryWeight, product of:
4.1529117 = idf(docFreq=1897, maxDocs=44421)
0.06921162 = queryNorm
0.32444623 = fieldWeight in 1430, product of:
2.0 = tf(freq=4.0), with freq of:
4.0 = termFreq=4.0
4.1529117 = idf(docFreq=1897, maxDocs=44421)
0.0390625 = fieldNorm(doc=1430)
0.25 = coord(1/4)
- Abstract
- With the advance of the semantic Web, increasing amounts of data are available in a structured and machine-understandable form. This opens opportunities for users to employ semantic queries instead of simple keyword-based ones to accurately express the information need. However, constructing semantic queries is a demanding task for human users [11]. To compose a valid semantic query, a user has to (1) master a query language (e.g., SPARQL) and (2) acquire sufficient knowledge about the ontology or the schema of the data source. While there are systems which support this task with visual tools [21, 26] or natural language interfaces [3, 13, 14, 18], the process of query construction can still be complex and time consuming. According to [24], users prefer keyword search, and struggle with the construction of semantic queries although being supported with a natural language interface. Several keyword search approaches have already been proposed to ease information seeking on semantic data [16, 32, 35] or databases [1, 31]. However, keyword queries lack the expressivity to precisely describe the user's intent. As a result, ranking can at best put query intentions of the majority on top, making it impossible to take the intentions of all users into consideration.
-
Chang, C.-H.; Hsu, C.-C.: Integrating query expansion and conceptual relevance feedback for personalized Web information retrieval (1998)
0.02
0.023079555 = product of:
0.09231822 = sum of:
0.09231822 = weight(_text_:however in 2319) [ClassicSimilarity], result of:
0.09231822 = score(doc=2319,freq=2.0), product of:
0.28742972 = queryWeight, product of:
4.1529117 = idf(docFreq=1897, maxDocs=44421)
0.06921162 = queryNorm
0.32118538 = fieldWeight in 2319, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
4.1529117 = idf(docFreq=1897, maxDocs=44421)
0.0546875 = fieldNorm(doc=2319)
0.25 = coord(1/4)
- Abstract
- Keyword based querying has been an immediate and efficient way to specify and retrieve related information that the user inquired. However, conventional document ranking based on an automatic assessment of document relevance to the query may not be the best approach when little information is given. Proposes an idea to integrate 2 existing techniques, query expansion and relevance feedback to achieve a concept-based information search for the Web
-
Bean, C.: ¬The semantics of hierarchy : explicit parent-child relationships in MeSH tree structures (1998)
0.02
0.023079555 = product of:
0.09231822 = sum of:
0.09231822 = weight(_text_:however in 1042) [ClassicSimilarity], result of:
0.09231822 = score(doc=1042,freq=2.0), product of:
0.28742972 = queryWeight, product of:
4.1529117 = idf(docFreq=1897, maxDocs=44421)
0.06921162 = queryNorm
0.32118538 = fieldWeight in 1042, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
4.1529117 = idf(docFreq=1897, maxDocs=44421)
0.0546875 = fieldNorm(doc=1042)
0.25 = coord(1/4)
- Abstract
- Parent-Child relationships in MeSH trees were surveyed and described, and their patterns in the relational structure were determined for selected broad subject categories and subcategories. Is-a relationships dominated and were more prevalent overall than previously reported; however, an additional 67 different relationships were also seen, most of them nonhierarchical. Relational profiles were found to vary both within and among subject subdomains, but tended to display characteristic domain patterns. The implications for inferential reasoning and other cognitive and computational operations on hierarchical structures are considered
-
Lobin, H.; Witt, A.: Semantic and thematic navigation in electronic encyclopedias (1999)
0.02
0.023079555 = product of:
0.09231822 = sum of:
0.09231822 = weight(_text_:however in 1624) [ClassicSimilarity], result of:
0.09231822 = score(doc=1624,freq=2.0), product of:
0.28742972 = queryWeight, product of:
4.1529117 = idf(docFreq=1897, maxDocs=44421)
0.06921162 = queryNorm
0.32118538 = fieldWeight in 1624, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
4.1529117 = idf(docFreq=1897, maxDocs=44421)
0.0546875 = fieldNorm(doc=1624)
0.25 = coord(1/4)
- Abstract
- In the field of electronic publishing, encyclopedias represent a unique sort of text for investigating advanced methods of navigation. The user of an electronic excyclopedia normally expects special methods for accessing the entries in an encyclopedia database. Navigation through printed encyclopedias in the traditional sense focuses on the alphabetic order of the entries. In electronic encyclopedias, however, thematic structuring of lemmas and, of course, extensive (hyper-) linking mechanisms have been added. This paper will focus on showing developments, which go beyond these navigational strucutres. We will concentrate on the semantic space formed by lemmas to build a network of semantic distances and thematic trails through the encyclopedia
-
Hu, K.; Luo, Q.; Qi, K.; Yang, S.; Mao, J.; Fu, X.; Zheng, J.; Wu, H.; Guo, Y.; Zhu, Q.: Understanding the topic evolution of scientific literatures like an evolving city : using Google Word2Vec model and spatial autocorrelation analysis (2019)
0.02
0.022842837 = product of:
0.09137135 = sum of:
0.09137135 = weight(_text_:however in 102) [ClassicSimilarity], result of:
0.09137135 = score(doc=102,freq=6.0), product of:
0.28742972 = queryWeight, product of:
4.1529117 = idf(docFreq=1897, maxDocs=44421)
0.06921162 = queryNorm
0.3178911 = fieldWeight in 102, product of:
2.4494898 = tf(freq=6.0), with freq of:
6.0 = termFreq=6.0
4.1529117 = idf(docFreq=1897, maxDocs=44421)
0.03125 = fieldNorm(doc=102)
0.25 = coord(1/4)
- Abstract
- Topic evolution has been described by many approaches from a macro level to a detail level, by extracting topic dynamics from text in literature and other media types. However, why the evolution happens is less studied. In this paper, we focus on whether and how the keyword semantics can invoke or affect the topic evolution. We assume that the semantic relatedness among the keywords can affect topic popularity during literature surveying and citing process, thus invoking evolution. However, the assumption is needed to be confirmed in an approach that fully considers the semantic interactions among topics. Traditional topic evolution analyses in scientometric domains cannot provide such support because of using limited semantic meanings. To address this problem, we apply the Google Word2Vec, a deep learning language model, to enhance the keywords with more complete semantic information. We further develop the semantic space as an urban geographic space. We analyze the topic evolution geographically using the measures of spatial autocorrelation, as if keywords are the changing lands in an evolving city. The keyword citations (keyword citation counts one when the paper containing this keyword obtains a citation) are used as an indicator of keyword popularity. Using the bibliographical datasets of the geographical natural hazard field, experimental results demonstrate that in some local areas, the popularity of keywords is affecting that of the surrounding keywords. However, there are no significant impacts on the evolution of all keywords. The spatial autocorrelation analysis identifies the interaction patterns (including High-High leading, High-Low suppressing) among the keywords in local areas. This approach can be regarded as an analyzing framework borrowed from geospatial modeling. Moreover, the prediction results in local areas are demonstrated to be more accurate if considering the spatial autocorrelations.
-
Schek, M.: Automatische Klassifizierung und Visualisierung im Archiv der Süddeutschen Zeitung (2005)
0.02
0.020289319 = product of:
0.081157275 = sum of:
0.081157275 = weight(_text_:und in 5884) [ClassicSimilarity], result of:
0.081157275 = score(doc=5884,freq=76.0), product of:
0.15350439 = queryWeight, product of:
2.217899 = idf(docFreq=13141, maxDocs=44421)
0.06921162 = queryNorm
0.5286968 = fieldWeight in 5884, product of:
8.717798 = tf(freq=76.0), with freq of:
76.0 = termFreq=76.0
2.217899 = idf(docFreq=13141, maxDocs=44421)
0.02734375 = fieldNorm(doc=5884)
0.25 = coord(1/4)
- Abstract
- Die Süddeutsche Zeitung (SZ) verfügt seit ihrer Gründung 1945 über ein Pressearchiv, das die Texte der eigenen Redakteure und zahlreicher nationaler und internationaler Publikationen dokumentiert und auf Anfrage für Recherchezwecke bereitstellt. Die Einführung der EDV begann Anfang der 90er Jahre mit der digitalen Speicherung zunächst der SZ-Daten. Die technische Weiterentwicklung ab Mitte der 90er Jahre diente zwei Zielen: (1) dem vollständigen Wechsel von der Papierablage zur digitalen Speicherung und (2) dem Wandel von einer verlagsinternen Dokumentations- und Auskunftsstelle zu einem auch auf dem Markt vertretenen Informationsdienstleister. Um die dabei entstehenden Aufwände zu verteilen und gleichzeitig Synergieeffekte zwischen inhaltlich verwandten Archiven zu erschließen, gründeten der Süddeutsche Verlag und der Bayerische Rundfunk im Jahr 1998 die Dokumentations- und Informationszentrum (DIZ) München GmbH, in der die Pressearchive der beiden Gesellschafter und das Bildarchiv des Süddeutschen Verlags zusammengeführt wurden. Die gemeinsam entwickelte Pressedatenbank ermöglichte das standortübergreifende Lektorat, die browserbasierte Recherche für Redakteure und externe Kunden im Intraund Internet und die kundenspezifischen Content Feeds für Verlage, Rundfunkanstalten und Portale. Die DIZPressedatenbank enthält zur Zeit 6,9 Millionen Artikel, die jeweils als HTML oder PDF abrufbar sind. Täglich kommen ca. 3.500 Artikel hinzu, von denen ca. 1.000 lektoriert werden. Das Lektorat erfolgt im DIZ nicht durch die Vergabe von Schlagwörtern am Dokument, sondern durch die Verlinkung der Artikel mit "virtuellen Mappen", den Dossiers. Diese stellen die elektronische Repräsentation einer Papiermappe dar und sind das zentrale Erschließungsobjekt. Im Gegensatz zu statischen Klassifikationssystemen ist die Dossierstruktur dynamisch und aufkommensabhängig, d.h. neue Dossiers werden hauptsächlich anhand der aktuellen Berichterstattung erstellt. Insgesamt enthält die DIZ-Pressedatenbank ca. 90.000 Dossiers, davon sind 68.000 Sachthemen (Topics), Personen und Institutionen. Die Dossiers sind untereinander zum "DIZ-Wissensnetz" verlinkt.
DIZ definiert das Wissensnetz als Alleinstellungsmerkmal und wendet beträchtliche personelle Ressourcen für die Aktualisierung und Oualitätssicherung der Dossiers auf. Nach der Umstellung auf den komplett digitalisierten Workflow im April 2001 identifizierte DIZ vier Ansatzpunkte, wie die Aufwände auf der Inputseite (Lektorat) zu optimieren sind und gleichzeitig auf der Outputseite (Recherche) das Wissensnetz besser zu vermarkten ist: 1. (Teil-)Automatische Klassifizierung von Pressetexten (Vorschlagwesen) 2. Visualisierung des Wissensnetzes (Topic Mapping) 3. (Voll-)Automatische Klassifizierung und Optimierung des Wissensnetzes 4. Neue Retrievalmöglichkeiten (Clustering, Konzeptsuche) Die Projekte 1 und 2 "Automatische Klassifizierung und Visualisierung" starteten zuerst und wurden beschleunigt durch zwei Entwicklungen: - Der Bayerische Rundfunk (BR), ursprünglich Mitbegründer und 50%-Gesellschafter der DIZ München GmbH, entschloss sich aus strategischen Gründen, zum Ende 2003 aus der Kooperation auszusteigen. - Die Medienkrise, hervorgerufen durch den massiven Rückgang der Anzeigenerlöse, erforderte auch im Süddeutschen Verlag massive Einsparungen und die Suche nach neuen Erlösquellen. Beides führte dazu, dass die Kapazitäten im Bereich Pressedokumentation von ursprünglich rund 20 (nur SZ, ohne BR-Anteil) auf rund 13 zum 1. Januar 2004 sanken und gleichzeitig die Aufwände für die Pflege des Wissensnetzes unter verstärkten Rechtfertigungsdruck gerieten. Für die Projekte 1 und 2 ergaben sich daraus drei quantitative und qualitative Ziele: - Produktivitätssteigerung im Lektorat - Konsistenzverbesserung im Lektorat - Bessere Vermarktung und intensivere Nutzung der Dossiers in der Recherche Alle drei genannten Ziele konnten erreicht werden, wobei insbesondere die Produktivität im Lektorat gestiegen ist. Die Projekte 1 und 2 "Automatische Klassifizierung und Visualisierung" sind seit Anfang 2004 erfolgreich abgeschlossen. Die Folgeprojekte 3 und 4 laufen seit Mitte 2004 und sollen bis Mitte 2005 abgeschlossen sein. Im folgenden wird in Abschnitt 2 die Produktauswahl und Arbeitsweise der Automatischen Klassifizierung beschrieben. Abschnitt 3 schildert den Einsatz der Wissensnetz-Visualisierung in Lektorat und Recherche. Abschnitt 4 fasst die Ergebnisse der Projekte 1 und 2 zusammen und gibt einen Ausblick auf die Ziele der Projekte 3 und 4.
-
Poynder, R.: Web research engines? (1996)
0.02
0.019782476 = product of:
0.079129905 = sum of:
0.079129905 = weight(_text_:however in 6698) [ClassicSimilarity], result of:
0.079129905 = score(doc=6698,freq=2.0), product of:
0.28742972 = queryWeight, product of:
4.1529117 = idf(docFreq=1897, maxDocs=44421)
0.06921162 = queryNorm
0.27530175 = fieldWeight in 6698, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
4.1529117 = idf(docFreq=1897, maxDocs=44421)
0.046875 = fieldNorm(doc=6698)
0.25 = coord(1/4)
- Abstract
- Describes the shortcomings of search engines for the WWW comparing their current capabilities to those of the first generation CD-ROM products. Some allow phrase searching and most are improving their Boolean searching. Few allow truncation, wild cards or nested logic. They are stateless, losing previous search criteria. Unlike the indexing and classification systems for today's CD-ROMs, those for Web pages are random, unstructured and of variable quality. Considers that at best Web search engines can only offer free text searching. Discusses whether automatic data classification systems such as Infoseek Ultra can overcome the haphazard nature of the Web with neural network technology, and whether Boolean search techniques may be redundant when replaced by technology such as the Euroferret search engine. However, artificial intelligence is rarely successful on huge, varied databases. Relevance ranking and automatic query expansion still use the same simple inverted indexes. Most Web search engines do nothing more than word counting. Further complications arise with foreign languages
-
Nie, J.-Y.: Query expansion and query translation as logical inference (2003)
0.02
0.019782476 = product of:
0.079129905 = sum of:
0.079129905 = weight(_text_:however in 2425) [ClassicSimilarity], result of:
0.079129905 = score(doc=2425,freq=2.0), product of:
0.28742972 = queryWeight, product of:
4.1529117 = idf(docFreq=1897, maxDocs=44421)
0.06921162 = queryNorm
0.27530175 = fieldWeight in 2425, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
4.1529117 = idf(docFreq=1897, maxDocs=44421)
0.046875 = fieldNorm(doc=2425)
0.25 = coord(1/4)
- Abstract
- A number of studies have examined the problems of query expansion in monolingual Information Retrieval (IR), and query translation for crosslanguage IR. However, no link has been made between them. This article first shows that query translation is a special case of query expansion. There is also another set of studies an inferential IR. Again, there is no relationship established with query translation or query expansion. The second claim of this article is that logical inference is a general form that covers query expansion and query translation. This analysis provides a unified view of different subareas of IR. We further develop the inferential IR approach in two particular contexts: using fuzzy logic and probability theory. The evaluation formulas obtained are shown to strongly correspond to those used in other IR models. This indicates that inference is indeed the core of advanced IR.
-
Zazo, A.F.; Figuerola, C.G.; Berrocal, J.L.A.; Rodriguez, E.: Reformulation of queries using similarity-thesauri (2005)
0.02
0.019782476 = product of:
0.079129905 = sum of:
0.079129905 = weight(_text_:however in 2043) [ClassicSimilarity], result of:
0.079129905 = score(doc=2043,freq=2.0), product of:
0.28742972 = queryWeight, product of:
4.1529117 = idf(docFreq=1897, maxDocs=44421)
0.06921162 = queryNorm
0.27530175 = fieldWeight in 2043, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
4.1529117 = idf(docFreq=1897, maxDocs=44421)
0.046875 = fieldNorm(doc=2043)
0.25 = coord(1/4)
- Abstract
- One of the major problems in information retrieval is the formulation of queries on the part of the user. This entails specifying a set of words or terms that express their informational need. However, it is well-known that two people can assign different terms to refer to the same concepts. The techniques that attempt to reduce this problem as much as possible generally start from a first search, and then study how the initial query can be modified to obtain better results. In general, the construction of the new query involves expanding the terms of the initial query and recalculating the importance of each term in the expanded query. Depending on the technique used to formulate the new query several strategies are distinguished. These strategies are based on the idea that if two terms are similar (with respect to any criterion), the documents in which both terms appear frequently will also be related. The technique we used in this study is known as query expansion using similarity thesauri.
-
Wang, Y.-H.; Jhuo, P.-S.: ¬A semantic faceted search with rule-based inference (2009)
0.02
0.019782476 = product of:
0.079129905 = sum of:
0.079129905 = weight(_text_:however in 1540) [ClassicSimilarity], result of:
0.079129905 = score(doc=1540,freq=2.0), product of:
0.28742972 = queryWeight, product of:
4.1529117 = idf(docFreq=1897, maxDocs=44421)
0.06921162 = queryNorm
0.27530175 = fieldWeight in 1540, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
4.1529117 = idf(docFreq=1897, maxDocs=44421)
0.046875 = fieldNorm(doc=1540)
0.25 = coord(1/4)
- Abstract
- Semantic Search has become an active research of Semantic Web in recent years. The classification methodology plays a pretty critical role in the beginning of search process to disambiguate irrelevant information. However, the applications related to Folksonomy suffer from many obstacles. This study attempts to eliminate the problems resulted from Folksonomy using existing semantic technology. We also focus on how to effectively integrate heterogeneous ontologies over the Internet to acquire the integrity of domain knowledge. A faceted logic layer is abstracted in order to strengthen category framework and organize existing available ontologies according to a series of steps based on the methodology of faceted classification and ontology construction. The result showed that our approach can facilitate the integration of inconsistent or even heterogeneous ontologies. This paper also generalizes the principles of picking appropriate facets with which our facet browser completely complies so that better semantic search result can be obtained.
-
Narock, T.; Zhou, L.; Yoon, V.: Semantic similarity of ontology instances using polarity mining (2013)
0.02
0.019782476 = product of:
0.079129905 = sum of:
0.079129905 = weight(_text_:however in 1620) [ClassicSimilarity], result of:
0.079129905 = score(doc=1620,freq=2.0), product of:
0.28742972 = queryWeight, product of:
4.1529117 = idf(docFreq=1897, maxDocs=44421)
0.06921162 = queryNorm
0.27530175 = fieldWeight in 1620, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
4.1529117 = idf(docFreq=1897, maxDocs=44421)
0.046875 = fieldNorm(doc=1620)
0.25 = coord(1/4)
- Abstract
- Semantic similarity is vital to many areas, such as information retrieval. Various methods have been proposed with a focus on comparing unstructured text documents. Several of these have been enhanced with ontology; however, they have not been applied to ontology instances. With the growth in ontology instance data published online through, for example, Linked Open Data, there is an increasing need to apply semantic similarity to ontology instances. Drawing on ontology-supported polarity mining (OSPM), we propose an algorithm that enhances the computation of semantic similarity with polarity mining techniques. The algorithm is evaluated with online customer review data. The experimental results show that the proposed algorithm outperforms the baseline algorithm in multiple settings.
-
Jindal, V.; Bawa, S.; Batra, S.: ¬A review of ranking approaches for semantic search on Web (2014)
0.02
0.019782476 = product of:
0.079129905 = sum of:
0.079129905 = weight(_text_:however in 3799) [ClassicSimilarity], result of:
0.079129905 = score(doc=3799,freq=2.0), product of:
0.28742972 = queryWeight, product of:
4.1529117 = idf(docFreq=1897, maxDocs=44421)
0.06921162 = queryNorm
0.27530175 = fieldWeight in 3799, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
4.1529117 = idf(docFreq=1897, maxDocs=44421)
0.046875 = fieldNorm(doc=3799)
0.25 = coord(1/4)
- Abstract
- With ever increasing information being available to the end users, search engines have become the most powerful tools for obtaining useful information scattered on the Web. However, it is very common that even most renowned search engines return result sets with not so useful pages to the user. Research on semantic search aims to improve traditional information search and retrieval methods where the basic relevance criteria rely primarily on the presence of query keywords within the returned pages. This work is an attempt to explore different relevancy ranking approaches based on semantics which are considered appropriate for the retrieval of relevant information. In this paper, various pilot projects and their corresponding outcomes have been investigated based on methodologies adopted and their most distinctive characteristics towards ranking. An overview of selected approaches and their comparison by means of the classification criteria has been presented. With the help of this comparison, some common concepts and outstanding features have been identified.
-
Li, N.; Sun, J.: Improving Chinese term association from the linguistic perspective (2017)
0.02
0.019782476 = product of:
0.079129905 = sum of:
0.079129905 = weight(_text_:however in 4381) [ClassicSimilarity], result of:
0.079129905 = score(doc=4381,freq=2.0), product of:
0.28742972 = queryWeight, product of:
4.1529117 = idf(docFreq=1897, maxDocs=44421)
0.06921162 = queryNorm
0.27530175 = fieldWeight in 4381, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
4.1529117 = idf(docFreq=1897, maxDocs=44421)
0.046875 = fieldNorm(doc=4381)
0.25 = coord(1/4)
- Abstract
- The study aims to solve how to construct the semantic relations of specific domain terms by applying linguistic rules. The semantic structure analysis at the morpheme level was used for semantic measure, and a morpheme-based term association model was proposed by improving and combining the literal-based similarity algorithm and co-occurrence relatedness methods. This study provides a novel insight into the method of semantic analysis and calculation by morpheme parsing, and the proposed solution is feasible for the automatic association of compound terms. The results show that this approach could be used to construct appropriate term association and form a reasonable structural knowledge graph. However, due to linguistic differences, the viability and effectiveness of the use of our method in non-Chinese linguistic environments should be verified.
-
Hauer, M.: Neue OPACs braucht das Land ... dandelon.com (2006)
0.02
0.019545622 = product of:
0.07818249 = sum of:
0.07818249 = weight(_text_:und in 47) [ClassicSimilarity], result of:
0.07818249 = score(doc=47,freq=24.0), product of:
0.15350439 = queryWeight, product of:
2.217899 = idf(docFreq=13141, maxDocs=44421)
0.06921162 = queryNorm
0.50931764 = fieldWeight in 47, product of:
4.8989797 = tf(freq=24.0), with freq of:
24.0 = termFreq=24.0
2.217899 = idf(docFreq=13141, maxDocs=44421)
0.046875 = fieldNorm(doc=47)
0.25 = coord(1/4)
- Abstract
- In dandelon.com werden im Gegensatz zu den bisherigen Federated Search-Portal-Ansätzen die Titel von Medien neu mittels intelligentCAPTURE dezentral und kollaborativ erschlossen und inhaltlich stark erweitert. intelligentCAPTURE erschließt maschinell bisher Buchinhaltsverzeichnisse, Bücher, Klappentexte, Aufsätze und Websites, übernimmt bibliografische Daten aus Bibliotheken (XML, Z.39.50), von Verlagen (ONIX + Cover Pages), Zeitschriftenagenturen (Swets) und Buchhandel (SOAP) und exportierte maschinelle Indexate und aufbereitete Dokumente an die Bibliothekskataloge (MAB, MARC, XML) oder Dokumentationssysteme, an dandelon.com und teils auch an Fachportale. Die Daten werden durch Scanning und OCR, durch Import von Dateien und Lookup auf Server und durch Web-Spidering/-Crawling gewonnen. Die Qualität der Suche in dandelon.com ist deutlich besser als in bisherigen Bibliothekssystemen. Die semantische, multilinguale Suche mit derzeit 1,2 Millionen Fachbegriffen trägt zu den guten Suchergebnissen stark bei.
- Source
- Spezialbibliotheken zwischen Auftrag und Ressourcen: 6.-9. September 2005 in München, 30. Arbeits- und Fortbildungstagung der ASpB e.V. / Sektion 5 im Deutschen Bibliotheksverband. Red.: M. Brauer
-
Brambilla, M.; Ceri, S.: Designing exploratory search applications upon Web data sources (2012)
0.02
0.018651098 = product of:
0.07460439 = sum of:
0.07460439 = weight(_text_:however in 1428) [ClassicSimilarity], result of:
0.07460439 = score(doc=1428,freq=4.0), product of:
0.28742972 = queryWeight, product of:
4.1529117 = idf(docFreq=1897, maxDocs=44421)
0.06921162 = queryNorm
0.25955698 = fieldWeight in 1428, product of:
2.0 = tf(freq=4.0), with freq of:
4.0 = termFreq=4.0
4.1529117 = idf(docFreq=1897, maxDocs=44421)
0.03125 = fieldNorm(doc=1428)
0.25 = coord(1/4)
- Abstract
- Search is the preferred method to access information in today's computing systems. The Web, accessed through search engines, is universally recognized as the source for answering users' information needs. However, offering a link to a Web page does not cover all information needs. Even simple problems, such as "Which theater offers an at least three-stars action movie in London close to a good Italian restaurant," can only be solved by searching the Web multiple times, e.g., by extracting a list of the recent action movies filtered by ranking, then looking for movie theaters, then looking for Italian restaurants close to them. While search engines hint to useful information, the user's brain is the fundamental platform for information integration. An important trend is the availability of new, specialized data sources-the so-called "long tail" of the Web of data. Such carefully collected and curated data sources can be much more valuable than information currently available in Web pages; however, many sources remain hidden or insulated, in the lack of software solutions for bringing them to surface and making them usable in the search context. A new class of tailor-made systems, designed to satisfy the needs of users with specific aims, will support the publishing and integration of data sources for vertical domains; the user will be able to select sources based on individual or collective trust, and systems will be able to route queries to such sources and to provide easyto-use interfaces for combining them within search strategies, at the same time, rewarding the data source owners for each contribution to effective search. Efforts such as Google's Fusion Tables show that the technology for bringing hidden data sources to surface is feasible.
-
Schmitz-Esser, W.: EXPO-INFO 2000 : Visuelles Besucherinformationssystem für Weltausstellungen (2000)
0.02
0.016953107 = product of:
0.06781243 = sum of:
0.06781243 = weight(_text_:und in 2404) [ClassicSimilarity], result of:
0.06781243 = score(doc=2404,freq=26.0), product of:
0.15350439 = queryWeight, product of:
2.217899 = idf(docFreq=13141, maxDocs=44421)
0.06921162 = queryNorm
0.44176215 = fieldWeight in 2404, product of:
5.0990195 = tf(freq=26.0), with freq of:
26.0 = termFreq=26.0
2.217899 = idf(docFreq=13141, maxDocs=44421)
0.0390625 = fieldNorm(doc=2404)
0.25 = coord(1/4)
- Abstract
- Das aktuelle Wissen der Welt im Spiegel einer Weltausstellung: Wie stellt man das dar und wie macht man es Interessierten zugänglich - in der Ausstellung, in Publikationen, im Funk und über das Internet? Was man alles auf einer Weltausstellung an der Schwelle zum dritten Jahrtausend sehen und erfahren kann, sprengt in Fülle und Vielfalt jeden individuell faßbaren Rahmen. Schmitz-Esser zeigt in seinem Buch, wie der Besucher wahlweise in vier Sprachen die Weltausstellung erleben und die Quintessenz davon mitnehmen kann. Ermöglicht wird dies durch das Konzept des virtuellen "Wissens in der Kapsel", das so aufbereitet ist, daß es in allen gängigen medialen Formen und für unterschiedlichste Wege der Aneignung eingesetzt werden kann. Die Lösung ist nicht nur eine Sache der Informatik und Informationstechnologie, sondern vielmehr auch eine Herausforderung an Informationswissenschaft und Computerlinguistik. Das Buch stellt Ziel, Ansatz, Komponenten und Voraussetzungen dafür dar.
- Content
- Willkommene Anregung schon am Eingang.- Vertiefung des Wissens während der Ausstellung.- Alles für das Wohlbefinden.- Die Systemstruktur und ihre einzelnen Elemente.- Wovon alles ausgeht.- Den Stoff als Topics und Subtopics strukturieren.- Die Nutshells.- Der Proxy-Text.Der Thesaurus.- Gedankenraumreisen.- Und zurück in die reale Welt.- Weitergehende Produkte.- Das EXPO-Infosystem auf einen Blick.- Register.- Literaturverzeichnis.
- Theme
- Konzeption und Anwendung des Prinzips Thesaurus
-
Rahmstorf, G.: Integriertes Management inhaltlicher Datenarten (2001)
0.02
0.016927004 = product of:
0.067708015 = sum of:
0.067708015 = weight(_text_:und in 6856) [ClassicSimilarity], result of:
0.067708015 = score(doc=6856,freq=18.0), product of:
0.15350439 = queryWeight, product of:
2.217899 = idf(docFreq=13141, maxDocs=44421)
0.06921162 = queryNorm
0.44108194 = fieldWeight in 6856, product of:
4.2426405 = tf(freq=18.0), with freq of:
18.0 = termFreq=18.0
2.217899 = idf(docFreq=13141, maxDocs=44421)
0.046875 = fieldNorm(doc=6856)
0.25 = coord(1/4)
- Abstract
- Inhaltliche Daten sind im Unterschied zu Messdaten, Zahlen, Analogsignalen und anderen Informationen solche Daten, die sich auch sprachlich interpretieren lassen. Sie transportieren Inhalte, die sich benennen lassen. Zu inhaltlichen Daten gehören z. B. Auftragsdaten, Werbetexte, Produktbezeichnungen und Patentklassifikationen. Die meisten Daten, die im Internet kommuniziert werden, sind inhaltliche Daten. Man kann inhaltliche Daten in vier Klassen einordnen: * Wissensdaten - formatierte Daten (Fakten u. a. Daten in strukturierter Form), - nichtformatierte Daten (vorwiegend Texte); * Zugriffsdaten - Benennungsdaten (Wortschatz, Terminologie, Themen u. a.), - Begriffsdaten (Ordnungs- und Bedeutungsstrukturen). In der Wissensorganisation geht es hauptsächlich darum, die unüberschaubare Fülle des Wissens zu ordnen und wiederauffindbar zu machen. Daher befasst sich das Fach nicht nur mit dem Wissen selbst, selbst sondern auch mit den Mitteln, die dazu verwendet werden, das Wissen zu ordnen und auffindbar zu machen
- Series
- Tagungen der Deutschen Gesellschaft für Informationswissenschaft und Informationspraxis; 4
- Source
- Information Research & Content Management: Orientierung, Ordnung und Organisation im Wissensmarkt; 23. DGI-Online-Tagung der DGI und 53. Jahrestagung der Deutschen Gesellschaft für Informationswissenschaft und Informationspraxis e.V. DGI, Frankfurt am Main, 8.-10.5.2001. Proceedings. Hrsg.: R. Schmidt