Search (13355 results, page 13 of 668)

  1. Cavalcante Dourado, Í.; Galante, R.; Gonçalves, M.A.; Silva Torres, R. de: Bag of textual graphs (BoTG) : a general graph-based text representation model (2019) 0.04
    0.038618177 = product of:
      0.15447271 = sum of:
        0.15447271 = weight(_text_:handle in 291) [ClassicSimilarity], result of:
          0.15447271 = score(doc=291,freq=2.0), product of:
            0.42740422 = queryWeight, product of:
              6.5424123 = idf(docFreq=173, maxDocs=44421)
              0.06532823 = queryNorm
            0.36142063 = fieldWeight in 291, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              6.5424123 = idf(docFreq=173, maxDocs=44421)
              0.0390625 = fieldNorm(doc=291)
      0.25 = coord(1/4)
    
    Abstract
    Text representation models are the fundamental basis for information retrieval and text mining tasks. Although different text models have been proposed, they typically target specific task aspects in isolation, such as time efficiency, accuracy, or applicability for different scenarios. Here we present Bag of Textual Graphs (BoTG), a general text representation model that addresses these three requirements at the same time. The proposed textual representation is based on a graph-based scheme that encodes term proximity and term ordering, and represents text documents into an efficient vector space that addresses all these aspects as well as provides discriminative textual patterns. Extensive experiments are conducted in two experimental scenarios-classification and retrieval-considering multiple well-known text collections. We also compare our model against several methods from the literature. Experimental results demonstrate that our model is generic enough to handle different tasks and collections. It is also more efficient than the widely used state-of-the-art methods in textual classification and retrieval tasks, with a competitive effectiveness, sometimes with gains by large margins.
  2. Saarti, J.: Fictional literature : classification and indexing (2019) 0.04
    0.038618177 = product of:
      0.15447271 = sum of:
        0.15447271 = weight(_text_:handle in 315) [ClassicSimilarity], result of:
          0.15447271 = score(doc=315,freq=2.0), product of:
            0.42740422 = queryWeight, product of:
              6.5424123 = idf(docFreq=173, maxDocs=44421)
              0.06532823 = queryNorm
            0.36142063 = fieldWeight in 315, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              6.5424123 = idf(docFreq=173, maxDocs=44421)
              0.0390625 = fieldNorm(doc=315)
      0.25 = coord(1/4)
    
    Abstract
    Fiction content analysis and retrieval are interesting specific topics for two major reasons: 1) the extensive use of fictional works; and, 2) the multimodality and interpretational nature of fiction. The primary challenge in the analysis of fictional content is that there is no single meaning to be analysed; the analysis is an ongoing process involving an interaction between the text produced by author, the reader and the society in which the interaction occurs. Furthermore, different audiences have specific needs to be taken into consideration. This article explores the topic of fiction knowledge organization, including both classification and indexing. It provides a broad and analytical overview of the literature as well as describing several experimental approaches and developmental projects for the analysis of fictional content. Traditional fiction indexing has been mainly based on the factual aspects of the work; this has then been expanded to handle different aspects of the fictional work. There have been attempts made to develop vocabularies for fiction indexing. All the major classification schemes use the genre and language/culture of fictional works when subdividing fictional works into subclasses. The evolution of shelf classification of fiction and the appearance of different types of digital tools have revolutionized the classification of fiction, making it possible to integrate both indexing and classification of fictional works.
  3. Bidwell, S.: Curiosities of light and sight (1899) 0.04
    0.038618177 = product of:
      0.15447271 = sum of:
        0.15447271 = weight(_text_:handle in 783) [ClassicSimilarity], result of:
          0.15447271 = score(doc=783,freq=2.0), product of:
            0.42740422 = queryWeight, product of:
              6.5424123 = idf(docFreq=173, maxDocs=44421)
              0.06532823 = queryNorm
            0.36142063 = fieldWeight in 783, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              6.5424123 = idf(docFreq=173, maxDocs=44421)
              0.0390625 = fieldNorm(doc=783)
      0.25 = coord(1/4)
    
    Abstract
    The following chapters are based upon notes of several unconnected lectures addressed to audiences of very different classes in the theatres of the Royal Institution, the London Institution, the Leeds Philosophical and Literary Society, and Caius House, Battersea. In preparing the notes for publication the matter has been re-arranged with the object of presenting it, as far as might be, in methodical order; additions and omissions have been freely made, and numerous diagrams, illustrative of the apparatus and experiments described, have been provided. I do not know that any apology is needed for offering the collection as thus re-modelled to a larger public. Though the essays are, for the most part, of a popular and informal character, they touch upon a number of curious matters of which no readily accessible account has yet appeared, while, even in the most elementary parts, an attempt has been made to handle the subject with some degree of freshness. The interesting subjective phenomena which are associated with the sense of vision do not appear to have received in this country the attention they deserve. This little book may perhaps be of some slight service in suggesting to experimentalists, both professional and amateur, an attractive field of research which has hitherto been only partially explored.
  4. Peponakis, M.; Mastora, A.; Kapidakis, S.; Doerr, M.: Expressiveness and machine processability of Knowledge Organization Systems (KOS) : an analysis of concepts and relations (2020) 0.04
    0.038618177 = product of:
      0.15447271 = sum of:
        0.15447271 = weight(_text_:handle in 787) [ClassicSimilarity], result of:
          0.15447271 = score(doc=787,freq=2.0), product of:
            0.42740422 = queryWeight, product of:
              6.5424123 = idf(docFreq=173, maxDocs=44421)
              0.06532823 = queryNorm
            0.36142063 = fieldWeight in 787, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              6.5424123 = idf(docFreq=173, maxDocs=44421)
              0.0390625 = fieldNorm(doc=787)
      0.25 = coord(1/4)
    
    Abstract
    This study considers the expressiveness (that is the expressive power or expressivity) of different types of Knowledge Organization Systems (KOS) and discusses its potential to be machine-processable in the context of the Semantic Web. For this purpose, the theoretical foundations of KOS are reviewed based on conceptualizations introduced by the Functional Requirements for Subject Authority Data (FRSAD) and the Simple Knowledge Organization System (SKOS); natural language processing techniques are also implemented. Applying a comparative analysis, the dataset comprises a thesaurus (Eurovoc), a subject headings system (LCSH) and a classification scheme (DDC). These are compared with an ontology (CIDOC-CRM) by focusing on how they define and handle concepts and relations. It was observed that LCSH and DDC focus on the formalism of character strings (nomens) rather than on the modelling of semantics; their definition of what constitutes a concept is quite fuzzy, and they comprise a large number of complex concepts. By contrast, thesauri have a coherent definition of what constitutes a concept, and apply a systematic approach to the modelling of relations. Ontologies explicitly define diverse types of relations, and are by their nature machine-processable. The paper concludes that the potential of both the expressiveness and machine processability of each KOS is extensively regulated by its structural rules. It is harder to represent subject headings and classification schemes as semantic networks with nodes and arcs, while thesauri are more suitable for such a representation. In addition, a paradigm shift is revealed which focuses on the modelling of relations between concepts, rather than the concepts themselves.
  5. Díez Platas, M.L.; Muñoz, S.R.; González-Blanco, E.; Ruiz Fabo, P.; Álvarez Mellado, E.: Medieval Spanish (12th-15th centuries) named entity recognition and attribute annotation system based on contextual information (2021) 0.04
    0.038618177 = product of:
      0.15447271 = sum of:
        0.15447271 = weight(_text_:handle in 1094) [ClassicSimilarity], result of:
          0.15447271 = score(doc=1094,freq=2.0), product of:
            0.42740422 = queryWeight, product of:
              6.5424123 = idf(docFreq=173, maxDocs=44421)
              0.06532823 = queryNorm
            0.36142063 = fieldWeight in 1094, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              6.5424123 = idf(docFreq=173, maxDocs=44421)
              0.0390625 = fieldNorm(doc=1094)
      0.25 = coord(1/4)
    
    Abstract
    The recognition of named entities in Spanish medieval texts presents great complexity, involving specific challenges: First, the complex morphosyntactic characteristics in proper-noun use in medieval texts. Second, the lack of strict orthographic standards. Finally, diachronic and geographical variations in Spanish from the 12th to 15th century. In this period, named entities usually appear as complex text structure. For example, it was frequent to add nicknames and information about the persons role in society and geographic origin. To tackle this complexity, named entity recognition and classification system has been implemented. The system uses contextual cues based on semantics to detect entities and assign a type. Given the occurrence of entities with attached attributes, entity contexts are also parsed to determine entity-type-specific dependencies for these attributes. Moreover, it uses a variant generator to handle the diachronic evolution of Spanish medieval terms from a phonetic and morphosyntactic viewpoint. The tool iteratively enriches its proper lexica, dictionaries, and gazetteers. The system was evaluated on a corpus of over 3,000 manually annotated entities of different types and periods, obtaining F1 scores between 0.74 and 0.87. Attribute annotation was evaluated for a person and role name attributes with an overall F1 of 0.75.
  6. Ilhan, A.; Fietkiewicz, K.J.: Data privacy-related behavior and concerns of activity tracking technology users from Germany and the USA (2021) 0.04
    0.038618177 = product of:
      0.15447271 = sum of:
        0.15447271 = weight(_text_:handle in 1181) [ClassicSimilarity], result of:
          0.15447271 = score(doc=1181,freq=2.0), product of:
            0.42740422 = queryWeight, product of:
              6.5424123 = idf(docFreq=173, maxDocs=44421)
              0.06532823 = queryNorm
            0.36142063 = fieldWeight in 1181, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              6.5424123 = idf(docFreq=173, maxDocs=44421)
              0.0390625 = fieldNorm(doc=1181)
      0.25 = coord(1/4)
    
    Abstract
    Purpose This investigation aims to examine the differences and similarities between activity tracking technology users from two regions (the USA and Germany) in their intended privacy-related behavior. The focus lies on data handling after hypothetical discontinuance of use, data protection and privacy policy seeking, and privacy concerns. Design/methodology/approach The data was collected through an online survey in 2019. In order to identify significant differences between participants from Germany and the USA, the chi-squared test and the Mann-Whitney U test were applied. Findings The intensity of several privacy-related concerns was significantly different between the two groups. The majority of the participants did not inform themselves about the respective data privacy policies or terms and conditions before installing an activity tracking application. The majority of the German participants knew that they could request the deletion of all their collected data. In contrast, only 35% out of 68 participants from the US knew about this option. Research limitations/implications This study intends to raise awareness about managing the collected health and fitness data after stopping to use activity tracking technologies. Furthermore, to reduce privacy and security concerns, the involvement of the government, companies and users is necessary to handle and share data more considerably and in a sustainable way. Originality/value This study sheds light on users of activity tracking technologies from a broad perspective (here, participants from the USA and Germany). It incorporates not only concerns and the privacy paradox but (intended) user behavior, including seeking information on data protection and privacy policy and handling data after hypothetical discontinuance of use of the technology.
  7. Ahmed, M.; Mukhopadhyay, M.; Mukhopadhyay, P.: Automated knowledge organization : AI ML based subject indexing system for libraries (2023) 0.04
    0.038618177 = product of:
      0.15447271 = sum of:
        0.15447271 = weight(_text_:handle in 1979) [ClassicSimilarity], result of:
          0.15447271 = score(doc=1979,freq=2.0), product of:
            0.42740422 = queryWeight, product of:
              6.5424123 = idf(docFreq=173, maxDocs=44421)
              0.06532823 = queryNorm
            0.36142063 = fieldWeight in 1979, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              6.5424123 = idf(docFreq=173, maxDocs=44421)
              0.0390625 = fieldNorm(doc=1979)
      0.25 = coord(1/4)
    
    Abstract
    The research study as reported here is an attempt to explore the possibilities of an AI/ML-based semi-automated indexing system in a library setup to handle large volumes of documents. It uses the Python virtual environment to install and configure an open source AI environment (named Annif) to feed the LOD (Linked Open Data) dataset of Library of Congress Subject Headings (LCSH) as a standard KOS (Knowledge Organisation System). The framework deployed the Turtle format of LCSH after cleaning the file with Skosify, applied an array of backend algorithms (namely TF-IDF, Omikuji, and NN-Ensemble) to measure relative performance, and selected Snowball as an analyser. The training of Annif was conducted with a large set of bibliographic records populated with subject descriptors (MARC tag 650$a) and indexed by trained LIS professionals. The training dataset is first treated with MarcEdit to export it in a format suitable for OpenRefine, and then in OpenRefine it undergoes many steps to produce a bibliographic record set suitable to train Annif. The framework, after training, has been tested with a bibliographic dataset to measure indexing efficiencies, and finally, the automated indexing framework is integrated with data wrangling software (OpenRefine) to produce suggested headings on a mass scale. The entire framework is based on open-source software, open datasets, and open standards.
  8. Ghali, M.-K.; Farrag, A.; Won, D.; Jin, Y.: Enhancing knowledge retrieval with in-context learning and semantic search through Generative AI (2024) 0.04
    0.038618177 = product of:
      0.15447271 = sum of:
        0.15447271 = weight(_text_:handle in 2367) [ClassicSimilarity], result of:
          0.15447271 = score(doc=2367,freq=2.0), product of:
            0.42740422 = queryWeight, product of:
              6.5424123 = idf(docFreq=173, maxDocs=44421)
              0.06532823 = queryNorm
            0.36142063 = fieldWeight in 2367, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              6.5424123 = idf(docFreq=173, maxDocs=44421)
              0.0390625 = fieldNorm(doc=2367)
      0.25 = coord(1/4)
    
    Abstract
    Retrieving and extracting knowledge from extensive research documents and large databases presents significant challenges for researchers, students, and professionals in today's information-rich era. Existing retrieval systems, which rely on general-purpose Large Language Models (LLMs), often fail to provide accurate responses to domain-specific inquiries. Additionally, the high cost of pretraining or fine-tuning LLMs for specific domains limits their widespread adoption. To address these limitations, we propose a novel methodology that combines the generative capabilities of LLMs with the fast and accurate retrieval capabilities of vector databases. This advanced retrieval system can efficiently handle both tabular and non-tabular data, understand natural language user queries, and retrieve relevant information without fine-tuning. The developed model, Generative Text Retrieval (GTR), is adaptable to both unstructured and structured data with minor refinement. GTR was evaluated on both manually annotated and public datasets, achieving over 90% accuracy and delivering truthful outputs in 87% of cases. Our model achieved state-of-the-art performance with a Rouge-L F1 score of 0.98 on the MSMARCO dataset. The refined model, Generative Tabular Text Retrieval (GTR-T), demonstrated its efficiency in large database querying, achieving an Execution Accuracy (EX) of 0.82 and an Exact-Set-Match (EM) accuracy of 0.60 on the Spider dataset, using an open-source LLM. These efforts leverage Generative AI and In-Context Learning to enhance human-text interaction and make advanced AI capabilities more accessible. By integrating robust retrieval systems with powerful LLMs, our approach aims to democratize access to sophisticated AI tools, improving the efficiency, accuracy, and scalability of AI-driven information retrieval and database querying.
  9. Hou, Y.; Pascale, A.; Carnerero-Cano, J.; Sattigeri, P.; Tchrakian, T.; Marinescu, R.; Daly, E.; Padhi, I.: WikiContradict : a benchmark for evaluating LLMs on real-world knowledge conflicts from Wikipedia (2024) 0.04
    0.038618177 = product of:
      0.15447271 = sum of:
        0.15447271 = weight(_text_:handle in 2368) [ClassicSimilarity], result of:
          0.15447271 = score(doc=2368,freq=2.0), product of:
            0.42740422 = queryWeight, product of:
              6.5424123 = idf(docFreq=173, maxDocs=44421)
              0.06532823 = queryNorm
            0.36142063 = fieldWeight in 2368, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              6.5424123 = idf(docFreq=173, maxDocs=44421)
              0.0390625 = fieldNorm(doc=2368)
      0.25 = coord(1/4)
    
    Abstract
    Retrieval-augmented generation (RAG) has emerged as a promising solution to mitigate the limitations of large language models (LLMs), such as hallucinations and outdated information. However, it remains unclear how LLMs handle knowledge conflicts arising from different augmented retrieved passages, especially when these passages originate from the same source and have equal trustworthiness. In this work, we conduct a comprehensive evaluation of LLM-generated answers to questions that have varying answers based on contradictory passages from Wikipedia, a dataset widely regarded as a high-quality pre-training resource for most LLMs. Specifically, we introduce WikiContradict, a benchmark consisting of 253 high-quality, human-annotated instances designed to assess LLM performance when augmented with retrieved passages containing real-world knowledge conflicts. We benchmark a diverse range of both closed and open-source LLMs under different QA scenarios, including RAG with a single passage, and RAG with 2 contradictory passages. Through rigorous human evaluations on a subset of WikiContradict instances involving 5 LLMs and over 3,500 judgements, we shed light on the behaviour and limitations of these models. For instance, when provided with two passages containing contradictory facts, all models struggle to generate answers that accurately reflect the conflicting nature of the context, especially for implicit conflicts requiring reasoning. Since human evaluation is costly, we also introduce an automated model that estimates LLM performance using a strong open-source language model, achieving an F-score of 0.8. Using this automated metric, we evaluate more than 1,500 answers from seven LLMs across all WikiContradict instances. To facilitate future work, we release WikiContradict on: https://ibm.biz/wikicontradict.
  10. Noerr, P.: ¬The Digital Library Tool Kit (2001) 0.04
    0.035848998 = product of:
      0.14339599 = sum of:
        0.14339599 = weight(_text_:java in 774) [ClassicSimilarity], result of:
          0.14339599 = score(doc=774,freq=2.0), product of:
            0.4604012 = queryWeight, product of:
              7.0475073 = idf(docFreq=104, maxDocs=44421)
              0.06532823 = queryNorm
            0.31145877 = fieldWeight in 774, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              7.0475073 = idf(docFreq=104, maxDocs=44421)
              0.03125 = fieldNorm(doc=774)
      0.25 = coord(1/4)
    
    Footnote
    This Digital Library Tool Kit was sponsored by Sun Microsystems, Inc. to address some of the leading questions that academic institutions, public libraries, government agencies, and museums face in trying to develop, manage, and distribute digital content. The evolution of Java programming, digital object standards, Internet access, electronic commerce, and digital media management models is causing educators, CIOs, and librarians to rethink many of their traditional goals and modes of operation. New audiences, continuous access to collections, and enhanced services to user communities are enabled. As one of the leading technology providers to education and library communities, Sun is pleased to present this comprehensive introduction to digital libraries
  11. Herrero-Solana, V.; Moya Anegón, F. de: Graphical Table of Contents (GTOC) for library collections : the application of UDC codes for the subject maps (2003) 0.04
    0.035848998 = product of:
      0.14339599 = sum of:
        0.14339599 = weight(_text_:java in 3758) [ClassicSimilarity], result of:
          0.14339599 = score(doc=3758,freq=2.0), product of:
            0.4604012 = queryWeight, product of:
              7.0475073 = idf(docFreq=104, maxDocs=44421)
              0.06532823 = queryNorm
            0.31145877 = fieldWeight in 3758, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              7.0475073 = idf(docFreq=104, maxDocs=44421)
              0.03125 = fieldNorm(doc=3758)
      0.25 = coord(1/4)
    
    Abstract
    The representation of information contents by graphical maps is an extended ongoing research topic. In this paper we introduce the application of UDC codes for the subject maps development. We use the following graphic representation methodologies: 1) Multidimensional scaling (MDS), 2) Cluster analysis, 3) Neural networks (Self Organizing Map - SOM). Finally, we conclude about the application viability of every kind of map. 1. Introduction Advanced techniques for Information Retrieval (IR) currently make up one of the most active areas for research in the field of library and information science. New models representing document content are replacing the classic systems in which the search terms supplied by the user were compared against the indexing terms existing in the inverted files of a database. One of the topics most often studied in the last years is bibliographic browsing, a good complement to querying strategies. Since the 80's, many authors have treated this topic. For example, Ellis establishes that browsing is based an three different types of tasks: identification, familiarization and differentiation (Ellis, 1989). On the other hand, Cove indicates three different browsing types: searching browsing, general purpose browsing and serendipity browsing (Cove, 1988). Marcia Bates presents six different types (Bates, 1989), although the classification of Bawden is the one that really interests us: 1) similarity comparison, 2) structure driven, 3) global vision (Bawden, 1993). The global vision browsing implies the use of graphic representations, which we will call map displays, that allow the user to get a global idea of the nature and structure of the information in the database. In the 90's, several authors worked an this research line, developing different types of maps. One of the most active was Xia Lin what introduced the concept of Graphical Table of Contents (GTOC), comparing the maps to true table of contents based an graphic representations (Lin 1996). Lin applies the algorithm SOM to his own personal bibliography, analyzed in function of the words of the title and abstract fields, and represented in a two-dimensional map (Lin 1997). Later on, Lin applied this type of maps to create websites GTOCs, through a Java application.
  12. Vlachidis, A.; Binding, C.; Tudhope, D.; May, K.: Excavating grey literature : a case study on the rich indexing of archaeological documents via natural language-processing techniques and knowledge-based resources (2010) 0.04
    0.035848998 = product of:
      0.14339599 = sum of:
        0.14339599 = weight(_text_:java in 935) [ClassicSimilarity], result of:
          0.14339599 = score(doc=935,freq=2.0), product of:
            0.4604012 = queryWeight, product of:
              7.0475073 = idf(docFreq=104, maxDocs=44421)
              0.06532823 = queryNorm
            0.31145877 = fieldWeight in 935, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              7.0475073 = idf(docFreq=104, maxDocs=44421)
              0.03125 = fieldNorm(doc=935)
      0.25 = coord(1/4)
    
    Abstract
    Purpose - This paper sets out to discuss the use of information extraction (IE), a natural language-processing (NLP) technique to assist "rich" semantic indexing of diverse archaeological text resources. The focus of the research is to direct a semantic-aware "rich" indexing of diverse natural language resources with properties capable of satisfying information retrieval from online publications and datasets associated with the Semantic Technologies for Archaeological Resources (STAR) project. Design/methodology/approach - The paper proposes use of the English Heritage extension (CRM-EH) of the standard core ontology in cultural heritage, CIDOC CRM, and exploitation of domain thesauri resources for driving and enhancing an Ontology-Oriented Information Extraction process. The process of semantic indexing is based on a rule-based Information Extraction technique, which is facilitated by the General Architecture of Text Engineering (GATE) toolkit and expressed by Java Annotation Pattern Engine (JAPE) rules. Findings - Initial results suggest that the combination of information extraction with knowledge resources and standard conceptual models is capable of supporting semantic-aware term indexing. Additional efforts are required for further exploitation of the technique and adoption of formal evaluation methods for assessing the performance of the method in measurable terms. Originality/value - The value of the paper lies in the semantic indexing of 535 unpublished online documents often referred to as "Grey Literature", from the Archaeological Data Service OASIS corpus (Online AccesS to the Index of archaeological investigationS), with respect to the CRM ontological concepts E49.Time Appellation and P19.Physical Object.
  13. Radhakrishnan, A.: Swoogle : an engine for the Semantic Web (2007) 0.04
    0.035848998 = product of:
      0.14339599 = sum of:
        0.14339599 = weight(_text_:java in 709) [ClassicSimilarity], result of:
          0.14339599 = score(doc=709,freq=2.0), product of:
            0.4604012 = queryWeight, product of:
              7.0475073 = idf(docFreq=104, maxDocs=44421)
              0.06532823 = queryNorm
            0.31145877 = fieldWeight in 709, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              7.0475073 = idf(docFreq=104, maxDocs=44421)
              0.03125 = fieldNorm(doc=709)
      0.25 = coord(1/4)
    
    Content
    "Swoogle, the Semantic web search engine, is a research project carried out by the ebiquity research group in the Computer Science and Electrical Engineering Department at the University of Maryland. It's an engine tailored towards finding documents on the semantic web. The whole research paper is available here. Semantic web is touted as the next generation of online content representation where the web documents are represented in a language that is not only easy for humans but is machine readable (easing the integration of data as never thought possible) as well. And the main elements of the semantic web include data model description formats such as Resource Description Framework (RDF), a variety of data interchange formats (e.g. RDF/XML, Turtle, N-Triples), and notations such as RDF Schema (RDFS), the Web Ontology Language (OWL), all of which are intended to provide a formal description of concepts, terms, and relationships within a given knowledge domain (Wikipedia). And Swoogle is an attempt to mine and index this new set of web documents. The engine performs crawling of semantic documents like most web search engines and the search is available as web service too. The engine is primarily written in Java with the PHP used for the front-end and MySQL for database. Swoogle is capable of searching over 10,000 ontologies and indexes more that 1.3 million web documents. It also computes the importance of a Semantic Web document. The techniques used for indexing are the more google-type page ranking and also mining the documents for inter-relationships that are the basis for the semantic web. For more information on how the RDF framework can be used to relate documents, read the link here. Being a research project, and with a non-commercial motive, there is not much hype around Swoogle. However, the approach to indexing of Semantic web documents is an approach that most engines will have to take at some point of time. When the Internet debuted, there were no specific engines available for indexing or searching. The Search domain only picked up as more and more content became available. One fundamental question that I've always wondered about it is - provided that the search engines return very relevant results for a query - how to ascertain that the documents are indeed the most relevant ones available. There is always an inherent delay in indexing of document. Its here that the new semantic documents search engines can close delay. Experimenting with the concept of Search in the semantic web can only bore well for the future of search technology."
  14. Piros, A.: Automatic interpretation of complex UDC numbers : towards support for library systems (2015) 0.04
    0.035848998 = product of:
      0.14339599 = sum of:
        0.14339599 = weight(_text_:java in 3301) [ClassicSimilarity], result of:
          0.14339599 = score(doc=3301,freq=2.0), product of:
            0.4604012 = queryWeight, product of:
              7.0475073 = idf(docFreq=104, maxDocs=44421)
              0.06532823 = queryNorm
            0.31145877 = fieldWeight in 3301, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              7.0475073 = idf(docFreq=104, maxDocs=44421)
              0.03125 = fieldNorm(doc=3301)
      0.25 = coord(1/4)
    
    Abstract
    Analytico-synthetic and faceted classifications, such as Universal Decimal Classification (UDC) express content of documents with complex, pre-combined classification codes. Without classification authority control that would help manage and access structured notations, the use of UDC codes in searching and browsing is limited. Existing UDC parsing solutions are usually created for a particular database system or a specific task and are not widely applicable. The approach described in this paper provides a solution by which the analysis and interpretation of UDC notations would be stored into an intermediate format (in this case, in XML) by automatic means without any data or information loss. Due to its richness, the output file can be converted into different formats, such as standard mark-up and data exchange formats or simple lists of the recommended entry points of a UDC number. The program can also be used to create authority records containing complex UDC numbers which can be comprehensively analysed in order to be retrieved effectively. The Java program, as well as the corresponding schema definition it employs, is under continuous development. The current version of the interpreter software is now available online for testing purposes at the following web site: http://interpreter-eto.rhcloud.com. The future plan is to implement conversion methods for standard formats and to create standard online interfaces in order to make it possible to use the features of software as a service. This would result in the algorithm being able to be employed both in existing and future library systems to analyse UDC numbers without any significant programming effort.
  15. Rosenfeld, L.; Morville, P.: Information architecture for the World Wide Web : designing large-scale Web sites (1998) 0.03
    0.031367872 = product of:
      0.12547149 = sum of:
        0.12547149 = weight(_text_:java in 1493) [ClassicSimilarity], result of:
          0.12547149 = score(doc=1493,freq=2.0), product of:
            0.4604012 = queryWeight, product of:
              7.0475073 = idf(docFreq=104, maxDocs=44421)
              0.06532823 = queryNorm
            0.2725264 = fieldWeight in 1493, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              7.0475073 = idf(docFreq=104, maxDocs=44421)
              0.02734375 = fieldNorm(doc=1493)
      0.25 = coord(1/4)
    
    Abstract
    Some web sites "work" and some don't. Good web site consultants know that you can't just jump in and start writing HTML, the same way you can't build a house by just pouring a foundation and putting up some walls. You need to know who will be using the site, and what they'll be using it for. You need some idea of what you'd like to draw their attention to during their visit. Overall, you need a strong, cohesive vision for the site that makes it both distinctive and usable. Information Architecture for the World Wide Web is about applying the principles of architecture and library science to web site design. Each web site is like a public building, available for tourists and regulars alike to breeze through at their leisure. The job of the architect is to set up the framework for the site to make it comfortable and inviting for people to visit, relax in, and perhaps even return to someday. Most books on web development concentrate either on the aesthetics or the mechanics of the site. This book is about the framework that holds the two together. With this book, you learn how to design web sites and intranets that support growth, management, and ease of use. Special attention is given to: * The process behind architecting a large, complex site * Web site hierarchy design and organization Information Architecture for the World Wide Web is for webmasters, designers, and anyone else involved in building a web site. It's for novice web designers who, from the start, want to avoid the traps that result in poorly designed sites. It's for experienced web designers who have already created sites but realize that something "is missing" from their sites and want to improve them. It's for programmers and administrators who are comfortable with HTML, CGI, and Java but want to understand how to organize their web pages into a cohesive site. The authors are two of the principals of Argus Associates, a web consulting firm. At Argus, they have created information architectures for web sites and intranets of some of the largest companies in the United States, including Chrysler Corporation, Barron's, and Dow Chemical.
  16. Multilingual information management : current levels and future abilities. A report Commissioned by the US National Science Foundation and also delivered to the European Commission's Language Engineering Office and the US Defense Advanced Research Projects Agency, April 1999 (1999) 0.03
    0.03089454 = product of:
      0.12357816 = sum of:
        0.12357816 = weight(_text_:handle in 68) [ClassicSimilarity], result of:
          0.12357816 = score(doc=68,freq=2.0), product of:
            0.42740422 = queryWeight, product of:
              6.5424123 = idf(docFreq=173, maxDocs=44421)
              0.06532823 = queryNorm
            0.2891365 = fieldWeight in 68, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              6.5424123 = idf(docFreq=173, maxDocs=44421)
              0.03125 = fieldNorm(doc=68)
      0.25 = coord(1/4)
    
    Abstract
    This picture will rapidly change. The twin challenges of massive information overload via the web and ubiquitous computers present us with an unavoidable task: developing techniques to handle multilingual and multi-modal information robustly and efficiently, with as high quality performance as possible. The most effective way for us to address such a mammoth task, and to ensure that our various techniques and applications fit together, is to start talking across the artificial research boundaries. Extending the current technologies will require integrating the various capabilities into multi-functional and multi-lingual natural language systems. However, at this time there is no clear vision of how these technologies could or should be assembled into a coherent framework. What would be involved in connecting a speech recognition system to an information retrieval engine, and then using machine translation and summarization software to process the retrieved text? How can traditional parsing and generation be enhanced with statistical techniques? What would be the effect of carefully crafted lexicons on traditional information retrieval? At which points should machine translation be interleaved within information retrieval systems to enable multilingual processing?
  17. Kolluri, V.; Metzler, D.P.: Knowledge guided rule learning (1999) 0.03
    0.03089454 = product of:
      0.12357816 = sum of:
        0.12357816 = weight(_text_:handle in 550) [ClassicSimilarity], result of:
          0.12357816 = score(doc=550,freq=2.0), product of:
            0.42740422 = queryWeight, product of:
              6.5424123 = idf(docFreq=173, maxDocs=44421)
              0.06532823 = queryNorm
            0.2891365 = fieldWeight in 550, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              6.5424123 = idf(docFreq=173, maxDocs=44421)
              0.03125 = fieldNorm(doc=550)
      0.25 = coord(1/4)
    
    Abstract
    Rule learning algorithms, developed by traditional supervised machine learning research community, are being used as data analysis tools for generating accurate concept definitions, given a set of instances (pre-classified) and a goal-task (concept class). Most rule learners use straightforward data driven approaches using information theoretic principles to search for statistically defined "interesting" patterns in the data sets. There are two main drawbacks with such purely data driven approaches. First, they perform poorly when insufficient data is available. Second, when large training data sets are available they tend to generate many uninteresting patterns from data sets, and usually it is left to the domain expert to distinguish the "useful" pieces of information from the rest. The size of this problem (a data mining issue onto itself) suggests the need to guide the learning system's search to relevant sub spaces within the space of all possible hypotheses. This paper explores the utility of using prior domain knowledge (in the form of taxonomies over attributes, attribute values and concept classes) to constrain the rule learner's search by requiring it to be consistent with what is already known about the domain. Spreading Activation Learning (SAL) using marker propagation techniques introduced by Aronis and Provost (1994) is used to efficiently learn over taxonomically structured attributes and attribute values. An extension to the SAL methodology to handle rule learning over concept class values is presented. By representing the range of numeric (continuous) values for attributes in the form of simplified IS A taxonomies, the SAL methodology is shown to be capable of handling numeric (continuous) attribute values. Large taxonomies over value sets (especially taxonomies over numeric value sets) usually result in too many redundant rules. This problem can be addressed by pruning the rule set using "rule interest" measures. The focus of this study is to explore the utility of taxonomic structures in rule learning and, in particular the use of taxonomic structures as a way of incorporating background knowledge in the rule learning process. Initial results obtained from an ongoing research work are presented
  18. Woods, X.B.: Envisioning the word : Multimedia CD-ROM indexing (2000) 0.03
    0.03089454 = product of:
      0.12357816 = sum of:
        0.12357816 = weight(_text_:handle in 1223) [ClassicSimilarity], result of:
          0.12357816 = score(doc=1223,freq=2.0), product of:
            0.42740422 = queryWeight, product of:
              6.5424123 = idf(docFreq=173, maxDocs=44421)
              0.06532823 = queryNorm
            0.2891365 = fieldWeight in 1223, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              6.5424123 = idf(docFreq=173, maxDocs=44421)
              0.03125 = fieldNorm(doc=1223)
      0.25 = coord(1/4)
    
    Abstract
    If you are an indexer who is accustomed to working in solitude with static words, you might face some big surprises in the production of a multimedia CD-ROM. You will not be working alone. You will not be working from a manuscript. Your dexterity with a dedicated software tool for indexing will be irrelevant. The coding or tagging might not be your worry either, because it will likely be done by members of a separate technical staff. The CD-ROM can currently hold 660 megabytes of data. Its production is a massive team effort. Because of the sheer volume of data involved, it is unlikely that one indexer working alone can handle the job in a reasonable period of time. The database for the actual index entries is likely to have been designed specifically for the project at hand, so the indexers will learn the software tools on the job. The entire project will probably be onscreen. So, if you choose to thrust yourself into this teeming amalgam of production, what are the prerequisites and what new things can you expect to learn? CD-ROM is an amorphous new medium with few rules. Your most important resume items might be your flexibility, imagination, and love of words. What remains unchanged from traditional back-of-the-book indexing is the need for empathy with the user; you will still need to come up with exactly the right word for the situation. What is new here is the situation: you might learn to envision the words that correspond to non-textual media such as graphics, photos, video clips, and musical passages. And because you will be dealing with vast amounts of textual and sensory data, you might find yourself rethinking the nature and purpose of an index as a whole. CD-ROM production can take many forms; three will be discussed here
  19. Jens-Erik Mai, J.-E.: ¬The role of documents, domains and decisions in indexing (2004) 0.03
    0.03089454 = product of:
      0.12357816 = sum of:
        0.12357816 = weight(_text_:handle in 3653) [ClassicSimilarity], result of:
          0.12357816 = score(doc=3653,freq=2.0), product of:
            0.42740422 = queryWeight, product of:
              6.5424123 = idf(docFreq=173, maxDocs=44421)
              0.06532823 = queryNorm
            0.2891365 = fieldWeight in 3653, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              6.5424123 = idf(docFreq=173, maxDocs=44421)
              0.03125 = fieldNorm(doc=3653)
      0.25 = coord(1/4)
    
    Content
    1. Introduction The document at hand is often regarded as the most important entity for analysis in the indexing situation. The indexer's focus is directed to the "entity and its faithful description" (Soergel, 1985, 227) and the indexer is advised to "stick to the text and the author's claims" (Lancaster, 2003, 37). The indexer's aim is to establish the subject matter based an an analysis of the document with the goal of representing the document as truthfully as possible and to ensure the subject representation's validity by remaining neutral and objective. To help indexers with their task they are guided towards particular and important attributes of the document that could help them determine the document's subject matter. The exact attributes the indexer is recommended to examine varies, but typical examples are: the title, the abstract, the table of contents, chapter headings, chapter subheadings, preface, introduction, foreword, the text itself, bibliographical references, index entries, illustrations, diagrams, and tables and their captions. The exact recommendations vary according to the type of document that is being indexed (monographs vs. periodical articles, for instance). It is clear that indexers should provide faithful descriptions, that indexers should represent the author's claims, and that the document's attributes are helpful points of analysis. However, indexers need much more guidance when determining the subject than simply the documents themselves. One approach that could be taken to handle the Situation is a useroriented approach in which it is argued that the indexer should ask, "how should I make this document ... visible to potential users? What terms should I use to convey its knowledge to those interested?" (Albrechtsen, 1993, 222). The basic idea is that indexers need to have the users' information needs and terminology in mind when determining the subject matter of documents as well as when selecting index terms.
  20. Munzner, T.: Interactive visualization of large graphs and networks (2000) 0.03
    0.03089454 = product of:
      0.12357816 = sum of:
        0.12357816 = weight(_text_:handle in 5746) [ClassicSimilarity], result of:
          0.12357816 = score(doc=5746,freq=2.0), product of:
            0.42740422 = queryWeight, product of:
              6.5424123 = idf(docFreq=173, maxDocs=44421)
              0.06532823 = queryNorm
            0.2891365 = fieldWeight in 5746, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              6.5424123 = idf(docFreq=173, maxDocs=44421)
              0.03125 = fieldNorm(doc=5746)
      0.25 = coord(1/4)
    
    Abstract
    Many real-world domains can be represented as large node-link graphs: backbone Internet routers connect with 70,000 other hosts, mid-sized Web servers handle between 20,000 and 200,000 hyperlinked documents, and dictionaries contain millions of words defined in terms of each other. Computational manipulation of such large graphs is common, but previous tools for graph visualization have been limited to datasets of a few thousand nodes. Visual depictions of graphs and networks are external representations that exploit human visual processing to reduce the cognitive load of many tasks that require understanding of global or local structure. We assert that the two key advantages of computer-based systems for information visualization over traditional paper-based visual exposition are interactivity and scalability. We also argue that designing visualization software by taking the characteristics of a target user's task domain into account leads to systems that are more effective and scale to larger datasets than previous work. This thesis contains a detailed analysis of three specialized systems for the interactive exploration of large graphs, relating the intended tasks to the spatial layout and visual encoding choices. We present two novel algorithms for specialized layout and drawing that use quite different visual metaphors. The H3 system for visualizing the hyperlink structures of web sites scales to datasets of over 100,000 nodes by using a carefully chosen spanning tree as the layout backbone, 3D hyperbolic geometry for a Focus+Context view, and provides a fluid interactive experience through guaranteed frame rate drawing. The Constellation system features a highly specialized 2D layout intended to spatially encode domain-specific information for computational linguists checking the plausibility of a large semantic network created from dictionaries. The Planet Multicast system for displaying the tunnel topology of the Internet's multicast backbone provides a literal 3D geographic layout of arcs on a globe to help MBone maintainers find misconfigured long-distance tunnels. Each of these three systems provides a very different view of the graph structure, and we evaluate their efficacy for the intended task. We generalize these findings in our analysis of the importance of interactivity and specialization for graph visualization systems that are effective and scalable.

Authors

Languages

Types

  • a 9476
  • m 2245
  • el 1025
  • x 595
  • s 558
  • i 168
  • r 117
  • ? 66
  • n 55
  • b 47
  • l 23
  • p 23
  • h 17
  • d 15
  • u 14
  • fi 10
  • v 2
  • z 2
  • au 1
  • ms 1
  • More… Less…

Themes

Subjects

Classifications