-
Shachak, A.; Fine, S.: ¬The Effect of training on biologists acceptance of bioinformatics tools : a field experiment (2008)
0.04
0.038618177 = product of:
0.15447271 = sum of:
0.15447271 = weight(_text_:handle in 2606) [ClassicSimilarity], result of:
0.15447271 = score(doc=2606,freq=2.0), product of:
0.42740422 = queryWeight, product of:
6.5424123 = idf(docFreq=173, maxDocs=44421)
0.06532823 = queryNorm
0.36142063 = fieldWeight in 2606, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
6.5424123 = idf(docFreq=173, maxDocs=44421)
0.0390625 = fieldNorm(doc=2606)
0.25 = coord(1/4)
- Abstract
- A recent development in biological research is the emergence of bioinformatics, which employs novel informatics techniques to handle biological data. Although the importance of bioinformatics training is widely recognized, little attention has been paid to its effect on the acceptance of bioinformatics by biologists. In this study, the effect of training on biologists' acceptance of bioinformatics tools was tested using the technology acceptance model (TAM) as a theoretical framework. Ninety individuals participated in a field experiment during seven bioinformatics workshops. Pre- and post-intervention tests were used to measure perceived usefulness, perceived ease of use, and intended use of bioinformatics tools for primer design and microarray analysis - a simple versus a complex tool that is used for a simple and a complex task, respectively. Perceived usefulness and ease of use were both significant predictors of intended use of bioinformatics tools. After hands-on experience, intention to use both tools decreased. The perceived ease of use of the primer design tool increased but that of the microarray analysis tool decreased. It is suggested that hands-on training helps researchers to form realistic perceptions of bioinformatics tools, thereby enabling them to make informed decisions about whether and how to use them.
-
Li, T.; Zhu, S.; Ogihara, M.: Text categorization via generalized discriminant analysis (2008)
0.04
0.038618177 = product of:
0.15447271 = sum of:
0.15447271 = weight(_text_:handle in 3119) [ClassicSimilarity], result of:
0.15447271 = score(doc=3119,freq=2.0), product of:
0.42740422 = queryWeight, product of:
6.5424123 = idf(docFreq=173, maxDocs=44421)
0.06532823 = queryNorm
0.36142063 = fieldWeight in 3119, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
6.5424123 = idf(docFreq=173, maxDocs=44421)
0.0390625 = fieldNorm(doc=3119)
0.25 = coord(1/4)
- Abstract
- Text categorization is an important research area and has been receiving much attention due to the growth of the on-line information and of Internet. Automated text categorization is generally cast as a multi-class classification problem. Much of previous work focused on binary document classification problems. Support vector machines (SVMs) excel in binary classification, but the elegant theory behind large-margin hyperplane cannot be easily extended to multi-class text classification. In addition, the training time and scaling are also important concerns. On the other hand, other techniques naturally extensible to handle multi-class classification are generally not as accurate as SVM. This paper presents a simple and efficient solution to multi-class text categorization. Classification problems are first formulated as optimization via discriminant analysis. Text categorization is then cast as the problem of finding coordinate transformations that reflects the inherent similarity from the data. While most of the previous approaches decompose a multi-class classification problem into multiple independent binary classification tasks, the proposed approach enables direct multi-class classification. By using generalized singular value decomposition (GSVD), a coordinate transformation that reflects the inherent class structure indicated by the generalized singular values is identified. Extensive experiments demonstrate the efficiency and effectiveness of the proposed approach.
-
Calegari, S.; Sanchez, E.: Object-fuzzy concept network : an enrichment of ontologies in semantic information retrieval (2008)
0.04
0.038618177 = product of:
0.15447271 = sum of:
0.15447271 = weight(_text_:handle in 3393) [ClassicSimilarity], result of:
0.15447271 = score(doc=3393,freq=2.0), product of:
0.42740422 = queryWeight, product of:
6.5424123 = idf(docFreq=173, maxDocs=44421)
0.06532823 = queryNorm
0.36142063 = fieldWeight in 3393, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
6.5424123 = idf(docFreq=173, maxDocs=44421)
0.0390625 = fieldNorm(doc=3393)
0.25 = coord(1/4)
- Abstract
- This article shows how a fuzzy ontology-based approach can improve semantic documents retrieval. After formally defining a fuzzy ontology and a fuzzy knowledge base, a special type of new fuzzy relationship called (semantic) correlation, which links the concepts or entities in a fuzzy ontology, is discussed. These correlations, first assigned by experts, are updated after querying or when a document has been inserted into a database. Moreover, in order to define a dynamic knowledge of a domain adapting itself to the context, it is shown how to handle a tradeoff between the correct definition of an object, taken in the ontology structure, and the actual meaning assigned by individuals. The notion of a fuzzy concept network is extended, incorporating database objects so that entities and documents can similarly be represented in the network. Information retrieval (IR) algorithm, using an object-fuzzy concept network (O-FCN), is introduced and described. This algorithm allows us to derive a unique path among the entities involved in the query to obtain maxima semantic associations in the knowledge domain. Finally, the study has been validated by querying a database using fuzzy recall, fuzzy precision, and coefficient variant measures in the crisp and fuzzy cases.
-
Koppel, M.; Schler, J.; Argamon, S.: Computational methods in authorship attribution (2009)
0.04
0.038618177 = product of:
0.15447271 = sum of:
0.15447271 = weight(_text_:handle in 3683) [ClassicSimilarity], result of:
0.15447271 = score(doc=3683,freq=2.0), product of:
0.42740422 = queryWeight, product of:
6.5424123 = idf(docFreq=173, maxDocs=44421)
0.06532823 = queryNorm
0.36142063 = fieldWeight in 3683, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
6.5424123 = idf(docFreq=173, maxDocs=44421)
0.0390625 = fieldNorm(doc=3683)
0.25 = coord(1/4)
- Abstract
- Statistical authorship attribution has a long history, culminating in the use of modern machine learning classification methods. Nevertheless, most of this work suffers from the limitation of assuming a small closed set of candidate authors and essentially unlimited training text for each. Real-life authorship attribution problems, however, typically fall short of this ideal. Thus, following detailed discussion of previous work, three scenarios are considered here for which solutions to the basic attribution problem are inadequate. In the first variant, the profiling problem, there is no candidate set at all; in this case, the challenge is to provide as much demographic or psychological information as possible about the author. In the second variant, the needle-in-a-haystack problem, there are many thousands of candidates for each of whom we might have a very limited writing sample. In the third variant, the verification problem, there is no closed candidate set but there is one suspect; in this case, the challenge is to determine if the suspect is or is not the author. For each variant, it is shown how machine learning methods can be adapted to handle the special challenges of that variant.
-
Yeganova, L.; Comeau, D.C.; Kim, W.; Wilbur, W.J.: How to interpret PubMed queries and why it matters (2009)
0.04
0.038618177 = product of:
0.15447271 = sum of:
0.15447271 = weight(_text_:handle in 3712) [ClassicSimilarity], result of:
0.15447271 = score(doc=3712,freq=2.0), product of:
0.42740422 = queryWeight, product of:
6.5424123 = idf(docFreq=173, maxDocs=44421)
0.06532823 = queryNorm
0.36142063 = fieldWeight in 3712, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
6.5424123 = idf(docFreq=173, maxDocs=44421)
0.0390625 = fieldNorm(doc=3712)
0.25 = coord(1/4)
- Abstract
- A significant fraction of queries in PubMed(TM) are multiterm queries without parsing instructions. Generally, search engines interpret such queries as collections of terms, and handle them as a Boolean conjunction of these terms. However, analysis of queries in PubMed(TM) indicates that many such queries are meaningful phrases, rather than simple collections of terms. In this study, we examine whether or not it makes a difference, in terms of retrieval quality, if such queries are interpreted as a phrase or as a conjunction of query terms. And, if it does, what is the optimal way of searching with such queries. To address the question, we developed an automated retrieval evaluation method, based on machine learning techniques, that enables us to evaluate and compare various retrieval outcomes. We show that the class of records that contain all the search terms, but not the phrase, qualitatively differs from the class of records containing the phrase. We also show that the difference is systematic, depending on the proximity of query terms to each other within the record. Based on these results, one can establish the best retrieval order for the records. Our findings are consistent with studies in proximity searching.
-
Lee, Y.-S.; Wu, Y.-C.; Yang, J.-C.: BVideoQA : Online English/Chinese bilingual video question answering (2009)
0.04
0.038618177 = product of:
0.15447271 = sum of:
0.15447271 = weight(_text_:handle in 3739) [ClassicSimilarity], result of:
0.15447271 = score(doc=3739,freq=2.0), product of:
0.42740422 = queryWeight, product of:
6.5424123 = idf(docFreq=173, maxDocs=44421)
0.06532823 = queryNorm
0.36142063 = fieldWeight in 3739, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
6.5424123 = idf(docFreq=173, maxDocs=44421)
0.0390625 = fieldNorm(doc=3739)
0.25 = coord(1/4)
- Abstract
- This article presents a bilingual video question answering (QA) system, namely BVideoQA, which allows users to retrieve Chinese videos through English or Chinese natural language questions. Our method first extracts an optimal one-to-one string pattern matching according to the proposed dense and long N-gram match. On the basis of the matched string patterns, it gives a passage score based on our term-weighting scheme. The main contributions of this approach to multimedia information retrieval literatures include: (a) development of a truly bilingual video QA system, (b) presentation of a robust bilingual passage retrieval algorithm to handle no-word-boundary languages such as Chinese and Japanese, (c) development of a large-scale bilingual video QA corpus for system evaluation, and (d) comparisons of seven top-performing retrieval methods under the fair conditions. The experimental studies indicate that our method is superior to other existing approaches in terms of precision and main rank reciprocal rates. When ported to English, encouraging empirical results also are obtained. Our method is very important to Asian-like languages since the development of a word tokenizer is optional.
-
Lavrenko, V.: ¬A generative theory of relevance (2009)
0.04
0.038618177 = product of:
0.15447271 = sum of:
0.15447271 = weight(_text_:handle in 293) [ClassicSimilarity], result of:
0.15447271 = score(doc=293,freq=2.0), product of:
0.42740422 = queryWeight, product of:
6.5424123 = idf(docFreq=173, maxDocs=44421)
0.06532823 = queryNorm
0.36142063 = fieldWeight in 293, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
6.5424123 = idf(docFreq=173, maxDocs=44421)
0.0390625 = fieldNorm(doc=293)
0.25 = coord(1/4)
- Abstract
- A modern information retrieval system must have the capability to find, organize and present very different manifestations of information - such as text, pictures, videos or database records - any of which may be of relevance to the user. However, the concept of relevance, while seemingly intuitive, is actually hard to define, and it's even harder to model in a formal way. Lavrenko does not attempt to bring forth a new definition of relevance, nor provide arguments as to why any particular definition might be theoretically superior or more complete. Instead, he takes a widely accepted, albeit somewhat conservative definition, makes several assumptions, and from them develops a new probabilistic model that explicitly captures that notion of relevance. With this book, he makes two major contributions to the field of information retrieval: first, a new way to look at topical relevance, complementing the two dominant models, i.e., the classical probabilistic model and the language modeling approach, and which explicitly combines documents, queries, and relevance in a single formalism; second, a new method for modeling exchangeable sequences of discrete random variables which does not make any structural assumptions about the data and which can also handle rare events. Thus his book is of major interest to researchers and graduate students in information retrieval who specialize in relevance modeling, ranking algorithms, and language modeling.
-
Nolin, J.: "Relevance" as a boundary concept : reconsidering early information retrieval (2009)
0.04
0.038618177 = product of:
0.15447271 = sum of:
0.15447271 = weight(_text_:handle in 600) [ClassicSimilarity], result of:
0.15447271 = score(doc=600,freq=2.0), product of:
0.42740422 = queryWeight, product of:
6.5424123 = idf(docFreq=173, maxDocs=44421)
0.06532823 = queryNorm
0.36142063 = fieldWeight in 600, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
6.5424123 = idf(docFreq=173, maxDocs=44421)
0.0390625 = fieldNorm(doc=600)
0.25 = coord(1/4)
- Abstract
- Purpose - Throughout its history, information retrieval has struggled to handle contradictory needs of system oriented and user-oriented research. Information retrieval has gradually, starting in the 1960s, moved toward handling the needs of the user. This paper aims to consider the way boundaries toward the user and user-oriented perspectives are drawn, renegotiated and re-drawn. Design/methodology/approach - The central concept of relevance is seen as a boundary concept, complex and flexible, that is continuously redefined in order to manage boundaries. Five influential research papers from the 1960s and early 1970s are analysed in order to understand usage of the concept during a period when psychological and cognitive research tools began to be discussed as a possibility. Findings - Relevance does not only carry an explanatory function, but also serves a purpose relating to the identity of the field. Key contributions on research on relevance seems to, as a by-product, draw a boundary giving legitimacy to certain theoretical resources while demarcating against others. The strategies that are identified in the key texts are intent on finding, representing, justifying and strengthening a boundary that includes and excludes a reasonable amount of complexity associated with the user. Originality/value - The paper explores a central concept within information retrieval and information science in a new way. It also supplies a fresh perspective on the development of information retrieval during the 1960s and 1970s.
-
Multi-source, multilingual information extraction and summarization (2013)
0.04
0.038618177 = product of:
0.15447271 = sum of:
0.15447271 = weight(_text_:handle in 1978) [ClassicSimilarity], result of:
0.15447271 = score(doc=1978,freq=2.0), product of:
0.42740422 = queryWeight, product of:
6.5424123 = idf(docFreq=173, maxDocs=44421)
0.06532823 = queryNorm
0.36142063 = fieldWeight in 1978, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
6.5424123 = idf(docFreq=173, maxDocs=44421)
0.0390625 = fieldNorm(doc=1978)
0.25 = coord(1/4)
- Abstract
- Information extraction (IE) and text summarization (TS) are powerful technologies for finding relevant pieces of information in text and presenting them to the user in condensed form. The ongoing information explosion makes IE and TS critical for successful functioning within the information society. These technologies face particular challenges due to the inherent multi-source nature of the information explosion. The technologies must now handle not isolated texts or individual narratives, but rather large-scale repositories and streams---in general, in multiple languages---containing a multiplicity of perspectives, opinions, or commentaries on particular topics, entities or events. There is thus a need to adapt existing techniques and develop new ones to deal with these challenges. This volume contains a selection of papers that present a variety of methodologies for content identification and extraction, as well as for content fusion and regeneration. The chapters cover various aspects of the challenges, depending on the nature of the information sought---names vs. events,--- and the nature of the sources---news streams vs. image captions vs. scientific research papers, etc. This volume aims to offer a broad and representative sample of studies from this very active research field.
-
Future of online catalogues : Essen symposium, 30.9.-3.10.1985 (1986)
0.04
0.038618177 = product of:
0.15447271 = sum of:
0.15447271 = weight(_text_:handle in 3077) [ClassicSimilarity], result of:
0.15447271 = score(doc=3077,freq=2.0), product of:
0.42740422 = queryWeight, product of:
6.5424123 = idf(docFreq=173, maxDocs=44421)
0.06532823 = queryNorm
0.36142063 = fieldWeight in 3077, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
6.5424123 = idf(docFreq=173, maxDocs=44421)
0.0390625 = fieldNorm(doc=3077)
0.25 = coord(1/4)
- Abstract
- In the late 1970s libraries suddenly recognized the importance of online catalogues. Advanced computer technology can handle massive bibliographic records and direct user inquiries (cataloguing and circulation) and online access is much more adequate than a cara and COM catalogue. There are several problems associated woth online public access catalogues as they are designed primarily for direct use by library patrons without knowledge of library cataloguing rules, yet the introduction of online catalogues extends the services that a library offers in the sense of efficiency, productivity and cooperation with other libraries, i. e. users and staff.
-
Francu, V.; Dediu, L.-I.: TinREAD - an integrative solution for subject authority control (2015)
0.04
0.038618177 = product of:
0.15447271 = sum of:
0.15447271 = weight(_text_:handle in 3297) [ClassicSimilarity], result of:
0.15447271 = score(doc=3297,freq=2.0), product of:
0.42740422 = queryWeight, product of:
6.5424123 = idf(docFreq=173, maxDocs=44421)
0.06532823 = queryNorm
0.36142063 = fieldWeight in 3297, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
6.5424123 = idf(docFreq=173, maxDocs=44421)
0.0390625 = fieldNorm(doc=3297)
0.25 = coord(1/4)
- Abstract
- The paper introduces TinREAD (The Information Navigator for Readers), an integrated library system produced by IME Romania. The main feature of interest is the way TinREAD can handle a classification-based thesaurus in which verbal index terms are mapped to classification notations. It supports subject authority control interlinking the authority files (subject headings and UDC system). Authority files are used for indexing consistency. Although it is said that intellectual indexing is, unlike automated indexing, both subjective and inconsistent, TinREAD is using intellectual indexing as input (the UDC notations assigned to documents) for the automated indexing resulting from the implementation of a thesaurus structure based on UDC. Each UDC notation is represented by a UNIMARC subject heading record as authority data. One classification notation can be used to search simultaneously into more than one corresponding thesaurus. This way natural language terms are used in indexing and, at the same time, the link with the corresponding classification notation is kept. Additionally, the system can also manage multilingual data for the authority files. This, together with other characteristics of TinREAD are largely discussed and illustrated in the paper. Problems encountered and possible solutions to tackle them are shown.
-
Kalman, Y.M.; Ravid, G.: Filing, piling, and everything in between : the dynamics of E-mail inbox management (2015)
0.04
0.038618177 = product of:
0.15447271 = sum of:
0.15447271 = weight(_text_:handle in 3336) [ClassicSimilarity], result of:
0.15447271 = score(doc=3336,freq=2.0), product of:
0.42740422 = queryWeight, product of:
6.5424123 = idf(docFreq=173, maxDocs=44421)
0.06532823 = queryNorm
0.36142063 = fieldWeight in 3336, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
6.5424123 = idf(docFreq=173, maxDocs=44421)
0.0390625 = fieldNorm(doc=3336)
0.25 = coord(1/4)
- Abstract
- Managing the constant flow of incoming messages is a daily challenge faced by knowledge workers who use technologies such as e-mail and other digital communication tools. This study focuses on the most ubiquitous of these technologies, e-mail, and unobtrusively explores the ongoing inbox-management activities of thousands of users worldwide over a period of 8 months. The study describes the dynamics of these inboxes throughout the day and the week as users strive to handle incoming messages, read them, classify them, respond to them in a timely manner, and archive them for future reference, all while carrying out the daily tasks of knowledge workers. It then tests several hypotheses about the influence of specific inbox-management behaviors in mitigating the causes of e-mail overload, and proposes a continuous index that quantifies one of these inbox-management behaviors. This inbox clearing index (ICI) expands on the widely cited trichotomous classification of users into frequent filers, spring cleaners, and no filers, as suggested by Whittaker and Sidner (1996). We propose that the ICI allows shifting the focus, from classifying users to characterizing a diversity of user behaviors and measuring the relationships between these behaviors and desired outcomes.
-
Pontis, S.; Blandford, A.: Understanding "influence" : an empirical test of the Data-Frame Theory of Sensemaking (2016)
0.04
0.038618177 = product of:
0.15447271 = sum of:
0.15447271 = weight(_text_:handle in 3847) [ClassicSimilarity], result of:
0.15447271 = score(doc=3847,freq=2.0), product of:
0.42740422 = queryWeight, product of:
6.5424123 = idf(docFreq=173, maxDocs=44421)
0.06532823 = queryNorm
0.36142063 = fieldWeight in 3847, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
6.5424123 = idf(docFreq=173, maxDocs=44421)
0.0390625 = fieldNorm(doc=3847)
0.25 = coord(1/4)
- Abstract
- This paper reports findings from a study designed to gain broader understanding of sensemaking activities using the Data/Frame Theory as the analytical framework. Although this theory is one of the dominant models of sensemaking, it has not been extensively tested with a range of sensemaking tasks. The tasks discussed here focused on making sense of structures rather than processes or narratives. Eleven researchers were asked to construct understanding of how a scientific community in a particular domain is organized (e.g., people, relationships, contributions, factors) by exploring the concept of "influence" in academia. This topic was chosen because, although researchers frequently handle this type of task, it is unlikely that they have explicitly sought this type of information. We conducted a think-aloud study and semistructured interviews with junior and senior researchers from the human-computer interaction (HCI) domain, asking them to identify current leaders and rising stars in both HCI and chemistry. Data were coded and analyzed using the Data/Frame Model to both test and extend the model. Three themes emerged from the analysis: novices and experts' sensemaking activity chains, constructing frames through indicators, and characteristics of structure tasks. We propose extensions to the Data/Frame Model to accommodate structure sensemaking.
-
Bar-Hillel, Y.; Carnap, R.: ¬An outline of a theory of semantic information (1952)
0.04
0.038618177 = product of:
0.15447271 = sum of:
0.15447271 = weight(_text_:handle in 4369) [ClassicSimilarity], result of:
0.15447271 = score(doc=4369,freq=2.0), product of:
0.42740422 = queryWeight, product of:
6.5424123 = idf(docFreq=173, maxDocs=44421)
0.06532823 = queryNorm
0.36142063 = fieldWeight in 4369, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
6.5424123 = idf(docFreq=173, maxDocs=44421)
0.0390625 = fieldNorm(doc=4369)
0.25 = coord(1/4)
- Content
- Vgl.: https://dspace.mit.edu/bitstream/handle/1721.1/4821/RLE-TR-247-03150899.pdf?sequence=1.
-
Victorino, M.; Terto de Holanda, M.; Ishikawa, E.; Costa Oliveira, E.; Chhetri, S.: Transforming open data to linked open data using ontologies for information organization in big data environments of the Brazilian Government : the Brazilian database Government Open Linked Data - DBgoldbr (2018)
0.04
0.038618177 = product of:
0.15447271 = sum of:
0.15447271 = weight(_text_:handle in 532) [ClassicSimilarity], result of:
0.15447271 = score(doc=532,freq=2.0), product of:
0.42740422 = queryWeight, product of:
6.5424123 = idf(docFreq=173, maxDocs=44421)
0.06532823 = queryNorm
0.36142063 = fieldWeight in 532, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
6.5424123 = idf(docFreq=173, maxDocs=44421)
0.0390625 = fieldNorm(doc=532)
0.25 = coord(1/4)
- Abstract
- The Brazilian Government has made a massive volume of structured, semi-structured and non-structured public data available on the web to ensure that the administration is as transparent as possible. Subsequently, providing applications with enough capability to handle this "big data environment" so that vital and decisive information is readily accessible, has become a tremendous challenge. In this environment, data processing is done via new approaches in the area of information and computer science, involving technologies and processes for collecting, representing, storing and disseminating information. Along these lines, this paper presents a conceptual model, the technical architecture and the prototype implementation of a tool, denominated DBgoldbr, designed to classify government public information with the help of ontologies, by transforming open data into open linked data. To achieve this objective, we used "soft system methodology" to identify problems, to collect users needs and to design solutions according to the objectives of specific groups. The DBgoldbr tool was designed to facilitate the search for open data made available by many Brazilian government institutions, so that this data can be reused to support the evaluation and monitoring of social programs, in order to support the design and management of public policies.
-
Cavalcante Dourado, Í.; Galante, R.; Gonçalves, M.A.; Silva Torres, R. de: Bag of textual graphs (BoTG) : a general graph-based text representation model (2019)
0.04
0.038618177 = product of:
0.15447271 = sum of:
0.15447271 = weight(_text_:handle in 291) [ClassicSimilarity], result of:
0.15447271 = score(doc=291,freq=2.0), product of:
0.42740422 = queryWeight, product of:
6.5424123 = idf(docFreq=173, maxDocs=44421)
0.06532823 = queryNorm
0.36142063 = fieldWeight in 291, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
6.5424123 = idf(docFreq=173, maxDocs=44421)
0.0390625 = fieldNorm(doc=291)
0.25 = coord(1/4)
- Abstract
- Text representation models are the fundamental basis for information retrieval and text mining tasks. Although different text models have been proposed, they typically target specific task aspects in isolation, such as time efficiency, accuracy, or applicability for different scenarios. Here we present Bag of Textual Graphs (BoTG), a general text representation model that addresses these three requirements at the same time. The proposed textual representation is based on a graph-based scheme that encodes term proximity and term ordering, and represents text documents into an efficient vector space that addresses all these aspects as well as provides discriminative textual patterns. Extensive experiments are conducted in two experimental scenarios-classification and retrieval-considering multiple well-known text collections. We also compare our model against several methods from the literature. Experimental results demonstrate that our model is generic enough to handle different tasks and collections. It is also more efficient than the widely used state-of-the-art methods in textual classification and retrieval tasks, with a competitive effectiveness, sometimes with gains by large margins.
-
Saarti, J.: Fictional literature : classification and indexing (2019)
0.04
0.038618177 = product of:
0.15447271 = sum of:
0.15447271 = weight(_text_:handle in 315) [ClassicSimilarity], result of:
0.15447271 = score(doc=315,freq=2.0), product of:
0.42740422 = queryWeight, product of:
6.5424123 = idf(docFreq=173, maxDocs=44421)
0.06532823 = queryNorm
0.36142063 = fieldWeight in 315, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
6.5424123 = idf(docFreq=173, maxDocs=44421)
0.0390625 = fieldNorm(doc=315)
0.25 = coord(1/4)
- Abstract
- Fiction content analysis and retrieval are interesting specific topics for two major reasons: 1) the extensive use of fictional works; and, 2) the multimodality and interpretational nature of fiction. The primary challenge in the analysis of fictional content is that there is no single meaning to be analysed; the analysis is an ongoing process involving an interaction between the text produced by author, the reader and the society in which the interaction occurs. Furthermore, different audiences have specific needs to be taken into consideration. This article explores the topic of fiction knowledge organization, including both classification and indexing. It provides a broad and analytical overview of the literature as well as describing several experimental approaches and developmental projects for the analysis of fictional content. Traditional fiction indexing has been mainly based on the factual aspects of the work; this has then been expanded to handle different aspects of the fictional work. There have been attempts made to develop vocabularies for fiction indexing. All the major classification schemes use the genre and language/culture of fictional works when subdividing fictional works into subclasses. The evolution of shelf classification of fiction and the appearance of different types of digital tools have revolutionized the classification of fiction, making it possible to integrate both indexing and classification of fictional works.
-
Bidwell, S.: Curiosities of light and sight (1899)
0.04
0.038618177 = product of:
0.15447271 = sum of:
0.15447271 = weight(_text_:handle in 783) [ClassicSimilarity], result of:
0.15447271 = score(doc=783,freq=2.0), product of:
0.42740422 = queryWeight, product of:
6.5424123 = idf(docFreq=173, maxDocs=44421)
0.06532823 = queryNorm
0.36142063 = fieldWeight in 783, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
6.5424123 = idf(docFreq=173, maxDocs=44421)
0.0390625 = fieldNorm(doc=783)
0.25 = coord(1/4)
- Abstract
- The following chapters are based upon notes of several unconnected lectures addressed to audiences of very different classes in the theatres of the Royal Institution, the London Institution, the Leeds Philosophical and Literary Society, and Caius House, Battersea. In preparing the notes for publication the matter has been re-arranged with the object of presenting it, as far as might be, in methodical order; additions and omissions have been freely made, and numerous diagrams, illustrative of the apparatus and experiments described, have been provided. I do not know that any apology is needed for offering the collection as thus re-modelled to a larger public. Though the essays are, for the most part, of a popular and informal character, they touch upon a number of curious matters of which no readily accessible account has yet appeared, while, even in the most elementary parts, an attempt has been made to handle the subject with some degree of freshness. The interesting subjective phenomena which are associated with the sense of vision do not appear to have received in this country the attention they deserve. This little book may perhaps be of some slight service in suggesting to experimentalists, both professional and amateur, an attractive field of research which has hitherto been only partially explored.
-
Peponakis, M.; Mastora, A.; Kapidakis, S.; Doerr, M.: Expressiveness and machine processability of Knowledge Organization Systems (KOS) : an analysis of concepts and relations (2020)
0.04
0.038618177 = product of:
0.15447271 = sum of:
0.15447271 = weight(_text_:handle in 787) [ClassicSimilarity], result of:
0.15447271 = score(doc=787,freq=2.0), product of:
0.42740422 = queryWeight, product of:
6.5424123 = idf(docFreq=173, maxDocs=44421)
0.06532823 = queryNorm
0.36142063 = fieldWeight in 787, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
6.5424123 = idf(docFreq=173, maxDocs=44421)
0.0390625 = fieldNorm(doc=787)
0.25 = coord(1/4)
- Abstract
- This study considers the expressiveness (that is the expressive power or expressivity) of different types of Knowledge Organization Systems (KOS) and discusses its potential to be machine-processable in the context of the Semantic Web. For this purpose, the theoretical foundations of KOS are reviewed based on conceptualizations introduced by the Functional Requirements for Subject Authority Data (FRSAD) and the Simple Knowledge Organization System (SKOS); natural language processing techniques are also implemented. Applying a comparative analysis, the dataset comprises a thesaurus (Eurovoc), a subject headings system (LCSH) and a classification scheme (DDC). These are compared with an ontology (CIDOC-CRM) by focusing on how they define and handle concepts and relations. It was observed that LCSH and DDC focus on the formalism of character strings (nomens) rather than on the modelling of semantics; their definition of what constitutes a concept is quite fuzzy, and they comprise a large number of complex concepts. By contrast, thesauri have a coherent definition of what constitutes a concept, and apply a systematic approach to the modelling of relations. Ontologies explicitly define diverse types of relations, and are by their nature machine-processable. The paper concludes that the potential of both the expressiveness and machine processability of each KOS is extensively regulated by its structural rules. It is harder to represent subject headings and classification schemes as semantic networks with nodes and arcs, while thesauri are more suitable for such a representation. In addition, a paradigm shift is revealed which focuses on the modelling of relations between concepts, rather than the concepts themselves.
-
Díez Platas, M.L.; Muñoz, S.R.; González-Blanco, E.; Ruiz Fabo, P.; Álvarez Mellado, E.: Medieval Spanish (12th-15th centuries) named entity recognition and attribute annotation system based on contextual information (2021)
0.04
0.038618177 = product of:
0.15447271 = sum of:
0.15447271 = weight(_text_:handle in 1094) [ClassicSimilarity], result of:
0.15447271 = score(doc=1094,freq=2.0), product of:
0.42740422 = queryWeight, product of:
6.5424123 = idf(docFreq=173, maxDocs=44421)
0.06532823 = queryNorm
0.36142063 = fieldWeight in 1094, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
6.5424123 = idf(docFreq=173, maxDocs=44421)
0.0390625 = fieldNorm(doc=1094)
0.25 = coord(1/4)
- Abstract
- The recognition of named entities in Spanish medieval texts presents great complexity, involving specific challenges: First, the complex morphosyntactic characteristics in proper-noun use in medieval texts. Second, the lack of strict orthographic standards. Finally, diachronic and geographical variations in Spanish from the 12th to 15th century. In this period, named entities usually appear as complex text structure. For example, it was frequent to add nicknames and information about the persons role in society and geographic origin. To tackle this complexity, named entity recognition and classification system has been implemented. The system uses contextual cues based on semantics to detect entities and assign a type. Given the occurrence of entities with attached attributes, entity contexts are also parsed to determine entity-type-specific dependencies for these attributes. Moreover, it uses a variant generator to handle the diachronic evolution of Spanish medieval terms from a phonetic and morphosyntactic viewpoint. The tool iteratively enriches its proper lexica, dictionaries, and gazetteers. The system was evaluated on a corpus of over 3,000 manually annotated entities of different types and periods, obtaining F1 scores between 0.74 and 0.87. Attribute annotation was evaluated for a person and role name attributes with an overall F1 of 0.75.