-
Milstead, J.L.: Thesauri in a full-text world (1998)
0.06
0.06070148 = product of:
0.12140296 = sum of:
0.018726096 = weight(_text_:und in 3337) [ClassicSimilarity], result of:
0.018726096 = score(doc=3337,freq=2.0), product of:
0.15283768 = queryWeight, product of:
2.217899 = idf(docFreq=13141, maxDocs=44421)
0.068911016 = queryNorm
0.12252277 = fieldWeight in 3337, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
2.217899 = idf(docFreq=13141, maxDocs=44421)
0.0390625 = fieldNorm(doc=3337)
0.10267686 = weight(_text_:human in 3337) [ClassicSimilarity], result of:
0.10267686 = score(doc=3337,freq=4.0), product of:
0.30094394 = queryWeight, product of:
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.068911016 = queryNorm
0.34118268 = fieldWeight in 3337, product of:
2.0 = tf(freq=4.0), with freq of:
4.0 = termFreq=4.0
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.0390625 = fieldNorm(doc=3337)
0.5 = coord(2/4)
- Abstract
- Despite early claims to the contemporary, thesauri continue to find use as access tools for information in the full-text environment. Their mode of use is changing, but this change actually represents an expansion rather than a contrdiction of their utility. Thesauri and similar vocabulary tools can complement full-text access by aiding users in focusing their searches, by supplementing the linguistic analysis of the text search engine, and even by serving as one of the tools used by the linguistic engine for its analysis. While human indexing contunues to be used for many databases, the trend is to increase the use of machine aids for this purpose. All machine-aided indexing (MAI) systems rely on thesauri as the basis for term selection. In the 21st century, the balance of effort between human and machine will change at both input and output, but thesauri will continue to play an important role for the foreseeable future
- Theme
- Konzeption und Anwendung des Prinzips Thesaurus
-
Mongin, L.; Fu, Y.Y.; Mostafa, J.: Open Archives data Service prototype and automated subject indexing using D-Lib archive content as a testbed (2003)
0.06
0.056722585 = product of:
0.22689034 = sum of:
0.22689034 = weight(_text_:java in 2167) [ClassicSimilarity], result of:
0.22689034 = score(doc=2167,freq=2.0), product of:
0.4856509 = queryWeight, product of:
7.0475073 = idf(docFreq=104, maxDocs=44421)
0.068911016 = queryNorm
0.46718815 = fieldWeight in 2167, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
7.0475073 = idf(docFreq=104, maxDocs=44421)
0.046875 = fieldNorm(doc=2167)
0.25 = coord(1/4)
- Abstract
- The Indiana University School of Library and Information Science opened a new research laboratory in January 2003; The Indiana University School of Library and Information Science Information Processing Laboratory [IU IP Lab]. The purpose of the new laboratory is to facilitate collaboration between scientists in the department in the areas of information retrieval (IR) and information visualization (IV) research. The lab has several areas of focus. These include grid and cluster computing, and a standard Java-based software platform to support plug and play research datasets, a selection of standard IR modules and standard IV algorithms. Future development includes software to enable researchers to contribute datasets, IR algorithms, and visualization algorithms into the standard environment. We decided early on to use OAI-PMH as a resource discovery tool because it is consistent with our mission.
-
Ward, M.L.: ¬The future of the human indexer (1996)
0.05
0.048703913 = product of:
0.19481565 = sum of:
0.19481565 = weight(_text_:human in 313) [ClassicSimilarity], result of:
0.19481565 = score(doc=313,freq=10.0), product of:
0.30094394 = queryWeight, product of:
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.068911016 = queryNorm
0.64734864 = fieldWeight in 313, product of:
3.1622777 = tf(freq=10.0), with freq of:
10.0 = termFreq=10.0
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.046875 = fieldNorm(doc=313)
0.25 = coord(1/4)
- Abstract
- Considers the principles of indexing and the intellectual skills involved in order to determine what automatic indexing systems would be required in order to supplant or complement the human indexer. Good indexing requires: considerable prior knowledge of the literature; judgement as to what to index and what depth to index; reading skills; abstracting skills; and classification skills, Illustrates these features with a detailed description of abstracting and indexing processes involved in generating entries for the mechanical engineering database POWERLINK. Briefly assesses the possibility of replacing human indexers with specialist indexing software, with particular reference to the Object Analyzer from the InTEXT automatic indexing system and using the criteria described for human indexers. At present, it is unlikely that the automatic indexer will replace the human indexer, but when more primary texts are available in electronic form, it may be a useful productivity tool for dealing with large quantities of low grade texts (should they be wanted in the database)
-
Anderson, J.D.; Pérez-Carballo, J.: ¬The nature of indexing: how humans and machines analyze messages and texts for retrieval : Part I: Research and the nature of human indexing (2001)
0.04
0.043562103 = product of:
0.17424841 = sum of:
0.17424841 = weight(_text_:human in 4136) [ClassicSimilarity], result of:
0.17424841 = score(doc=4136,freq=2.0), product of:
0.30094394 = queryWeight, product of:
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.068911016 = queryNorm
0.5790062 = fieldWeight in 4136, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.09375 = fieldNorm(doc=4136)
0.25 = coord(1/4)
-
Jones, R.L.: Automatic document content analysis : the AIDA project (1992)
0.04
0.036301754 = product of:
0.14520702 = sum of:
0.14520702 = weight(_text_:human in 2606) [ClassicSimilarity], result of:
0.14520702 = score(doc=2606,freq=2.0), product of:
0.30094394 = queryWeight, product of:
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.068911016 = queryNorm
0.4825052 = fieldWeight in 2606, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.078125 = fieldNorm(doc=2606)
0.25 = coord(1/4)
- Abstract
- The AIDA project is a research program being carried out by Computer Power in Canberra, Australia, in collaboration with the Australian Parliament. Its primary objective is to develop practical methods for carrying out document content analysis with minimal human intervention. The different techniques employed by AIDA to achieve its results are described
-
Anderson, J.D.; Pérez-Carballo, J.: ¬The nature of indexing: how humans and machines analyze messages and texts for retrieval : Part II: Machine indexing, and the allocation of human versus machine effort (2001)
0.04
0.036301754 = product of:
0.14520702 = sum of:
0.14520702 = weight(_text_:human in 1368) [ClassicSimilarity], result of:
0.14520702 = score(doc=1368,freq=2.0), product of:
0.30094394 = queryWeight, product of:
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.068911016 = queryNorm
0.4825052 = fieldWeight in 1368, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.078125 = fieldNorm(doc=1368)
0.25 = coord(1/4)
-
Losee, R.M.: ¬A Gray code based ordering for documents on shelves : classification for browsing and retrieval (1992)
0.04
0.035936903 = product of:
0.14374761 = sum of:
0.14374761 = weight(_text_:human in 2334) [ClassicSimilarity], result of:
0.14374761 = score(doc=2334,freq=4.0), product of:
0.30094394 = queryWeight, product of:
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.068911016 = queryNorm
0.47765577 = fieldWeight in 2334, product of:
2.0 = tf(freq=4.0), with freq of:
4.0 = termFreq=4.0
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.0546875 = fieldNorm(doc=2334)
0.25 = coord(1/4)
- Abstract
- A document classifier places documents together in a linear arrangement for browsing or high-speed access by human or computerised information retrieval systems. Requirements for document classification and browsing systems are developed from similarity measures, distance measures, and the notion of subject aboutness. A requirement that documents be arranged in decreasing order of similarity as the distance from a given document increases can often not be met. Based on these requirements, information-theoretic considerations, and the Gray code, a classification system is proposed that can classifiy documents without human intervention. A measure of classifier performance is developed, and used to evaluate experimental results comparing the distance between subject headings assigned to documents given classifications from the proposed system and the Library of Congress Classification (LCC) system
-
Plaunt, C.; Norgard, B.A.: ¬An association-based method for automatic indexing with a controlled vocabulary (1998)
0.03
0.031438243 = product of:
0.12575297 = sum of:
0.12575297 = weight(_text_:human in 2794) [ClassicSimilarity], result of:
0.12575297 = score(doc=2794,freq=6.0), product of:
0.30094394 = queryWeight, product of:
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.068911016 = queryNorm
0.41786176 = fieldWeight in 2794, product of:
2.4494898 = tf(freq=6.0), with freq of:
6.0 = termFreq=6.0
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.0390625 = fieldNorm(doc=2794)
0.25 = coord(1/4)
- Abstract
- In this article, we describe and test a two-stage algorithm based on a lexical collocation technique which maps from the lexical clues contained in a document representation into a controlled vocabulary list of subject headings. Using a collection of 4.626 INSPEC documents, we create a 'dictionary' of associations between the lexical items contained in the titles, authors, and abstracts, and controlled vocabulary subject headings assigned to those records by human indexers using a likelihood ratio statistic as the measure of association. In the deployment stage, we use the dictiony to predict which of the controlled vocabulary subject headings best describe new documents when they are presented to the system. Our evaluation of this algorithm, in which we compare the automatically assigned subject headings to the subject headings assigned to the test documents by human catalogers, shows that we can obtain results comparable to, and consistent with, human cataloging. In effect we have cast this as a classic partial match information retrieval problem. We consider the problem to be one of 'retrieving' (or assigning) the most probably 'relevant' (or correct) controlled vocabulary subject headings to a document based on the clues contained in that document
-
Humphrey, S.M.; Névéol, A.; Browne, A.; Gobeil, J.; Ruch, P.; Darmoni, S.J.: Comparing a rule-based versus statistical system for automatic categorization of MEDLINE documents according to biomedical specialty (2009)
0.03
0.031438243 = product of:
0.12575297 = sum of:
0.12575297 = weight(_text_:human in 287) [ClassicSimilarity], result of:
0.12575297 = score(doc=287,freq=6.0), product of:
0.30094394 = queryWeight, product of:
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.068911016 = queryNorm
0.41786176 = fieldWeight in 287, product of:
2.4494898 = tf(freq=6.0), with freq of:
6.0 = termFreq=6.0
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.0390625 = fieldNorm(doc=287)
0.25 = coord(1/4)
- Abstract
- Automatic document categorization is an important research problem in Information Science and Natural Language Processing. Many applications, including, Word Sense Disambiguation and Information Retrieval in large collections, can benefit from such categorization. This paper focuses on automatic categorization of documents from the biomedical literature into broad discipline-based categories. Two different systems are described and contrasted: CISMeF, which uses rules based on human indexing of the documents by the Medical Subject Headings (MeSH) controlled vocabulary in order to assign metaterms (MTs), and Journal Descriptor Indexing (JDI), based on human categorization of about 4,000 journals and statistical associations between journal descriptors (JDs) and textwords in the documents. We evaluate and compare the performance of these systems against a gold standard of humanly assigned categories for 100 MEDLINE documents, using six measures selected from trec_eval. The results show that for five of the measures performance is comparable, and for one measure JDI is superior. We conclude that these results favor JDI, given the significantly greater intellectual overhead involved in human indexing and maintaining a rule base for mapping MeSH terms to MTs. We also note a JDI method that associates JDs with MeSH indexing rather than textwords, and it may be worthwhile to investigate whether this JDI method (statistical) and CISMeF (rule-based) might be combined and then evaluated showing they are complementary to one another.
-
Bloomfield, M.: Indexing : neglected and poorly understood (2001)
0.03
0.03080306 = product of:
0.12321224 = sum of:
0.12321224 = weight(_text_:human in 439) [ClassicSimilarity], result of:
0.12321224 = score(doc=439,freq=4.0), product of:
0.30094394 = queryWeight, product of:
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.068911016 = queryNorm
0.40941924 = fieldWeight in 439, product of:
2.0 = tf(freq=4.0), with freq of:
4.0 = termFreq=4.0
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.046875 = fieldNorm(doc=439)
0.25 = coord(1/4)
- Abstract
- The growth of the Internet has highlighted the use of machine indexing. The difficulties in using the Internet as a searching device can be frustrating. The use of the term "Python" is given as an example. Machine indexing is noted as "rotten" and human indexing as "capricious." The problem seems to be a lack of a theoretical foundation for the art of indexing. What librarians have learned over the last hundred years has yet to yield a consistent approach to what really works best in preparing index terms and in the ability of our customers to search the various indexes. An attempt is made to consider the elements of indexing, their pros and cons. The argument is made that machine indexing is far too prolific in its production of index terms. Neither librarians nor computer programmers have made much progress to improve Internet indexing. Human indexing has had the same problems for over fifty years.
-
Fauzi, F.; Belkhatir, M.: Multifaceted conceptual image indexing on the world wide web (2013)
0.03
0.03080306 = product of:
0.12321224 = sum of:
0.12321224 = weight(_text_:human in 3721) [ClassicSimilarity], result of:
0.12321224 = score(doc=3721,freq=4.0), product of:
0.30094394 = queryWeight, product of:
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.068911016 = queryNorm
0.40941924 = fieldWeight in 3721, product of:
2.0 = tf(freq=4.0), with freq of:
4.0 = termFreq=4.0
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.046875 = fieldNorm(doc=3721)
0.25 = coord(1/4)
- Abstract
- In this paper, we describe a user-centered design of an automated multifaceted concept-based indexing framework which analyzes the semantics of the Web image contextual information and classifies it into five broad semantic concept facets: signal, object, abstract, scene, and relational; and identifies the semantic relationships between the concepts. An important aspect of our indexing model is that it relates to the users' levels of image descriptions. Also, a major contribution relies on the fact that the classification is performed automatically with the raw image contextual information extracted from any general webpage and is not solely based on image tags like state-of-the-art solutions. Human Language Technology techniques and an external knowledge base are used to analyze the information both syntactically and semantically. Experimental results on a human-annotated Web image collection and corresponding contextual information indicate that our method outperforms empirical frameworks employing tf-idf and location-based tf-idf weighting schemes as well as n-gram indexing in a recall/precision based evaluation framework.
-
Koryconski, C.; Newell, A.F.: Natural-language processing and automatic indexing (1990)
0.03
0.029041402 = product of:
0.11616561 = sum of:
0.11616561 = weight(_text_:human in 2312) [ClassicSimilarity], result of:
0.11616561 = score(doc=2312,freq=2.0), product of:
0.30094394 = queryWeight, product of:
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.068911016 = queryNorm
0.38600415 = fieldWeight in 2312, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.0625 = fieldNorm(doc=2312)
0.25 = coord(1/4)
- Abstract
- The task of producing satisfactory indexes by automatic means has been tackled on two fronts: by statistical analysis of text and by attempting content analysis of the text in much the same way as a human indexer does. Though statistical techniques have a lot to offer for free-text database systems, neither method has had much success with back-of-the-book indexing. This review examines some problems associated with the application of natural-language processing techniques to book texts. - Vgl. auch die Erwiderung von K.P. Jones
-
Wellisch, H.H.: ¬The art of indexing and some fallacies of its automation (1992)
0.03
0.029041402 = product of:
0.11616561 = sum of:
0.11616561 = weight(_text_:human in 3957) [ClassicSimilarity], result of:
0.11616561 = score(doc=3957,freq=2.0), product of:
0.30094394 = queryWeight, product of:
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.068911016 = queryNorm
0.38600415 = fieldWeight in 3957, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.0625 = fieldNorm(doc=3957)
0.25 = coord(1/4)
- Abstract
- Reviews the history of indexing, which began with the rise of the universities in the 13th century, before the invention of printing. Describes the different skills needed for indexing books, periodicals and databases. States the belief that the quest for fully automatic indexing is a futile endeavour; machine-generated indexes need the services of human post-editors if they are to be useful and acceptable
-
Malone, L.C.; Driscoll, J.R.; Pepe, J.W.: Modeling the performance of an automated keywording system (1991)
0.03
0.029041402 = product of:
0.11616561 = sum of:
0.11616561 = weight(_text_:human in 6681) [ClassicSimilarity], result of:
0.11616561 = score(doc=6681,freq=2.0), product of:
0.30094394 = queryWeight, product of:
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.068911016 = queryNorm
0.38600415 = fieldWeight in 6681, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.0625 = fieldNorm(doc=6681)
0.25 = coord(1/4)
- Abstract
- Presents a model for predicting the performance of a computerised keyword assigning and indexing system. Statistical procedures were investigated in order to protect against incorrect keywording by the system behaving as an expert system designed to mimic the behaviour of human keyword indexers and representing lessons learned from military exercises and operations
-
Tsujii, J.-I.: Automatic acquisition of semantic collocation from corpora (1995)
0.03
0.029041402 = product of:
0.11616561 = sum of:
0.11616561 = weight(_text_:human in 4777) [ClassicSimilarity], result of:
0.11616561 = score(doc=4777,freq=2.0), product of:
0.30094394 = queryWeight, product of:
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.068911016 = queryNorm
0.38600415 = fieldWeight in 4777, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.0625 = fieldNorm(doc=4777)
0.25 = coord(1/4)
- Abstract
- Proposes automatic linguistic knowledge acquisition from sublanguage corpora. The system combines existing linguistic knowledge and human intervention with corpus based techniques. The algorithm involves a gradual approximation which works to converge linguistic knowledge gradually towards desirable results. The 1st experiment revealed the characteristic of this algorithm and the others proved the effectiveness of this algorithm for a real corpus
-
Jones, S.; Paynter, G.W.: Automatic extractionof document keyphrases for use in digital libraries : evaluations and applications (2002)
0.03
0.025669215 = product of:
0.10267686 = sum of:
0.10267686 = weight(_text_:human in 1601) [ClassicSimilarity], result of:
0.10267686 = score(doc=1601,freq=4.0), product of:
0.30094394 = queryWeight, product of:
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.068911016 = queryNorm
0.34118268 = fieldWeight in 1601, product of:
2.0 = tf(freq=4.0), with freq of:
4.0 = termFreq=4.0
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.0390625 = fieldNorm(doc=1601)
0.25 = coord(1/4)
- Abstract
- This article describes an evaluation of the Kea automatic keyphrase extraction algorithm. Document keyphrases are conventionally used as concise descriptors of document content, and are increasingly used in novel ways, including document clustering, searching and browsing interfaces, and retrieval engines. However, it is costly and time consuming to manually assign keyphrases to documents, motivating the development of tools that automatically perform this function. Previous studies have evaluated Kea's performance by measuring its ability to identify author keywords and keyphrases, but this methodology has a number of well-known limitations. The results presented in this article are based on evaluations by human assessors of the quality and appropriateness of Kea keyphrases. The results indicate that, in general, Kea produces keyphrases that are rated positively by human assessors. However, typical Kea settings can degrade performance, particularly those relating to keyphrase length and domain specificity. We found that for some settings, Kea's performance is better than that of similar systems, and that Kea's ranking of extracted keyphrases is effective. We also determined that author-specified keyphrases appear to exhibit an inherent ranking, and that they are rated highly and therefore suitable for use in training and evaluation of automatic keyphrasing systems.
-
Driscoll, J.R.; Rajala, D.A.; Shaffer, W.H.: ¬The operation and performance of an artificially intelligent keywording system (1991)
0.03
0.025411226 = product of:
0.1016449 = sum of:
0.1016449 = weight(_text_:human in 6680) [ClassicSimilarity], result of:
0.1016449 = score(doc=6680,freq=2.0), product of:
0.30094394 = queryWeight, product of:
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.068911016 = queryNorm
0.33775362 = fieldWeight in 6680, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.0546875 = fieldNorm(doc=6680)
0.25 = coord(1/4)
- Abstract
- Presents a new approach to text analysis for automating the key phrase indexing process, using artificial intelligence techniques. This mimics the behaviour of human experts by using a rule base consisting of insertion and deletion rules generated by subject-matter experts. The insertion rules are based on the idea that some phrases found in a text imply or trigger other phrases. The deletion rules apply to semantically ambiguous phrases where text presence alone does not determine appropriateness as a key phrase. The insertion and deletion rules are used to transform a list of found phrases to a list of key phrases for indexing a document. Statistical data are provided to demonstrate the performance of this expert rule based system
-
Krutulis, J.D.; Jacob, E.K.: ¬A theoretical model for the study of emergent structure in adaptive information networks (1995)
0.03
0.025411226 = product of:
0.1016449 = sum of:
0.1016449 = weight(_text_:human in 3421) [ClassicSimilarity], result of:
0.1016449 = score(doc=3421,freq=2.0), product of:
0.30094394 = queryWeight, product of:
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.068911016 = queryNorm
0.33775362 = fieldWeight in 3421, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.0546875 = fieldNorm(doc=3421)
0.25 = coord(1/4)
- Abstract
- Attempts to automate classification have focused on mimicking the intellectual processes whereby human classifiers assign entities to mutually exclusive groups that exhibit or more shared characteristics. A more viable approach might be to construct an adaptive retrieval system that produces groupings of related entities by generating dynamic categories based on document content and on the system's emergent structure as it adapts to modifications in the database and to observed patterns of access. Presents a theoretical model for adaptive information networks using relevance feedback and genetic algorithms to generate emergent structure
-
Hafer, M.A.; Weiss, S.F.: Word segmentation by letter successor varieties (1974)
0.03
0.025411226 = product of:
0.1016449 = sum of:
0.1016449 = weight(_text_:human in 5065) [ClassicSimilarity], result of:
0.1016449 = score(doc=5065,freq=2.0), product of:
0.30094394 = queryWeight, product of:
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.068911016 = queryNorm
0.33775362 = fieldWeight in 5065, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.0546875 = fieldNorm(doc=5065)
0.25 = coord(1/4)
- Abstract
- This paper describes a method for automatically segmenting words into their stems and affixes. The process uses certain statistical properties of corpus (successor and predecessor letter variety counts) to indicate where words should be divided. Consequently, this process is less reliant on human intervention than are other methods for automated stemming. The segmentation system is used to construct stem dictionariesfor documnet classification. Information retrieval experiments are then performed using documents and queries so classified. Results show not only that this method is capable of high quality word segmentation, but also that its use in information retrieval produce results that are at least as good as thosse obtained using the more traditional stemming process.
-
Hlava, M.M.K.: Machine aided indexing (MAI) in a multilingual environment (1993)
0.03
0.025411226 = product of:
0.1016449 = sum of:
0.1016449 = weight(_text_:human in 474) [ClassicSimilarity], result of:
0.1016449 = score(doc=474,freq=2.0), product of:
0.30094394 = queryWeight, product of:
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.068911016 = queryNorm
0.33775362 = fieldWeight in 474, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.0546875 = fieldNorm(doc=474)
0.25 = coord(1/4)
- Abstract
- The machine aided indexing (MAI) software devloped by Access Innovations, Inc., is a semantic based, Boolean statement, rule interpreting application with 3 modules: the MA engine which accepts input files, matches terms in the knowledge base, interprets rules, and outputs a text file with suggested indexing terms; a rule building application allowing each Boolean style rule in the knowledge base to be created or modifies; and a statistical computation module which analyzes performance of the MA software against text manually indexed by professional human indexers. The MA software can be applied across multiple languages and can be used where the text to be searched is in one language and the indexes to be output are in another