-
Vanderwende, L.; Suzuki, H.; Brockett, J.M.; Nenkova, A.: Beyond SumBasic : task-focused summarization with sentence simplification and lexical expansion (2007)
0.04
0.043562103 = product of:
0.17424841 = sum of:
0.17424841 = weight(_text_:human in 1948) [ClassicSimilarity], result of:
0.17424841 = score(doc=1948,freq=8.0), product of:
0.30094394 = queryWeight, product of:
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.068911016 = queryNorm
0.5790062 = fieldWeight in 1948, product of:
2.828427 = tf(freq=8.0), with freq of:
8.0 = termFreq=8.0
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.046875 = fieldNorm(doc=1948)
0.25 = coord(1/4)
- Abstract
- In recent years, there has been increased interest in topic-focused multi-document summarization. In this task, automatic summaries are produced in response to a specific information request, or topic, stated by the user. The system we have designed to accomplish this task comprises four main components: a generic extractive summarization system, a topic-focusing component, sentence simplification, and lexical expansion of topic words. This paper details each of these components, together with experiments designed to quantify their individual contributions. We include an analysis of our results on two large datasets commonly used to evaluate task-focused summarization, the DUC2005 and DUC2006 datasets, using automatic metrics. Additionally, we include an analysis of our results on the DUC2006 task according to human evaluation metrics. In the human evaluation of system summaries compared to human summaries, i.e., the Pyramid method, our system ranked first out of 22 systems in terms of overall mean Pyramid score; and in the human evaluation of summary responsiveness to the topic, our system ranked third out of 35 systems.
-
Hirao, T.; Okumura, M.; Yasuda, N.; Isozaki, H.: Supervised automatic evaluation for summarization with voted regression model (2007)
0.03
0.03080306 = product of:
0.12321224 = sum of:
0.12321224 = weight(_text_:human in 1942) [ClassicSimilarity], result of:
0.12321224 = score(doc=1942,freq=4.0), product of:
0.30094394 = queryWeight, product of:
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.068911016 = queryNorm
0.40941924 = fieldWeight in 1942, product of:
2.0 = tf(freq=4.0), with freq of:
4.0 = termFreq=4.0
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.046875 = fieldNorm(doc=1942)
0.25 = coord(1/4)
- Abstract
- The high quality evaluation of generated summaries is needed if we are to improve automatic summarization systems. Although human evaluation provides better results than automatic evaluation methods, its cost is huge and it is difficult to reproduce the results. Therefore, we need an automatic method that simulates human evaluation if we are to improve our summarization system efficiently. Although automatic evaluation methods have been proposed, they are unreliable when used for individual summaries. To solve this problem, we propose a supervised automatic evaluation method based on a new regression model called the voted regression model (VRM). VRM has two characteristics: (1) model selection based on 'corrected AIC' to avoid multicollinearity, (2) voting by the selected models to alleviate the problem of overfitting. Evaluation results obtained for TSC3 and DUC2004 show that our method achieved error reductions of about 17-51% compared with conventional automatic evaluation methods. Moreover, our method obtained the highest correlation coefficients in several different experiments.
-
Johnson, F.: Automatic abstracting research (1995)
0.03
0.029041402 = product of:
0.11616561 = sum of:
0.11616561 = weight(_text_:human in 3915) [ClassicSimilarity], result of:
0.11616561 = score(doc=3915,freq=2.0), product of:
0.30094394 = queryWeight, product of:
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.068911016 = queryNorm
0.38600415 = fieldWeight in 3915, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.0625 = fieldNorm(doc=3915)
0.25 = coord(1/4)
- Abstract
- Discusses the attraction for researchers of the prospect of automatically generating abstracts but notes that the promise of superseding the human effort has yet to be realized. Notes ways in which progress in automatic abstracting research may come about and suggests a shift in the aim from reproducing the conventional benefits of abstracts to accentuating the advantages to users of the computerized representation of information in large textual databases
-
Sjöbergh, J.: Older versions of the ROUGEeval summarization evaluation system were easier to fool (2007)
0.03
0.029041402 = product of:
0.11616561 = sum of:
0.11616561 = weight(_text_:human in 1940) [ClassicSimilarity], result of:
0.11616561 = score(doc=1940,freq=2.0), product of:
0.30094394 = queryWeight, product of:
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.068911016 = queryNorm
0.38600415 = fieldWeight in 1940, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.0625 = fieldNorm(doc=1940)
0.25 = coord(1/4)
- Abstract
- We show some limitations of the ROUGE evaluation method for automatic summarization. We present a method for automatic summarization based on a Markov model of the source text. By a simple greedy word selection strategy, summaries with high ROUGE-scores are generated. These summaries would however not be considered good by human readers. The method can be adapted to trick different settings of the ROUGEeval package.
-
Goh, A.; Hui, S.C.; Chan, S.K.: ¬A text extraction system for news reports (1996)
0.03
0.025669215 = product of:
0.10267686 = sum of:
0.10267686 = weight(_text_:human in 6669) [ClassicSimilarity], result of:
0.10267686 = score(doc=6669,freq=4.0), product of:
0.30094394 = queryWeight, product of:
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.068911016 = queryNorm
0.34118268 = fieldWeight in 6669, product of:
2.0 = tf(freq=4.0), with freq of:
4.0 = termFreq=4.0
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.0390625 = fieldNorm(doc=6669)
0.25 = coord(1/4)
- Abstract
- Describes the design and implementation of a text extraction tool, NEWS_EXT, which aztomatically produces summaries from news reports by extracting sentences to form indicative abstracts. Selection of sentences is based on sentence importance, measured by means of sentence scoring or simple linguistic analysis of sentence structure. Tests were conducted on 4 approaches for the functioning of the NEWS_EXT system; extraction by keyword frequency; extraction by title keywords; extraction by location; and extraction by indicative phrase. Reports results of a study to compare the results of the application of NEWS_EXT with manually produced extracts; using relevance as the criterion for effectiveness. 48 newspaper articles were assessed (The Straits Times, International Herald Tribune, Asian Wall Street Journal, and Financial Times). The evaluation was conducted in 2 stages: stage 1 involving abstracts produced manually by 2 human experts; stage 2 involving the generation of abstracts using NEWS_EXT. Results of each of the 4 approaches were compared with the human produced abstracts, where the title and location approaches were found to give the best results for both local and foreign news. Reports plans to refine and enhance NEWS_EXT and incorporate it as a module within a larger newspaper clipping system
-
Jones, S.; Paynter, G.W.: Automatic extractionof document keyphrases for use in digital libraries : evaluations and applications (2002)
0.03
0.025669215 = product of:
0.10267686 = sum of:
0.10267686 = weight(_text_:human in 1601) [ClassicSimilarity], result of:
0.10267686 = score(doc=1601,freq=4.0), product of:
0.30094394 = queryWeight, product of:
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.068911016 = queryNorm
0.34118268 = fieldWeight in 1601, product of:
2.0 = tf(freq=4.0), with freq of:
4.0 = termFreq=4.0
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.0390625 = fieldNorm(doc=1601)
0.25 = coord(1/4)
- Abstract
- This article describes an evaluation of the Kea automatic keyphrase extraction algorithm. Document keyphrases are conventionally used as concise descriptors of document content, and are increasingly used in novel ways, including document clustering, searching and browsing interfaces, and retrieval engines. However, it is costly and time consuming to manually assign keyphrases to documents, motivating the development of tools that automatically perform this function. Previous studies have evaluated Kea's performance by measuring its ability to identify author keywords and keyphrases, but this methodology has a number of well-known limitations. The results presented in this article are based on evaluations by human assessors of the quality and appropriateness of Kea keyphrases. The results indicate that, in general, Kea produces keyphrases that are rated positively by human assessors. However, typical Kea settings can degrade performance, particularly those relating to keyphrase length and domain specificity. We found that for some settings, Kea's performance is better than that of similar systems, and that Kea's ranking of extracted keyphrases is effective. We also determined that author-specified keyphrases appear to exhibit an inherent ranking, and that they are rated highly and therefore suitable for use in training and evaluation of automatic keyphrasing systems.
-
Yang, C.C.; Wang, F.L.: Hierarchical summarization of large documents (2008)
0.03
0.025669215 = product of:
0.10267686 = sum of:
0.10267686 = weight(_text_:human in 2719) [ClassicSimilarity], result of:
0.10267686 = score(doc=2719,freq=4.0), product of:
0.30094394 = queryWeight, product of:
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.068911016 = queryNorm
0.34118268 = fieldWeight in 2719, product of:
2.0 = tf(freq=4.0), with freq of:
4.0 = termFreq=4.0
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.0390625 = fieldNorm(doc=2719)
0.25 = coord(1/4)
- Abstract
- Many automatic text summarization models have been developed in the last decades. Related research in information science has shown that human abstractors extract sentences for summaries based on the hierarchical structure of documents; however, the existing automatic summarization models do not take into account the human abstractor's behavior of sentence extraction and only consider the document as a sequence of sentences during the process of extraction of sentences as a summary. In general, a document exhibits a well-defined hierarchical structure that can be described as fractals - mathematical objects with a high degree of redundancy. In this article, we introduce the fractal summarization model based on the fractal theory. The important information is captured from the source document by exploring the hierarchical structure and salient features of the document. A condensed version of the document that is informatively close to the source document is produced iteratively using the contractive transformation in the fractal theory. The fractal summarization model is the first attempt to apply fractal theory to document summarization. It significantly improves the divergence of information coverage of summary and the precision of summary. User evaluations have been conducted. Results have indicated that fractal summarization is promising and outperforms current summarization techniques that do not consider the hierarchical structure of documents.
-
Saggion, H.; Lapalme, G.: Selective analysis for the automatic generation of summaries (2000)
0.03
0.025411226 = product of:
0.1016449 = sum of:
0.1016449 = weight(_text_:human in 1132) [ClassicSimilarity], result of:
0.1016449 = score(doc=1132,freq=2.0), product of:
0.30094394 = queryWeight, product of:
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.068911016 = queryNorm
0.33775362 = fieldWeight in 1132, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.0546875 = fieldNorm(doc=1132)
0.25 = coord(1/4)
- Abstract
- Selective Analysis is a new method for text summarization of technical articles whose design is based on the study of a corpus of professional abstracts and technical documents The method emphasizes the selection of particular types of information and its elaboration exploring the issue of dynamical summarization. A computer prototype was developed to demonstrate the viability of the approach and the automatic abstracts were evaluated using human informants. The results so far obtained indicate that the summaries are acceptable in content and text quality
-
Lee, J.-H.; Park, S.; Ahn, C.-M.; Kim, D.: Automatic generic document summarization based on non-negative matrix factorization (2009)
0.03
0.025411226 = product of:
0.1016449 = sum of:
0.1016449 = weight(_text_:human in 3448) [ClassicSimilarity], result of:
0.1016449 = score(doc=3448,freq=2.0), product of:
0.30094394 = queryWeight, product of:
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.068911016 = queryNorm
0.33775362 = fieldWeight in 3448, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.0546875 = fieldNorm(doc=3448)
0.25 = coord(1/4)
- Abstract
- In existing unsupervised methods, Latent Semantic Analysis (LSA) is used for sentence selection. However, the obtained results are less meaningful, because singular vectors are used as the bases for sentence selection from given documents, and singular vector components can have negative values. We propose a new unsupervised method using Non-negative Matrix Factorization (NMF) to select sentences for automatic generic document summarization. The proposed method uses non-negative constraints, which are more similar to the human cognition process. As a result, the method selects more meaningful sentences for generic document summarization than those selected using LSA.
-
Chen, H.-H.; Kuo, J.-J.; Huang, S.-J.; Lin, C.-J.; Wung, H.-C.: ¬A summarization system for Chinese news from multiple sources (2003)
0.02
0.021781052 = product of:
0.087124206 = sum of:
0.087124206 = weight(_text_:human in 3115) [ClassicSimilarity], result of:
0.087124206 = score(doc=3115,freq=2.0), product of:
0.30094394 = queryWeight, product of:
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.068911016 = queryNorm
0.2895031 = fieldWeight in 3115, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.046875 = fieldNorm(doc=3115)
0.25 = coord(1/4)
- Abstract
- This article proposes a summarization system for multiple documents. It employs not only named entities and other signatures to cluster news from different sources, but also employs punctuation marks, linking elements, and topic chains to identify the meaningful units (MUs). Using nouns and verbs to identify the similar MUs, focusing and browsing models are applied to represent the summarization results. To reduce information loss during summarization, informative words in a document are introduced. For the evaluation, a question answering system (QA system) is proposed to substitute the human assessors. In large-scale experiments containing 140 questions to 17,877 documents, the results show that those models using informative words outperform pure heuristic voting-only strategy by news reporters. This model can be easily further applied to summarize multilingual news from multiple sources.
-
Liang, S.-F.; Devlin, S.; Tait, J.: Investigating sentence weighting components for automatic summarisation (2007)
0.02
0.021781052 = product of:
0.087124206 = sum of:
0.087124206 = weight(_text_:human in 1899) [ClassicSimilarity], result of:
0.087124206 = score(doc=1899,freq=2.0), product of:
0.30094394 = queryWeight, product of:
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.068911016 = queryNorm
0.2895031 = fieldWeight in 1899, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.046875 = fieldNorm(doc=1899)
0.25 = coord(1/4)
- Abstract
- The work described here initially formed part of a triangulation exercise to establish the effectiveness of the Query Term Order algorithm. It subsequently proved to be a reliable indicator for summarising English web documents. We utilised the human summaries from the Document Understanding Conference data, and generated queries automatically for testing the QTO algorithm. Six sentence weighting schemes that made use of Query Term Frequency and QTO were constructed to produce system summaries, and this paper explains the process of combining and balancing the weighting components. The summaries produced were evaluated by the ROUGE-1 metric, and the results showed that using QTO in a weighting combination resulted in the best performance. We also found that using a combination of more weighting components always produced improved performance compared to any single weighting component.
-
Shen, D.; Yang, Q.; Chen, Z.: Noise reduction through summarization for Web-page classification (2007)
0.02
0.021781052 = product of:
0.087124206 = sum of:
0.087124206 = weight(_text_:human in 1953) [ClassicSimilarity], result of:
0.087124206 = score(doc=1953,freq=2.0), product of:
0.30094394 = queryWeight, product of:
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.068911016 = queryNorm
0.2895031 = fieldWeight in 1953, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.046875 = fieldNorm(doc=1953)
0.25 = coord(1/4)
- Abstract
- Due to a large variety of noisy information embedded in Web pages, Web-page classification is much more difficult than pure-text classification. In this paper, we propose to improve the Web-page classification performance by removing the noise through summarization techniques. We first give empirical evidence that ideal Web-page summaries generated by human editors can indeed improve the performance of Web-page classification algorithms. We then put forward a new Web-page summarization algorithm based on Web-page layout and evaluate it along with several other state-of-the-art text summarization algorithms on the LookSmart Web directory. Experimental results show that the classification algorithms (NB or SVM) augmented by any summarization approach can achieve an improvement by more than 5.0% as compared to pure-text-based classification algorithms. We further introduce an ensemble method to combine the different summarization algorithms. The ensemble summarization method achieves more than 12.0% improvement over pure-text based methods.
-
Kuhlen, R.: Abstracts, abstracting : intellektuelle und maschinelle Verfahren (1990)
0.02
0.018537888 = product of:
0.07415155 = sum of:
0.07415155 = weight(_text_:und in 2332) [ClassicSimilarity], result of:
0.07415155 = score(doc=2332,freq=4.0), product of:
0.15283768 = queryWeight, product of:
2.217899 = idf(docFreq=13141, maxDocs=44421)
0.068911016 = queryNorm
0.48516542 = fieldWeight in 2332, product of:
2.0 = tf(freq=4.0), with freq of:
4.0 = termFreq=4.0
2.217899 = idf(docFreq=13141, maxDocs=44421)
0.109375 = fieldNorm(doc=2332)
0.25 = coord(1/4)
- Source
- Grundlagen der praktischen Information und Dokumentation. 3. Aufl. Hrsg.: M. Buder u.a. Bd.1
-
Wu, Y.-f.B.; Li, Q.; Bot, R.S.; Chen, X.: Finding nuggets in documents : a machine learning approach (2006)
0.02
0.018150877 = product of:
0.07260351 = sum of:
0.07260351 = weight(_text_:human in 290) [ClassicSimilarity], result of:
0.07260351 = score(doc=290,freq=2.0), product of:
0.30094394 = queryWeight, product of:
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.068911016 = queryNorm
0.2412526 = fieldWeight in 290, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.0390625 = fieldNorm(doc=290)
0.25 = coord(1/4)
- Abstract
- Document keyphrases provide a concise summary of a document's content, offering semantic metadata summarizing a document. They can be used in many applications related to knowledge management and text mining, such as automatic text summarization, development of search engines, document clustering, document classification, thesaurus construction, and browsing interfaces. Because only a small portion of documents have keyphrases assigned by authors, and it is time-consuming and costly to manually assign keyphrases to documents, it is necessary to develop an algorithm to automatically generate keyphrases for documents. This paper describes a Keyphrase Identification Program (KIP), which extracts document keyphrases by using prior positive samples of human identified phrases to assign weights to the candidate keyphrases. The logic of our algorithm is: The more keywords a candidate keyphrase contains and the more significant these keywords are, the more likely this candidate phrase is a keyphrase. KIP's learning function can enrich the glossary database by automatically adding new identified keyphrases to the database. KIP's personalization feature will let the user build a glossary database specifically suitable for the area of his/her interest. The evaluation results show that KIP's performance is better than the systems we compared to and that the learning function is effective.
-
Ou, S.; Khoo, C.S.G.; Goh, D.H.: Multi-document summarization of news articles using an event-based framework (2006)
0.02
0.018150877 = product of:
0.07260351 = sum of:
0.07260351 = weight(_text_:human in 782) [ClassicSimilarity], result of:
0.07260351 = score(doc=782,freq=2.0), product of:
0.30094394 = queryWeight, product of:
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.068911016 = queryNorm
0.2412526 = fieldWeight in 782, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.0390625 = fieldNorm(doc=782)
0.25 = coord(1/4)
- Abstract
- Purpose - The purpose of this research is to develop a method for automatic construction of multi-document summaries of sets of news articles that might be retrieved by a web search engine in response to a user query. Design/methodology/approach - Based on the cross-document discourse analysis, an event-based framework is proposed for integrating and organizing information extracted from different news articles. It has a hierarchical structure in which the summarized information is presented at the top level and more detailed information given at the lower levels. A tree-view interface was implemented for displaying a multi-document summary based on the framework. A preliminary user evaluation was performed by comparing the framework-based summaries against the sentence-based summaries. Findings - In a small evaluation, all the human subjects preferred the framework-based summaries to the sentence-based summaries. It indicates that the event-based framework is an effective way to summarize a set of news articles reporting an event or a series of relevant events. Research limitations/implications - Limited to event-based news articles only, not applicable to news critiques and other kinds of news articles. A summarization system based on the event-based framework is being implemented. Practical implications - Multi-document summarization of news articles can adopt the proposed event-based framework. Originality/value - An event-based framework for summarizing sets of news articles was developed and evaluated using a tree-view interface for displaying such summaries.
-
Kim, H.H.; Kim, Y.H.: Generic speech summarization of transcribed lecture videos : using tags and their semantic relations (2016)
0.02
0.018150877 = product of:
0.07260351 = sum of:
0.07260351 = weight(_text_:human in 3640) [ClassicSimilarity], result of:
0.07260351 = score(doc=3640,freq=2.0), product of:
0.30094394 = queryWeight, product of:
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.068911016 = queryNorm
0.2412526 = fieldWeight in 3640, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.0390625 = fieldNorm(doc=3640)
0.25 = coord(1/4)
- Abstract
- We propose a tag-based framework that simulates human abstractors' ability to select significant sentences based on key concepts in a sentence as well as the semantic relations between key concepts to create generic summaries of transcribed lecture videos. The proposed extractive summarization method uses tags (viewer- and author-assigned terms) as key concepts. Our method employs Flickr tag clusters and WordNet synonyms to expand tags and detect the semantic relations between tags. This method helps select sentences that have a greater number of semantically related key concepts. To investigate the effectiveness and uniqueness of the proposed method, we compare it with an existing technique, latent semantic analysis (LSA), using intrinsic and extrinsic evaluations. The results of intrinsic evaluation show that the tag-based method is as or more effective than the LSA method. We also observe that in the extrinsic evaluation, the grand mean accuracy score of the tag-based method is higher than that of the LSA method, with a statistically significant difference. Elaborating on our results, we discuss the theoretical and practical implications of our findings for speech video summarization and retrieval.
-
Sankarasubramaniam, Y.; Ramanathan, K.; Ghosh, S.: Text summarization using Wikipedia (2014)
0.02
0.018150877 = product of:
0.07260351 = sum of:
0.07260351 = weight(_text_:human in 3693) [ClassicSimilarity], result of:
0.07260351 = score(doc=3693,freq=2.0), product of:
0.30094394 = queryWeight, product of:
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.068911016 = queryNorm
0.2412526 = fieldWeight in 3693, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.0390625 = fieldNorm(doc=3693)
0.25 = coord(1/4)
- Abstract
- Automatic text summarization has been an active field of research for many years. Several approaches have been proposed, ranging from simple position and word-frequency methods, to learning and graph based algorithms. The advent of human-generated knowledge bases like Wikipedia offer a further possibility in text summarization - they can be used to understand the input text in terms of salient concepts from the knowledge base. In this paper, we study a novel approach that leverages Wikipedia in conjunction with graph-based ranking. Our approach is to first construct a bipartite sentence-concept graph, and then rank the input sentences using iterative updates on this graph. We consider several models for the bipartite graph, and derive convergence properties under each model. Then, we take up personalized and query-focused summarization, where the sentence ranks additionally depend on user interests and queries, respectively. Finally, we present a Wikipedia-based multi-document summarization algorithm. An important feature of the proposed algorithms is that they enable real-time incremental summarization - users can first view an initial summary, and then request additional content if interested. We evaluate the performance of our proposed summarizer using the ROUGE metric, and the results show that leveraging Wikipedia can significantly improve summary quality. We also present results from a user study, which suggests that using incremental summarization can help in better understanding news articles.
-
Finegan-Dollak, C.; Radev, D.R.: Sentence simplification, compression, and disaggregation for summarization of sophisticated documents (2016)
0.02
0.018150877 = product of:
0.07260351 = sum of:
0.07260351 = weight(_text_:human in 4122) [ClassicSimilarity], result of:
0.07260351 = score(doc=4122,freq=2.0), product of:
0.30094394 = queryWeight, product of:
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.068911016 = queryNorm
0.2412526 = fieldWeight in 4122, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
4.3671384 = idf(docFreq=1531, maxDocs=44421)
0.0390625 = fieldNorm(doc=4122)
0.25 = coord(1/4)
- Abstract
- Sophisticated documents like legal cases and biomedical articles can contain unusually long sentences. Extractive summarizers can select such sentences-potentially adding hundreds of unnecessary words to the summary-or exclude them and lose important content. Sentence simplification or compression seems on the surface to be a promising solution. However, compression removes words before the selection algorithm can use them, and simplification generates sentences that may be ambiguous in an extractive summary. We therefore compare the performance of an extractive summarizer selecting from the sentences of the original document with that of the summarizer selecting from sentences shortened in three ways: simplification, compression, and disaggregation, which splits one sentence into several according to rules designed to keep all meaning. We find that on legal cases and biomedical articles, these shortening methods generate ungrammatical output. Human evaluators performed an extrinsic evaluation consisting of comprehension questions about the summaries. Evaluators given compressed, simplified, or disaggregated versions of the summaries answered fewer questions correctly than did those given summaries with unaltered sentences. Error analysis suggests 2 causes: Altered sentences sometimes interact with the sentence selection algorithm, and alterations to sentences sometimes obscure information in the summary. We discuss future work to alleviate these problems.
-
Ruda, S.: Abstracting: eine Auswahlbibliographie (1992)
0.02
0.016054282 = product of:
0.06421713 = sum of:
0.06421713 = weight(_text_:und in 6671) [ClassicSimilarity], result of:
0.06421713 = score(doc=6671,freq=12.0), product of:
0.15283768 = queryWeight, product of:
2.217899 = idf(docFreq=13141, maxDocs=44421)
0.068911016 = queryNorm
0.42016557 = fieldWeight in 6671, product of:
3.4641016 = tf(freq=12.0), with freq of:
12.0 = termFreq=12.0
2.217899 = idf(docFreq=13141, maxDocs=44421)
0.0546875 = fieldNorm(doc=6671)
0.25 = coord(1/4)
- Abstract
- Die vorliegende Auswahlbibliographie ist in 9 Themenbereiche unterteilt. Der erste Abschnitt enthält Literatur, in der auf Abstracts und Abstracting-Verfahren allgemein eingegangen und ein Überblick über den Stand der Forschung gegeben wird. Im nächsten Abschnitt werden solche Aufsätze referiert, die die historische Entwicklung des Abstracting beschreiben. Im dritten Teil sind Abstracting-Richtlinien verschiedener Institutionen aufgelistet. Lexikalische, syntaktische und semantische Textkondensierungsverfahren sind das Thema der in Abschnitt 4 präsentierten Arbeiten. Textstrukturen von Abstracts werden unter Punkt 5 betrachtet, und die Arbeiten des nächsten Themenbereiches befassen sich mit dem Problem des Schreibens von Abstracts. Der siebte Abschnitt listet sog. 'maschinelle' und maschinen-unterstützte Abstracting-Methoden auf. Anschließend werden 'maschinelle' und maschinenunterstützte Abstracting-Verfahren, Abstracts im Vergleich zu ihren Primärtexten sowie Abstracts im allgemeien bewertet. Den Abschluß bilden Bibliographien
-
Kuhlen, R.: Abstracts, abstracting : intellektuelle und maschinelle Verfahren (1997)
0.02
0.015889619 = product of:
0.063558474 = sum of:
0.063558474 = weight(_text_:und in 869) [ClassicSimilarity], result of:
0.063558474 = score(doc=869,freq=4.0), product of:
0.15283768 = queryWeight, product of:
2.217899 = idf(docFreq=13141, maxDocs=44421)
0.068911016 = queryNorm
0.41585606 = fieldWeight in 869, product of:
2.0 = tf(freq=4.0), with freq of:
4.0 = termFreq=4.0
2.217899 = idf(docFreq=13141, maxDocs=44421)
0.09375 = fieldNorm(doc=869)
0.25 = coord(1/4)
- Source
- Grundlagen der praktischen Information und Dokumentation: ein Handbuch zur Einführung in die fachliche Informationsarbeit. 4. Aufl. Hrsg.: M. Buder u.a