Automatic Summarization

From Saltlux

Jump to: navigation, search

1. Definition of Document Summarization
Document summarization or Automatic Document Summarization is a summarized version creation work of a document utilizing computer program.
The product of document summarization process should include the important parts of the original document.
Due to the information flood the need for coherent and accurate summary of a document is being increased. Google is one of the engines showing the utilization of this document summarization technology.

2. Document Summarization technique
There are two way of techniques, i.e., extraction summarization and abstract summarization.
Extract summarization provides the important parts of the original document in phrase sentence or clause that system decides.
And the abstract summarization has a function of paraphrasing the contents of the original document.
In other words, the abstract summarizes the original sentences more strongly than the extract summarization. One of the issues here is that these works require natural language creation technology which is not fully developed.

Image:Summary.jpg

3. Utilization
Bibliography is referred when we search a book in library. With bibliography we could understand the outline of the contents of the book due to its constitution of key contents and important sentences.
Document summarization function with text mining technique is similar to this in shape.
Document summarization can be utilized for electronic library, KMS, book related business, etc.

4. Evaluation
The biggest issue of the document summarization is evaluation. The automation of the document summarization is very difficult because the human being’s decision on “good” summarization is rather subjective and different by evaluator.
Manual evaluation is time and labour consuming job and is vexatious to read both summarized and original sentences. Coherence and coverage are some of other issues, too.