Intelligent Agent 03 Summary

Information Extraction

  1. Approaches
  2. Tasks and subtasks
  3. Evaluation

Intelligent Agent Summary

Information Extraction is the task of automatically extracting structured information from a unstructured and semi-structured documents.

Approaches


Manually crafted patterns

  1. regular expression

  2. Automata

  3. NLP-based

Rule-based/knowledge based

[person][office]of[org]
[org][person][office]
[org]in[loc]

Machine learning Approaches

  1. Label documents

  2. Learn rules and patterns

Tasks and subtasks

Typical subtasks of IE include:

  1. Named entity recognition:

    a Named entity Extraction

    b Coreference Resolution (指代消解)

    c Relationship Extraction

  2. Semi-structured information Extraction:

    • Table extract: Finding and extracting tables from documents

    • Comment extract

  3. Language and vocabulary analysis

    • Terminology extraction

Evaluation

  1. Model solutions as labelled facts

  2. Precision: $\frac{correctFoundFacts}{allFoundFacts}$

  3. Recall: $\frac{correctFoundFacts}{allFacts}$

  4. F-measure

$F-measure = \frac{2 \times precision \times recall}{Precision + Recall}$

Published 09 June 2015
blog comments powered by Disqus