Processing math: 100%

Intelligent Agent 03 Summary

Information Extraction

  1. Approaches
  2. Tasks and subtasks
  3. Evaluation

Intelligent Agent Summary

Information Extraction is the task of automatically extracting structured information from a unstructured and semi-structured documents.

Approaches


Manually crafted patterns

  1. regular expression

  2. Automata

  3. NLP-based

Rule-based/knowledge based

  1. [person][office]of[org]
  2. [org][person][office]
  3. [org]in[loc]

Machine learning Approaches

  1. Label documents

  2. Learn rules and patterns

Tasks and subtasks

Typical subtasks of IE include:

  1. Named entity recognition:

    a Named entity Extraction

    b Coreference Resolution (指代消解)

    c Relationship Extraction

  2. Semi-structured information Extraction:

    • Table extract: Finding and extracting tables from documents

    • Comment extract

  3. Language and vocabulary analysis

    • Terminology extraction

Evaluation

  1. Model solutions as labelled facts

  2. Precision: correctFoundFactsallFoundFacts

  3. Recall: correctFoundFactsallFacts

  4. F-measure

Fmeasure=2×precision×recallPrecision+Recall

Published 09 June 2015
blog comments powered by Disqus