Intelligent Agent 03 Summary

Information Extraction

Approaches
- Manually crafted patterns
- Machine learning Approaches
Tasks and subtasks
Evaluation

Intelligent Agent Summary

Information Extraction is the task of automatically extracting structured information from a unstructured and semi-structured documents.

Approaches

Manually crafted patterns

regular expression
Automata
NLP-based

Rule-based/knowledge based

[person][office]of[org]
[org][person][office]
[org]in[loc]

Machine learning Approaches

Label documents
Learn rules and patterns

Tasks and subtasks

Typical subtasks of IE include:

Named entity recognition:

a Named entity Extraction

b Coreference Resolution (指代消解)

c Relationship Extraction
Semi-structured information Extraction:
- Table extract: Finding and extracting tables from documents
- Comment extract
Language and vocabulary analysis
- Terminology extraction

Evaluation

Model solutions as labelled facts
Precision: $\frac{correctFoundFacts}{allFoundFacts}$
Recall: $\frac{correctFoundFacts}{allFacts}$
F-measure

$F-measure = \frac{2 \times precision \times recall}{Precision + Recall}$

Published 09 June 2015

blog comments powered by Disqus