Information Extraction
Information Extraction is the task of automatically extracting structured information from a unstructured and semi-structured documents.
regular expression
Automata
NLP-based
Rule-based/knowledge based
[person][office]of[org]
[org][person][office]
[org]in[loc]
Label documents
Learn rules and patterns
Typical subtasks of IE include:
Named entity recognition:
a Named entity Extraction
b Coreference Resolution (指代消解)
c Relationship Extraction
Semi-structured information Extraction:
Table extract: Finding and extracting tables from documents
Comment extract
Language and vocabulary analysis
Model solutions as labelled facts
Precision: $\frac{correctFoundFacts}{allFoundFacts}$
Recall: $\frac{correctFoundFacts}{allFacts}$
F-measure
$F-measure = \frac{2 \times precision \times recall}{Precision + Recall}$