Information Extraction
Information Extraction is the task of automatically extracting structured information from a unstructured and semi-structured documents.
regular expression
Automata
NLP-based
Rule-based/knowledge based
- [person][office]of[org]
- [org][person][office]
- [org]in[loc]
Label documents
Learn rules and patterns
Typical subtasks of IE include:
Named entity recognition:
a Named entity Extraction
b Coreference Resolution (指代消解)
c Relationship Extraction
Semi-structured information Extraction:
Table extract: Finding and extracting tables from documents
Comment extract
Language and vocabulary analysis
Model solutions as labelled facts
Precision: correctFoundFactsallFoundFacts
Recall: correctFoundFactsallFacts
F-measure
F−measure=2×precision×recallPrecision+Recall