Category: Computer Sciences

NERC

10/2/2016

Whosoever has tried to program something at least once knows that it can be sometimes quite hard to write codes for task which are easy for every human being. One example is NERC (Named Entity Recognition and Classification). The task sounds simple: scanning a text for mentions of certain categories like person/company names, temporal expressions, locations, etc.

Using this example sentence:
“On 2nd October, Pia was sitting with her Apple Macbook in a garden in Berlin and was writing this summary about a paper she found in Google scholar.”
a NERC algorith could extract and classify the following textparts:
2nd October -> dates
Berlin -> locations
Pia -> person names
Apple, Google -> company names

One of the first papers about NERC was published in 1991. The author, Lisa F, Rau described a system based on heuristics and handcrafted rules which was aimed to “extract and recognize [company] names”. Since then, NERC became more and more prominent. Today there are a lot of papers about NERC for different categories and text genres using different algorithms and methods. Handcrafted rules are replaced by machine learning algorithms and different features like e.g. dictionaries, analysing word structures (e.g. capital letters and numbers) and text structures (e.g. newspaper articles beginning with location and date) improve the NERC performance.
In summary: Extracting and classifying information in texts, which seems easy for humans, engaged computer scientists for years and it will do so also for the next years. At least texts are not the only media for information. There are also other media like videos and speech which contain information which can be extracted and classified.

Review about NERC from 1991 to 2006:

"A survey of named entity recognition and classification"
David Nadeau and Satoshi Sekine
Lingvisticae Investigationes 30.1 (2007): 3-26.

0 Kommentare

Paper of the day

NERC

Idea

Archiv

Kategorien