Whosoever has tried to program something at least once knows that it can be sometimes quite hard to write codes for task which are easy for every human being. One example is NERC (Named Entity Recognition and Classification). The task sounds simple: scanning a text for mentions of certain categories like person/company names, temporal expressions, locations, etc. Using this example sentence: One of the first papers about NERC was published in 1991. The author, Lisa F, Rau described a system based on heuristics and handcrafted rules which was aimed to “extract and recognize [company] names”. Since then, NERC became more and more prominent. Today there are a lot of papers about NERC for different categories and text genres using different algorithms and methods. Handcrafted rules are replaced by machine learning algorithms and different features like e.g. dictionaries, analysing word structures (e.g. capital letters and numbers) and text structures (e.g. newspaper articles beginning with location and date) improve the NERC performance. In summary: Extracting and classifying information in texts, which seems easy for humans, engaged computer scientists for years and it will do so also for the next years. At least texts are not the only media for information. There are also other media like videos and speech which contain information which can be extracted and classified. Review about NERC from 1991 to 2006:
"A survey of named entity recognition and classification" David Nadeau and Satoshi Sekine Lingvisticae Investigationes 30.1 (2007): 3-26.
0 Kommentare
|
IdeaI love to increase my general science knowledge by reading papers from different fields of science. Here I share some of them. Archiv
März 2018
Kategorien
Alle
|