Analysis of texts is an obvious way for semantic annotation and extraction of structured knowledge. A basic task is the recognition of references to entities (people, locations, organizations, etc). A next step is relation extraction, e.g. identifying that an organization is located in a particular city. Automatic extraction of such relations is a tough linguistic problem - the solutions are either very partial, expensive to implement, or slow. On the other hand, relationships are crucial for the usability of the extracted knowledge for navigation and search purposes. We demonstrate how efficient co-occurrence analysis, performed on top of semantic annotation, can be used for several purposes: relation extraction, faceted search, and popularity timelines. The faceted search interface allows an easy way for augmenting full-text search by means of entity references, derived through co-occurrence profiling and semantic relationships. Although this sort of analytics can be used in virtually any domain, their development within the KIM platform was driven by the requirements for news analysis and research. We demonstrate the usage of these interfaces on top of 1 million news articles - a corpus of the major international news for the last five years. This sort of co-occurrence analysis has the potential of aiding identity resolution, which is recognized to be a crucial problem for several tasks: cross-document co-reference resolution, record linkage, object linking, and data integration.
Attribution: The Open Education Consortium
http://www.ocwconsortium.org/courses/view/9ea09cbd5711d13a88ea547294fa73fa/
Course Home http://videolectures.net/iswc06_popov_hoccs/