Motivation

The ever-growing flood of information leads to the fact that on the one hand an effective and individual approach for the information retrieval process for a user is hardly possible. And on the other hand for specialized information centers and libraries a qualified subject-specific indexing of various publications is more difficult and complex. For this reason, the project will investigate into automated processes for content indexing based on appropriate taxonomies and contextual information. The resulting indexing workflow will be a possible first step for the development of virtual research environments.

Challenges & Highlights

The TIB and FIZ Karlsruhe are important full text and information providers in mathematics, engineering and natural sciences. Both face the challenge of meeting the needs of customers from science and research arising from the trends of digital information services and the associated global competition. In particular, the previously mentioned increase in digital information has to be considered.

The optimal use of information requires the identification of a subset of relevant information from a large pool of potentially available information. But, the identification of relevant information is more difficult the larger and more heterogeneous the existing data sources. Thus, extensions along the search process and the integration of semantic methods are needed for building knowledge networks.

The focus of the project is:

  • The development of a semi-automatic procedure to support the previously manual creation and maintenance of controlled vocabularies and thesauri.
  • The development of a fully automated process for indexing and selection / classification of unexplored mathematical documents with high precision and quality.
  • The development of innovative and individually configurable retrieval capabilities and ranking procedures for access to the information.