![]() |
  | Via Sommarive, 18 I-38050 | |||
| Manuela Speranza | |||||
| You
Are Here: Home |
|
|
Publications | |||||
| Manuela Speranza Cognitive and Communication Technologies Division ITC-irst Phone: +39 0461 314 521 Fax: +39 0461 302 040 Email: manspera@itc.it |
This paper presents the general objectives of the ONTOTEXT project
(From Text to Knowledge for the Semantic Web), and the activities carried out
during the first year of its development cycle. First, the task of annotating huge
amounts of textual data (e.g. those available on the Web or in local document
collections) will be introduced, focusing on its importance in order to enhance
the interoperability of such data through ontology-based reasoning. Then, the
main issues related to the annotation task will be discussed. These include the
choice of an adequate formalism to capture and describe different types of relevant
information contained in a text, and the adaptation of existing languagespecific
markup formalisms to a new language (Italian in our case). Finally, the
results of our experience in the concrete annotation of information about people
and temporal expressions for the Italian Content Annotation Bank (I-CAB) being
developed at ITC-irst and CELCT will be reported.
We present CtxMatch, an algorithm that finds mappings between two heterogeneous partially overlapping Classification Hierarchies (e.g. taxonomic structures used to organize documents). CtxMatch relies on the semantic interpretation of both the labels provided in the CHs and the hierarchical structures of the CHs; it does not consider the content of classified documents, thus allowing the retrieval of any kind of documents (e.g. text files, images, applications, videos, etc.). The Web Directories of Google and Yahoo! have been chosen as an evaluation set for discussing the performance of CtxMatch. Classification Hierarchies (CHs) are widely used to organize
documents in a way that makes their retrieval easier. Common
examples of CHs are Web directories, marketplace catalogs, and file
systems. In this paper we discuss and evaluate \ctxmatch, an
approach to interoperability that discovers mappings among CHs
considering the semantic interpretation of their nodes.
\ctxmatch\ performs a linguistic processing of the labels
attached to the nodes, including tokenization, Part of Speech
tagging, multiword recognition and word sense disambiguation. We
present an evaluation of the overall performance of the approach
over Web directories as well as a systematic analysis of the
linguistic modules involved.
This paper describes the main characteristics of the ItalWordNet semantic databese, built in the context of the SI-TAL Italian NationalProject, within which a set of integrated resources and tools for the automatic treatment of the Italian language was realized. The database was created by extending the Italian wordnet developed within the EuroWordNet project, by adding: i) adjectives, adverbs and proper nouns (not dealt with in EuroWordNet); ii) a terminological subset related to the economic-financial domain. The relevant changes involved by these extensions both in the linguistic model and in the data structure are also illustrated. In particular we we discuss: i) the overall architecture of the database; ii) the semantic relations used to encode information on synsets; iii) the changes made to the EuroWordNet Top Ontology structure; iv) the specific characteristics of the terminological subset and the solutions adopted to link it to the generic wordnet. Most of the data stored in the Semantic Web is organized in schema
models, which can be represented as labeled graphs where labels are
short natural language expressions. Examples of schema models
include ER-schema automata, ontologies, taxonomies, and Web
Directories. The semantics of schema models is not explicit but is
hidden in their structures and labels. To obtain semantic
interoperability we need to make their semantics explicit by taking
into account both the interpretation of the labels and the
structures described by the arcs. We propose a methodology for
interpreting schema models on the basis of the taxonomic relations
and the linguistic material they contain. We rely on a set of
linguistic repositories, such as WordNet, and explore a number of
crucial linguistic issues such as disambiguation of polysemous
words, multiwords, and coordinations. The Web Directories of Google
and Yahoo! have been chosen as an evaluation set. We show that
there is a considerable amount of information to be made explicit
and discuss the performance of an implementation of our analysis. Hierarchical classifications are concept hierarchies used to
organize large amounts of documents. File systems, products'
taxonomies for the market place and the directories provided by Web
portals are common examples of hierarchical classifications. We
propose a methodology for building a semantic interpretation of
hierarchical classifications on the basis of the analysis of the
taxonomic relations and the linguistic material they contain. We
provide a formal semantics for hierarchical classifications and use
it to interpret the implicit knowledge represented. Relevant
phenomena addressed include the disambiguation of polysemous words,
the semantics of multiwords, and the interpretation of
coordinations. We report about experiments performed on the Web
Directories of Google and Yahoo!.
This paper describes an automatic algorithm of meaning negotiation
that enables semantic interoperability between local overlapping and
heterogeneous ontologies. Rather than reconciling differences
between heterogeneous ontologies, this algorithm searches for
mappings between concepts of different ontologies. The algorithm is
composed of three main steps: (i) computing the linguistic
meaning of the label occurring in the ontologies via natural
language processing, (ii) contextualization of such a
linguistic meaning by considering the context, i.e. the ontologies,
where a label occurs; (iii) comparing contextualized linguistic
meaning of two ontologies in in order to find a possible matching
between them. This paper describes an automatic algorithm of meaning negotiation that enables semantic interoperability between local overlapping and heterogeneous ontologies. Rather than reconciling differences between heterogeneous ontologies, this algorithm searches for mappings between concepts of different ontologies. The algorithm is composed of three main steps: (i) computing the linguistic meaning of the label occurring in the ontologies via natural language processing, (ii) contextualization of such a linguistic meaning by considering the context, i.e. the ontologies, where a label occurs; (iii) comparing contextualized linguistic meaning of two ontologies in order to find a possible matching between them. There is an increasing interest in linguistic ontologies (e.g. WordNet) for a variety of content-based tasks, including conceptual indexing, word sense disambiguation and cross-language information retrieval. A relevant contribution in this direction is represented by linguistic ontologies with domain specific coverage, which are a crucial topic for the development of concrete application systems. This paper tries to go a step further in the direction of the interoperability of specialized linguistic ontologies, by addressing the problem of their integration with global ontologies. This scenario poses some simplifications with respect to the general problem of merging ontologies, since it enables to define a strong precedence criterion so that terminological information overshadows generic information whenever conflicts arise. We assume the EuroWordNet model and propose a methodology to ``plug'' specialized linguistic ontologies into global ontologies. Experimental data related to an implemented algorithm, which has been tested on a global and a specialized linguistic ontology for the Italian language, are provided. This paper describes the main characteristics of the ItalWordNet semantic database, built within the SI-TAL Italian National Project. The database was created by extending the Italian wordnet developed within the EuroWordNet project by adding i) adjectives, adverbs and proper nouns (not dealt with within EuroWordNet); ii) a terminological subset related to the economic-financial domain. The relevant changes involved by these extensions both in the linguistic model and in the data structure are illustrated.
Although generic (i.e. domain independent) and specialized (i.e. domain specific) lexical resources are usually developed with different aims, an integrated consultation seems to be necessary for many NLP based applications. In this paper we describe an integration procedure based on the definition of plug-in relations that are established to manage overlaps and inconsistencies between the two resources. The approach has been experimented connecting ItalWordNet, a generic lexical database for Italian, and Economic-WordNet, a specialized wordnet for the economic-financial domain.
|
|||||