Via Sommarive, 18 I-38050
Povo-Trento, Italy

 
 Manuela Speranza
 
  You Are Here: HomePeopleManuela Speranza Publications

Publications

Manuela Speranza


Cognitive and Communication Technologies Division

ITC-irst

Phone: +39 0461 314 521
Fax: +39 0461 302 040
Email: manspera@itc.it
 


2009:

2008:

2007:

2006:

  • Magnini B., Pianta E., Popescu O., Speranza M.
    Ontology Population from Textual Mentions: Task Definition and Benchmark In: Proceedings of the 2nd Workshop on Ontology Learning and Population (OLP2): Bridging the Gap between Text and Knowledge, Sidney, Australia, 2006

  • Popescu O., Magnini B., Pianta E., Serafini, L. Speranza M., Tamilin A.
    From Mention to Ontology: A Pilot Study In: Proceedings of In Proceedings of the 3rd Italian Semantic Web Workshop on Semantic Web Applications and Perspectives (SWAP 2006), Pisa, Italy, December 18-20, 2006

  • Magnini B., Cappelli, A., Pianta E., Speranza M., Bartalesi Lenzi V., Sprugnoli R., Romano L., Girardi C., Negri M.
    Annotazione di contenuti concettuali in un corpus italiano: I-CAB. In: Proceedings of SILFI 2006 - X Congresso Internazionale della Societa' di Linguistica e Filologia Italiana, Firenze 14-17 giugno, 2006.

  • Magnini B., Pianta E., Bartalesi Lenzi, V., Girardi, C., Negri, M., Romano L., Speranza M., Sprugnoli R.
    I-CAB: the Italian Content Annotation Bank. In: Proceedings of LREC 2006, Genova, Italy, May 24-26, 2006.
  • In this paper we present work in progress for the creation of the Italian Content Annotation Bank (I-CAB), a corpus of Italian news annotated with semantic information at different levels. The first level is represented by temporal expressions, the second level is represented by different types of entities (i.e. person, organizations, locations and geo-political entities), and the third level is represented by relations between entities (e.g. the affiliation relation connecting a person to an organization). So far I-CAB has been manually annotated with temporal expressions, person entities and organization entities. As we intend I-CAB to become a benchmark for various automatic Information Extraction tasks, we followed a policy of reusing already available markup languages. In particular, we adopted the annotation schemes developed for the ACE Entity Detection and Time Expressions Recognition and Normalization tasks. As the ACE guidelines have originally been developed for English, part of the effort consisted in adapting them to the specific morpho-syntactic features of Italian. Finally, we have extended them to include a wider range of entities, such as conjunctions.

     

2005:

  • Magnini B., Negri M., Pianta E., Romano L., Speranza M., Sprugnoli R.
    From Text to Knowledge for the Semantic Web: the ONTOTEXT Project. SWAP-2005, Trento, Italy, 14-16 December, 2005.
  • This paper presents the general objectives of the ONTOTEXT project (From Text to Knowledge for the Semantic Web), and the activities carried out during the first year of its development cycle. First, the task of annotating huge amounts of textual data (e.g. those available on the Web or in local document collections) will be introduced, focusing on its importance in order to enhance the interoperability of such data through ontology-based reasoning. Then, the main issues related to the annotation task will be discussed. These include the choice of an adequate formalism to capture and describe different types of relevant information contained in a text, and the adaptation of existing languagespecific markup formalisms to a new language (Italian in our case). Finally, the results of our experience in the concrete annotation of information about people and temporal expressions for the Italian Content Annotation Bank (I-CAB) being developed at ITC-irst and CELCT will be reported.
    [PDF document]

     

2004:

  • Bernardo Magnini, Luciano Serafini, Manuela Speranza.
    Semantic Coordination for Document Retrieval. KI 18(4): 18-23 (2004).
  • We present CtxMatch, an algorithm that finds mappings between two heterogeneous partially overlapping Classification Hierarchies (e.g. taxonomic structures used to organize documents). CtxMatch relies on the semantic interpretation of both the labels provided in the CHs and the hierarchical structures of the CHs; it does not consider the content of classified documents, thus allowing the retrieval of any kind of documents (e.g. text files, images, applications, videos, etc.). The Web Directories of Google and Yahoo! have been chosen as an evaluation set for discussing the performance of CtxMatch.

  • Bernardo Magnini, Manuela Speranza, Christian Girardi.
    A Semantic-based Approach to Interoperability of Classification Hierarchies: Evaluation of Linguistic Techniques. In: Proceedings of COLING-2004, Geneva, Switzerland, August 23 - 27, 2004.
  • Classification Hierarchies (CHs) are widely used to organize documents in a way that makes their retrieval easier. Common examples of CHs are Web directories, marketplace catalogs, and file systems. In this paper we discuss and evaluate \ctxmatch, an approach to interoperability that discovers mappings among CHs considering the semantic interpretation of their nodes. \ctxmatch\ performs a linguistic processing of the labels attached to the nodes, including tokenization, Part of Speech tagging, multiword recognition and word sense disambiguation. We present an evaluation of the overall performance of the approach over Web directories as well as a systematic analysis of the linguistic modules involved.
    [PDF document]

     

2003:

  • A. Roventini, A. Alonge, F. Bertagna, N. Calzolari, J. Cancila, C. Girardi, B. Magnini, R. Marinelli, M. Speranza, A. Zampolli.
    ItalWordNet: Building a Large Semantic Database for the Automatic Treatment of the Italian Language. In Zampolli, A., Calzolari, N., Cignoni, L. (eds.), Computational Linguistics in Pisa, Special Issue of Linguistica Computazionale, Vol. XVIII-XIX, Istituto Editoriale e Poligrafico Internazionale, Pisa-Roma, 2003.
  • This paper describes the main characteristics of the ItalWordNet semantic databese, built in the context of the SI-TAL Italian NationalProject, within which a set of integrated resources and tools for the automatic treatment of the Italian language was realized. The database was created by extending the Italian wordnet developed within the EuroWordNet project, by adding: i) adjectives, adverbs and proper nouns (not dealt with in EuroWordNet); ii) a terminological subset related to the economic-financial domain. The relevant changes involved by these extensions both in the linguistic model and in the data structure are also illustrated. In particular we we discuss: i) the overall architecture of the database; ii) the semantic relations used to encode information on synsets; iii) the changes made to the EuroWordNet Top Ontology structure; iv) the specific characteristics of the terminological subset and the solutions adopted to link it to the generic wordnet.

  • Bernardo Magnini, Luciano Serafini, Manuela Speranza.
    Making explicit the Semantics Hidden in Schema Models. In: Proceedings of the Workshop on Human Language Technology for the Semantic Web and Web Services, held at ISWC-2003, Sanibel Island, Florida, October 20 - 23, 2003.
  • Most of the data stored in the Semantic Web is organized in schema models, which can be represented as labeled graphs where labels are short natural language expressions. Examples of schema models include ER-schema automata, ontologies, taxonomies, and Web Directories. The semantics of schema models is not explicit but is hidden in their structures and labels. To obtain semantic interoperability we need to make their semantics explicit by taking into account both the interpretation of the labels and the structures described by the arcs. We propose a methodology for interpreting schema models on the basis of the taxonomic relations and the linguistic material they contain. We rely on a set of linguistic repositories, such as WordNet, and explore a number of crucial linguistic issues such as disambiguation of polysemous words, multiwords, and coordinations. The Web Directories of Google and Yahoo! have been chosen as an evaluation set. We show that there is a considerable amount of information to be made explicit and discuss the performance of an implementation of our analysis.
    [PS document]

  • Bernardo Magnini, Luciano Serafini, Manuela Speranza.
    Making explicit the Hidden Semantics of Hierarchical Classifications. In: Atti dell'Ottavo Congresso Nazionale AI*IA, Pisa, Italy, September 23 - 26, 2003.
  • Hierarchical classifications are concept hierarchies used to organize large amounts of documents. File systems, products' taxonomies for the market place and the directories provided by Web portals are common examples of hierarchical classifications. We propose a methodology for building a semantic interpretation of hierarchical classifications on the basis of the analysis of the taxonomic relations and the linguistic material they contain. We provide a formal semantics for hierarchical classifications and use it to interpret the implicit knowledge represented. Relevant phenomena addressed include the disambiguation of polysemous words, the semantics of multiwords, and the interpretation of coordinations. We report about experiments performed on the Web Directories of Google and Yahoo!.
    [PS document]

     

2002:

  • Bernardo Magnini, Luciano Serafini, Manuela Speranza.
    Using NLP Techniques for Meaning Negotiation. In: Proceedings of VIII Convegno AI*IA, Siena, Italy, September 10 - 13, 2002.
  • This paper describes an automatic algorithm of meaning negotiation that enables semantic interoperability between local overlapping and heterogeneous ontologies. Rather than reconciling differences between heterogeneous ontologies, this algorithm searches for mappings between concepts of different ontologies. The algorithm is composed of three main steps: (i) computing the linguistic meaning of the label occurring in the ontologies via natural language processing, (ii) contextualization of such a linguistic meaning by considering the context, i.e. the ontologies, where a label occurs; (iii) comparing contextualized linguistic meaning of two ontologies in in order to find a possible matching between them.
    [PS document]

  • Bernardo Magnini, Luciano Serafini, Manuela Speranza.
    Linguistic Based Matching of Local Ontologies. In: Working notes of MeaN-02 (Workshop held in conjunction with AAAI-2002) , Edmonton, Alberta, Canada, July 28 - August 1, 2002.
  • This paper describes an automatic algorithm of meaning negotiation that enables semantic interoperability between local overlapping and heterogeneous ontologies. Rather than reconciling differences between heterogeneous ontologies, this algorithm searches for mappings between concepts of different ontologies. The algorithm is composed of three main steps: (i) computing the linguistic meaning of the label occurring in the ontologies via natural language processing, (ii) contextualization of such a linguistic meaning by considering the context, i.e. the ontologies, where a label occurs; (iii) comparing contextualized linguistic meaning of two ontologies in order to find a possible matching between them.
    [PS document]

  • Bernardo Magnini, Manuela Speranza.
    Merging Global and Specialized Linguistic Ontologies. In: Proceedings of Ontolex 2002 (Workshop held in conjunction with LREC-2002), Las Palmas, Canary Islands, Spain, May 27-31, 2002.
  • There is an increasing interest in linguistic ontologies (e.g. WordNet) for a variety of content-based tasks, including conceptual indexing, word sense disambiguation and cross-language information retrieval. A relevant contribution in this direction is represented by linguistic ontologies with domain specific coverage, which are a crucial topic for the development of concrete application systems. This paper tries to go a step further in the direction of the interoperability of specialized linguistic ontologies, by addressing the problem of their integration with global ontologies. This scenario poses some simplifications with respect to the general problem of merging ontologies, since it enables to define a strong precedence criterion so that terminological information overshadows generic information whenever conflicts arise. We assume the EuroWordNet model and propose a methodology to ``plug'' specialized linguistic ontologies into global ontologies. Experimental data related to an implemented algorithm, which has been tested on a global and a specialized linguistic ontology for the Italian language, are provided.
    [PDF document]

  • A. Roventini, A. Alonge, F. Bertagna, N. Calzolari, R. Marinelli, B. Magnini, M. Speranza, A. Zampolli.
    ItalWordNet: a Large Semantic Database for the Automatic Treatment of the Italian Language. In: Proceedings of the First Global WordNet Conference, Mysore, India, January 21-25, 2002.
  • This paper describes the main characteristics of the ItalWordNet semantic database, built within the SI-TAL Italian National Project. The database was created by extending the Italian wordnet developed within the EuroWordNet project by adding i) adjectives, adverbs and proper nouns (not dealt with within EuroWordNet); ii) a terminological subset related to the economic-financial domain. The relevant changes involved by these extensions both in the linguistic model and in the data structure are illustrated.
    [Word document]

     

2001:

  • Bernardo Magnini, Manuela Speranza.
    Integrating Generic and Specialized Wordnets. In: Proceedings of the Euroconference RANLP 2001, Tzigov Chark, Bulgaria, September 5-7, 2002.
  • Although generic (i.e. domain independent) and specialized (i.e. domain specific) lexical resources are usually developed with different aims, an integrated consultation seems to be necessary for many NLP based applications. In this paper we describe an integration procedure based on the definition of plug-in relations that are established to manage overlaps and inconsistencies between the two resources. The approach has been experimented connecting ItalWordNet, a generic lexical database for Italian, and Economic-WordNet, a specialized wordnet for the economic-financial domain.
    [Word document]