TCC Division    
 Cognitive and Communication
Via Sommarive, 18 I-38050
Povo-Trento, Italy
 Technologies
  You Are Here: HomeSeminars

Seminars
  Future Seminars Past Seminars in 2000-2001-2002  

There are no seminar scheduled at the moment

 
 
 


Past Seminars
in the year

  • Friday, Oct 3rd 2003, 9.30 a.m.
    Octavian Popescu
    "Lexicalized Parsing for Word Sense Disambiguation"

    In computational linguistics WSD has been seen as the corner stone module of any large scale natural language processing application. While previously a total skepticism was expressed, in the last decade a major breakthrough has been achieved thanks to corpora study.
    Nevertheless the lack of linguistics motivated techniques imposes a relatively law level of effectiveness (the state of art figure is bellow 70% for fine grained sense lexicon such as WordNet).
    We would like to maintain that the study of sense ambiguity cannot be divorced of the phrase structure study.
    Particularly, we propose a both linguistically motivated and computational tractable technique of sentence decomposition into phrases such as the sense ambiguity can be locally determined.
    One of the main advantages of the proposed technique is that it is separable in distinct components for which independent solutions can be found. Virtually any of these components has been investigated, various solutions being proposed in literature.
    Our solution is based on many of these ideas and tries to overcome some of their drawbacks.


    Silvia Rossi
    "Bi-Directional Optimality Theory: An Application of Game Theory"

    This presentation is extracted for an article of Paul Dekker and Robert van Rooy where they point at some parallels between principles employed in optimality theoretic interpretation and notions from the well-established field of Game Theory. Optimality theoretic interpretation can be defined as what they call an "interpretation game", and optimality itself can be viewed as a solution concept for a game. More in particular, optimality can be characterized in terms of the game-theoretical notion of a "Nash Equilibrium".

  • Tuesday July 29 2003, 10:30 a.m (Sala Riunioni)
    Marco Guerini
    "Persuasion Models for Automated Multimodal Generation"

Future intelligent interfaces will have contextual goals to pursue. As opposed to more traditional scenarios of Human Computer Interaction, the user interface may also aim at inducing the user, or in general the audience, to perform some actions in the real world. Some scenarios of application are dynamic advertisement, preventive medicine, social action and edutainment. In this prospect we are investigating persuasion mechanisms and how these are connected to other related concepts such as natural argumentation. In modelling persuasion we distinguish argumentation as a subpart of it, because persuasion is also concerned with a-rational elements. We take a cognitive approach, considering the state of the participants on the basis of their beliefs-desires-intentions, but also their social relations, their emotions and the context of interaction. In this paper we propose a taxonomy of
persuasive strategies and a meta-reasoning model that works on this taxonomy.
In this paper the focus is on the high level planning of the proposed system: how it is structured and how it is combined with the adoption of appropriate rhetorical strategies (and other elements such as lexical choice) for
producing an effective and context-adapted message. The approach is also at the basis of multimodal developments.

  • Wednesday, July 9 2003, 11:00 a.m (Sala Consiglio Scientifico)
    Manuela Speranza
    A Description of SUMO (Suggested Upper Merged Ontology)

In this talk I will describe the Suggested Upper Merged Ontology (SUMO), an upper level ontology created as part of the IEEE Standard Upper Ontology Working Group.
SUMO has been completely aligned to WordNet and has been proposed as the initial version of an eventual Standard Upper Ontology (SUO).
Sumo was created by merging publicly available ontological content (two ontologies defining very high-level philosophic concepts, i.e. John Sowa's upper-level ontology and Russell and Norvig's upper-level ontology, and a number ontologies defining lower-level notions) into a single, comprehensive and cohesive structure.
SUMO can be browsed online and can be freely downloaded (the language in which this ontology is expressed is SUO-KIF, a version of the Knowledge Interchange Format, but is also available in other formats, including DAML, LOOM, XML, and Protege).

  • Wednesday, June 18 2003, 11:00 a.m (Sala Consiglio Scientifico)
    Hristo Tanev
    "Automatic Extraction of Syntactic Patterns for Question Answering"

    This seminar presents work in progress: A machine learning algorithm for extraction of syntactic patterns from parse trees is being developed.
    The algorithm is based on maximal subgraph isomorphism. In the context of Question Answering the algorithm is being applied for extraction of lexico-syntactic (and possibly semantic) patterns for answering specific question types. This machine algorithm can use different kind of linguistic information in addition to the syntactic structures. Possible algorithm extension with WordNet classes is being considered.
    Apart from the machine learning application, the algorithm can also be applied for quantitative evaluation of the similarity between the question and the answer parse trees.
    Results from some experiments will be presented.

  • Thursday, June 12 2003, hrs 11.00 (Sala Consiglio Scientifico)
    Octavian Popescu
    "Formal Conceptual Analysis - a possible way to NLP"


    Formal Conceptual Analysis (FCA) is a relatively new theory which aims to formalize algebraically the concept of "concept".
    It is assumed that any ontology is begotten by the structure of its entities and moreover that this structure can be adequately described analyzing the attributes of its entities. Concepts are defined as relations between entities and their attributes.
    Basically these relations can be represented in lattice terms. It may be the case that getting to the underlying concepts of an ontology is just to analyse the structure of the attached lattice.
    I would like to overview the theoretical framework of FCA and also to present the first attempts I have made to apply it to NLP.


  • Wednesday, June 4th 2003, 11:00 a.m (Sala Consiglio Scientifico)
    Alfio Gliozzo, ITC-irst
    Acquiring and Exploiting Semantic Domains for NLP Applications

    Recent studies in the word sense disambiguation area have shown that domain information is very useful for sense discrimination and, in general, to deal with ambiguity in texts. Moreover, domains constitute a bridge between lexicon and texts, allowing to cope with topicality in a uniform way among applications (such as text categorization, word sense disambiguation and information retrieval).

  • Monday, May 26th 2003, 14.30 p.m (Sala Consiglio Scientifico)
    Lorenza Romano, ITC-irst
    "Machine Learning for Information Extraction. Learning relationships between entities and events: some recent approaches"

    The task of Information Extraction (IE) is the selective identification of predefined information in natural language texts.
    The type of information extraction performed ranges from simple identification of entities to more complex extraction of relationships between entities and events in which entities participate.
    One of the principal challenges of IE is the efficient customization of a system to new extraction tasks. The core problem is to learn pattern base and/or text analysis rules for the new tasks.
    In this seminar we present some supervised and unsupervised approaches focusing on results obtained on the extraction of relationships and events: Crystal (Soderland et al., 1997) WHISK (Soderland 1999) Relational learning (Califf 1998; Freitag 1998; Roth, Yih 2001) ExDisco (Yangarber et al., 2000), ESSENCE (Catala` et al., 2000).

  • Friday, May 23rd 2003, 11.00 a.m. (Sala Consiglio Scientifico)
    Bogdan Sacaleanu
    German Research Center for Artificial Intelligence, Saarbruecken
    "Learning Co-occurrence Patterns for Enriching WordNets"

    Extension of lexical semantic resources assumes different methods and is dependent on the nature of lexical information that is to be extended.
    Verbs, for example, are usually extended with subcategorisation frames or selectional preferences, whereas nouns are associated with other nouns or with other concepts through various lexical and conceptual relations.
    We describe a technique for extending the lexical inventory of nouns and consider the task of adding new entries for terms not already in the lexical resource. We approach the extension task through learning lexico-syntactic co-occurrence patterns of context on domain-specific unrestricted text.
    We present a system that uses patterns of lexico-syntactic context to discover semantic similarities between concepts in WordNet and words not currently in WordNet. The hypothesis we based our work on is that words used in similar syntactic contexts with a large overlap of lexical information will be semantically similar.
    In other words, we intend to classify words by means of their contexts, driven by syntactic considerations.

  • Thursday, May 22nd 2003, 11.00 a.m. (Sala Consiglio Scientifico)
    Giorgio Satta
    Department of Information Engineering, University of Padua
    "Statistical extension of parsing strategies for probabilistic context-free grammars"

    In the last decade, several parsing algorithms for natural language processing have been proposed, working with probabilistic formalisms based on context-free grammars. Most of these works assume that purely symbolic context-free parsing strategies, as for instance Earley, Left-Corner, LR, etc., can be extended to the probabilistic case, in which the source grammar is a probabilistic context-free grammar. This talk will present new results on the relation between purely symbolic context-free parsing strategies and their probabilistic counter-parts.
    I will show that for some well-known parsing strategies, including LR, there are probability distributions, as defined by means of probabilistic context-free grammars, that cannot be preserved. I will also show that preservation of probability distribution is possible under two conditions, viz. the correct-prefix property and the property of strong predictiveness. These results generalize existing results in the literature that were obtained by considering parsing strategies in isolation.
    (work done in collaboration with Mark-Jan Nederhof)

  • Tuesday, May 6th 2003, 11.00 a.m. Conference room (main entrance hall)
    Roberto Pieraccini (Speech Works Int.)
    “Spoken Language Systems: from Research to Industry”


    For several decades spoken language research has been aiming at building machines that would approach human-like skills. Indeed, during the past few years, we observed a tremendous improvement in the performance of prototypical conversational systems built and maintained by several research institutions.
    On a parallel path, less than ten years ago, the speech recognition and dialog industry started developing robust non-conversational systems. These commercial systems are slowly but relentlessly substituting human agents and touch-tone in high-volume telephone transactions. Both research and industry are aiming at the next step in human-machine communication: multi-modal dialog systems.
    The research and commercial spoken-language technical communities share similar visions and a common cultural background. However the problems, assumptions, goals, and metrics used are often different.
    In this talk I will give an overview of the issues encountered in commercial spoken language systems with some details on the core technology, design, development, and evaluation processes. I will also show how and why the success of commercial dialog systems is based not only on high performance state of the art technology, but also on a high quality design of the user interface that derives from a deep understanding of the psychology of the user. Finally I will show how the speech industry is moving its first steps towards multi-modal applications. The talk will be complemented by several demos.

  • Tuesday, May 6th 2003, 14:30 p.m. (Sala Consiglio Scientifico)
    Christian Hempelmann, Purdue University

    Short seminar: "Paronomasic Puns: Target Recoverability towards Automatic Generation"

  • Thursday, April 3rd 2003, 2.30 p.m. (Sala Consiglio Scientifico)
    Cesare Rocchi, ITC-irst, TCC Division
    "Generation of Video Documentaries from Discourse Structures"


    Recent interest in the use of multimedia presentations and multimodal interfaces have raised the need for the automatic generation of graphics and especially temporal media. In this talk, we introduce an engine to
    build video documentaries from annotated audio commentaries. The engine, taking into consideration the discourse structure of the commentary, plans the segmentation in shots as well as the camera movements
    and decides the transition effects among the shots. The output is a complete script of a "video presentation", with instructions for synchronizing images and movements with the audio commentary.
    The language of cinematography and a set of strategies similar to those used in documentaries are the basic resources to plan the animation.


  • Friday, March 21st 2003, 11.00 a.m. (Conference room, main building, entrance)
    Milen Kouylekov, Institute of Mathematics and Informatics, Bulgarian Academy of Science
    "CLaRK - an XML-based system for corpora development"

CLaRK is an XML-based software system for corpora development. The main aim behind the design of the system is the minimization of human intervention during the creation of language resources. It incorporates several technologies:
1. XML technology;
2. Unicode;
3. Regular Cascaded Grammars;
4. Constraints over XML Documents.
For document management, storing and querying, we chose the XML technology because of its popularity and its ease of understanding. The core of CLaRK is an Unicode XML Editor, which is the main interface to the system. Besides the XML language itself, we implemented an XPath language for navigation in documents and an XSLT language for transformation of XML documents.
For multilingual processing tasks, CLaRK is based on an Unicode encoding of the information inside the system. There is a mechanism for the creation of a hierarchy of tokenisers. They can be attached to the elements in the DTDs and in this way there are different tokenisers for different parts of the documents.
The basic mechanism of CLaRK for linguistic processing of text corpora is the cascaded regular grammar processor. The main challenge to the grammars in question is how to apply them on XML encoding of the linguistic information. The system offers a solution using an Xpath language for constructing the input word to the grammar and an XML encoding of the categories of the recognized words.
Several mechanisms for imposing constraints over XML documents are available. The constraints cannot be stated by the standard XML technology. The following types of constraints are implemented in CLaRK:
1. Regular expression constraints - additional constraints over the content of given elements based on a context;
2. Number restriction constraints - cardinality constraints over the content of a document;
3. Value constraints - restriction of the possible content or parent of an element in a document based on a context.
The constraints are used in two modes: checking the validity of a document regarding a set of constraints; supporting the linguist in his/her work during the building of a corpus. The first mode allows the creation of constraints for the validation of a corpus according to given requirements.
The second mode helps the underlying strategy of minimization of the human labour.in light of future applications to QA.

  • Friday, March 14th 2003, 11.00 a.m. (Sala Consiglio Scientifico)
    Roberto Basili, University of Rome Tor Vergata, Department of Computer Science, Systems and Production,

    "Learning lexical and conceptual patterns"

    Lexical semantic patterns have a wide impact on a number of linguistic tasks, ranging from Information Extraction and Retrieval to Question Answering and Text Summarization.
    Pattern induction and learning from available resources, i.e. corpora, semantic networks as well as ontological resources, is the only reasonable approach for scaling up to realistic NLP applications.
    When multilinguality is critical, determining the nature and scope of lexical semantic information is even more important as portability of linguistic patterns across texts in different languages requires alignment at different levels (e.g. syntagmatic and semantic).
    When portability across applications and domains is a further requirement, the linguistic bias, i.e. principles for designing the suitable level of linguistic representation, is indeed critical to the quality of learning as well as to its feasibility. Here linguistic information at different levels (e.g. grammatical and word sense information) must be used as newer domains require, in general, creation, adjustments and alignment of ontological resources.
    In this talk experiences in the acquisition of lexical semantic patterns in scenarios of multilingual Information Extraction as well as within an ontology engineering task will be presented. The two cases will be also discussed in light of future applications to QA.


  • Friday, March 7th 2003, 11.00 a.m. (Conference room)
    Ido DAGAN (Computer Science Department – Bar Ilan University)
    "Unsupervised Semantic Learning in Natural Language Processing"

The research field of natural language processing (NLP) has been receiving growing attention in recent years. In particular, the major focus on empirical methods, which learn vast knowledge and inferences from available text collections (corpora), boosted feasibility and robustness of language processing techniques and facilitated real world applications.
This talk will describe an ongoing line of research for unsupervised semantic learning, addressing several disambiguation, inference and discovery tasks. Further, a novel approach to the (essentially supervised) task of text categorization will be presented, which establishes a different category specification scheme based on unsupervised learning, enjoying several practical advantages. I will conclude with some new directions in corpus-based semantic modeling and the roles they open for unsupervised and supervised learning.

  • Tuesday, February 18th 04.00 p.m. (Sala Consiglio Scientifico)
    Laure Vieu, IRIT-CNRS and LOA-ISTC-CNR
    "Rhetorical structure of dialogue and ontology of interaction"

In this talk I will present on-going work on the extension of standard SDRT (Segmented Discourse Representation Theory) to dialogue. Examples are based on a corpus of route explanation dialogues through telephone. The main issue concern individuating and giving a semantics to the set of discourse -or rhetorical- relations specific to dialogue (e.g., question-answer pair). Some of these crucially involve the so-called cognitive level, i.e., the mental attitudes of the participants, a feature deliberately avoided in standard SDRT of narratives.
I will thus proceed by introducing a larger topic, the ontology of interaction. If we want to give an adequate semantics to cognitive-level dialogue relation, we are to choose well-founded primitives, and this involves questioning the nature of mental attitudes, speech acts, dialogue conventions, etc. More generally, I believe it is time that the ontology of abstract entities, subjective entities, agents and social entities be now seriously addressed in order to improve formal models of interaction, not limited to SDRT.

  • Wednesday, January 13th 2003, - 11:00 a.m. - (Sala Consiglio Scientifico)
    R. Prevete, Università degli Studi di Napoli "Federico II"
    "Un approccio tramite Pattern al Question/Answering (Q/A) e un possibile sviluppo di sistemi di Q/A multi-lingua"

Il Q/A si prefigge lo scopo di costruire sistemi informatici capaci di estrarre le risposte corrette, espresse in linguaggio naturale, da una collezione di testi, più o meno strutturati, a partire da domande espresse, anch'esse, in linguaggio naturale.
L'uso di "pattern sintattici" in tali sistemi sembra essere una delle metodologie più convincenti. Il nostro approccio ha portato alla costruzione di due diverse tipologie di pattern: Question Pattern (QP) e Answer Pattern (AP). I QP sono stati utilizzati per la classificazione delle domande data una predefinita tassonomia. Gli AP si sono rivelati efficienti per l'estrazione di un sottoinsieme di possibili risposte data una specifica domanda. Uno dei nodi cruciali di tale approccio è proprio la costruzione, finora realizzata manualmente, dei "giusti" QP e AP. A causa nella natura prevalentemente "sintattica" di tali pattern è in fase di studio la possibilità di definire procedure di apprendimento semi-supervisionate per la costruzione automatica di QP e AP. In più tali
procedure sembrano poter essere indipendenti, in parte o del tutto, dal linguaggio. Quest'ultimo aspetto potrebbe essere un punto fondamentale per la costruzione di sistemi di Q/A multi-lingua.