| |
Past Seminars in
the year
- Friday,
Oct 3rd 2003, 9.30 a.m.
Octavian Popescu
"Lexicalized Parsing for Word Sense Disambiguation"
In computational linguistics WSD has been seen as the corner stone module
of any large scale natural language processing application. While previously
a total skepticism was expressed, in the last decade a major breakthrough
has been achieved thanks to corpora study.
Nevertheless the lack of linguistics motivated techniques imposes a
relatively law level of effectiveness (the state of art figure is bellow
70% for fine grained sense lexicon such as WordNet).
We would like to maintain that the study of sense ambiguity cannot be
divorced of the phrase structure study.
Particularly, we propose a both linguistically motivated and computational
tractable technique of sentence decomposition into phrases such as the
sense ambiguity can be locally determined.
One of the main advantages of the proposed technique is that it is separable
in distinct components for which independent solutions can be found.
Virtually any of these components has been investigated, various solutions
being proposed in literature.
Our solution is based on many of these ideas and tries to overcome some
of their drawbacks.
Silvia Rossi
"Bi-Directional Optimality Theory: An Application of Game Theory"
This presentation is extracted for an article of Paul Dekker and Robert
van Rooy where they point at some parallels between principles employed
in optimality theoretic interpretation and notions from the well-established
field of Game Theory. Optimality theoretic interpretation can be defined
as what they call an "interpretation game", and optimality
itself can be viewed as a solution concept for a game. More in particular,
optimality can be characterized in terms of the game-theoretical notion
of a "Nash Equilibrium".
- Tuesday
July 29 2003, 10:30 a.m (Sala Riunioni)
Marco Guerini
"Persuasion
Models for Automated Multimodal Generation"
Future
intelligent interfaces will have contextual goals to pursue. As opposed
to more traditional scenarios of Human Computer Interaction, the user
interface may also aim at inducing the user, or in general the audience,
to perform some actions in the real world. Some scenarios of application
are dynamic advertisement, preventive medicine, social action and edutainment.
In this prospect we are investigating persuasion mechanisms and how
these are connected to other related concepts such as natural argumentation.
In modelling persuasion we distinguish argumentation as a subpart of
it, because persuasion is also concerned with a-rational elements. We
take a cognitive approach, considering the state of the participants
on the basis of their beliefs-desires-intentions, but also their social
relations, their emotions and the context of interaction. In this paper
we propose a taxonomy of
persuasive strategies and a meta-reasoning model that works on this
taxonomy.
In this paper the focus is on the high level planning of the proposed
system: how it is structured and how it is combined with the adoption
of appropriate rhetorical strategies (and other elements such as lexical
choice) for
producing an effective and context-adapted message. The approach is
also at the basis of multimodal developments.
- Wednesday,
July 9 2003, 11:00 a.m (Sala Consiglio Scientifico)
Manuela
Speranza
A Description of SUMO (Suggested Upper Merged Ontology)
In
this talk I will describe the Suggested Upper Merged Ontology (SUMO),
an upper level ontology created as part of the IEEE Standard Upper Ontology
Working Group.
SUMO
has been completely aligned to WordNet and has been proposed as the
initial version of an eventual Standard Upper Ontology (SUO).
Sumo
was created by merging publicly available ontological content (two ontologies
defining very high-level philosophic concepts, i.e. John Sowa's upper-level
ontology and Russell and Norvig's upper-level ontology, and a number
ontologies defining lower-level notions) into a single, comprehensive
and cohesive structure.
SUMO
can be browsed online and can be freely downloaded (the language in
which this ontology is expressed is SUO-KIF, a version of the Knowledge
Interchange Format, but is also available in other formats, including
DAML, LOOM, XML, and Protege).
- Wednesday,
June 18 2003, 11:00 a.m (Sala Consiglio Scientifico)
Hristo Tanev
"Automatic Extraction of Syntactic Patterns for Question Answering"
This seminar presents work in progress: A machine learning algorithm
for extraction of syntactic patterns from parse trees is being developed.
The algorithm is based on maximal subgraph isomorphism. In the context
of Question Answering the algorithm is being applied for extraction
of lexico-syntactic (and possibly semantic) patterns for answering specific
question types. This machine algorithm can use different kind of linguistic
information in addition to the syntactic structures. Possible algorithm
extension with WordNet classes is being considered.
Apart from the machine learning application, the algorithm can also
be applied for quantitative evaluation of the similarity between the
question and the answer parse trees.
Results from some experiments will be presented.
- Thursday,
June 12 2003, hrs 11.00 (Sala Consiglio Scientifico)
Octavian Popescu
"Formal Conceptual Analysis - a possible way to NLP"
Formal
Conceptual Analysis (FCA) is a relatively new theory which aims to formalize
algebraically the concept of "concept".
It is assumed that any ontology is begotten by the structure of its
entities and moreover that this structure can be adequately described
analyzing the attributes of its entities. Concepts are defined as relations
between entities and their attributes.
Basically these relations can be represented in lattice terms. It may
be the case that getting to the underlying concepts of an ontology is
just to analyse the structure of the attached lattice.
I would like to overview the theoretical framework of FCA and also to
present the first attempts I have made to apply it to NLP.
- Wednesday,
June 4th 2003, 11:00 a.m
(Sala Consiglio Scientifico)
Alfio
Gliozzo, ITC-irst
Acquiring and Exploiting Semantic Domains for NLP Applications
Recent
studies in the word sense disambiguation area have shown that domain
information is very useful for sense discrimination and, in general,
to deal with ambiguity in texts. Moreover, domains constitute a bridge
between lexicon and texts, allowing to cope with topicality in a uniform
way among applications (such as text categorization, word sense disambiguation
and information retrieval).
- Monday,
May 26th 2003, 14.30 p.m (Sala Consiglio Scientifico)
Lorenza Romano, ITC-irst
"Machine Learning for Information Extraction. Learning relationships
between entities and events: some recent approaches"
The task of Information Extraction (IE) is the selective identification
of predefined information in natural language texts.
The type of information extraction performed ranges from simple identification
of entities to more complex extraction of relationships between entities
and events in which entities participate.
One
of the principal challenges of IE is the efficient customization of
a system to new extraction tasks. The core problem is to learn pattern
base and/or text analysis rules for the new tasks.
In
this seminar we present some supervised and unsupervised approaches
focusing on results obtained on the extraction of relationships and
events: Crystal (Soderland et al., 1997) WHISK (Soderland 1999) Relational
learning (Califf 1998; Freitag 1998; Roth, Yih 2001) ExDisco (Yangarber
et al., 2000), ESSENCE (Catala` et al., 2000).
- Friday,
May 23rd 2003, 11.00 a.m. (Sala Consiglio Scientifico)
Bogdan Sacaleanu
German Research Center for Artificial Intelligence, Saarbruecken
"Learning Co-occurrence Patterns for Enriching WordNets"
Extension of lexical semantic resources assumes different methods and
is dependent on the nature of lexical information that is to be extended.
Verbs, for example, are usually extended with subcategorisation frames
or selectional preferences, whereas nouns are associated with other
nouns or with other concepts through various lexical and conceptual
relations.
We describe a technique for extending the lexical inventory of nouns
and consider the task of adding new entries for terms not already in
the lexical resource. We approach the extension task through learning
lexico-syntactic co-occurrence patterns of context on domain-specific
unrestricted text.
We present a system that uses patterns of lexico-syntactic context to
discover semantic similarities between concepts in WordNet and words
not currently in WordNet. The hypothesis we based our work on is that
words used in similar syntactic contexts with a large overlap of lexical
information will be semantically similar.
In other words, we intend to classify words by means of their contexts,
driven by syntactic considerations.
- Thursday,
May 22nd 2003, 11.00 a.m. (Sala Consiglio Scientifico)
Giorgio Satta
Department of Information Engineering, University of Padua
"Statistical extension of parsing strategies for probabilistic
context-free grammars"
In the last decade, several parsing algorithms for natural language
processing have been proposed, working with probabilistic formalisms
based on context-free grammars. Most of these works assume that purely
symbolic context-free parsing strategies, as for instance Earley, Left-Corner,
LR, etc., can be extended to the probabilistic case, in which the source
grammar is a probabilistic context-free grammar. This talk will present
new results on the relation between purely symbolic context-free parsing
strategies and their probabilistic counter-parts.
I will show that for some well-known parsing strategies, including LR,
there are probability distributions, as defined by means of probabilistic
context-free grammars, that cannot be preserved. I will also show that
preservation of probability distribution is possible under two conditions,
viz. the correct-prefix property and the property of strong predictiveness.
These results generalize existing results in the literature that were
obtained by considering parsing strategies in isolation.
(work
done in collaboration with Mark-Jan Nederhof)
- Tuesday,
May 6th 2003, 11.00 a.m. Conference room (main entrance hall)
Roberto
Pieraccini (Speech Works Int.)
Spoken Language Systems: from Research to Industry
For
several decades spoken language research has been aiming at building
machines that would approach human-like skills. Indeed, during the
past few years, we observed a tremendous improvement in the performance
of prototypical conversational systems built and maintained by several
research institutions.
On a parallel path, less than ten years ago, the speech recognition
and dialog industry started developing robust non-conversational systems.
These commercial systems are slowly but relentlessly substituting
human agents and touch-tone in high-volume telephone transactions.
Both research and industry are aiming at the next step in human-machine
communication: multi-modal dialog systems.
The research and commercial spoken-language technical communities
share similar visions and a common cultural background. However the
problems, assumptions, goals, and metrics used are often different.
In this talk I will give an overview of the issues encountered in
commercial spoken language systems with some details on the core technology,
design, development, and evaluation processes. I will also show how
and why the success of commercial dialog systems is based not only
on high performance state of the art technology, but also on a high
quality design of the user interface that derives from a deep understanding
of the psychology of the user. Finally I will show how the speech
industry is moving its first steps towards multi-modal applications.
The talk will be complemented by several demos.
- Tuesday,
May 6th 2003, 14:30
p.m. (Sala Consiglio Scientifico)
Christian Hempelmann, Purdue University
Short
seminar: "Paronomasic Puns: Target Recoverability towards
Automatic Generation"
- Thursday,
April 3rd 2003, 2.30 p.m. (Sala Consiglio Scientifico)
Cesare
Rocchi, ITC-irst, TCC Division
"Generation of Video Documentaries from Discourse Structures"
Recent
interest in the use of multimedia presentations and multimodal interfaces
have raised the need for the automatic generation of graphics and especially
temporal media. In this talk, we introduce an engine to
build video documentaries from annotated audio commentaries. The engine,
taking into consideration the discourse structure of the commentary,
plans the segmentation in shots as well as the camera movements
and decides the transition effects among the shots. The output is a
complete script of a "video presentation", with instructions
for synchronizing images and movements with the audio commentary.
The language of cinematography and a set of strategies similar to those
used in documentaries are the basic resources to plan the animation.
- Friday,
March 21st 2003, 11.00 a.m. (Conference room, main building, entrance)
Milen Kouylekov, Institute of Mathematics and Informatics, Bulgarian
Academy of Science
"CLaRK - an XML-based system for corpora development"
CLaRK
is an XML-based software system for corpora development. The main aim
behind the design of the system is the minimization of human intervention
during the creation of language resources. It incorporates several technologies:
1. XML technology;
2. Unicode;
3. Regular Cascaded Grammars;
4. Constraints over XML Documents.
For document management, storing and querying, we chose the XML technology
because of its popularity and its ease of understanding. The core of
CLaRK is an Unicode XML Editor, which is the main interface to the system.
Besides the XML language itself, we implemented an XPath language for
navigation in documents and an XSLT language for transformation of XML
documents.
For multilingual processing tasks, CLaRK is based on an Unicode encoding
of the information inside the system. There is a mechanism for the creation
of a hierarchy of tokenisers. They can be attached to the elements in
the DTDs and in this way there are different tokenisers for different
parts of the documents.
The basic mechanism of CLaRK for linguistic processing of text corpora
is the cascaded regular grammar processor. The main challenge to the
grammars in question is how to apply them on XML encoding of the linguistic
information. The system offers a solution using an Xpath language for
constructing the input word to the grammar and an XML encoding of the
categories of the recognized words.
Several mechanisms for imposing constraints over XML documents are available.
The constraints cannot be stated by the standard XML technology. The
following types of constraints are implemented in CLaRK:
1. Regular expression constraints - additional constraints over the
content of given elements based on a context;
2. Number restriction constraints - cardinality constraints over the
content of a document;
3. Value constraints - restriction of the possible content or parent
of an element in a document based on a context.
The constraints are used in two modes: checking the validity of a document
regarding a set of constraints; supporting the linguist in his/her work
during the building of a corpus. The first mode allows the creation
of constraints for the validation of a corpus according to given requirements.
The second mode helps the underlying strategy of minimization of the
human labour.in light of future applications to QA.
- Friday,
March 14th 2003, 11.00 a.m. (Sala Consiglio Scientifico)
Roberto Basili, University of Rome Tor Vergata, Department of Computer
Science, Systems and Production,
"Learning lexical and conceptual patterns"
Lexical semantic patterns have a wide impact on a number of linguistic
tasks, ranging from Information Extraction and Retrieval to Question
Answering and Text Summarization.
Pattern induction and learning from available resources, i.e. corpora,
semantic networks as well as ontological resources, is the only reasonable
approach for scaling up to realistic NLP applications.
When multilinguality is critical, determining the nature and scope of
lexical semantic information is even more important as portability of
linguistic patterns across texts in different languages requires alignment
at different levels (e.g. syntagmatic and semantic).
When portability across applications and domains is a further requirement,
the linguistic bias, i.e. principles for designing the suitable level
of linguistic representation, is indeed critical to the quality of learning
as well as to its feasibility. Here linguistic information at different
levels (e.g. grammatical and word sense information) must be used as
newer domains require, in general, creation, adjustments and alignment
of ontological resources.
In this talk experiences in the acquisition of lexical semantic patterns
in scenarios of multilingual Information Extraction as well as within
an ontology engineering task will be presented. The two cases will be
also discussed in light of future applications to QA.
- Friday,
March 7th 2003, 11.00 a.m. (Conference room)
Ido DAGAN (Computer Science Department Bar Ilan University)
"Unsupervised Semantic Learning in Natural Language Processing"
The
research field of natural language processing (NLP) has been receiving
growing attention in recent years. In particular, the major focus on
empirical methods, which learn vast knowledge and inferences from available
text collections (corpora), boosted feasibility and robustness of language
processing techniques and facilitated real world applications.
This talk will describe an ongoing line of research for unsupervised
semantic learning, addressing several disambiguation, inference and
discovery tasks. Further, a novel approach to the (essentially supervised)
task of text categorization will be presented, which establishes a different
category specification scheme based on unsupervised learning, enjoying
several practical advantages. I will conclude with some new directions
in corpus-based semantic modeling and the roles they open for unsupervised
and supervised learning.
- Tuesday,
February 18th 04.00 p.m. (Sala Consiglio Scientifico)
Laure Vieu, IRIT-CNRS and LOA-ISTC-CNR
"Rhetorical structure of dialogue and ontology of interaction"
In
this talk I will present on-going work on the extension of standard
SDRT (Segmented Discourse Representation Theory) to dialogue. Examples
are based on a corpus of route explanation dialogues through telephone.
The main issue concern individuating and giving a semantics to the set
of discourse -or rhetorical- relations specific to dialogue (e.g., question-answer
pair). Some of these crucially involve the so-called cognitive level,
i.e., the mental attitudes of the participants, a feature deliberately
avoided in standard SDRT of narratives.
I will thus proceed by introducing a larger topic, the ontology of interaction.
If we want to give an adequate semantics to cognitive-level dialogue
relation, we are to choose well-founded primitives, and this involves
questioning the nature of mental attitudes, speech acts, dialogue conventions,
etc. More generally, I believe it is time that the ontology of abstract
entities, subjective entities, agents and social entities be now seriously
addressed in order to improve formal models of interaction, not limited
to SDRT.
- Wednesday,
January 13th 2003, - 11:00 a.m. - (Sala Consiglio Scientifico)
R. Prevete, Università degli Studi di Napoli "Federico II"
"Un approccio tramite Pattern al Question/Answering (Q/A) e
un possibile sviluppo di sistemi di Q/A multi-lingua"
Il
Q/A si prefigge lo scopo di costruire sistemi informatici capaci di
estrarre le risposte corrette, espresse in linguaggio naturale, da una
collezione di testi, più o meno strutturati, a partire da domande
espresse, anch'esse, in linguaggio naturale.
L'uso di "pattern sintattici" in tali sistemi sembra essere
una delle metodologie più convincenti. Il nostro approccio ha
portato alla costruzione di due diverse tipologie di pattern: Question
Pattern (QP) e Answer Pattern (AP). I QP sono stati utilizzati per la
classificazione delle domande data una predefinita tassonomia. Gli AP
si sono rivelati efficienti per l'estrazione di un sottoinsieme di possibili
risposte data una specifica domanda. Uno dei nodi cruciali di tale approccio
è proprio la costruzione, finora realizzata manualmente, dei
"giusti" QP e AP. A causa nella natura prevalentemente "sintattica"
di tali pattern è in fase di studio la possibilità di
definire procedure di apprendimento semi-supervisionate per la costruzione
automatica di QP e AP. In più tali
procedure sembrano poter essere indipendenti, in parte o del tutto,
dal linguaggio. Quest'ultimo aspetto potrebbe essere un punto fondamentale
per la costruzione di sistemi di Q/A multi-lingua.
|
|
|