The Seminars will take place every Thursday, usually from 11:00 to 12:00, during the months from December 2006 to June 2007 as specified in the program's overview included below.
The purpose of the seminars is to:
This page is updated regularly and it is kept as accurate as possible. However, details for the seminars may be changed at short notice and some of the seminars may not always take place as announced.
For more information, please contact: giuliano [at] itc.it
| Date | Time | Location | Speaker | affiliation | Title | Type |
|---|---|---|---|---|---|---|
| December 7 | 14:30-15:30 | Sala Grande Est | Bernardo Magnini | ITC-irst | Question Answering at ITC-irst | introduction |
| December 14 | 14:30-15:30 | Sala Consiglio Scientifico | Ido Dagan | Bar Ilan University | Textual entailment as a framework for applied semantics | invited talk |
| December 21 | 14:30-15:30 | Sala Grande Est | Marcello Federico | ITC-irst | Machine Translation at ITC-irst | introduction |
| January 11 | 11:00-12:00 | Sala Grande Est | Alberto Lavelli | ITC-irst | Information Extraction at ITC-irst | introduction |
| January 18 | 11:00-12:00 | Sala Grande Est | Nicola Bertoldi | ITC-irst | Efficient Speech Translation through Confusion Network Decoding | research results |
| January 25 | 11:00-12:00 | Sala Grande Est | Claudio Giuliano | ITC-irst | Relation Extraction and the Effect of Automatic Entity Recognition | research results |
| February 1 | 11:00-12:00 | Sala Grande Est | Lorenza Romano | ITC-irst | Simple Information Extraction (SIE): A Portable and Effective IE System | research results |
| February 15 | 11:00-12:00 | Sala Grande Est | Carlo Strapparava | ITC-irst | Dances with Words | research results |
| February 22 | 11:00-12:00 | Sala Grande Est | Milen Kouylekov | ITC-irst | Recognising Textual Entailment with Tree Edit Distance: Application to Question Answering and Information Extraction | research results |
| March 1 | 11:00-12:00 | Sala Grande Est | Daniele Pighin | FBK-irst | Tree Kernels for Statistical Natural Language Processing | research results |
| March 8 | 11:00-12:00 | Sala Grande Est | Mauro Cettolo | FBK-irst | Handling Word-reordering Phenomena in Statistical Machine Translation | research results |
| March 22 | 11:00-12:00 | Sala Consiglio Scientifico | Deepa Gupta | FBK-irst | POS-based Reordering Models for Statistical Machine Translation | research results |
| March 29 | 11:00-12:00 | Sala Grande Est | Massimiliano Ciaramita | Yahoo! Research | Dependency Parsing with Semantic Information | invited talk |
| April 12 | 11:00-12:00 | Sala Grande Est | Marco Baroni | CIMeC, University of Trento | Building Very Large Corpora from the Web | invited talk |
| May 3 | 11:00-12:00 | Sala Consiglio Scientifico | Alfio Gliozzo | FBK-irst | Semantic Domains and Ontology Learning | research results |
| May 10 | 11:00-12:00 | Sala Consiglio Scientifico | Marcello Federico | FBK-irst | Efficient Handling of N-gram Language Models for Statistical Machine Translation | research results |
| May 17 | 11:00-12:00 | Sala Consiglio Scientifico | Fabio Brugnara | FBK-irst | Speech Recognition at FBK-irst | introduction |
| May 24 | 11:00-12:00 | Sala Grande Est | Charles Callaway | University of Edinburgh | The use of ontologies in knowledge acquisition | invited talk |
| May 31 | 11:00-12:00 | Sala Grande Est | Valentina Bartalesi Lenzi, Manuela Speranza and Rachele Sprugnoli | CELCT & FBK-irst | The Italian Content Annotation Bank (I-CAB) | research results |
| June 7 | 11:00-12:00 | Sala Grande Est | Octavian Popescu | FBK-irst | Cross Document Coreference | research results |
| June 14 | 11:00-12:00 | Sala Grande Est | Daniele Falavigna | FBK-irst | Use of Word Graphs in Automatic Speech Recognition | research results |
| December 7, 14:30-15:30, Sala Grande Est |
| Question Answering at ITC-irst |
| Bernardo Magnini, ITC-irst |
This seminar is intended to provide both an introduction to Open Domain Question Answering and an overview of the research carried on at ITC-irst on this topic in the last years. I will introduce the QA task as defined in evaluation campaigns which have been running under TREC and CLEF, pointing out achievements and current limitations. The Irst QA systems will be briefly described, as well as the contribution of related research topics such as answer validation, query expansion and automatic acquisition of answer patterns. Finally, I will mention the QALL-ME project, recently started at Irst. |
| December 14, 14:30-15:30, Sala Consiglio Scientifico |
| Textual entailment as a framework for applied semantics |
| Ido Dagan, Bar Ilan University |
We have recently proposed Recognizing Textual Entailment (RTE) as a generic task that captures major semantic inferences across different natural language processing applications. The talk will first review the motivation and definition of the textual entailment task and the PASCAL RTE-1&2 Challenges benchmarks. Then we will demonstrate directions for building up textual entailment systems and utilizing them for concrete applications. Furthermore, we suggest that textual entailment modeling may become a comprehensive framework for applied semantics research. Such framework introduces novel useful variants for known semantic problems and also highlights important new problems which were hardly investigated so far within computational linguistics. This semantic modeling perspective will be reviewed and illustrated by a case study for an entailment variant of the word sense disambiguation problem. |
| December 21, 14:30-15:30, Sala Grande Est |
| Machine Translation at ITC-irst |
| Marcello Federico, ITC-irst |
After a short introduction on statistical machine translation, I will overview research carried out and results achieved at ITC-irst during the last 5 years. Finally, I will give an outlook on current and future activities on SMT. |
| January 11, 11:00-12:00, Sala Grande Est |
| Information Extraction at ITC-irst |
| Alberto Lavelli, ITC-irst |
The talk will provide both an introduction to Information Extraction and an overview of the research carried on in the TCC division on this topic in the last years. Various IE evaluation activities will be described, starting from the MUC conferences. Both the IE systems developed in the TCC division and the European projects involving the IE group will be briefly described, as well as other contributions to related research topics. |
| January 18, 11:00-12:00, Sala Grande Est |
| Efficient Speech Translation through Confusion Network Decoding |
| Nicola Bertoldi, ITC-irst |
In the talk I will introduce the Spoken Language Translation task (SLT), highlighting the issues which make SLT more complex than Text Translation. I will present a state-of-the-art system for SLT, which exploits confusion networks as interface between automatic speech recognition and machine translation. In particular, I will describe a decoding algorithm for confusion networks which results as an extension of a state-of-the-art phrase-based text translation decoder. |
| January 25, 11:00-12:00, Sala Grande Est |
| Relation Extraction and the Effect of Automatic Entity Recognition |
| Claudio Giuliano, ITC-irst |
We present an approach for extracting relations between entities from natural-language documents. The approach is based solely on shallow linguistic processing, such as tokenization, sentence splitting, part-of-speech tagging and lemmatization. It uses a combination of kernel functions to integrate two different information sources: (i) the whole sentence where the relation appears, and (ii) the local contexts around the interacting entities. We present the results of experiments on extracting five different types of relations from a data set of newswire documents and show that each information source provides a useful contribution to the recognition task. Moreover, we performed a set of experiments to assess the influence of the accuracy of named entity recognition on the performance of the relation extraction algorithm. Such experiments were performed using both the ``correct'' named entities (i.e., those manually annotated in the corpus) and the ``noisy'' named entities (i.e., those produced by a machine learning based named-entity recognizer). The results show that the approach is robust with respect to the noise introduced by a named-entity recognizer. Moreover, our approach significantly improves the previous results obtained on the same data set when a comparison is possible. |
| February 1, 11:00-12:00, Sala Grande Est |
| Simple Information Extraction (SIE): A Portable and Effective IE System |
| Lorenza Romano, ITC-irst |
In this talk we present SIE (Simple Information Extraction) an information extraction system designed and developed in the context of the IST-Dot.Kom project (2002-2005). SIE is a supervised modular system based on a general purpose machine learning algorithm (SVM) combined with several customizable modules, and was designed with the goal of being easily and quickly portable across languages, tasks, and domains. A crucial role in the architecture is played by Instance Filtering, which allows increasing efficiency without reducing effectiveness. Results obtained by SIE on several standard data sets, representative of different tasks and domains, are reported. |
| February 15, 11:00-12:00, Sala Grande Est |
| Dances with Words |
| Carlo Strapparava, ITC-irst |
Animated text is an appealing field of creative graphical design. Manually designed text animation is largely employed in advertising, movie titles and web pages. In this talk we propose to link, through state of the art NLP techniques, the affective content detection of a piece of text to the animation of the words in the text itself. This methodology allows us to automatically generate affective text animation and opens some new perspectives for many internet applications. |
| February 22, 11:00-12:00, Sala Grande Est |
| Recognising Textual Entailment with Tree Edit Distance: Application to Question Answering and Information Extraction |
| Milen Kouylekov, ITC-irst |
This work addresses the problem of Recognizing Textual Entailment (i.e. recognizing that the meaning of a text entails the meaning of another text) using a Tree Edit Distance algorithm between the syntactic trees of the two texts. A key aspect of the approach is the estimation of the cost for the editing operations (i.e. Insertion, Deletion, Substitution) among words. Our aim is to compare the contribution of different resources providing entailment rules, including lexical rules from WordNet and the UniAlberta thesaurus, and syntactic rules automatically acquired by the Dirt and TEASE systems. We carried out a number of experiments over the PASCAL-RTE dataset in order to estimate the contribution of different combinations of the available resources. In addition, we have developed and evaluated an Answer Validation module for Question Answering and a Relation Extraction system, both of them based on textual entailment. |
| March 1, 11:00-12:00, Sala Grande Est |
| Tree Kernels for Statistical Natural Language Processing |
| Daniele Pighin, FBK-irst |
Statistical classifiers are widely used for many NLP, Information and Relation Extraction tasks. The learning algorithms employed are typically trained using a combination of the linguistic information available for the text fragments at study. Nevertheless, (a) such lexical, morphological and syntactic features generally have to be defined on a per-language basis, and (b) the selection and representation of features can be hardened by the lack of a sound interpretation of the underlying linguistic phenomenon. Tree kernel functions alleviate these problems, as they can trigger automatic feature selection and evaluate the similarity between two parse tree fragments without requiring the explicit design and extraction of the attribute-value representation of the encoded texts. Simple manipulations of the parse tree fragments can also be performed, resulting in very accurate models for specific classification tasks. |
| March 8, 11:00-12:00, Sala Grande Est |
| Handling Word-reordering Phenomena in Statistical Machine Translation |
| Mauro Cettolo, FBK-irst |
In machine translation (MT), one of the main problems to handle is word reordering. A word is ``reordered'' when it and its translation occupy different positions within the corresponding sentence. In statistical MT, word reordering is faced from two points of view: constraints and modeling. Constraints are introduced to limit the exponential number of possible word reorderings. Models, known also as distortion models, provide a measure of the plausibility of allowed reorderings. In this talk, I present an overview of some of the re-ordering constraints and models widespread employed in current MT systems. |
| March 22, 11:00-12:00, Sala Consiglio Scientifico |
| POS-based Reordering Models for Statistical Machine Translation |
| Deepa Gupta , FBK-irst |
We present a novel word reordering model for phrase-based statistical machine translation. In particular, reordering of nouns, verbs and adjectives is modeled by exploiting inverse alignments, that take into account the distances between source as well as target words. The re-ordering model showed to be particularly effective for pairs of linguistically structured languages, namely Japaneses-English and German-English. The proposed model was applied as a set of additional feature functions to rescore N-best translation candidates generated by a statistical machine translation system. Experiments showed relative BLEU score improvement of 4.7-6.2% on the BTEC Japaneses-to-English task, and 3.9-5.6% on the Europarl German-to-English task. |
| March 29, 11:00-12:00, Sala Grande Est |
| Dependency Parsing with Semantic Information |
| Massimiliano Ciaramita, Yahoo! Research |
In this talk I will present ongoing research which investigates new design options for the feature space of syntactic dependency parsers. We focus on one of the simplest parsing architectures, based on deterministic shift-reduce algorithms, trained with the perceptron. We show that by adopting second-order feature maps, the primal form of the perceptron produces models with comparable accuracy to more complex parsers, without need for approximations. Further, we explore the application of new features extracted from the annotations produced by a semantic tagger. These semantic features guarantee additional improvements in a accuracy and provide the first promising evidence of the usefulness of annotated semantic information for syntactic parsing. We provide standard experimental evaluations on the Wall Street Journal Penn Treebank. |
| April 12, 11:00-12:00, Sala Grande Est |
| Building Very Large Corpora from the Web |
| Marco Baroni, CIMeC, University of Trento |
In this talk, I introduce a few initiatives I have been recently involved in (in particular, WaCky and CLEANEVAL) that aim at collecting, pre-processing, annotating and indexing large amounts of textual data crawled from the Web. I first motivate the idea of building corpora from the Web (vs. relying on a commercial search engine to gather linguistic data); then, I describe the Web corpus creation pipeline we developed, and shortly present a few applications of our Web corpora. Finally, I discuss what I believe, based on our experiences, to be the major challenges in this area for the near future. |
| May 3, 11:00-12:00, Sala Consiglio Scientifico |
| Semantic Domains and Ontology Learning |
| Alfio Gliozzo, FBK-irst |
Knowledge acquisition from texts is an old problem in Artificial Intelligence. Recently, due to the increasing interest for the Semantic WEB, the research community is concentrating on learning domain ontologies from texts. Semantic Domains plays a crucial role in the acquisition process, as they show many interesting properties that can be exploited to enhance the performance of terminology induction, relation extraction and ontology pruning algorithms. In addition, they allow us to design ontology learning algorithms working on open domain corpora, specifying the domain of interest by simply querying the system in an Information Retrieval style. |
| May 10, 11:00-12:00, Sala Consiglio Scientifico |
| Efficient Handling of N-gram Language Models for Statistical Machine Translation |
| Marcello Federico, FBK-irst |
Statistical machine translation, as well as other areas of human language processing, have recently pushed toward the use of large scale n-gram language models. This paper presents efficient algorithmic and architectural solutions which have been tested within the Moses decoder, an open source toolkit for statistical machine translation. Experiments are reported with a high performing baseline, trained on the Chinese-English NIST 2006 Evaluation task and running on a standard Linux 64-bit PC architecture. Comparative tests show that our representation reduces of 58% the memory required by SRI LM Toolkit, at the cost of 44% slower translation speed. However, as it can take advantage of memory mapping on disk, the proposed implementation seems to scale-up much better to very large language models: decoding with a 289-million 5-gram language model runs in 2.1Gb of RAM. |
| May 17, 11:00-12:00, Sala Consiglio Scientifico |
| Speech Recognition at FBK-irst |
| Fabio Brugnara, FBK-irst |
This seminar will provide an introduction to the technologies involved in large vocabulary continuous speech recognition. Topics include hints on acoustic modeling, language modeling and their integration to the purpose of achieving an efficient decoding of speech signals, with an outline of the main algorithms and data structures used in a speech recognition system. The challenges raised by large size tasks as the transcription of spontaneous unconstrained speech will be briefly discussed. The seminar will include a demonstration of the performance of automatic transcription systems developed at Irst on some indicative domains. |
| May 24, 11:00-12:00, Sala Grande Est |
| The use of ontologies in knowledge acquisition |
| Charles Callaway, University of Edinburgh |
Automatic Knowledge Acquisition comprises a diverse collection of approaches for the more efficient creation of represented semantic knowledge than manual knowledge engineering alone can expect. Producing an automatic text-based KA system requires solving a number of problems in text processing, ontology learning, formal knowledge representation, knowledge extraction, and verification. In this talk I focus on the scaffolding role that pre-existing and learned ontologies play in text-based KA by presenting examples where no, minimal, and extensive ontologies have different effects on incoming knowledge. I also present a prototype text-based KA system that extracts knowledge from unrestricted text in a concretely represented form, discuss different methods to evaluate its accuracy (both potential and implemented), and describe what ontological processing is sufficient and/or necessary for this KA system. |
| May 31, 11:00-12:00, Sala Grande Est |
| The Italian Content Annotation Bank (I-CAB) |
| Valentina Bartalesi Lenzi, Manuela Speranza and Rachele Sprugnoli, CELCT & FBK-irst |
In this talk we will present the Italian Content Annotation Bank (I-CAB), a corpus of Italian news stories annotated with semantic information at three different levels: temporal expressions, different types of entities (i.e. persons, organizations, locations and geo-political entities), and relations between entities (e.g. the affiliation relation connecting a person to an organization). So far, I-CAB has been manually annotated with the first two levels, while the annotation of relations is work in progress. As we intend I-CAB to become a benchmark for various automatic Information Extraction tasks, we followed a policy of reusing already available markup languages. In particular, we adopted the annotation schemes developed for the ACE Entity Detection and Time Expressions Recognition and Normalization tasks, adapting them to the specific morpho-syntactic features of Italian and extending them to include a wider range of entities, such as conjunctions. |
| June 7, 11:00-12:00, Sala Grande Est |
| Cross Document Coreference |
| Octavian Popescu, FBK-irst |
We present a general system for Person Cross Document Coreference that combines name frequency estimates with list of NEs. The system has three main modules: Person Name Splitter, Local Coreference and Global Coreference, corresponding to three main steps. The first step is to identify for each person name (PN) its type, first_name, last_name respectively. This information is used at the second step where the coreference among the names in the same document is established. The output of Local Coreference is a list of names, which represents the input of the third module, Global Coreference. Two names from different news corefer if their contexts are similar (context is considered a bag of words with no linguistics processing). The cluster of globally coreferred names represents the names of a unique person. These three steps are repeated till no new coreferences are made. |
| June 14, 11:00-12:00, Sala Grande Est |
| Use of Word Graphs in Automatic Speech Recognition |
| Daniele Falavigna, FBK-irst |
Word graphs are largely used in automatic speech recognition because they can embed the correct recognition hypothesis inside a reduced search space. This allows to adopt complex acoustic and language models for exploring the search space, and to employ decoding techniques aimed at minimizing the word error rate instead of the sentence error rate. In the seminar word graph generation, word graph expansion and some minimum word error rate decoding methods proposed in the literature will be introduced and discussed. Some preliminary results obtained on speech data acquired during recent European Parliament sessions will be given. |
Claudio Giuliano
FBK-irst - Centro per la Ricerca Scientifica e Tecnologica
via Sommarive, 18
I-38050 Povo (TN), ITALY
email: giuliano [at] itc [dot] it