GIST

Generating InStructional Text

(European Project LRE 062-09)








The GIST project addresses the construction of a multilingual generation system for texts describing administrative procedures (e.g., how to apply for a social security facility, what to do in order to get visas, etc.) starting from language independent specifications.
We considered three languages: English, German and Italian.
The application domain of the developed prototype is represented by instructions on how to fill out pension forms.

Project Summary






























Application forms represent one of the main communication channels between the citizens and the Public Administration. Whenever a citizen wants to apply for a document or a benefit, he/she is required to fill out a form, specifying various kinds of information, from personal details to income data. The requested information reflects the current status of the legislation and may be quite complex. For this reason forms very often include instructions that help the applicant to fill them out.

Producing clear and effective application forms, is a major and permanent effort for large public institutions. Every time some changes are introduced in the current legislation about the services offered to citizens or the obligations expected from them, new application forms need to be created or old ones need to be revised. The problem is even more complex in multilingual areas, where public documentation must appear in all the official languages. Administrative agreements between different countries (for example in the pension domain) are another source of multilingual forms, and so could be in the future massive immigration.

A possible solution to the problem of producing and maintaining multilingual versions of forms is the use of automatic tools, such as machine translation or multilingual generation systems. The GIST project (LRE 062-09) explored the latter solution, addressing the development of a multilingual generation system for the automatic production of texts describing administrative procedures (e.g. the instructions that a citizen has to follow to apply for pension benefits) in three different languages: English, Italian and German. The GIST system aims at providing good quality drafts of text; such drafts can then be revised and post-edited by professional writers and/or translators.

The consortium of the project included academic and industrial partners -IRST (Trento, Italy), ITRI (University of Brighton, England), ÖFAI (Vienna, Austria), Quinary (Milano, Italy), Universidade Complutense de Madrid (Spain)- as well as two user groups collaborating actively to the specification and evaluation of the system -the Italian National Security Service (INPS) and the Autonome Provice of Bolzano (PAB). A third user group -the British Department of Social Security (DSS)- accepted to contribute to the project definition and provided valuable input .

The final prototype of GIST integrates and adapts tools developed in the framework of numerous national and international projects and has delivered new components on the bases of theoretical and extensive empirical research. It serves as an aid to the public administrators and technical writers in the definition and writing of a formal administrative procedure. We expect the system to significantly shorten and improve the process of production of this kind of texts.


What is Natural Language Generation?

A Natural Language Generation system is a computational tool that automatically "builds" a text (a sequence of sentences) starting from abstract (non-linguistic) specifications.

Given the internal representation of the knowledge sources, the system decides what is the relevant information to be communicated, it organizes a coherent text structure and produces the most appropriate linguistic expressions to convey the message.

What is Multilingual Generation?

A Multilingual Generation system takes as input an abstract representation of a document's contents and "builds" texts in different languages.

With multilingual generation we do not have the production of a first original text and a subsequent translation process: all the texts are produced in parallel.




Current production cycle for administrative documents for multilingual areas

As the reference application scenario, we took the production of bilingual documents (Italian/German) required in the Italian bilingual province of Bolzano.

Currently, all the public documentation that circulates inside the province is first produced in Italian and then translated into German. For documents with national validity, like laws or forms for pension benefits claim, the original Italian version of texts is first produced in Rome and then sent to Bolzano for translation or, sometimes, revision of the translation.

The figure sketches the current information flow in the document production process.


Multilingual documents production cycle with automatic generation tools

The new scenario, envisaged with the use of GIST's results, avoids the production of a first complete version of the document in Italian. The administrative procedure to be communicated is described only once, in abstract terms. Three text drafts are then produced in parallel for Italian, German and English.



Advantages of Automatic Multilingual Generation

The advantages introduced in our scenario by the automatic generation of multilingual texts include:








* the user of the GIST system is an administrative expert that knows the content of the document.
* through a graphical interface the user specifies the content of the document: he uses menus to specify actions, conditions, entities, relations between entities...





* the graphical information is mapped into an internal representation and recorded in the Knowledge Base. On this data the generation process is activated.





* The generator extracts the relevant information in the Knowledge Base and arranges it in a structured text plan that satisfies the communicative intentions and coherence requirements. A different text plan is produced for each language.





* The text plans are converted into an internal representation containing the semantic and syntactic information necessary to select the most appropriate linguistic expressions in the various languages





* The sentences of the output texts are built by components specialized for each language.






The GIST architecture

An overall sketch of the GIST architecture is shown in the figure here below. The system consists of four main components: the User Interface, allowing the user to input the content of the message and some global parameters constraining the text generation; the Strategical Planner, building Text Plans for the three languages; three distinct Tactical Generators, responsible for the linguistic realization of the text plans; the Knowledge Base where domain dependent (Domain Model) and general linguistic (Upper Model) concepts are defined.

Some of the components are build re-using or adapting existing tools. The Knowledge Base [Fabris et al., 1994] is implemented using the LOOM representation language [MacGregor and Bates, 1987]. A Generalized Upper Model is used, which has been built from existing Upper Models for English/German [Henschel, 1993] and Italian (partly developed within the project [Bateman et al, 1994]). All Tactical Generators are adaptations of existing systems. For English, the KPML system, developed at IPSI-GMD as an extension of PENMAN [Penman, 1989], is used. The generator for Italian has been developed at IRST and is based on a GB-style unification grammar [Pianesi, 1993]. The generator for German, instead, has been developed at ÖFAI and is based on a HPSG grammar implemented in the FUF formalism [Elhadad, 1991]. The language used to describe the input text plan (ESPL) is an extended version, developed within the project, of the Sentence Plan Language [Kasper, 1989] which has already been successfully used in a number of generation systems. ESPL includes features and keywords necessary to describe the semantic content and the structure of the text to be produced in all the three languages.

Other modules have been specifically developed for the project needs. The graphical interface has been based on the form layout and provides a menu-based on-line help to inform the author about the alternative items of knowledge available to compose the message. The strategic planner has been designed to incrementally build the communicative and the rhetorical structures integrating them with information on cohesive devices.

The system is implemented in Common Lisp and runs on a SUN workstation under the Lucid Common Lisp 4.2 or the Allegro Common Lisp 4.2 environments. To run the system, the LOOM 2.0 code and the GINA (free domain software) graphical library are required.


GIST Bibliography


Get the Final Report of the GIST project here



Prime contractor:

* I.R.S.T. - Istituto per la Ricerca Scientifica e Tecnologica, Istituto Trentino di Cultura, Trento, I

Other partners:

* I.T.R.I.- Information Technology Research Institute, University of Brighton, UK

* Quinary S.p.A., Milano, I

* ÖFAI - Österreichisches Forschungsinstitut für Künstliche Intelligenz, Vienna, A

* Universidad Complutense de Madrid, Madrid, E


Institutions participating in GIST as user groups:

*Autonome Provinz Bozen / Provincia Autonoma di Bolzano, I

* I.N.P.S. - Istituto Nazionale della Previdenza Sociale, I

Contact point:

Fabio Pianesi

I.R.S.T. - via Sommarive - 38050 POVO (Trento), Italia

Phone +39-461-314327

Fax +39-461-314591