The overall objective of the project is to overcome the technological barriers currently preventing web users from having interactive access to and use of large quantities of data of contemporary Italian to improve their language skills. The project is particularly targeted to second generation emigrants from Italy who keep Italian as a native language, but in severely limited usage, and third generation emigrants who have Italian as a second language (L2).
The aim of this project is to build a large and richly annotated open corpus of written Italian (> 100 M words); the novelty of the project is using for the corpus an entirely non-copyrighted sample of texts. Thus, the corpus will be made freely accessible on-line by means of a user-friendly query interface.
The objective of the project is divided into two main sub-goals:
A) development of language resources in electronic format such as lexical databases and richly annotated corpora.
(University of Bologna & CNR Pisa are in charge of work package A.)
B) development of a novel web-based interface to the created corpora, fostering on-line access to concrete contexts of use of contemporary Italian.
(The Institute for Specialised Communication and Multilingualism of the European Academy Bozen/Bolzano is in charge of work package B.)
For further information of the project and access to the corpus visit www.corpusitaliano.it.