Ogni modulo equivale a 3 crediti ECTS. È possibile scegliere un totale di 10 moduli/30 ECTS nelle seguenti categorie:
- 12-15 crediti ECTS in moduli tecnico-scientifici (TSM)
I moduli TSM trasmettono competenze tecniche specifiche del profilo e si integrano ai moduli di approfondimento decentralizzati. - 9-12 crediti ECTS in basi teoriche ampliate (FTP)
I moduli FTP trattano principalmente basi teoriche come la matematica, la fisica, la teoria dell’informazione, la chimica ecc. I moduli ampliano la competenza scientifica dello studente e contribuiscono a creare un importante sinergia tra i concetti astratti e l’applicazione fondamentale per l’innovazione - 6-9 crediti ECTS in moduli di contesto (CM)
I moduli CM trasmettono competenze supplementari in settori quali gestione delle tecnologie, economia aziendale, comunicazione, gestione dei progetti, diritto dei brevetti, diritto contrattuale ecc.
La descrizione del modulo (scarica il pdf) riporta le informazioni linguistiche per ogni modulo, suddivise nelle seguenti categorie:
- Insegnamento
- Documentazione
- Esame
This module enables students to understand the main theoretical concepts that are relevant to text and speech processing, and to design applications which, one the one hand, find, classify or extract information from text or speech, and on the other hand, generate text or speech to summarize or translate language data, or in response to user instructions. The module briefly reviews fundamentals of natural language processing from a data science perspective, with emphasis on methods that support recent approaches based on deep learning models. The module emphasizes the origins and rationale of foundation models, which can be fine-tuned, instructed, or given adequate prompts to achieve a wide range of tasks, thus paving the way towards generative artificial intelligence. The module also provides practical knowledge regarding multi-task models for spoken or written input, multilingual models, and interactive systems, as well as practical skills through hands-on exercises using open-source libraries and models, focusing on the rapid prototyping of solutions for a range of typical problems.
The module is divided into four parts. The first part reviews the main concepts of language analysis and then focuses on the representation of words and the uses of bags-of-words, from the vector space model to non-contextual word embeddings with neural networks; applications include document retrieval and text similarity. In the second part, deep learning models for sequences of words are discussed in depth, preceded by a review of statistical sequence models, with application, e.g., to part-of-speech tagging and named entity recognition. The module presents a paradigm based on foundation models with Transformers – encoders, decoders, or both – which can be fine-tuned to various tasks or used for zero-shot learning. The third part surveys neural models for speech processing and synthesis, along with typical tasks, data and evaluation methods. Finally, the module presents methods that enable natural interaction with generative AI systems, including instruction tuning and reinforcement learning from human feedback, along with spoken and written chatbots, concluding with a discussions of the limitations and risks of such systems.
Requisiti
- Mathematics: basic linear algebra, probability theory (e.g. Bayes theorem), descriptive statistics and hypothesis testing.
- Machine learning and deep learning (e.g., classifiers, neural networks), basic notions of natural language processing and information retrieval (e.g., preprocessing and manipulating text data, tokenization, tagging, TF-IDF, query-based text retrieval).
- Programming for data science: good command of Python, ability to handle the entire data science pipeline (data acquisition and analysis, design and training of ML models, evaluation and interpretation of results).
Obiettivi di apprendimento
- The students are able to frame a problem in the domain of text and speech processing and generation. They can relate a new problem to a range of known problems and adapt solutions to their needs.
- The students are able to specify the characteristics of the data and features needed to train and test models, along with the suitable evaluation metrics. Given a language processing problem, they are able to design comparative experiments to identify the most promising solution.
- The students are able to select, among statistical and neural models, the most effective ones for a given task in language or speech processing and generation. Moreover, they know how to select, between existing libraries and pretrained models, the most suitable ones for a given task. The students are aware of the capabilities of foundation models, and know how to adapt them to specific task, through additional layers, fine-tuning, or prompt engineering.
Contenuti del modulo
Part I: Words [ca. 20%]
1. Brief review of basic notions of natural language processing: properties of language, speech, and text; subword tokenization, including BPE and SentencePiece; main processing stages, tasks, evaluation metrics, and applications.
2. Text classification and sentiment analysis based on statistical learning with a bag-of-words representation; evaluation metrics for these tasks.
3. Word vectors and their uses: (a) high-dimensional vectors, the VSM model, and application to document retrieval; (b) low-dimensional vectors, non-contextual word embeddings, LSA, word2vec, FastText, and applications to text similarity.
Part II: Word Sequences [ca. 35%]
4. Statistical modeling of word sequences for word-level, span-level and sentence-level tasks; application to part-of-speech (POS) tagging, named entity recognition (NER), and parsing; evaluation methods for these tasks.
5. Language modeling, from n-grams to neural networks; sequence-to-sequence models using deep neural networks, RNNs, Transformers; application to machine translation and text summarization; evaluation methods for these tasks.
6. Foundation models: encoders, decoders, and encoder-decoder models; pre-training tasks; adaptation of models to other tasks using additional layers; fine-tuning pre-trained models; few-shot learning in large language models.
Part III : Speech [ca. 20%]
7. Representation and processing of speech with neural networks; statistical models vs. neural architectures based on RNNs and Transformers; CTC architecture; survey of existing frameworks and pretrained models; notions of speech synthesis.
8. Speech processing tasks, benchmark data and evaluation methods; topic detection, information extraction, and speech translation; multilingual systems.
Part IV: Interaction [ca. 25%]
9. Large language models: survey and emerging capabilities; instruction tuning and reinforcement learning from human feedback (RLHF); prompt engineering.
10. Applications of generative AI; benchmarks with multiple tasks for evaluating foundation models and LLMs; limitations and risks, alignment with human preferences.
11. Spoken and written human-computer interaction: chatbots and dialogue systems.
Metodologie di insegnamento e apprendimento
Classroom teaching; programming exercises.
Bibliografia
Speech and Language Processing, Daniel Jurafsky and James H. Martin, 2nd edition, Prentice-Hall, 2008 / 3rd edition draft, online, 2023.
Introduction to Information Retrieval, Christopher Manning, Prabhakar Raghavan and Hinrich Schütze, 2008.
Neural Network Methods for Natural Language Processing, Yoav Goldberg, Morgan & Claypool, 2017.
Supplemental material (articles) will be indicated for each lesson.
Scarica il descrittivo completo del modulo
Indietro