en en

Summer school

Textual Analysis and Machine Learning with Applications to Economics and Finance

Event type

5-day training course


Date

19. – 23. June 2023


Venue

National Bank of Slovakia, 16th floor
Imricha Karvaša 1
813 25 Bratislava


Tutors

Matthieu Picault (University of Orléans)
Thomas Renault (University Paris 1 Panthéon-Sorbonne)


Agenda

Event Program

1.94 MB

Language

English


Contact

Martin Cesnak
martin.cesnak@nbs.sk

Pavel Gertler
pavel.gertler@nbs.sk

The objective of this course is to study how we can use the millions of textual contents published on the Internet and social media every day to improve our understanding of various economic and financial phenomena.

Hand on laptop in the background with different graphs
Course Content

The objective of this course is to study how we can use the millions of textual contents
published on the Internet and social media every day to improve our understanding of various
economic and financial phenomena. After an introduction to the Python programming
language, we will start by seeing how it is possible to extract online content via the use of
existing APIs or the implementation of web scraping tools. We will create an application to
collect articles from a major media site and we will use an API to extract tweets from a social
network dedicated to finance.

  • Read more

    Next, we will see how to analyse a text using Natural Language Processing (NLP) methods and create a full NLP pipeline (cleaning, stop words, Part-of-Speech tagging, Named Entity Recognition, Stemming/Lemmatization) relevant to a given research project. We will apply this to the press conferences made by the European Central Bank to show how it is possible to give structure to unstructured data. The next session will be dedicated to sentiment analysis and will present the different methods (dictionary approach and machine learning) with an application on a database of media articles. The fourth session will be devoted to machine learning using text as data with an application on StockTwits data (asset pricing). In the last session, we will introduce methods of textual analysis on unsupervised data (topic modelling and transformers). We will perform an application of a Latent Dirichlet Allocation on a large corpus of Glassdoor reviews.

    For the different sessions, we will first present both the related theories and methods – in a language accessible to non-mathematicians – and their latest applications in the economic and financial literature. We will then study and share with the participants all scripts and codes to realize different tasks in Python. We will also offer participants the opportunity to present their research and/or projects, and if possible, we will assist them with their projects – both on the data collection side and on the data analysis side.

Registration form

Deadline for registration is 2.5.2023.

For capacity reasons not all registrations might be accepted.