14:00 – 15:00 – Patrick Rubin-Delanchy (University of Edinburgh)
Title: What makes a good embedding?
Abstract: Embeddings are continuous vector representations of entities, such as words or nodes, perhaps most widely known for their role in modern AI systems such as large language models.
In this talk I consider a different goal, which is facilitating statistical analysis. An embedding is an instrument which allows us to observe complex, unstructured, or otherwise intractable data, in a way that we can use.
In embeddings, classical (e.g. Gaussian) statistical models are tenable; concepts like similarity, or trend, have a `shape’; abstract notions such as political opinion, the health of a patient, the function of a cell, can be made geometric and measurable; and we can uncover truths that could have seemed completely absent from the raw data.
I illustrate these points with new theory connecting statistical models, embeddings and the manifold hypothesis, and with motivating problems in science, security, and recent work with Southmead hospital at Bristol.
We welcome feedback on our codebase, pyemb, a work in progress implementing these ideas: https://pyemb.github.io/pyemb/html/index.html
Refreshments available between 15:00 – 15:30, Huxley Common Room (HXLY 549)