- This event has passed.
SILO Seminar Series: Theo Rekatsinas
February 28 @ 12:30 pm - 1:30 pm
From Dirty Data to Structured Prediction
The advent of data-hungry applications has enabled computers to interpret what they see, communicate in natural language, and answer complex questions. There is a hidden catch, however: the reliance of all these state-of-the-art systems on high-effort tasks like data preparation and data cleaning. It is estimated that 70% to 80% of the time devoted on analytics projects is spent on checking and organizing data. The challenge is that data collection often introduces dirty data, i.e., incomplete, erroneous, replicated, or conflicting data records.
In this talk, I discuss how to reason about dirty data and demonstrate how statistical learning is the key to managing large volumes of heterogeneous, noisy data sources effectively. I will present HoloClean, our new system that relies on statistical learning and inference to repair identified data errors and anomalies. Finally, I will conclude by drawing connections between data cleaning and structured prediction and how these connections lead to new insights and solutions to classical database problems such as data repairs and consistent query answering.
SILO is a lecture series with speakers from the UW faculty, graduate students or invited researchers that discuss mathematical related topics. The seminars are organized by WID’s Optimization research group.
SILO’s purpose is to provide a forum that helps connect and recruit mathematically-minded graduate students. SILO is a lunch-and-listen format, where speakers present interesting math topics while the audience eats lunch.