SliceLens: guided exploration of machine learning datasets

A screenshot showing SliceLens being used for the Census Income dataset. The visualization compares the distributions of a model's predictions for subsets of the dataset created by the features for age, education, and number of hours worked per week.
SliceLens on the Census Income (Adult) dataset. The left sidebar contains the controls for the visualization. The main component is a visualization of the intersections of feature bins. Each square visualizes the label distribution of a subset of the data. The right sidebar is for notes, which lets users document and revisit their findings.
Abstract
SliceLens is a tool for exploring labeled, tabular, machine learning datasets. To explore a dataset, the user selects combinations of features in the dataset that they are interested in. The tool splits those features into bins and then visualizes the label distributions for the subsets of data created by the intersections of the bins. SliceLens guides the user in determining which feature combinations to explore. Guidance is based on a user-selected rating metric, which assigns a score to the subsets created by a given combination of features. The purpose of the metrics are to detect interesting patterns in the subsets, such as subsets that have high label purity or an uneven distribution of errors. SliceLens uses the metrics to guide the user towards combinations of features that create potentially interesting subsets in two ways. First, SliceLens assigns a rating to each feature based on the subsets that would be created by selecting that feature. This incremental guidance can help the user determine which feature to select next. Second, SliceLens can suggest combinations of features ranked according to the chosen metric, which the user can then cycle through.
Materials
PDF | Preprint | DOI | Code | BibTeX
Authors
Citation
Thumbnail image for publication titled: SliceLens: guided exploration of machine learning datasets
SliceLens: guided exploration of machine learning datasets

Daniel Kerrigan and Enrico Bertini. Proceedings of the Workshop on Human-In-the-Loop Data Analytics. 2023. DOI: 10.1145/3597465.3605217

PDF | Preprint | DOI | Code | BibTeX


Khoury Vis Lab — Northeastern University
* West Village H, Room 302, 440 Huntington Ave, Boston, MA 02115, USA
* 100 Fore Street, Portland, ME 04101, USA
* Carnegie Hall, 201, 5000 MacArthur Blvd, Oakland, CA 94613, USA