Loch Prospector: Metadata visualization for lakes of open data

A screenshot from Loch Prospector - a scatterplots representing datasets in a data lake. Points represented closer together indicate datasets with similar features.
LOCH PROSPECTOR visualizes available datasets in Open Data lakes using four linked components. A multidimensional scaling (MDS) plot (1) shows a point for each dataset, organized spatially by similar metadata characteristics. Weights for the MDS algorithm can be tuned for particular types of metadata using the Visualization Configuration Box (2) . Dynamic Filters (4) can be used to explore datasets of interest, with Summary Statistics (3) shown for the currently selected datasets.
Abstract
Data lakes are an emerging storage paradigm that promotes data availability over integration. A prime example are repositories of Open Data which show great promise for transparent data science. Due to the lack of proper integration, Data Lakes may not have a common consistent schema and traditional data management techniques fall short with these repositories. Much recent research has tried to address the new challenges associated with these data lakes. Researchers in this area are mainly interested in the structural properties of the data for developing new algorithms, yet typical Open Data portals offer limited functionality in that respect and instead focus on data semantics. We propose Loch Prospector, a visualization to assist data management researchers in exploring and understanding the most crucial structural aspects of Open Data — in particular, metadata attributes — and the associated task abstraction for their work. Our visualization enables researchers to navigate the contents of data lakes effectively and easily accomplish what were previously laborious tasks. A copy of this paper with all supplemental material is available at osf.io/zkxv9.
Materials
PDF | Preprint | DOI | Homepage | Supplement | Code | Video Preview | Video Presentation | BibTeX
Authors
Neha Makhija
Nikolaos Tziavelis
Laura Di Rocco
Citation
Thumbnail image for publication titled: Loch Prospector: Metadata visualization for lakes of open data
Loch Prospector: Metadata visualization for lakes of open data

Neha Makhija, Mansi Jain, Nikolaos Tziavelis, Laura Di Rocco, Sara Di Bartolomeo, and Cody Dunne. Proc. IEEE Visualization Conference—VIS. 2020. DOI: 10.1109/VIS47514.2020.00032

PDF | Preprint | DOI | Homepage | Supplement | Code | Video Preview | Video Presentation | BibTeX


Khoury Vis Lab — Northeastern University
* West Village H, Room 302, 440 Huntington Ave, Boston, MA 02115, USA
* 100 Fore Street, Portland, ME 04101, USA
* Carnegie Hall, 201, 5000 MacArthur Blvd, Oakland, CA 94613, USA