Embedding visualization toolkit

Professional
Visualization
Tooling
Web-based visualization suite for high-dimensional embedding interpretation — clustering, heatmaps, histograms, archetype analysis.
Published

January 1, 2022

AstraZeneca · Tooling & Infrastructure

Problem

Unsupervised models produce high-dimensional embeddings that, on their own, are nearly impossible for domain experts to interpret. Without good interactive tooling, the rich structure inside these embeddings stays invisible to the domain experts who need to act on them.

Approach

A web-based visualization suite covering the interpretation paths that come up most often:

  • Clustering — HDBSCAN over UMAP-projected embeddings, with interactive parameter selection.
  • Heatmaps — across embedding dimensions, with row/column reordering by similarity.
  • Histograms — per-dimension distributions, conditioned on cluster or metadata variable.
  • Archetype analysis — surfaces “extreme” examples that anchor cluster identity.

Result

The toolkit opened new ways for domain experts to explore structure that would otherwise stay buried in raw embedding outputs. Used by an R&D lab to interpret unsupervised model outputs, including downstream interpretation of the patch-based 3D imaging pipeline and the GNN multiplex IF work.

Stack

Python, HDBSCAN, UMAP, Dash for interactive frontend.