Embedding visualization toolkit
Professional
Visualization
Tooling
Web-based visualization suite for high-dimensional embedding interpretation — clustering, heatmaps, histograms, archetype analysis.
AstraZeneca · Tooling & Infrastructure
Problem
Unsupervised models produce high-dimensional embeddings that, on their own, are nearly impossible for domain experts to interpret. Without good interactive tooling, the rich structure inside these embeddings stays invisible to the domain experts who need to act on them.
Approach
A web-based visualization suite covering the interpretation paths that come up most often:
- Clustering — HDBSCAN over UMAP-projected embeddings, with interactive parameter selection.
- Heatmaps — across embedding dimensions, with row/column reordering by similarity.
- Histograms — per-dimension distributions, conditioned on cluster or metadata variable.
- Archetype analysis — surfaces “extreme” examples that anchor cluster identity.
Result
The toolkit opened new ways for domain experts to explore structure that would otherwise stay buried in raw embedding outputs. Used by an R&D lab to interpret unsupervised model outputs, including downstream interpretation of the patch-based 3D imaging pipeline and the GNN multiplex IF work.
Stack
Python, HDBSCAN, UMAP, Dash for interactive frontend.