Everyone working with LLM Datasets should check out
@lilac_ai
's data platform.
Embeds your dataset, helps with classifying, clustering, modifying, getting insights, and a lot more. Runs locally or hosted too, even the gpu poor can use it!
Their clustering helped determine a lot