![Common Crawl Foundation Profile](https://pbs.twimg.com/profile_images/1668691180157300736/ZojoUCoB_x96.jpg)
Common Crawl Foundation
@CommonCrawl
Followers
8K
Following
558
Statuses
1K
Common Crawl is a non-profit foundation dedicated to the Open Web.
San Francisco, CA
Joined February 2010
RT @pjox13: I’ll be today at the AI Action Summit in Paris, if you’re attending and want to discuss about @CommonCrawl or about open data,…
0
1
0
We are pleased to announce a new release of host-level and domain-level web graphs based on the crawls of November, December 2024 and January 2025. The host-level graph consists of 277.7 million nodes and 2.7 billion edges, and the domain-level graph has 100.8 million nodes and 1.9 billion edges.
0
0
12
RT @MarkusKliegl: We are excited to release Nemotron-CC, our high quality Common Crawl based 6.3 trillion tokens dataset for LLM pretrainin…
0
26
0
RT @MLCommons: Announcing the release of AILuminate, a first-of-its kind benchmark to measure the safety of LLMs. The AILuminate v1.0 bench…
0
9
0
RT @occiglot: 📣Community Call Contribute to LLM pre-training resources in (your) unrepresented language! Please submit any websites in t…
0
10
0