CommonCrawl Profile Banner
Common Crawl Foundation Profile
Common Crawl Foundation

@CommonCrawl

Followers
8K
Following
558
Statuses
1K

Common Crawl is a non-profit foundation dedicated to the Open Web.

San Francisco, CA
Joined February 2010
Don't wanna be here? Send us removal request.
@CommonCrawl
Common Crawl Foundation
23 hours
RT @pjox13: I’ll be today at the AI Action Summit in Paris, if you’re attending and want to discuss about @CommonCrawl or about open data,…
0
1
0
@CommonCrawl
Common Crawl Foundation
7 days
We’re happy to share our January/February 2025 newsletter with updates and insights from the world of open data and web archiving.
0
2
4
@CommonCrawl
Common Crawl Foundation
9 days
We are pleased to announce a new release of host-level and domain-level web graphs based on the crawls of November, December 2024 and January 2025. The host-level graph consists of 277.7 million nodes and 2.7 billion edges, and the domain-level graph has 100.8 million nodes and 1.9 billion edges.
0
0
12
@CommonCrawl
Common Crawl Foundation
11 days
0
4
9
@CommonCrawl
Common Crawl Foundation
20 days
We are happy to announce cc-downloader, an experimental command-line tool for downloading Common Crawl data via https:
1
3
19
@CommonCrawl
Common Crawl Foundation
21 days
0
1
5
@CommonCrawl
Common Crawl Foundation
2 months
3
18
108
@CommonCrawl
Common Crawl Foundation
2 months
0
4
18
@CommonCrawl
Common Crawl Foundation
2 months
0
0
6
@CommonCrawl
Common Crawl Foundation
2 months
1
1
12
@CommonCrawl
Common Crawl Foundation
2 months
RT @MarkusKliegl: We are excited to release Nemotron-CC, our high quality Common Crawl based 6.3 trillion tokens dataset for LLM pretrainin…
0
26
0
@CommonCrawl
Common Crawl Foundation
2 months
RT @MLCommons: Announcing the release of AILuminate, a first-of-its kind benchmark to measure the safety of LLMs. The AILuminate v1.0 bench…
0
9
0
@CommonCrawl
Common Crawl Foundation
3 months
1
1
6
@CommonCrawl
Common Crawl Foundation
3 months
0
0
5
@CommonCrawl
Common Crawl Foundation
3 months
RT @occiglot: 📣Community Call Contribute to LLM pre-training resources in (your) unrepresented language! Please submit any websites in t…
0
10
0
@CommonCrawl
Common Crawl Foundation
3 months
1
0
5
@CommonCrawl
Common Crawl Foundation
3 months
We're spinning up another account on the friendly skies. Follow us there if you like as well. 🦋
0
1
2
@CommonCrawl
Common Crawl Foundation
3 months
0
3
17
@CommonCrawl
Common Crawl Foundation
3 months
0
0
2
@CommonCrawl
Common Crawl Foundation
3 months
Reflections on Recent Talks at the Turing Institute and UCL Thom Vaughan and Pedro Ortiz Suarez discussed the power of Common Crawl’s open web data in driving research and innovation during two notable presentations last week
0
2
4