Common Crawl maintains a free, open repository of web crawl data that can be used by anyone.
Over 250 billion pages spanning 17 years. Free and open corpus since 2007. Cited in over 10,000 research papers. 3–5 billion new pages added each month.
Organization Type: | Non-profit / charity / foundation |
---|---|
Status: | Active |
Founded: | 2007 |
Open Source: | Yes |
Last Modified: | 11/24/2024 |
Added on: | 4/29/2024 |