Filecoin Slingshot Browser

Common Crawl

In this phase, the focus is to store Common Crawl Index data, ranging from November 2020 (CC-MAIN-2020-50) to March 2018 (CC-MAIN-2018-13), as listed in https://index.commoncrawl.org.

Each month's Index File List (parsed from cc-index.paths.gz) is downloaded, packed into tarballs and stored on the filecoin network.

To retrieve a particular month's index files, click the corresponding month and metadata for retrieval will be shown.

Month of Crawl

CC-MAIN-2020-50November 2020 Index
CC-MAIN-2020-45October 2020 Index
CC-MAIN-2020-40September 2020 Index
CC-MAIN-2020-34August 2020 Index
CC-MAIN-2020-29July 2020 Index
CC-MAIN-2020-24May 2020 Index
CC-MAIN-2020-16March 2020 Index
CC-MAIN-2020-10February 2020 Index
CC-MAIN-2020-05January 2020 Index
CC-MAIN-2019-51December 2019 Index
CC-MAIN-2019-47November 2019 Index
CC-MAIN-2019-43October 2019 Index
CC-MAIN-2019-39September 2019 Index
CC-MAIN-2019-35August 2019 Index
CC-MAIN-2019-30July 2019 Index
CC-MAIN-2019-26June 2019 Index
CC-MAIN-2019-22May 2019 Index
CC-MAIN-2019-18April 2019 Index
CC-MAIN-2019-13March 2019 Index
CC-MAIN-2019-09February 2019 Index
CC-MAIN-2019-04January 2019 Index
CC-MAIN-2018-51December 2018 Index
CC-MAIN-2018-47November 2018 Index
CC-MAIN-2018-43October 2018 Index
CC-MAIN-2018-39September 2018 Index
CC-MAIN-2018-34August 2018 Index
CC-MAIN-2018-30July 2018 Index
CC-MAIN-2018-26June 2018 Index
CC-MAIN-2018-22May 2018 Index
CC-MAIN-2018-17April 2018 Index
CC-MAIN-2018-13March 2018 Index
Please select a month from left.
Deal ID Miner ID Payload CID Filename Deal size in bytes Date Curated dataset