Your friendly data-collecting service in the Tor network

Welcome to CollecTor, your friendly data-collecting service in the Tor network. CollecTor fetches data from various nodes and services in the public Tor network and makes it available to the world. If you're doing research on the Tor network, or if you're developing an application that uses Tor network data, this is your place to start.

What is in the data? #

The Tor network data provided here comes from currently five different sources (each of which is explained in more detail on a separate page):

  1. Relays and directory authorities publish relay descriptors, so that clients can select relays for their paths through the Tor network.
  2. Bridges and the bridge authority publish bridge descriptors that are used by censored clients to connect to the Tor network.
  3. The bridge distribution service BridgeDB publishes bridge pool assignments describing which bridges it has assigned to which distribution pool.
  4. The exit list service TorDNSEL publishes exit lists containing the IP addresses of relays that it found when exiting through them.
  5. The performance measurement service Torperf publishes performance data from making simple HTTP requests over the Tor network.

Where do I get the data? #

We have over 10 years of Tor network data available for download in monthly tarballs. The latest tarballs are updated every few days. So, if you want to fetch data covering an extended period of time, monthly tarballs are for you. Just be careful: these tarballs can decompress to 20 times the compressed size or even more. Monthly tarballs can be browsed and downloaded in the archive/ subdirectory.

If you're only interested in recently published data, we also have data from the last 72 hours available for you. In contrast to monthly tarballs, this data set is updated every hour. If you have already bootstrapped your application with monthly tarballs and want to stay up-to-date, or if you just want to take a peak at the latest data, this is your data set. If you're using special software to download these files, you may want to configure it to accept gzip-compressed data to save us all some bandwidth. The latest 72 hours of data are available in the recent/ subdirectory.

How can I parse the data? #

We developed two parsing libraries, one for Java and one for Python:

If you developed a parsing library for another language and want it to be listed here, please let us know!

What did others do with the data? #

We wrote a couple of applications, and researchers wrote research papers using the Tor network data provided here. The following list is not at all exhaustive:

If you wrote an application or research paper that uses Tor network data and that is not yet listed here, please let us know! Please include a short description what your application does or what your research was about.

How can I get support? #

If you have any questions about the Tor network data provided here, we'd like to hear from you! Of course, suggestions or other feedback are welcome, too.