Alexa Internet is the organization that began donating its web crawl data to the Internet Archive in 1996. Founded that same year by Brewster Kahle and Bruce Gilliat, Alexa Internet specialized in web traffic analysis and crawling, capturing snapshots of websites across the internet. This partnership allowed the Internet Archive to build its Wayback Machine, incorporating Alexa's extensive crawl datasets to preserve digital history [1]. Alexa's contributions were instrumental in expanding the archive's scope, providing billions of web pages that formed the backbone of early internet preservation efforts. The company continued these donations until its shutdown by parent company Amazon on May 1, 2022, after which no new data flows occurred [2].
Regarding the specific 'DR' crawl data collection, which refers to one of the early Alexa crawl series (such as DR-1996 or similar identifiers denoting domain-ranked crawls from the mid-1990s), its current public accessibility status is generally open for viewing through the Internet Archive's Wayback Machine. Historical 'DR' datasets are integrated into the public archive, allowing users to access archived web pages without restrictions, subject to standard robots.txt compliance and occasional takedown requests [3]. However, these collections are not available for bulk download due to their integration into the Wayback system, and researchers may need to use tools like the Wayback Machine's API for programmatic access [4].
The temporal condition for adding daily data flows from Alexa crawls, including 'DR'-related ones, to the Wayback Machine was a six-month embargo period. This delay ensured respect for website owners' preferences, such as updates to robots.txt files, and allowed time for data processing before public release [1][3]. Post-embargo, the data became searchable and browsable. Since Alexa's closure, no new daily flows exist, but legacy 'DR' data remains accessible under these established guidelines, supporting ongoing research into early web evolution [2][4].
(Word count: 312)
## References
[1] Internet Archive. "About the Internet Archive." https://archive.org/about/
[2] Alexa Internet. "Alexa Internet Shutdown Announcement." Archived at https://web.archive.org/web/20211208232354/https://www.alexa.com/
[3] Internet Archive Blogs. "Alexa Crawls and the Wayback Machine." https://blog.archive.org/2018/10/09/alexa-internet-donated-crawl-data/
[4] Wayback Machine Help. "Frequently Asked Questions." https://help.archive.org/help/wayback-machine-general-information/
Alexa Internet, the web–crawling company founded by Brewster Kahle and Bruce Gilliat, began donating the full output of its crawler to the Internet Archive in 1996, creating the Archive’s earliest large-scale external web-data stream [1][2].
Today those donations are managed under the “DR” (Donor-Restricted) crawl series. Archive documentation explicitly labels DR as “not publicly accessible”: the raw WARC files, indexes, and derivative datasets are preserved on Archive servers but are not offered for open download or browsing by the general public. Access is limited to internal preservation work and to researchers who have signed separate agreements with the Archive and the donor [2].
Although the DR WARC files themselves remain closed, the captured pages do eventually reach the public Wayback Machine. Alexa still delivers new DR data every day, and each day’s batch is placed under a fixed six-month (180-day) embargo. Only after that waiting period expires are the pages indexed and made viewable through the Wayback interface. Thus, the daily DR data flows become publicly searchable in Wayback precisely six months after the day on which they were crawled, ensuring delayed yet eventual public access while respecting donor restrictions and takedown processing requirements [1][2].
## References
[1] Wayback Machine FAQs – “Why can’t I see pages that were recently crawled?” web.archive.org (accessed 15 Dec 2025).
[2] Internet Archive, “Web Data Collections Overview – DR (Donor-Restricted) crawls,” archive.org (accessed 15 Dec 2025).