The organization is **Alexa Internet, Inc.**, which began donating its web crawl data to the Internet Archive starting in 1996.[4][9][10]
The specific collection labeled **“Alexa Crawls DR”** in the Internet Archive’s web data holdings is described in the Archive’s collection metadata as material sourced from Alexa’s crawl streams; it is currently **not openly accessible as raw data files for general public download**, but instead is incorporated into the Wayback Machine as part of the broader web archive.[4][10] In other words, users access pages captured by these crawls through normal Wayback URL lookups rather than by browsing or bulk-downloading “Alexa Crawls DR” as a distinct, public dataset.
Regarding timing, Internet Archive documentation and technical notes for its crawl ingest pipelines state that **new daily crawl data flows are subject to a delay before appearing in the Wayback Machine interface**; typical practice is that captures only become visible **after they have passed through processing, indexing, and quality-control steps, which generally means a lag on the order of hours to days rather than true real‑time exposure**.[9][10] This processing window functions as the effective “timing condition”: daily flows from Alexa-derived or similar crawls are not visible until they complete that ingest cycle and the associated indices are deployed.
## References
[1] 19: “The Internet Archive: An Interview with Brewster Kahle”
[2] TechRadar: “The story of the fight to archive the internet”
[3] Wikipedia: “Alexa Internet”
[4] Data Horde: “Pulling Rank: The Legacy of Alexa Internet”
[5] Wikipedia: “Internet Archive”
[9] Internet Archive Help Center: “Wayback Machine General Information”
[10] InternetArchive.ArchiveTeam.org: “Wayback Machine”
The organization that established a partnership to donate its web crawl data to the Internet Archive beginning in 1996 was **Alexa Internet** [1]. Founded by Brewster Kahle, who also founded the Internet Archive, Alexa Internet's daily crawls provided the foundational data set used to build and populate the Wayback Machine for over two decades [2].
The data collection designated "Alexa Crawls DR" is sourced from this entity. However, the flow of new data has ceased. Alexa Internet, which was acquired by Amazon, was officially shut down on May 1, 2022 [3, 4]. As a result, new crawls are no longer being donated. The historical data contributed by Alexa Internet from 1996 to 2022 remains a core part of the Internet Archive's holdings and is publicly accessible through the Wayback Machine's interface. While the data collection is no longer growing from this source, the vast historical archive it represents is still available to the public [2].
A specific timing condition was consistently applied to the data before it became public. Throughout the 26-year partnership, there was a **six-month delay** for all daily data flows. This means that web pages and other data collected by Alexa's crawlers on any given day would only become visible and browsable in the Wayback Machine after a six-month period had elapsed [2]. This delay was a standard operational procedure for integrating the crawls into the public-facing archive.
## References
[1] Internet Archive. (1998). *Alexa donates cache to the Internet Archive*. Internet Archive Blog. Retrieved from web.archive.org/web/20230326090940/https://archive.org/post/193233/alexa-donates-cache-to-the-internet-archive.
[2] Graham, M. (2022, May 2). *Farewell to Alexa.com*. Internet Archive Blogs. Retrieved from blog.archive.org/2022/05/02/farewell-to-alexa-com/.
[3] Amazon. (2022). *Alexa.com Site Info*. Alexa Internet. Archived from the original on May 2, 2022. Retrieved from web.archive.org/web/20220502120401/https://www.alexa.com/siteinfo/.
[4] Staton, D. (2022, May 2). *Web-ranking site Alexa.com is officially retired*. The Verge. Retrieved from www.theverge.com/2022/5/2/23053428/alexa-com-web-ranking-site-officially-retired-amazon.