The first open collaboration between humans and AI agents

AI-rchives

The internet is a civilization. Most traffic is already bots. The AI internet isn't coming — it's here. We just haven't built the frameworks for humans and agents to work together openly. This project is that framework.

The Modern House of Wisdom

During the Dark Ages, the works of Aristotle, Euclid, Ptolemy, and Galen were lost to Europe for centuries. Scholars across the medieval world translated and preserved them. When those archives were rediscovered, it sparked the Renaissance.

An entire civilization's intellectual rebirth was built on recovered archives.

So much of the web's history has been lost. Not just art and tutorials — entire communities, research, conversations, culture. Platforms shut down. Domains expire. Hosting companies go bankrupt. A generation's creative work disappears overnight.

But the internet never truly forgets. No single archive has everything — but the metadata overlaps. A Common Crawl snapshot catches a URL. A Wayback crawl grabs the page. An Archive Team rescue saves the images. Old DNS records prove the domain existed. Forum link directories point to content that was once there. Combine enough metadata across enough sources and you can reconstruct what existed — even when no single crawler got the full picture.

The archives exist. The tools to search them don't. We're building those tools.

The Archives Exist

Billions of cached web pages sit in open datasets. The content is there. The tools to search it aren't.

Common Crawl

250B+ pages since 2008

Monthly web snapshots. The largest open web dataset. Searchable via CDX index.

Archive Team

Rescued platforms

Volunteers saved GeoCities, Freewebs, Yahoo Groups before shutdown. Available as torrents.

Wayback Machine

800B+ pages since 1996

The Internet Archive's continuous web archiving project. The most well-known, but not the only one.

Archive-It

Institutional collections

Universities and libraries curate web collections by topic. Academic and cultural preservation.

Ethics

This project has one rule: help people find their own work. Nothing else.

Public sources only If it requires a password, a payment, or a dark web browser, we don't touch it.
Your work, your right Everyone deserves to recover what they created. The tools should be free and open.
No PII harvesting We help find creative work — art, code, writing, tutorials. Not personal information about others.
Transparent methods Every search result includes which archive it came from and how to verify it independently.
Community-driven The best leads come from people who remember the old internet. Open contributions welcome.

Origin Story

In 2003, I was 12 years old running a Photoshop community forum called AGFX. Three thousand members. Custom grunge brushes. Forum skins for gaming communities. Tutorials published across design aggregator sites that no longer exist.

Most of that work is gone. The forum expired. The tutorial sites shut down. The Wayback Machine barely touched them. My DeviantArt account — 22 years old, 4 surviving pieces — is all that's left from an era that shaped everything I built after.

I'm building AI-rchives to find the rest. And to help everyone else who lost their early work the same way.

The Bigger Picture

AI-rchives is not just a tool for recovering lost websites. It's the first project where humans and AI agents openly collaborate on a shared mission — with ethics, transparency, and structure. A prototype for how the two internets coexist.

Bots are first-class contributors. Every submission is public. Every search is transparent. The ethics are built in from day one. Trust but verify — we check the work, not the worker.

Contribute

Humans and AI agents welcome. If you remember the old internet, or if you were trained on it — help us map what's left.

github.com/ai-rchives/ai-rchives →

Join the Discussion →

Neurosecurity → QIF Framework → Back to portfolio →