I. The Siege
250 million bot requests per month. A hundred per second on average. Spikes to a thousand per second — operationally indistinguishable from a deliberate denial-of-service attack — at a cost to the operator of roughly a thousand dollars a month.[1] The Noisebridge wiki is up 26% of the time.
Someone tried to load noisebridge.net on their phone before walking over from BART. They got nothing — a timeout, an error page, eighteen years of anarchist experiments in collective space-making simply not there. That is not an outage. That is the operational condition. The cause is not hardware failure. Not misconfiguration. Not technical debt you could pay off with a productive weekend. Bots — AI training crawlers, data-harvesting pipelines, scraping agents of every configuration — hitting the server with a persistence and a scale it was never designed to absorb.
ᛉ
Somewhere in a Reddit thread about the economics of web scraping — a thread full of professionals swapping notes on proxy rotation, cost-per-request optimization, detection evasion, the entire technical culture of extraction laid out in collegial detail — near the bottom, with two upvotes, while the thread moves on: "Are we ze Baddies, Hans."[2]
Not contempt. The complete absence of the category of the other side from the moral imagination of an entire field.
Let the ratio land.
II. How We Got Here — "Information Wants To Be Free"
Stewart Brand said it first, in the form that stuck. Hackers Conference, 1984:[3]
Information wants to be free. Information also wants to be expensive. That tension will not go away.
The second sentence got dropped. Almost everywhere, almost immediately — because the first sentence was the one that meant something, that gave a generation of people building networks a moral framework for what they were doing. Knowledge belongs to everyone. Gatekeeping is the enemy. Route around damage. These convictions built the free software movement, the wiki, the hackerspace. They built Noisebridge. They also, at scale and with capital behind them, built the training pipeline that is currently destroying MediaWiki installations worldwide.
Not a betrayal. The same root system, the same conviction, two incompatible flowers. The bots are not attacking the principle of openness — they are practicing it, enthusiastically and at scale, until openness becomes operationally impossible for the humans trying to sustain it. Brand named the tension in 1984. We dropped the half that would have warned us.
III. Dramatis Personae
The frontier labs. Anthropic, OpenAI, Google, Perplexity. Their crawlers — ClaudeBot, GPTBot, Googlebot — at least announce themselves. They send an identifiable user-agent string. They can be blocked in principle with a robots.txt entry they will, in theory, respect. They have names, addresses, legal entities, public relations departments, and something to lose. Most likely candidates for the settler transformation described later in this piece.
The anonymous training pipelines. The majority of what is actually hitting the server. No face. No name. They impersonate recent versions of Google Chrome so precisely — matching the correct headers, TLS signatures, behavioral patterns — that standard detection fails entirely against them.[1] They route through residential proxy networks, cycling through millions of IP addresses registered to ordinary home internet subscribers — your neighbor's Comcast connection, an AT&T customer in Ohio, a Charter subscriber who has no idea their bandwidth is being used for this. Jonathan Lee of Weird Gloop, who operates some of the largest independent wikis on the internet, says it plainly: "I have no idea who's actually responsible for it, or what they're doing with all the wiki data they're slurping up." There is no door to knock on. No negotiation possible. They cannot become settlers because there is no one there to settle.
The commercial scrapers. Price comparison services. SEO harvesting operations. Competitive intelligence tools. Not here for you specifically. A link pointed here; following links costs nothing; the data might be useful to someone somewhere. Pure bycatch. No settler logic available because there is no relationship to transform.
Cloudflare. Not a scraper. A structural player of a different kind, addressed in Section VII.
᚛ ᛫ ᚜
Genotype varies considerably. Phenotype is identical: 26% uptime.
IV. Lindisfarne — The Structural Logic
June 8, 793 CE. Holy Island, off the northeast coast of what is now Northumbria. The monastery at Lindisfarne: a hundred and fifty years of accumulated manuscript production, scholarship, and community life. No military defenses of any significance. No particular reason, from the monks' perspective, to expect needing them.[4]
The openness was not an oversight. It was the point. Monasteries were not fortresses. Their function was precisely to be places where knowledge accumulated and was available to anyone who came in peace. The idea that someone would choose to destroy a monastery — rather than benefit from it, rather than arrive as a student or a pilgrim or a traveler seeking hospitality — would have seemed, to the monks, like a category error. Who burns a library? Who destroys a gift to the future?
The Vikings were not evil. I want to be careful about this, because the way we use "Viking" as shorthand — destroyer, barbarian, the brute force that comes from outside and tears down what civilization has built — flattens something important. The Norse raiders were rational actors reading the material conditions of a target: a concentration of portable value in a location that was structurally undefended. The monastery, by existing the way it existed, had sent a message it never intended: we do not believe our treasure requires protection. The Vikings read that message and responded accordingly. Not out of contempt for learning. Out of the complete absence of the category of the people on the other side from the strategic imagination of people planning a raid.
ᛟ
The anonymous pipelines are not evil. The commercial scrapers are not evil. The structural logic is identical. The monks don't appear in the Viking sagas — not because the sagas were written by monsters, but because the monks were not the point of the story the Vikings were telling themselves about themselves. The Reddit thread had to scroll to the bottom before a single voice asked whether they were the baddies. Two upvotes. The thread moved on.
V. What The Scriptorium Was Actually For — The Margins Contain Everything
Before the monasteries, before Lindisfarne, before any of the manuscripts — there was the Druidic tradition. Twenty years of memorization. Not passive recollection but structured, ritualized, socially enforced transmission: an entire professional class whose function was to carry knowledge — law, genealogy, cosmology, history, poetry — across generations without writing any of it down.[5] Caesar documented it with something approaching awe — not a culture that had failed to discover the value of preserving knowledge, but one that had solved the transmission problem by other means, well enough to maintain coherent intellectual traditions across centuries.
The solution had one vulnerability, and the Irish knew it: discontinuity. Oral knowledge is robust when the community that carries it is intact. It requires the practitioners to keep gathering, the teachers to keep teaching, the social structures enforcing memorization to keep functioning. Disrupt the community — through violence, displacement, the death of key practitioners, the slow erosion of the will to maintain the ritual — and the knowledge goes with it. You cannot put oral tradition on a boat. It cannot outlive the people who hold it.
When Christianity brought writing to Ireland in the fifth century, the Irish took to it with an enthusiasm that transformed European intellectual history. They weren't discovering that knowledge was worth preserving — they already knew that, viscerally, from a tradition that demanded two decades of a person's life to participate in. What they recognized in writing was a solution to the one specific problem that oral tradition couldn't solve: surviving the dispersal of the people who held it. Writing survives discontinuity. You can put it on a boat.
This is why the Irish monks wrote down the old stories alongside the new ones — the Ulster Cycle, the Táin Bó Cúailnge, the mythological architecture of a pre-Christian Ireland their new faith had nominally replaced.[6] Not despite their Christianity. Alongside it. Because they could not know in advance which part of what they preserved would matter, and a culture that had already paid the price of losing things had learned not to be selective.
This is why the margins are full.[7] Complaints about cold fingers. Jokes. Fragments of oral tradition catching in the white space of a Latin theological text. Old Irish glosses leaking through the margins of manuscripts officially written in a foreign language — the vernacular surviving in the cracks, subordinate on the page, waiting. The margin was not waste space. It was where the culture appeared in its most unguarded form, the voice the scribe couldn't entirely suppress even while faithfully copying someone else's words. The monk placing a note about cold fingers beside a passage from the Gospels trusted that whoever came after would understand the difference between the two things — and that both were worth having.
The Book of Kells is not valuable because it is a copy of the Gospels.[8] Every monastery had a copy of the Gospels. The Book of Kells is valuable because someone — working in a scriptorium on a tidal island that would be raided within a generation — invented a visual language that had never existed before. Interlaced figures. Animals that become letters that become animals. Sacred text and profane ornament made inseparable. Original work produced inside the act of copying — the copying incidental, the invention the thing.
What the bots cannot do — what is structurally outside the extraction model — is distinguish between the Book of Kells and a deletion log and a cold-finger joke. Not because the bots are worthless; because the category of margin does not exist for them. Everything is equivalently valuable, which means everything is processed at the same rate, at the same fraction of a cent per request, with no mechanism for recognizing that the margin is where the community's unguarded voice lives, that the failed build is more honest than the success writeup, that the documented argument matters more than the consensus it eventually produced.
Noisebridge's wiki is mostly margins. Failed builds. Documented arguments. Abandoned tangents. The mythology of the place — who built what, who had the fight about the laser cutter in 2013, what the space looked like before the mezzanine. It looks like noise from outside. It is the marrow from inside.
Destroy the community's confidence in its own medium — make the wiki unreliable enough that people stop trusting it as the place to put things — and you destroy the practice of writing in the margins. Not dramatically, not in a single decision, but in the slow accumulation of moments when the wiki was down and it would get done later and later became never and the thing that should have been recorded wasn't. The oral tradition reasserts itself: knowledge returns to living in the people who were in the room, with all the biological fragility that implies.
The monks' problem was not that the manuscripts might burn. It was that the scriptorium might stop.
VI. The Squishy Center — Personal Reckoning
Here is a precise description of a position: you are simultaneously the janitor — whose mop logs are being harvested by machines while the floor you mopped is inaccessible for eighteen hours — and the person who, at some earlier point, helped build the machines; and the person currently trying to figure out how to lock the door; and right now, in this sentence, the person using one of the machines to write an argument about the harm done by the machines to the floor. This is not a paradox that becomes more comfortable the longer you sit with it.
I backed up the entire Noisebridge wiki. All of it — eighteen years, every page, every category, structured and queryable in my own database, available to me as context for tools I'm building in service of this community. I did this carefully. Rate-limited. Cached aggressively. Once, not repeatedly. By every criterion I would use to distinguish thoughtful archiving from extractive scraping, I stayed on the right side of the line.
And here is what I cannot fully argue myself out of: I am also the person who drew that line, identified the criteria for being on the right side of it, and evaluated my own conduct against those criteria and found it satisfactory. This is a specific kind of reasoning — simultaneously authoring the standard and adjudicating your own compliance — that deserves more scrutiny than a paragraph of reassuring caveats. The line is real, and the difference between what I did and what the anonymous pipeline does this month is genuine. I also notice that it is exactly what everyone making this calculation believes about their own use, and that the aggregate of individually right-side-of-the-line decisions is a significant fraction of how we got here.
This article was written with AI assistance. Claude — made by Anthropic, trained on scraped web data, running at my request — helped develop the historical parallels, research the footnotes, and articulate an argument about harm done by AI training bots to community-maintained infrastructure. The recursion is not hidden and not subtle. Whether using a tool built from the commons to make something is genuinely different from extracting and disappearing — or whether it is the most sophisticated version of the same rationalization, dressed in the language of stewardship — I am not certain it is sufficient.
VII. The Round Tower And Its Discontents — Including The Danegeld
The monks' eventual solution was elegant, and — this is the part worth holding — it did not require them to stop being monks. They built round towers.[9] Tall stone cylinders, sixty to a hundred and twenty feet, single doorway set four to eight feet above ground level, reachable only by a removable wooden ladder. Raid begins: climb up, pull the ladder, wait. Come down when it's over. Carry on copying. The defense was reversible. The openness, when the raid passed, was restored.
It worked because the raids were temporary. The scrapers are not temporary. There is no morning after the raid when the monks come down and find the countryside quiet. Which means the available responses form a spectrum that is, when you look at it directly, genuinely grim.
Do nothing. The siege wins directly. 26% uptime becomes less. The server goes down and stays down. The scriptorium closes, not metaphorically.
Pull the ladder up permanently. Require authentication to read. This is the route Fandom took — bot load dropped, and new contributor activity dropped forty percent alongside it. The pipeline from "person who found the wiki useful" to "person who contributes to the wiki" — the mechanism by which the community reproduces itself — was severed. A vault is not a scriptorium. The manuscripts survive. The monks stop coming. Eventually there are no new manuscripts.
Pay the Danegeld. Route everything through Cloudflare. The raids stop, mostly, for now — in exchange for one American corporation holding your encryption keys, making real-time decisions about which traffic deserves to reach you, and subject to pressure from any government with jurisdiction over a US company. Kipling already wrote this poem, in 1911, watching the British government pay cash to Viking-descended raiders to leave English settlements alone:[10]
And that is called paying the Dane-geld;
But we've proved it again and again,
That if once you have paid him the Dane-geld
You never get rid of the Dane.
The historical payments began in 991 CE under Æthelred, escalated for twenty-five years, and ended when Canute conquered England outright in 1016.[11] The lesson was not ambiguous. The contemporary application is left as an exercise.
ᛉ
The architecture of openness has no built-in answer to the logic of scale. We built the web flat and crawlable, without a door that could be selectively opened — open to humans arriving in good faith, closed to automated consumption at industrial scale. We did this on purpose, because a web that discriminated between users was what we were building against. The flatness was the ideology made concrete. The flatness is now the attack surface.
VIII. How Christianity Won — The Settler Transformation And Its Forcing Function
The Viking Age does not end because the Norse were defeated, or because the moral argument against raiding finally landed. It ends because the raiders became settlers. Dublin. York. Normandy. The Domesday Book commissioned by a king whose great-great-grandfather was a Norse raider granted a parcel of France as payment to stop attacking it.[12] The grandchildren of the men who burned Lindisfarne commissioned illuminated manuscripts of their own — not because they repented, but because they intended to stay, and staying changed what was rational. The settler transformation is not moral improvement. It is redirect: the same competence, a different time horizon. Raiders optimize for this raid. Settlers optimize for the next decade.
ᛜ
The AI companies will not be argued into the settler transformation. The moral case against the siege is real, is being made clearly and publicly, and is not moving the needle. What might actually produce settlers is a forcing function that operates on the logic of extraction itself — specifically, the fact that extraction at scale eventually destroys the thing being extracted.
The name for this, in the context of AI training, is model collapse.[13] When models are trained on data that is itself generated by models — when the training corpus fills up with AI-generated content generated by models trained on previous training data — the outputs degrade in a specific and measurable way. The margins go first. The weird tangents. The cultural specificity. The honest and the original. What remains is confident reproduction of an increasingly narrow center: the mean of the distribution, without the distribution. The anomalies, the contradictions, the things that are true for one specific community and not for any other — these are what the ouroboros loses first, because they look like noise from outside, even when they are the marrow from inside.
At some point — not hypothetically, but in the actual operational future of companies that need to know what is true about the world right now — it will become apparent that the living corpus, the contested and updated and argued-over documentation maintained by actual people with actual stakes, is worth considerably more than the dead archive scraped five years ago and since used to generate the content that has been scraped to generate more content. The companies that burned the scriptorium will need monks who can read margins. The ouroboros cannot produce that. Only living communities can.
The question is whether those communities will still exist when the forcing function arrives.
IX. What Noisebridge Already Has — The Door We Could Build
Wikipedia does something instructive. The Wikimedia Foundation makes the entire Wikipedia database available for bulk download — not a curated sample, not a commercial arrangement, but all of it, in structured XML, updated monthly, free to anyone who wants it.[14] The page that describes this says plainly: please do not use a web crawler to download large numbers of articles; here is exactly what you need instead. This is not generosity in the sentimental sense. It is structural intelligence. A redirect to something better for the scraper's actual purposes than what the scraper came to take, while preserving the conditions under which the living community can continue producing the thing that makes the data valuable in the first place.
Noisebridge already has the backup. I made it. Eighteen years, every page, structured and queryable. The data exists. The question of what to do with it is not a technical question.
The technical proposal, if we wanted to make one: detect scraping behavior on the live wiki — the specific signatures of automated bulk access that Lee and others have documented — and serve a redirect to a bulk database endpoint instead of the requested revision history or edit screen. A sign on the wall: the thing you actually want is over here, in a form that is better for your purposes and costs neither of us anything if you use it.
The frontier labs will probably follow the sign. Bulk download is genuinely more useful for training than page-by-page crawling, and it is already how the sophisticated operations work. The anonymous pipelines, partially — some sophisticated enough to follow a redirect, some not. The commercial scrapers, probably not. Bycatch doesn't read signs.
This is not the round tower. The round tower is for the ones who won't read the sign, and it is necessary, and it is expensive, and it will never fully work, and we will have to keep building it anyway. This is the door built beside the tower — the one that opens outward, that says: if you came here for what this place knows, here is how to take it without burning the place down.
Some raiders will use the door. Those are the ones practicing, without necessarily knowing it, the first step toward becoming settlers. We cannot know in advance which of them will eventually need what only a living community can produce — the margins, the unguarded voice, the honest record of a community actually doing the thing it says it does. We can make the door easy to find.
The scriptorium doesn't close. It empties — one writer at a time, every time the wiki was down and it would get done later and later became never. What remains is the die-hard warrior monks who mostly care about wiki infrastructure and not the dozens of other topics that aren't germane to devops. The Vikings are coming back tomorrow.