You are using an outdated browser.
Please upgrade your browser
and improve your visit to our site.

The Chaotic Wisdom of Wikipedia Paragraphs

Getty

What fascinates me about Wikipedia is not that it exists, per se, even though it is obviously extraordinary that it does. Nor is it the ever-roiling “talk” pages, with their meta-tail-wagging discussions about each and every subject—such as whether the “Socks” page should list notable wearers of socks. It’s not the semi-Byzantine internal policies delineating the editing and administration of pages (Please see “Reinstating a reverted action (‘wheel warring’)”); nor the frustrating news that occasionally comes out about the demographics of its administrators—almost all men, to put it simply. (The ultimate significance of Wikipedia may be that mansplaining exists as a fundamental concept and is quite possibly unstoppable.)

No. These are all excellent matters to ponder, especially given Wikipedia’s global dominance, and I do ponder them, and perhaps you do as well. But what is genuinely most fascinating, at least to me, is the strange way it lets you write encyclopedia pages—the structures that have built up since its founding in 2001. The way that Wikipedia is composed is a good example of what happens when you build something so incredibly simple that anyone can use it, and then everyone does.

The standard web page that most of us are familiar with is defined by markup and built with a series of iconic tags—<p> for a paragraph, <img> for an image, <html></html> for wrapping a whole page. A set of carefully described and defined tags is known as HTML, the Hypertext Markup Language. Much effort was and is spent, by large and august standardization bodies populated with professionals, to make HTML predictable and “parseable.” Wikipedia, however, chose another path. In order to make it as easy as possible for people to create encyclopedia pages, it let them put their content into a simple text box right on the page. No tricky markup to understand. No concerns over parsing or predicting. Anyone could edit.

The wiki-on-the-web idea goes back to 1995, and was invented by a brilliant thinker named Ward Cunningham (wiki means quick in Hawaiian, despite the fact that Cunningham, according to his Wikipedia page, is from Indiana). The wiki was a conscious effort to encapsulate decades of technology-industry thinking about community, attribution, and information-sharing all in one place. When Wikipedia launched, it raised immediate concerns about the sanctity of accreditation—could knowledge be created by amateurs? But its steady rise in utility meant that, in time, nearly everyone made their peace with it—some more happily than others.

The wiki idea was to take a couple of equal signs, a few spare brackets, and blammo: You have a full-on hypertextual network of ideas. Say you wanted to write a headline. Just put an “==” in front of and after the text, and now you have written what publishing professionals like to call “display copy.” Sub-headline? (My editor insists on calling it a “dek.”) That can be accomplished with little more than “===”; a sub-sub (no one says “dek-dek”), by “====” and all the way down to “=======” for the sixth sub-level of headline. If you wanted to make a new page, all you had to do was add double brackets around [[something]] and the wiki software would set up a new blank page, called “Something.” And then you were off and writing.

Except if you wanted to link to something outside Wikipedia. That required more characters, and a little specialized knowledge. If you go to the “Help:Link” page on Wikipedia now, there are dozens of linking rules, and examples like: [{{fullurl:{{{1}}}|action=edit}} {{{1}}}]. Hmm. Parse that.

As Wikipedia grew in stature, its community began to look beyond the limits of mere words and links to images and infoboxes. Culture keeps creeping in. People keep hacking. And they keep pulling in ideas from elsewhere. “We need chessboards,” someone thought, and then after some experimentation, there were chessboards, only now it wasn’t knowledge anyone could have: You needed to know how to code using the right template. Today, you can do just about anything a computer can do inside of Wikipedia pages, using a scripting language called Lua. Wikipedia can execute code, display calendars, show complex mathematical equations, and so forth— but there’s a learning curve.

The software that runs Wikipedia is called MediaWiki (there’s another entity called Wikimedia, for free photos and such, just to keep things interesting). The core text markup language is pretty complex, but when you get to see how they make the lists of templates and extensions, it becomes … remarkable. There are so many templates: the chess one; the ones for making lists of historical populations; others for inserting flags in places where flags are supposed to go; a variant for providing a statistical overview of a bicyclist; another for inserting (where appropriate) a guide to all of the world’s writing systems. All of this is to illustrate the ideas on a page, or make it easier to navigate between pages. Yet despite how advanced it has become, the core unit of meaning in Wikipedia remains the page, organized around a subject, composed of paragraphs organized in sections. Wikipedia is a paragraph-based entity.

It’s via these templates that Wikipedia manages itself. They knit together the great, seemingly endless mass of pages and enrich the encyclopedic nature of the site. When you see a box on a Wikipedia page filled with facts and figures, it’s because a volunteer somewhere (Bangalore, Dubuque, wherever) chose to create a framework for that specific kind of information and decided how it should look and behave. Beyond just the templates are the extensions—870 “stable” extensions to the Wiki software are currently in use by Wikipedia—which allow you to insert timelines. A site called wikiapiary.com tracks how they are used—5.7 billion edits have been made across the wikis, at least the ones it can track. There are likely other wikis unaccounted for. Like the one used by the CIA.

What started as a modest attempt to categorize and describe the world has, improbably, succeeded. As goes Wikipedia, so goes pretty much everything. Today, to learn the things needed to thrive as a Wikipedia editor, beyond writing simple paragraphs or making edits, requires a serious commitment of effort and time. Wikipedia’s most committed volunteers, unpaid though they may be, are professionals. The Wikipedia data, this vast storehouse of knowledge both large and small, free and open to all, remains its own, essentially unapproachable, universe. It’s hard to do anything with it besides read it.

John Lisle

Hard but not intractable. There are ongoing efforts to transform Wikipedia into something a computer can cogitate over— to truly sanctify knowledge. For example, wiki.dbpedia.org extracts controlled databases from the sprawling text of Wikipedia and makes that data available in a way that can be readily parsed and restructured and explored. So you can make a list of all the national anthems of all the world’s landlocked countries and sort them by population size (smallest: Vatican City, “Inno e Marcia Pontificale”; largest: Ethiopia, “Whedefit Gesgeshi Woud Enat Ethiopia”). This conversion of text into actionable data requires an enormous amount of code-labor, but the result is the ability to see the world as a set of interlocking, faceted entities. For some reason that is confounding to me and the rest of my fellow nerds, the greater mass of humanity seem to prefer exploring paragraphs and pictures to searching and sorting. In this particular case, narrative—stories and how they are told— has won out over elegantly structured digital facts. The paragraph turns out to be a very robust technology. Which, when you’re used to computers disrupting the hell out of everything, is kind of surprising. Paragraphs, in the end, are tough little birds. (Pilcrows.)

So paragraphs it is. Wikipedia remains focused on encyclopedia entries that, for all of their singing and dancing and templates, would seem pretty familiar to the Encyclopædia Britannica reader of the mid-1800s. And yes, it’s a mess—ask any computer scientist, and he or she will tell you: All these templates, these rules—this is not the right way to do it, despite how well it has worked. Consistent markup that is easy to parse and manipulate; careful taxonomy controls; S-expressions: That’s what you want. There are better, more manageable ways to build the online encyclopedia of everything. But the wiki way—Scotch tape, glue, plain text—has triumphed, at least when it comes to getting people to write encyclopedia pages.

The Wikipedia markup format reminds me of the process of writing: The ideas and relationships that underpin Wikipedia are a hideous mess; but as the sections take shape, and the images flow in, and the edits begin to accrue, and when presented in the familiar typography of sections with headlines and images and infoboxes, the whole thing—mysteriously, miraculously—begins, finally, to take on the appearance of legitimate knowledge. Peek under the hood, and it’s pretty horrifying. But the words are good enough.

I have in my possession a cheap facsimile edition of the very first, 1771, three-volume edition of the Encyclopædia Britannica (used copies go for between $30 and $50 online; they make a nice gift). Each volume contains roughly the same number of pages, but the first covers only A-B; the second is C-L; and the third is M-Z. There’s no more pure testament to what happens when humans try to capture knowledge than that kind of lexicographic imbalance: They thought they’d be able to get the whole world in, but by the time they got to B, they knew they were in trouble. “The Editors,” reads the preface, “though fully sensible of the propriety of adopting the present plan, were not aware of the length of time necessary for the execution, but engaged to begin publication too early. However, by the remonstrances of the Compilers, the publication was delayed for twelve months. Still time was wanted. But the subscribers pushed the Editors, and they at last persuaded the Compilers to consent to the publication.” Sounds familiar. But what a miracle those volumes are, the seeds of something great. The Encyclopædia Britannica grew into a monument to human knowledge, expanding over the decades into as complete a summary of the universe as you could buy, compiled by experts, sold on television. It was impossible to imagine what could displace it, until it was displaced.