Wikipedia as a Graph Through Human History

Landon Harris
5 min readJan 2, 2020

As a student, Wikipedia happens to be one of the most underrated freely available tools. It’s often the top result for any Google search query, and for good reason: it provides a convenient, concise summary of a selected topic as well as a comprehensive analysis. Coincidentally, the page always includes an excessive number of links, linking to any pages or concepts mentioned (not even necessarily relevant). As a result, there are numerous interesting games that can be played, but the one that fascinated me is what is known as the Wiki Game.

The Wiki Game

The motivation for this project came about while I was working on a report and a friend mentioned a game that can be played on Wikipedia. He explained the game as such: you start from Wikipedia’s Random Article Link, and by only clicking on the links within the body of the article (and subsequent articles), you can reach the Adolf Hitler page within 3 clicks. At first glance, it’s true! (If you’re deliberate and strategic about the links you select).

After being inundated with writing for the last few hours, I decided to take a break and do some research into this subject. I, interestingly, found no data on this so-called game, so I decided to study it myself (because why not!). I call it:

The Wikipedia Network Mapper

This project focuses on the idea that one can reach a particularly common Wikipedia page from any other Wikipedia page. I started this project with the idea of using recursion, but ended up implementing a hybrid between a Breadth-First Search and Depth-First Search algorithm.

Breadth-first search (left) | Depth-first search (right)

To solve this… “Hitler Problem”, the algorithm starts at Wikipedia’s “Random Article” page. It iterates through all links on this page, checking to see if they match the target page title, if not, it starts with the first link on the page and repeats the process. This program will only go to the max depth specified before failing up, causing the algorithm to start searching the page at the next link of the previous level (in the style of a Breadth-First search).

The code

Here’s a simplified snippet of the recursive loop that performs the actual search in the C++ implementation:

bool Finder::find_hitler_recursive(int n, Page *page, string path[]){    
// There are sort of 2 base cases here
if(n > MAX) return false;
if(page->name == goal_page){
path[n+1] = goal_page;
return true;
}
// Breadth search
for(size_t i = 0; i < page->links.size(); i++){
if(page->links[i].get_title() == goal_page){
path[n+1] = page->links[i].get_title();
return true;
}
}

Page* nextPage;
if(n != MAX){
for(size_t i = 0; i < page->links.size(); i++){
nextPage = page->get_sub_page(page->links[i]);
// Depth search
if(find_hitler_recursive(n+1, nextPage, path)){
path[n+1] = page->links[i].get_title();
return true;
}
}
}
return false;
}

The full set of code can be found on the github repository at https://github.com/lharri73/wikipediaProject.

Results

What I found from this was honestly quite surprising (but could be misleading and is addressed later). The raw data from my findings are in this CSV file (It’s large and you’ll have to download it to view it as Github will not be able to show you the preview). Here are some of the highlights:

  • 99.580% of all Wikipedia pages can reach Hitler in at least 3 clicks
  • 96.875% required 3 clicks
  • 2.9490% required 2 clicks
  • 0.1756% required 1 click
  • There were 2,961 unique connections to Hitler

With the raw data, you can start from the first column, go to that Wikipedia page, and find the link in the second column. Note that the CSV file contains the name of the page as it appears in its title, not the link text (the link “English” might point to the “English Language” page, so “English Language” will appear in the csv file, not “English”). Continue this until you get to the target page, in this case, Adolf Hitler.

I decided to go further. Looking at raw data in the form of a spreadsheet can only be so much fun, so the next obvious step was to create a graph. A simple binary tree was not sufficient, so I settled on a network where each node can have n connections, and the size of the node is directly proportional to n. I generated 2 graphs, one of which is a simple image to show the overall architecture of the graph in 2D, and the other is an interactive 3D graph. The interactive graph can be found at https://lharri73.github.io/wikipediaProject/.

A 2D representation of all 56,935 unique paths to the Adolf Hitler path. An interactive version can be found here.

Limitations

Because of the approach this program takes in solving the Hitler problem, there are a couple of limitations to the algorithm as well as nuances that correlate to somewhat of a bias.

  • If the target could be reached from multiple paths, only the first path will be returned (which may not be the shortest path).
  • It takes SIGNIFICANTLY longer to determine that it is not possible to reach the target page. This causes the data to be misleading. Because of this, this algorithm will, by nature, find more true paths than negative results.

Conclusion

First: No, I’m not obsessed with Hitler and this does not reflect my political views, this is merely the findings of a rather interesting set of data.

In actuality, it’s not all that surprising that Adolf Hitler is that prevalent among the Wikipedia domain. The reason it is so easy to reach this page is because of the effect he had on people which ripples throughout history. On the Wiki Game page, there is actually a variation where, rather than reaching the Adolf Hitler page in 3 clicks, the Jesus Christ page is reached in 5 clicks (which is an interesting observation in itself). Overall, I believe that one could reach any historically significant individual, event, or place in the same manner as Adolf Hitler, and maybe even with the same frequency, it would just require further study. At this point, however, I have considered this simple ‘game’ to be thoroughly explored, and enough of my time wasted.

--

--

Landon Harris

Computer Science student at The University Of Tennessee Knoxville.