Offline Wiki Redux

There’s just something incredibly alluring about the concept of holding the sum of human knowledge with you at all times. While near-ubiquitous connectivity alleviates this to a certain extent, the momentary lapses of networking are incredibly corrosive to an information dependent mentality. Wikipedia never ceases to amaze me and, while I’ve tried in the past to encapsulate part of its sheer awesomeness, this marks a much more significant attempt.

The differences start even before the data gets to the application. The preprocessing toolchain was entirely rewritten for a multitude of reasons. First of all, it compresses not the entireity, but rather the most popular subset of the English Wikipedia. Two dumps are distributed at time of writing, the top 1000 articles and the top 300,000 requiring approximately 10MB and 1GB, respectively. While ostensibly, the mere top 300k articles is far too narrow to delve deep into the long tail, the breadth of the meager 1/25th of articles consistently surprises me in its depth. The advantage is that at 1GB, it’s relatively easy to fit into any system. The algorithm which strips extraneous content has been made far more sophisticated than the original series of regular expressions. This enables greater compression and less accidentally omitted content.

On the application end, the application has switched from a GWT-compiled LZMA SDK to a speedy, pure javascript decoder. This makes page loads significantly speedier and allows greater compression ratios, for individual blocks can be made larger (256KB instead of 100KB). It also now uses WebGL Typed Arrays to further speed things up, such as sending data to and from the WebWorker thread.

The interface was redesigned with CSS media queries to dynamically transition between different modes in response to different viewing environments. The interface consists of two regions: the fixed position recessed left panel which holds the page title, a search bar, controls and the page outline. This collapses down to a toolbar header automatically when the screen estate is limited. It uses an Apple-esque noise texture background.

Downloads happen in little units called chunks (they’re half a megabyte for the dump file and about four kilobytes for the index). The local file can be built up out of order. While online, all storage operations check the virtual file, indexed db, or web sql database. If it’s not there, it transparently uses an XMLHttpRequest in order to fulfill the request and caches it to disk in the respective persistence mechanism. A bitset is used to keep track of which chunks are already downloaded and which need to be downloaded.

http://offline-wiki.googlecode.com/git/app.html

Posted in Offline Wiki.

Tagged with app, chrome, firefox, ipad, javascript, lzma, offline, offline cache, offline wiki, wikipedia.

21 comments

By admin – December 30, 2011

21 Responses

Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.

Paul M. Watson says

Very cool, thank you. Noticed one bug though; long lists of content don’t scroll (I’m on 1280×800), try the Solar System article (http://offline-wiki.googlecode.com/git/app.html?Solar_System) or see this screenshot; http://cl.ly/CyWn

December 30, 2011, 6:34 pm
Tomasz Finc says

Awesome Job! I’m eager to test it. What are you planning next for it?

December 30, 2011, 7:49 pm
SquareWheel says

Looks awesome. I tried to use it but find Downloading is stuck at 0.0%. Is there a “start” button, or is it automatic? Chrome Beta on Windows.

December 31, 2011, 6:40 am
admin says

Yeah I couldn’t get that CSS to work so I just left it like that

December 31, 2011, 9:53 am
Robert Pollard says

Brilliant idea, but I haven’t figured out how to resume downloading, or to start again if settings are changed. Could “Resume Download” and “Start Download” buttons be added on the Settings page. Also, where are the downloaded files stored locally and how are they accessed when one is off-line?

December 31, 2011, 11:50 am
pax says

Awesome, thanks. I think I was looking for this
1. Would this 1G locally copied dump possibly affect the overall speed of the browser (when not using the offline Wikipedia)?
2. It would be awesome to also have a similar app for Wikitravel.

December 31, 2011, 12:06 pm
Robert Pollard says

Would be a valuable enhancement to allow selection of Wikipedia language version, for areas and countries – mostly non-English speaking – where near-ubiquitous connectivity is a rarity. Btw, if one has downloaded the entire Wikipedia, can that be readily shared, e.g.on a DVD, so that access to offline Wikipedia doesn’t require downloading a gigabyte?

December 31, 2011, 12:07 pm
Confused says

Please pardon my ignorance, but, I can’t get the download to start. How can I start the download?

December 31, 2011, 12:28 pm
Amit A says

Would you be interested in having some kind of API for this to access it with Javascript.

I am building a search aggregator and this could come handy.

January 1, 2012, 4:33 am
Josh says

I see that it says it’s been tested on iOS 5, but for some reason I can’t seem to access any articles.

I have enough free space for the 13MB version and Safari asked me if I would expand the page’s storage to 25MB, then later 50MB, and I said yes both times, but to no avail.

This seems really cool, so thanks for putting it together, nonetheless! I’m getting the 1GB version on my Macbook in Chrome as I type this

January 17, 2012, 12:08 pm
SquareWheel says

Oh neat, it does work on my iPod Touch. Unfortunately it does not start downloading in Chrome on Windows.

January 17, 2012, 4:43 pm
Anonymous says

I’ve noticed that mine is missing some words. I’m getting

“The page was found in the index, but not found in the archive.”

But I downloaded the large version. Any ideas?

January 17, 2012, 7:14 pm
SquareWheel says

I’ve found this site finally worked for me after moving to Chrome Stable and clearing my profile.

February 4, 2012, 9:39 am
evan says

hi i love this offline wiki. i want to know on which folder the dump is saved. and can i copy it into my pendrive??

March 27, 2012, 11:24 pm
Rohan says

This is amazing—our debate team uses it for tournaments. Is there a way to add specific articles, etc for the non-technologically inclined? Other than just downloading them to our hard drives I suppose :/

June 5, 2012, 11:43 pm
pd says

can anyone tell me where the offline-wiki dump stored on hard drive?

September 21, 2012, 7:37 am
anthony says

how i download this for my website i want to have it on my site

November 20, 2012, 6:58 pm
Chuck says

This is really handy; I use it for school projects when i don’t have internet!
Thanks!

(P.S.: Is there any way this could be portable, like for a flash drive?)

February 19, 2013, 9:43 am
Chuck says

Hey I just found where the dump is stored! Its in “C:\Users\User_Name\AppData\Local\Google\Chrome\User Data\Default\File System02\p” . Just replace the “User_Name” with your actual user name.
(this is on Chrome on Windows 7)

February 21, 2013, 11:06 am
Chuck says

Correction: The Dump is Located in C:\Users\User_Name\AppData\Local\Google\Chrome\User Data\Default\File System02\p

February 21, 2013, 11:09 am
Chuck says

something is wrong with the posting here the link one more time “”C:\Users\username\AppData\Local\Google\Chrome\User Data\Default\File System02\p0″”
it does not end with a p. After File system its slash 002 slash p slash 00

February 21, 2013, 11:48 am