Skip to content

Offline Wiki Redux

There’s just something incredibly alluring about the concept of holding the sum of human knowledge with you at all times. While near-ubiquitous connectivity alleviates this to a certain extent, the momentary lapses of networking are incredibly corrosive to an information dependent mentality. Wikipedia never ceases to amaze me and, while I’ve tried in the past to encapsulate part of its sheer awesomeness, this marks a much more significant attempt.

The differences start even before the data gets to the application. The preprocessing toolchain was entirely rewritten for a multitude of reasons. First of all, it compresses not the entireity, but rather the most popular subset of the English Wikipedia. Two dumps are distributed at time of writing, the top 1000 articles and the top 300,000 requiring approximately 10MB and 1GB, respectively. While ostensibly, the mere top 300k articles is far too narrow to delve deep into the long tail, the breadth of the meager 1/25th of articles consistently surprises me in its depth. The advantage is that at 1GB, it’s relatively easy to fit into any system. The algorithm which strips extraneous content has been made far more sophisticated than the original series of regular expressions. This enables greater compression and less accidentally omitted content.

On the application end, the application has switched from a GWT-compiled LZMA SDK to a speedy, pure javascript decoder. This makes page loads significantly speedier and allows greater compression ratios, for individual blocks can be made larger (256KB instead of 100KB). It also now uses WebGL Typed Arrays to further speed things up, such as sending data to and from the WebWorker thread.

The interface was redesigned with CSS media queries to dynamically transition between different modes in response to different viewing environments. The interface consists of two regions: the fixed position recessed left panel which holds the page title, a search bar, controls and the page outline. This collapses down to a toolbar header automatically when the screen estate is limited. It uses an Apple-esque noise texture background.

Downloads happen in little units called chunks (they’re half a megabyte for the dump file and about four kilobytes for the index). The local file can be built up out of order. While online, all storage operations check the virtual file, indexed db, or web sql database. If it’s not there, it transparently uses an XMLHttpRequest in order to fulfill the request and caches it to disk in the respective persistence mechanism. A bitset is used to keep track of which chunks are already downloaded and which need to be downloaded.

Posted in Offline Wiki.

Tagged with , , , , , , , , , .

21 Responses

Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.

  1. Paul M. Watson says

    Very cool, thank you. Noticed one bug though; long lists of content don’t scroll (I’m on 1280×800), try the Solar System article ( or see this screenshot;

  2. Tomasz Finc says

    Awesome Job! I’m eager to test it. What are you planning next for it?

  3. SquareWheel says

    Looks awesome. I tried to use it but find Downloading is stuck at 0.0%. Is there a “start” button, or is it automatic? Chrome Beta on Windows.

  4. admin says

    Yeah I couldn’t get that CSS to work so I just left it like that

  5. Robert Pollard says

    Brilliant idea, but I haven’t figured out how to resume downloading, or to start again if settings are changed. Could “Resume Download” and “Start Download” buttons be added on the Settings page. Also, where are the downloaded files stored locally and how are they accessed when one is off-line?

  6. pax says

    Awesome, thanks. I think I was looking for this :)
    1. Would this 1G locally copied dump possibly affect the overall speed of the browser (when not using the offline Wikipedia)?
    2. It would be awesome to also have a similar app for Wikitravel.

  7. Robert Pollard says

    Would be a valuable enhancement to allow selection of Wikipedia language version, for areas and countries – mostly non-English speaking – where near-ubiquitous connectivity is a rarity. Btw, if one has downloaded the entire Wikipedia, can that be readily shared, e.g.on a DVD, so that access to offline Wikipedia doesn’t require downloading a gigabyte?

  8. Confused says

    Please pardon my ignorance, but, I can’t get the download to start. How can I start the download?

  9. Amit A says

    Would you be interested in having some kind of API for this to access it with Javascript.

    I am building a search aggregator and this could come handy.

  10. Josh says

    I see that it says it’s been tested on iOS 5, but for some reason I can’t seem to access any articles.

    I have enough free space for the 13MB version and Safari asked me if I would expand the page’s storage to 25MB, then later 50MB, and I said yes both times, but to no avail.

    This seems really cool, so thanks for putting it together, nonetheless! I’m getting the 1GB version on my Macbook in Chrome as I type this ;)

  11. SquareWheel says

    Oh neat, it does work on my iPod Touch. Unfortunately it does not start downloading in Chrome on Windows.

  12. Anonymous says

    I’ve noticed that mine is missing some words. I’m getting

    “The page was found in the index, but not found in the archive.”

    But I downloaded the large version. Any ideas?

  13. SquareWheel says

    I’ve found this site finally worked for me after moving to Chrome Stable and clearing my profile.

  14. evan says

    hi i love this offline wiki. i want to know on which folder the dump is saved. and can i copy it into my pendrive??

  15. Rohan says

    This is amazing—our debate team uses it for tournaments. Is there a way to add specific articles, etc for the non-technologically inclined? Other than just downloading them to our hard drives I suppose :/

  16. pd says

    can anyone tell me where the offline-wiki dump stored on hard drive?

  17. anthony says

    how i download this for my website i want to have it on my site

  18. Chuck says

    This is really handy; I use it for school projects when i don’t have internet!

    (P.S.: Is there any way this could be portable, like for a flash drive?)

  19. Chuck says

    Hey I just found where the dump is stored! Its in “C:\Users\User_Name\AppData\Local\Google\Chrome\User Data\Default\File System02\p” . Just replace the “User_Name” with your actual user name.
    (this is on Chrome on Windows 7)

  20. Chuck says

    Correction: The Dump is Located in C:\Users\User_Name\AppData\Local\Google\Chrome\User Data\Default\File System02\p

  21. Chuck says

    something is wrong with the posting here the link one more time “”C:\Users\username\AppData\Local\Google\Chrome\User Data\Default\File System02\p0″”
    it does not end with a p. After File system its slash 002 slash p slash 00

Some HTML is OK

or, reply to this post via trackback.