There’s just something incredibly alluring about the concept of holding the sum of human knowledge with you at all times. While near-ubiquitous connectivity alleviates this to a certain extent, the momentary lapses of networking are incredibly corrosive to an information dependent mentality. Wikipedia never ceases to amaze me and, while I’ve tried in the past to encapsulate part of its sheer awesomeness, this marks a much more significant attempt.
The differences start even before the data gets to the application. The preprocessing toolchain was entirely rewritten for a multitude of reasons. First of all, it compresses not the entireity, but rather the most popular subset of the English Wikipedia. Two dumps are distributed at time of writing, the top 1000 articles and the top 300,000 requiring approximately 10MB and 1GB, respectively. While ostensibly, the mere top 300k articles is far too narrow to delve deep into the long tail, the breadth of the meager 1/25th of articles consistently surprises me in its depth. The advantage is that at 1GB, it’s relatively easy to fit into any system. The algorithm which strips extraneous content has been made far more sophisticated than the original series of regular expressions. This enables greater compression and less accidentally omitted content.
On the application end, the application has switched from a GWT-compiled LZMA SDK to a speedy, pure javascript decoder. This makes page loads significantly speedier and allows greater compression ratios, for individual blocks can be made larger (256KB instead of 100KB). It also now uses WebGL Typed Arrays to further speed things up, such as sending data to and from the WebWorker thread.
The interface was redesigned with CSS media queries to dynamically transition between different modes in response to different viewing environments. The interface consists of two regions: the fixed position recessed left panel which holds the page title, a search bar, controls and the page outline. This collapses down to a toolbar header automatically when the screen estate is limited. It uses an Apple-esque noise texture background.
Downloads happen in little units called chunks (they’re half a megabyte for the dump file and about four kilobytes for the index). The local file can be built up out of order. While online, all storage operations check the virtual file, indexed db, or web sql database. If it’s not there, it transparently uses an XMLHttpRequest in order to fulfill the request and caches it to disk in the respective persistence mechanism. A bitset is used to keep track of which chunks are already downloaded and which need to be downloaded.
Very cool, thank you. Noticed one bug though; long lists of content don’t scroll (I’m on 1280×800), try the Solar System article (http://offline-wiki.googlecode.com/git/app.html?Solar_System) or see this screenshot; http://cl.ly/CyWn
Awesome Job! I’m eager to test it. What are you planning next for it?
Looks awesome. I tried to use it but find Downloading is stuck at 0.0%. Is there a “start” button, or is it automatic? Chrome Beta on Windows.
Yeah I couldn’t get that CSS to work so I just left it like that
Brilliant idea, but I haven’t figured out how to resume downloading, or to start again if settings are changed. Could “Resume Download” and “Start Download” buttons be added on the Settings page. Also, where are the downloaded files stored locally and how are they accessed when one is off-line?
Awesome, thanks. I think I was looking for this
1. Would this 1G locally copied dump possibly affect the overall speed of the browser (when not using the offline Wikipedia)?
2. It would be awesome to also have a similar app for Wikitravel.
Would be a valuable enhancement to allow selection of Wikipedia language version, for areas and countries – mostly non-English speaking – where near-ubiquitous connectivity is a rarity. Btw, if one has downloaded the entire Wikipedia, can that be readily shared, e.g.on a DVD, so that access to offline Wikipedia doesn’t require downloading a gigabyte?
Please pardon my ignorance, but, I can’t get the download to start. How can I start the download?
Would you be interested in having some kind of API for this to access it with Javascript.
I am building a search aggregator and this could come handy.
I see that it says it’s been tested on iOS 5, but for some reason I can’t seem to access any articles.
I have enough free space for the 13MB version and Safari asked me if I would expand the page’s storage to 25MB, then later 50MB, and I said yes both times, but to no avail.
This seems really cool, so thanks for putting it together, nonetheless! I’m getting the 1GB version on my Macbook in Chrome as I type this
Oh neat, it does work on my iPod Touch. Unfortunately it does not start downloading in Chrome on Windows.
I’ve noticed that mine is missing some words. I’m getting
“The page was found in the index, but not found in the archive.”
But I downloaded the large version. Any ideas?
I’ve found this site finally worked for me after moving to Chrome Stable and clearing my profile.
hi i love this offline wiki. i want to know on which folder the dump is saved. and can i copy it into my pendrive??
This is amazing—our debate team uses it for tournaments. Is there a way to add specific articles, etc for the non-technologically inclined? Other than just downloading them to our hard drives I suppose :/
can anyone tell me where the offline-wiki dump stored on hard drive?
how i download this for my website i want to have it on my site
This is really handy; I use it for school projects when i don’t have internet!
Thanks!
(P.S.: Is there any way this could be portable, like for a flash drive?)
Hey I just found where the dump is stored! Its in “C:\Users\User_Name\AppData\Local\Google\Chrome\User Data\Default\File System02\p” . Just replace the “User_Name” with your actual user name.
(this is on Chrome on Windows 7)
Correction: The Dump is Located in C:\Users\User_Name\AppData\Local\Google\Chrome\User Data\Default\File System02\p
something is wrong with the posting here the link one more time “”C:\Users\username\AppData\Local\Google\Chrome\User Data\Default\File System02\p0″”
it does not end with a p. After File system its slash 002 slash p slash 00