somewhere to talk about random ideas and projects like everyone else

stuff

August 2012 Archive

Whammy: A Real Time Javascript WebM Encoder 19 August 2012

This is sort of a conceptual reversal (or not, this might just be making the description needlessly confusing) of one of my older projects,Weppy. First, what Weppy did was it added support for WebP in browsers which didn’t support it by converting it into a single-frame video. This is instead predicated on the assumption that the browser already has support for WebP (at this point, it means it only works on Chrome since it’s the only browser which actually supports WebP), not only decoding WebP but encoding it as well.

The cool thing about WebP which was exploited in Weppy is that it’s actually based on the same codec as WebM, On2’s VP8. That means the actual image data, when the container formats are ignored, are virtually interchangable. With a catch: it’s intraframe only.

So it’s a video encoder in that it generates .webm files which should play in just about any program or device which supports the WebM format. But interframe compression is actually a fairly important thing which could reduce the file size by an order of magnitude or more.

But, there isn’t too much you can do on the client side in the ways of encoding stuff. And whatever you do, you basically can’t do interframe compression (aside from some really rudimentary delta encoding). More or less, when your only alternative is to maintain an array of DataURL encoded frames or encoding it (rather slowly) as a GIF, a fast but inefficient WebM encoder stops looking too bad.

This was actually Kevin Geng‘s idea, and he contributed some code too, but in the end most of the code was just leftovers from Weppy.

Demo

http://antimatter15.github.com/whammy/clock.html

Basic Usage

First, let’s include the JS file. It’s self contained and basically namespaced, which is pretty good I guess. And it’s not too big, minified it’s only about 4KB and gzipped, it’s under 2KB. That’s like really really tiny.

<script src="whammy.js"></script>

The API isn’t terrible either (at least, that’s what I’d like to hope)

var encoder = new Whammy.Video(15); 

That 15 over there is the frame rate. There’s a way to set the individual duration of each frame manually, but you can look in the code for that.

encoder.add(context or canvas or dataURL); 

Here, you can add a frame, this happens fairly quickly because basically all it’s doing is running .toDataURL() on the canvas (which isn’t exactly a speed-demon either, but it’s acceptable enough most of the time) and plopping the result onto an array (no computation or anything). The actual encoding only happens when you call .compile()

var output = encoder.compile(); 

Here, output is set to a Blob. In order to get a nice URL which you can use to stick in a &lt;video&gt; element, you need to send it over tocreateObjectURL

var url = (window.webkitURL || window.URL).createObjectURL(output); 

And you’re done. Awesome.

Documentation

Weppy.fromImageArray(image[], fps) this is a simple function that takes a list of DataURL encoded frames and returns a WebM video. Note that the images have to all be encoded with WebP.

new Weppy.Video(optional fps, optional quality) this is the constructor for the main API. quality only applies if you’re sending in contexts or canvas objects and doesn’t matter if you’re sending in encoded stuff

.add(canvas or context or dataURL, optional duration) if fps isn’t specified in the constructor, you can stick a duration (in milliseconds) here.

Todo

This pretty much works as well as it possibly could at this point. Maybe one day it should support WebWorkers or something, but unlike the GIF Encoder, it doesn’t actually require much real computation. So doing that probably wouldn’t net any performance benefits, especially since it can stitch together a 120-frame animation in like 20 milliseconds already.

But one of the sad things about it is that now it uses Blobs instead of strings, which is great and all except that blobs are actually slower than strings because it still has to do the DataURL conversion from string to Blob. That’s pretty lame. Firefox supports the canvas toBlob thing, but for some reason Chrome doesn’t, but eventually it probably might, and that might be useful to add.

Also, if someone ever makes a Javascript Vorbis encoder, it would be nice to integrate that in, since this currently only does the video part, but audio’s also a pretty big part.


Upcoming Changes 18 August 2012

This post has been hinted at by the past few blog posts, but I guess eventually it has to be written. But the basic gist is that rather than making this the home of random announcements of mostly finished projects, it’ll be the home of mostly daily (or weekly, whenever significant progress is made) and probably shorter updates on the progress certain projects. That is, the blog is transitioning back into something more like the olden days (circa 2008-ish) but without falling into the trap of using this as an alternative to having commit messages and still supporting the fact I’m now working on quite a bit more than one project at a time.

The problem is that I can’t exactly stay true to that because I actually have quite a bit of backlog in terms of stuff I have to write about, stuff which is for the most part done (so it’s not particularly viable for me to make up progress updates retroactively, and I’ll probably have to stick with writing a big blog post about it).

This should be the culmination of tons of factors and trends building up for the past year or so. I’ve always felt that the blog needed to be overhauled eventually (or end up rotting as nothing more than a backup kept in the eternal resting spot which is the Internet Archive, leaching fluids into the soil as bacterium leave the corpse punctured by holes and missing vital organs, a sure sign that I’m probably going too far into this metaphor, but in the end that’s the way many of the forums I used to visit have become). But the real spark came in the form of a migration to a new web host, something which I still alas have yet to blog about despite it happening over a week ago.

Those changes are hardly precipitous (however much anyone wants to unveil something in one flash of an instant in order to feign the appearance that everything happened suddenly and approached new heights of grandeur, that never actually happens, and it’s simply harder to work in that sort of manner - slow and steady doesn’t always lose the race). The first part was the change of the web host itself which was actually not exactly planned (I was testing out it, and unexpectedly on a whim cancelled my old web host and migrated over over the course of an afternoon and left the site down for a few hours). The second front at which this evolution occurred was a slight redesign, changing the color scheme a bit, upgrading the theme, reorganizing the categories and menus (this is meant to be chronicled in detail in some other blog post which I have yet to written). And the third and last one (which was meant to be the topic of this blog post) is a change in content.

In summary, three inevitable changes on three different fronts. Content, Frontend, and Backend. All in a not-so-grand gesture to save this blog from decaying into a moldy blob of feces on the internet’s great sidewalk.


Meta Analytics 17 August 2012

I’ve been maintaining this blog, or at least the content inside it for about five years now. It’s been through a handful of incarnations, often paired with significant changes in web hosting. I’ve had a blog for a little bit longer, but I don’t think I have the medium figured out. The structure of the posts and the style has changed over the past few years, but I can’t at this point call it evolution, a positive progression. Part of the power which lies in analyzing data is the ability to realize patterns, often at a different scale from human observation (spans of months or years) which are equally if not more insightful.

That’s been my personal attraction to data science. I’ve had a couple of personal experiments involving collecting data about my daily activities, my old writing and code in hopes of distilling the changes that I’m too conceited to admit without the infallible hand of statistics. For nearly two years now, I’ve logged my entire life within precision of approximately 30 minutes from Google Calendar (or the Calendar app on iPad which syncs to Google Calendar). Actually, the label is slightly off, I quite often dedicate large spans of time to more or less useless labels like “not productive”. But this temporal information falls apart in terms of its richness, for my schedule is dictated more so by the mandatory rhythms of school life than the drifting cadence of other behavior.

But I digress. This isn’t about why I collect data so much as “I have this data, now what?”. In this case, I had a hypothesis, a rather simple albeit morbid one at that “my blog is dying”. It’s not hard to see how I’m coming at the conclusion. I’m pretty much struggling at this point to meet my goal of one post per month (itself not a particularly difficult goal, but as time has gone on and my posts have become more infrequent, I feel more compelled to write obscenely long posts to compensate, but of course this also leads to big posts sitting there unfinished for long durations losing the sort of one post = one sitting mentality). But before I ramble for too long, I’ll cut to the chase and answer the question posed at the beginning of this paragraph: “Graphs.” (you could imagine those haunting glyphs levitating in the midst of air caught in the invisible grasp of Giorgio A. Tsoukalos, or better yet, I can spare your cognitive abilities by making it real)

Here’s a pretty little graph I made in R (sorry for the mess on the horizontal axis, and I just realized I have no idea as to how to interpret the dates, I’m assuming that they’re linear and it’s just some odd aliasing issue that makes even-numbered years repeat twice), it’s a histogram of the dates of posts that I’ve made to this blog (extracted with a simple Python script and Wordpress’s built-in Export button).You can probably actually tell that the blog’s demise is quite a long way’s coming. Every annual peak ends up shallower the following year and the first time gaps have actually existed was this fateful year, 2012.

It’s actually sort of interesting that these peaks exist, but I can’t really tell during what months that happened during (since these axes are labeled so terribly, it’d be nice if I knew some nice interactive graph engine that worked with histograms, something like that cool time series viewer that Google had for Finance for like ever but for histograms, but I guess that just shows how much of a non-scientist I am, to have no idea how to fluently articulate in a statistical or graphical language of my choice).

For more graph fun, here’s a scatter plot of word lengths as a function of year. I wasn’t dedicated enough to figure out how to get NLTK to tell me the Gunning-Fog, Flesch-Kincaid or ARI value for individual posts, and I doubt that would end up showing anything particularly insightful. But yeah, so here it is. Charts. Charts of words. Note that thing that sticks out clocking in at around 3724 words is my first Music Alpha post.

Actually, I won’t mind that Wordpress isn’t yet self aware (‘ello Skynet) and still sends trackbacks and pings (whatever they are) to me when I link to myself. Seriously, you don’t actually need to have a self-aware artificial intelligence in order to learn how to not spam me with emails when I’m quite probably as in super definitely aware of its existence. But anyway, I guess I’ll stomach the lurching pain of a thousand emails (I’m using hyperbole here, in case your rudimentary artificial intelligence algorithms can’t quite distinguish them, but I’m also pretty sure your algorithms wouldn’t be able to handle n-th degrees of meta, so this excruciatingly useless parenthetical wouldn’t be much other than that: excruciatingly useless) and post the last part of the list here.

1340133957.0 , 2012-06-19 19:25:57 , 1178 [http://antimatter15.com/wp/2012/06/pinball/](http://antimatter15.com/wp/2012/06/pinball/)

1333025085.0 , 2012-03-29 12:44:45 , 1302 [http://antimatter15.com/wp/2012/03/musicalpha-v2-0/](http://antimatter15.com/wp/2012/03/musicalpha-v2-0/)

1293394934.0 , 2010-12-26 20:22:14 , 1409 [http://antimatter15.com/wp/2010/12/drag2up-v2-drag-and-drop-uploading-for-all-sites/](http://antimatter15.com/wp/2010/12/drag2up-v2-drag-and-drop-uploading-for-all-sites/)

1317686582.0 , 2011-10-04 00:03:02 , 1565 [Haven't actually published this yet, hmm]

1341591648.0 , 2012-07-06 16:20:48 , 2117 [http://antimatter15.com/wp/2012/07/cloudfall-a-text-editor/](http://antimatter15.com/wp/2012/07/cloudfall-a-text-editor/)

1307064165.0 , 2011-06-03 01:22:45 , 2180 [http://antimatter15.com/wp/2011/06/why-the-chrome-web-store-is-bad-for-the-web/](http://antimatter15.com/wp/2011/06/why-the-chrome-web-store-is-bad-for-the-web/)

1277922545.0 , 2010-06-30 18:29:05 , 2319 [http://antimatter15.com/wp/2010/06/wave-embed-api/](http://antimatter15.com/wp/2010/06/wave-embed-api/)

1294958307.0 , 2011-01-13 22:38:27 , 2762 [http://antimatter15.com/wp/2011/01/the-ambiguity-of-open-and-vp8-vs-h-264/](http://antimatter15.com/wp/2011/01/the-ambiguity-of-open-and-vp8-vs-h-264/)

1308832860.0 , 2011-06-23 12:41:00 , 2872 [http://antimatter15.com/wp/2011/06/samsung-series-5-chromebook/](http://antimatter15.com/wp/2011/06/samsung-series-5-chromebook/)

1305426252.0 , 2011-05-15 02:24:12 , 3724 [http://antimatter15.com/wp/2011/05/uploading-mp3s-to-google-music-beta-from-linux-chrome-os-win-and-mac/](http://antimatter15.com/wp/2011/05/uploading-mp3s-to-google-music-beta-from-linux-chrome-os-win-and-mac/)

That list was compiled by the command cat blogtimes.csv | sort -t',' -k3n | tail, and that’s quite an accomplishment because I had to look up the arguments for the sort command in order to figure that out. Of course, blogtimes.csv is the output of my magical six line python script (which uses BeautifulSoup to extract all the wp:post_dates).

So, with 10 blog posts in that list, every single 8 of them happened after 2011 and 3 of them happened in 2012. Considering that there were 10 things published in 2012 (according to my dataset) and 21 in 2011, that’s a rather significant fraction of the stuff which has been written recently to be insanely long.

Wordpress tells me this post is now at 948 words, so I guess I’ll add a bit of concluding at the end to push it over the magical power-of-ten barrier, so presumably you should brace for the terrible boom which occurs at this point (oh, what’s that? I think that’s my imaginary telephone operator who informs me when I make a factual error, apparently those kinds of booms only happen with waves, and apparently words flowing through word count orders of magnitude don’t count).

The original title of this post was “Meta Analytics & Upcoming Changes”, but in the spirit of the upcoming changes, I’ve moved the “Upcoming Changes” part into its own post (tentatively titled “Upcoming Changes”). You can probably at this point guess that “Upcoming Changes” involves something to tackle the excessive verbosity and to mitigate the absurdly infrequent posts. This probably doesn’t sound nearly as heroic to you as it does to me, because I’m listening to The Avengers soundtrack right now, and “A Promise” is pretty dramatic.


Swipe Gesture 2 Development 16 August 2012

So I’m trying something new, returning to quasi-daily somewhat short updates about the development of whatever I’m working on rather than withholding everything until something of somewhat acceptable release quality is achieved. I have a blog post about that transition, but I’m still working on it (as in, writing it is somewhat boring). It’s probably better given my development cycle is quite nonlinear, usually I get something somewhat promising made in the first few days or so and pause for long and possibly indefinite durations doing other stuff in the process. Probably, writing short blog posts about what I have yet to finish will remind me to, well, finish them. Just maybe. But I’m probably going to have to preface every post that I write with this kind of disclaimer until I actually get that post finished and published so I have something to reference rather than pointing crazily into the air and saying “oh yeah, it’s coming, now, someday, maybe.”.

Starting about yesterday, I started working on the successor to Swipe Gesture. The new version tries to mimic the actual behavior of Chrome on Lion, which I think is really quite cool. Here’s a video I found on YouTube which shows how it basically looks like if you aren’t familiar with it. The first thing to notice that it’s substantially less trivial, code-wise. No more is it a 30-line software lightweight, but it’s not _too _complex and arcane to forbid any kind of comprehension. Now, the simple prototype of its functionality is already nearing 300 lines of code.

Another big difference is now it’s no longer designed strictly for Chromebooks. In fact, one of the reasons for starting this was that I was informed that the kind of functionality might be useful on Macbooks running Windows via Bootcamp. In fact, it’s meant to be as general as possible, to work on pretty much any kind of platform. And it’s not even bound strictly to the horizontal axis: the code is meant to work with linear swipes in any direction including diagonally (although some experimentation on my chromebook seems to indicate that swiping at angles isn’t terribly useful).

The most significant conceptual change is the transition between a speed/acceleration metric to a distance metric. That is, in the old version, an action was triggered when there was a swipe in one direction vigorous enough to be considered. This was a fairly simple way to avoid the problem of distinguishing between a horizontal scroll action and a swipe by not making a distinction. In a sense, cheating. The new version instead does things “the right way™” by observing events carefully to determine if a swiping action actually results in scrolling. If that’s your kind of thing, the technical nitty gritty details have their own dedicated blog post, so feel free to click through if you’re interested.

Once it’s determined that that scroll thing is actually probably a swipe gesture, it renders a nice little arrow in canvas. I considered using a unicode arrow and setting the font to huge, but that didn’t turn out quite as well as I expected (plus, it makes rotations and interactions with the embedding page CSS a little less predictable).

Also another thing is that it turns out that it’s a bad idea to set a css transition on something which is meant to hook with mouse or scroll movements because, while this ends up smoothing things out (which is good for mouse wheels because they click to the nearest 120 magical click units) it ends up producing a significant amount of lag and just feels so awkward.

Another thing (since this post is written over the course of several days, and the actual update has already been published at time of writing) is the cool redesign of the Settings page. The first thing to notice is that the settings page for once actually has settings, which is quite an accomplishment by itself. Also, it has a visual refresh that makes it look somewhat bootstrap-esque. That’s because ever since using Bootstrap in the making of Protobowl (a rather big project that I have yet to blog about), I’ve pretty much fallen in love with the color whiteSmoke. Partly because it has a name, which means I don’t have to google it or tattoo it on my arm for a mnemonic’s sake, and also because it’s a pretty nice color.


Raspberry Pi 14 August 2012

As part of the shift between long multi-kiloword blog posts which are somewhat more like press releases back into a sort of more personal (i.e. blog-esque) format, I guess I’ll talk about my newly-arrived Raspberry Pi. Right now, there isn’t terribly much to talk about since I’ve only had it for about two weeks.

I’ve been planning on getting a Raspberry Pi for a pretty long time, and I was actually pretty excited about that. For the weeks preceding the official announcement, I built a tiny script which ran in a ten-minutely cron job which would basically download the purported Raspberry Pi store (raspberrypi.com, note the dot-com rather than dot-org, which their homepage is situated at) and compare the hash, notifying me via Ubuntu’s built in notification system.

On that sleepless night when the actual pre-sale announcement was being made, I was incessantly checking raspberrypi.com, which had suddenly morphed into a server maintenance message (which remains to this very day). The anticipation was intense, and some twenty minutes after it was supposed to happen was when I realized that the whole time I had been checking the wrong page. The announcement came instead on raspberrypi.org, their main blog, and by that time, it was certain that all the distributor’s sites were already collapsing under the crushing load of a million souls crying out for a taste of berry-scented silicon pastry.

On the next day, I checked the sites again, and all the order pages were already closed. Either way, it wasn’t terribly useful for me because most of them didn’t support Paypal. Fast forward a veritable eternity, on June 16th, I was notified via email that RS Components that I would be allowed to order the device some time in the near future. Sure enough, on the 22nd, another email gave me a link to the order form, which I promptly filled out and I began the process of waiting. Not really, since I had other stuff to do and most of my interest had already vaporized at the daunting 7 weeks it was supposed to take.

Another eternity later, it arrived in some rather nice packaging. It actually came as a bit of a surprise, because I had become so accustomed to waiting that I had never really expected it to materialize so suddenly. But when it did, it was everything I imagined and more. It came in this rather nice cardboard box, which I eventually cut in half with an X-Acto knife (which nowadays, I use for all my paper splicing needs) to build a makeshift case. I fumbled around in a closet and found a neglected 16GB SD card (probably back from the era when point-and-shoots were actually preferable to mobile phones) and installed that weird Debian distro (after having a little internal debate on what to install). But the first thing I had done was plugged it into a monitor through a HDMI-to-DVI converter. I took the charger from my Galaxy Nexus (I wasn’t using it for anything since I charge it in my room from my HP Touchpad Charger, and my Touchpad idly draining power from a cool inductive stand, the standardization of chargers is really pretty awesome), and used that as my Pi’s permanent power supply.

I also had a 2000mAh LiPo battery which I was going to use with my Arduino LilyPad for some cool foot-operated telegraph which I wanted to use as essentially a UPS for the Pi, but a bit of googling reveals that that might possibly entail actual electronicswork, so maybe that’s something for later.

I turned it on, and lo and behold it didn’t work. I actually never quite figured out why. Then, I tried plugging it into a really old 13 megaton CRT TV, which makes me realize how it’s sort of weird that the unit of megatons is hardly ever used for things other than atomic weapons, and now it feels oddly inappropriate for a hyperbole for the mass of a TV, but maybe it’s actually sort of appropriate because CRTs are terrifying. So analog seemed to work, except for this problem where my keyboard would keep repeating letters and not working well. That wasn’t a good start.

But after a little googling from my Chromebook, it turns out the keyboard issues came from the fact that I had plugged in my only spare USB keyboard which happened to be a Logitech Mouse+Keyboard+Speaker thing and my teensy Galaxy Nexus charger couldn’t eke out enough watts to power it. And the issue with the HDMI-to-DVI thing was just because I needed to restart with the cable plugged in. But neither of them posed a real material issue because I had been intending to use it as a headless rig from the start.

The first thing I really noticed was how surprisingly easy it was to install things. I had expected the ARM repositories to basically lack everything which might be useful, but it turns out that actually almost everything I wanted was available. I didn’t dare compiling anything, but Node (albeit a somewhat old version) was available from the repos, so I never really needed to. I had to manually update to a new version of npm, but that wasn’t that bad. I set up forever to run a few apps, but not much.

One of the main reasons I could justify getting the Raspberry Pi however, was to run my Facebook logging script on something other than my main computer, and aside from getting confused trying to use sendxmpp, it was fairly straightforward.


Etch-A-Sketch on a Trackpad 14 August 2012

So I was testing the responsiveness of the Samsung Series 5 Chromebook touchpad and made a short little script which basically continuously adds the mousewheel changes and draws them using canvas. But it appears that the trackpad does some kind of fitting to make things tend to be at right angles, though it’s not hard to get something diagonal or even somewhat circular.

http://antimatter15.com/misc/experiments/swipe-gesture/responsive.html


Determining if a Mousewheel Event Results in Scroll 14 August 2012

So here’s a somewhat technical post, actually it’s pretty technical. But either way the premise is sort of simple to understand, and probably so is the context. I’m working on Swipe Gesture 2.0, which basically tries to take Chrome and Safari on OS X Lion’s awesome back-forward transitions and make them work on other operating systems. See, the thing is that multitouch isn’t _strictly _a requirement for it to work, a lot of computers just have the little bars on the bottom and right of the track bar (often with a little somewhat abrasive textured surface so you don’t accidentally tread upon it). Regardless, the title is a bit of a misnomer, because even though the event is called the “mousewheel”, it’s hardly meant to be observed from an actual mouse (or a wheel), instead it means the scrolling gesture on some kind of trackpad, either multitouch or not.

Well, first, I guess I’ll talk about the difference between how Lion and Leopard do it. The way Leopard did it was pretty cool but not particularly applicable to other platforms since it relied on the existence of a three-finger gesture. As in, you needed some kind of touchpad which was cool enough to support three-finger multiouch, reliably. It also behaved completely independent of the current zoom or scroll position, which makes implementation in software entirely trivial given access to some drivers which can recognize three fingers on a touchpad.

Lion did it a completely different way. Instead of creating an entirely new gesture which was entirely dedicated to the singular task of navigating through history, it conflated the notions of scrolling with navigating, which sort of makes sense. Apple’s quite dedicated to skeuomorphic metaphors, and they want to treat the web more like literal pages. A user can move it around to better keep certain things in view, and the physical movement to slide a sheet out of view is just an extension of that panning gesture.

However, technically this poses a completely different challenge, because this requires you to distinguish between scrolls and navigation requests. Scrolling is always the default behavior, but the navigation swipe gesture happens when scrolling wouldn’t actually result in anything. However, many implementations of scrolling are at least somewhat kinetic, often it’s emphasized in software (in the form of smooth-scrolling) or hardware (scroll wheels that don’t click but instead move basically freely) or because your arm has to obey the laws of kinematics (unless it doesn’t, in which case that’s certainly fascinating). So not only does the software have to determine when a mouse wheel action results manifests as a scroll, it has to see if it was the user’s intent to do the extraneous scrolling.

This is done by clustering the mouse wheel movements together temporally. Scroll events flow in in discrete chunks, and you can split events off into little buckets (in a sense), where if there isn’t any event sent within some arbitrary threshold (say, 500msecs or half a second), you stick stuff in a new bucket. This way, lets say you scroll from the top of the page to the bottom and you’re sort of excited, and spin the wheel as fast as possible, you hit the bottom of the page but it’s not some instant stop. You continue scrolling (because you’re just that excited, and just can’t stop) for a little bit more. Ignoring the fact that you probably won’t have a vertical/horizontal event handler (though there are some sort of intriguing possibilities for this, one idea is to have the upper threshold trigger full screen). Without segmenting them into certain buckets, it doesn’t recognize that the time when you’re ramming into the top of the page is part of the same general gesture as when you were scrolling, and it may interpret that as an intentional gesture. So that’s one part which makes it a bit more complex.

So now, you have these series of mousewheel events conveniently delimited into little gesture-chunks. The next part is determining whether or not the gesture-chunks are part of a scroll action or not.

Thankfully that’s a really simple thing to do. Just look at the document’s scrollTop and check if it’s zero (or scrollLeft for horizontal stuff) or whatever value is the width of the element. If it can’t scroll no more, then you have a winner and you can start the falling balloons and confetti.

Except it’s not that easy, because the document isn’t the only thing which can scroll. Thanks to the glory of overflow:scroll, there are lots of things which can scroll. Things which aren’t necessarily documents may be in arbitrary scroll positions to wreak havoc on your well-meaning heuristics.

So back to the drawing board, I guess. Actually, to think of it, maybe it’s simpler to listen for the scroll event, which fires when a scroll happens, and quite intuitively doesn’t fire when a scroll doesn’t happen. And mouse wheel actions always precede scroll ones (because the wheel events bubble and are cancelable, so you can prevent a scroll from happening). The only problem is that scroll events don’t bubble. As in, when a scroll event happens on some element, it’s not going to show up on the document, it’s only going to show up if you’re listening on that specific element at the right point in time.

The naive approach to this dilemma is just to attach a scroll listener to every single event on the document, and to reattach to some other elements whenever the DOM tree is modified in some way. This means the overhead grows rather significantly when pages are larger, in a way which could be likened to O(n) time where n represents the number of nodes in the document. If you want, you could lazily do it by attaching the scroll listener only once the wheel event has fired, but that would cause a significant delay when attempting to legitimately scroll.

Another thing you could do, is to make another assumption: that the element which gets scrolled has to be some parent of the element which the mouse is currently over. Making that assumption, we can add a mousewheel listener to the root of the document, as those kinds of events actually do bubble. And since they’re mouse events, once you capture it, you can get a clientX and clientY, comprising the current coordinates of the mouse. And with that, you can get the element immediately below the cursor with document.elementFromPoint. And since the scroll might fire on any one of the elements which are parents of the current element, you ascend up the tree and add a listener on all of those (until, of course you hit the document element, at which point you can’t go any further up). This yields performance which could essentially be modeled with O(log n), quite a bit better than O(n).

So now the finished process is fairly simple, you listen for a mousewheel event, and when it happens we determine the element, and ascend the tree, yada yada. That scroll listener, when fired, sets a global variable lastDetectedScroll to the current timestamp. We set a little temporary variable set to the before time and then we set a little timer, 150 milliseconds. It usually only takes like four to see if a scroll thingy happened, but let’s be safe by having an order-of-magnitude threshold. The Cuckoo clock rings, and we check if the lastDetectedScroll is the same thing, and if it is, it’s a swipe, and otherwise, it’s a scroll.

Here’s a little demo: http://antimatter15.com/misc/experiments/swipe-gesture/minimal.html


Swipe Gesture for Chrome 13 August 2012

Here’s an extension which I actually released some time back, but never got around to writing a blog post for. Part of the reason was that the early reviews didn’t quite pan out, in large part due to not working. But I was using my Chromebook and I somehow felt a vague longing for some kind of multitouch gesture, and remembered that I had made this little extension (which I had disabled for some reason). Anyway, this is as appropriate a time as any to formally announce it to my probably remarkably small blog readership.

There is, however a tad bit of difficulty representing the function of it in pictures because really, it doesn’t have a big UI. It makes hardware more useful, and in its idealized form, should have no interface. But of course, we don’t live in a place where apps are perfectly idealized and either way, Apple has plenty of nice pretty pictures of people swiping fingers to the right.

I really fell in love with the Macbook multitouch gestures, almost at first sight. They just seemed so natural and so beautiful that I sort of felt that that was like the epitome of design or HCI perfection. And from that point, any time I used a laptop which wasn’t made by Apple (or even the ones which were made by Apple but were stuck in the barbaric ages preceding the inclusion of the glass multitouch pad, where its invention might have produced a scene like this), I felt thoroughly disgusted.

Flipping through the Chromium OS design papers, there is one page dedicated specifically to cool multitouch gestures which could be used. And as far as I’m aware the Samsung Series 5 550 (the new chromebook) is the only device which supports these gestures (thus far), and even then it’s only pinch to zoom and forward/back (three finger). All the other Chromebook users have been left out.

Another cool thing about the implementation is that it uses a certain webkitDirectionInvertedFromDevice property of the mousewheel events, which gives you a boolean value about whether or not the platform you’re on has some magical direction inversion like on OS X Lion or if you’ve enabled “simple scrolling” on Chrome OS. But this might not have been a good idea since swipe directions too are sort of inverted on those platforms naturally as well, so it might be better to _not _compensate for it.

Anyway, the implementation is actually quite simple. The current version doesn’t even break the 40 line mark, because all it does it it listens for mousewheel events on every page (via a content script), and it calculates the current acceleration. If that acceleration ever passes a certain threshold, it triggers a forward or back action. Right now, the threshold is preconfigured based on my own testing on a Samsung Series 5 (note, not 550) chromebook. But for people with other devices, I’m working on a second version which will be slightly more Apple-esque in its implementation.


Distributed Pi Revisited 11 August 2012

So I’ve been interested in distributed computing for some time, since 2007, basically around the time I started doing web development. I’ve always sort of romanticized the notion of distributed computing because of its vast theoretical potential. Projects like BOINC, SETI@Home and Folding@Home have always given me some kind of idealized notion of “computing for good”, inspiring some kind of useful social, scientific or otherwise beneficial change to humanity. But those projects never got the kind of adoption which could truly change the world, together they form networks which are specialized but can, in the end, only eke relatively minute performance.

Part of the problem lies in their intrinsic forbidding voluntary nature. It takes too much effort to install and amplifies the problems intrinsic to any “democratic” system: why bother? Voter apathy often stems from a feeling that an individual contribution regresses towards nothing, which is statistically certainly true, but never helpful if it plays a part in the participation of one of these collective entities.

In a broader sense, I’ve always suspected that part of what is necessary for technological progress is the loss of control. Certainly something which appears true from a human experience, that the vast arrays of neurons bound by a cranium are uninviting and wholly unwilling to expose their inner raw computation power to the emergent conscience within them. No doubt the human brain is good at computing, just not (as it appears) math, but intuition (especially of the physical kind) can only arise when a system internalizes some very complex math. We just can’t access it, because evolution or some other process has determined that it was something that had to be done away with on the climb to higher cognitive powers.

Likewise, I think computers are convergent in that sort of way. There’s a great ladder of abstraction which is growing taller and taller, a tower of babel of sorts, and at the end, perhaps we’ll find a similar goal, maybe not of god per se, but an equally sought target of consciousness or some other high level intellectual faculties contemporaneously denied to computers.

Part of this is losing control, which is inevitable and politically dangerous. Computing terminals are powerful and capable of much more than they’re being used for. In fact, most computers spend most time idling, the processors get ever faster not for the handling of the idle time but in order to smooth out the few bursts that actually require fast computation.

It’s impractical for operating systems to build into them some kind of distributed computing platform, however ideal that would be. It’s too contentious to ever get adopted, conceptually a short step away from the ability for a mega-corp to pilfer your precious information as well.

But the browser, specifically the web, provides us with an interesting opportunity. Here, we have significantly less fear of personal privacy, since it comes with an expectation of sorts for information sharing. The existence of client side scripting and its prevalence gives an implicit permission to exploit system resources during the site’s tenure.

Now, of these granted permissions, we have the freedom now to exploit in order to create something truly remarkable. Because it’s not so much voluntary on the part of the user, so much as voluntary on the part of the site maintainer who now has the responsibility of allocating and managing the system resources of his visitors, for their brief but additive virtual encounters. The users lose control, and that’s a good thing.

Old Stuff

When I first wrote that introduction, I mentioned that I was interested in distributed computing for quite a while, and that’s true. 2007 was, at time of writing five years ago. A bit less than a third of my life, which is fairly significant. It doesn’t even quite belong to recent memory, and perhaps has escaped living memory into something quasi-zomboid. Part of the reason why this section is relatively short compared to the other ones is because I really don’t have a very good memory of what that was like, and it’s rather sad that my previous attempts never had long writeups regarding their potential and process (however, there do tend to be more little updates about the process).

Anyway, as it’s been such a long time, I’m trying to dig through old stuff to pick out exactly what I did back then. The first things I can find are somewhat easier and simpler problems, finding palindromes, or at least words which spell different words when reversed (I believe this was based on a program I had written a few years prior in Visual Basic). Another was for cracking hashes in a distributed manner.

Back then, I think those were one of the first explorations into the concept, of distributed computing in the browser. It was the days before WebWorkers and Cross-Origin Resource Sharing was in its infancy. The pool of possible computation was however, still large but most of it was forbidden. Take note, however that distributed and parallel computing have existed for far longer, perhaps even longer than the idea of a computer.

Computers get faster each year, thanks to Moore’s Law and more and more people get connected to the Internet each year. Perhaps in addition to Metcalfe’s law with regard to the value of a telecommunication’s network, is the ability to harness idle computation power in order to increase the vale of its participants linearly as the network grows. There are almost a billion people who are connected to the Internet, and however insignificant of a contribution each of those terminals brings to the network would add up into something truly remarkable.

However, hash cracking and palindrome finding are fairly trivial. They have little real world practicality and don’t seem in any way representative of the greater problems facing humanity. While it’s a little impractical to aim for some kind of project which has immediate applications to the value of humanity, it’s certainly a worthy target which is worth approximating. They’re isolated examples which are easy to parallelize and hardly count as real ventures into the field.

Calculating digits of pi is marginally less useless and represents something which is significantly harder and tasks which may be closer to the kinds of computations which are performed in the real world. There have been other projects with similar goals, most notably, PiHex, which acts as a sort of vindication of this as a possible legitimate attempt. At this point, I have no idea if the algorithm works with digits in the order of trillions, and that’s one of the big reasons I’m not actually trying to succeed PiHex.

Revisting

Okay, so I decided to dig this up again.

Why? Because I really want to play around with server-side JS a little more (just in case getting a node vps has anything to do with merit). The sort of funny thing is that’s exactly what it used to be, back in the ancient past, before NodeJS existed (it was before Google Chrome was released, and V8 wasn’t open source). I used to have an application which would schedule jobs for computation on the client and provide them in a more useful format.

However, I never bothered saving the files outside of the AppJet web IDE and the code was lost when the service was discontinued. I tried porting to Google App Engine, but that version was plagued with strange bugs which ended up printing out the wrong digits after a few thousand were computed.

Right now, revisiting the idea, I’ve added a few things which are somewhat more characteristic of the change in web browsers since my first attempt. The old version tried to mock threading by the liberal application of setTimeout, which meant that most interface interaction wouldn’t be terribly affected. However, it did incur a noticeable slowdown. Now, it uses WebWorkers, which provides numerous interesting possibilities. First and perhaps most important is context isolation and lightweight, asynchronous embedding in multiple domains (origins). Since WebWorkers can work (see what I did there?) across multiple origins and still have access to XMLHttpRequest, and the advent of Cross-Origin Resource Sharing (CORS), it’s easy to embed this in a page. The low embedding overhead and the true multi-threading abilities bring a lot in the way of making this more than an intriguing concept and into something much more practical.

The code, which is now on the Github repository in a new folder named node has a very basic interface. Rather than manipulating the widths of divs for progress indicators, now it uses &lt;progress&gt; (it’s actually really cool to see how far the state of web platform’s state of affairs has changed over the past few years). One cool thing about the progress bar is that it guesses progress by dividing the current prime number by the end point, which has an interesting effect of making the progress bar faster towards the end (this happens because the distribution of prime numbers asymptotically declines). While I could possibly invert the effect in order to create a more linear progression fairly trivially, behavioral studies indicate that people who stare at progress bars all day feel less irritated (i.e. more satisfied) when the bar speeds up in the beginning and end (then again, the progress indicator isn’t really meant to be shown to people and if it is, the user experience is hardly something which is being optimized for, and this does nothing in the way of speeding up the beginning, in fact it’s quite the opposite by slowing down the beginning, so perhaps it should first linearize the progress and then map it to some function which as accelerated starts and ends, perhaps some trigonometric function).

Part of what is cool about the project is the underlying algorithm, a port of Fabrice Bellard’s optimized version of the Bailey–Borwein–Plouffe algorithm. Unique to the algorithm is that it uses up comparably little memory and as such it’s uniquely suited to distributed computing, especially in the browser environment which demands relatively thin clients (however, browser caching means that the download would only necessarily need to be completed once, and embedding the computation in an iframe with certain storage permissions and appCache could allow persistence). It’s a relatively short algorithm and fairly easy to port.

However, as it’s a digit extraction algorithm, it’s not very efficient for calculating digits sequentially. That actually is something of a feature which I neglected in the so-called modern incarnation. Older versions had the prime number (sub-job) allocation done on the server, which meant using up a lot of memory in keeping track of the jobs. The current one has a server which is entirely unaware of the actual computational process, which leads to being lighter persistence-wise but comes at the obvious cost of losing granularity in scheduling.

One optimization about this strategy is that a client doesn’t have to wait for a server to respond with another job request in order to continue. Instead, it operates in an almost completely asynchronous manner. On the client’s first request, the server gives it a job which is given by a starting point (a prime number to continue from) and a digit number. From that point, the client begins computing until that digit is complete, sending its progress back to the server once in every specified interval (a couple of seconds).

If the client was to disconnect in the middle of a computation (which can take quite some time, especially in later digits, so its more than certain to happen), the server will be able to resume the calculation by sending the request to another client after a certain expiration date has passed. So future refinements to the server (and possibly the client) could make it more efficient with processing digits of less significance by allocating jobs and taking into account certain checkpoints. For instance, all the prime numbers between 3 and some integer N have to be iterated through (where N is approximately 3 times the number of digits from the radix). Rather than maintaining a “single-threaded” system (as it currently does, the parallel processing power comes from sending out multiple digit segments to be processed), it could instead send out simultaneous requests for different segments. The client may have to be modified in order to stop at certain designated checkpoints to prevent clients from overlapping.

This sort of scheme would be more efficient than storing jobs used in previous versions, first of all by retaining the sort of “dumb” server which doesn’t do any real computation that contributes to the process. Instead it only acts as a mediator and persistence layer (as perhaps is the best role to give to any server). Rather than keeping track of every single prime number as a different row in some database, the server would only need to keep track of as many sections as clients exist.

The new version also has some interesting changes. First is the switch to NodeJS, which was hinted at before, since it acts as a persistent server rather than something which is called and disposed like a CGI process, the queuing system is now entirely in-memory. However, as soon as its possible, it writes out the completed digits into a file (pi.txt), and if the server is ever interrupted the computations resume from the last saved version of the digits (only the jobs which were processing during the shutdown are lost). However, this would need to be changed if it were to be some more ideal use of the digit extraction method, since in such a situation, the vast majority of jobs would be for a single span of digits (and it would really suck to lose all of those).

So in a sense, this update constitutes a regression, a step in the wrong direction in that it’s using the algorithm wrong. However, in another sense, it paves the way to a set up which has a few more ideal properties, a lighter weight server end and a more efficient use of computation time on the part of the clients. In essence, the shift from fixed sized discrete computational tasks (or jobs) to mere ranges. The obvious benefit to the latter is the reduced memory demand and the ability to operate continuously, i.e. without pausing for acquiring tasks and behaving truly asynchronously (in a very Node.JS style). Perhaps taking it further along this direction (since WebWorkers are allowed to access applicationCache), clients could be given a starting point and are just let to rip through a few hundred cycles, synchronizing less frequently. Some specialized data structures might be employed to keep track of individual contributions so that the ranges are doled out optimally. Maybe this should instead evolve into some kind of successor to the PiHex project rather than some incredibly slow and inefficient sequential pi calculation platform.

One thing which I didn’t explore was the concept of having a persistent socket to the server. For this set of circumstances, the benefits of maintaining a persistent bidirectional socket weren’t large enough to warrant that kind of development. Right now, it’s communicating through a few small GET requests to a server. Part of the reason GET was used rather than POST was that it was designed with the idea that it could theoretically run in a cross-domain manner, and sending a POST request requires preflighting with CORS headers. Since it’s only ever really sending 30 or so bytes at a time, any additional overhead should be avoided. But this itself is also perhaps an argument for using WebSockets (though certainly not long-polling, since that incurs even more overhead). Quite often the synchronization requests don’t require the server to give anything back, but HTTP-wise, the server’s always going to have to send about a hundred bytes of headers to fulfill the HTTP requirements. Also, whenever the server sends a request, it wastes about a third of a kilobyte with header information like the User-Agent. Network-wise it might in the end be more efficient just to have a low-overhead persistent socket connection.

Having a persistent socket would make synchronization interesting as well. While the changes wouldn’t be terribly drastic since there’s still the goal of minimizing transmission overhead, a few changes to the scheduling could be conveyed without necessarily waiting to the next checkpoint. And of course, there’s the trade-off which comes with the question of how long the checkpoints should be spaced. Right now, it’s something fairly low in the range of 10 seconds since the duration of a single page visit usually isn’t much more than a few seconds. However, if a client-side persistence layer is added, the checkpoints could be spaced out much wider. Perhaps instead of every few seconds, one could be made for every minute, hour, day or even week. Where a single computational task allocation could span multiple domains, multiple days and while the user is online and offline. However, having wider-spaced checkpoints does incur a cost in terms of synchronization. It reduces the operator’s sense of intimate awareness of what’s actually happening and increases the chance of overlaps.

Another possible step is some kind of framework which handles all of these problems automatically. Instead of having to manually manage variables and the synchronization of tasks between clients and servers, one could construct a domain specific language for distributed computing (a superset or subset of javascript which automatically manages the state of variables in loops and such to compile into some kind of client for generic parallelized algorithms). Maybe it’ll do something cool and look at which loop is the optimal one to split up based on the amount of data which needs to be sent across so that it could be minimized.

One of the things I played with while revisiting this project was playing with LLJS, another kind of specialized language which builds upon Javascript. As it’s marketed, the “bastard child of Javascript and C”, which manually manages memory. I was hoping that using a typed language might bring some speed improvements (not really, in this situation). However, LLJS might be a good basis for this auto-magical compiler for synchronous routines into largely parallelized code. However, maybe there are in fact limits to parallel computing, and it’s better to search for specific algorithms which have the right properties. And maybe in the end, the problem of porting it to Javascript and managing the client-server communication fades into a relative triviality.