somewhere to talk about random ideas and projects like everyone else

stuff

#mac

October Progress Update 31 October 2013

It’s Halloween, and I still haven’t posted a monthly blog post and I don’t quite feel like retroactively posting something next month. I’m understandably quite starved for free time with my attempts to reconcile sleep with college with social interaction- and from the looks of it, I probably won’t be able to publish the blog post that I’ve been working on for the better half of this month before it the month ends.

For the past five days, I’ve started using an actual laptop- a late-2013 Macbook Pro Retina 15” (yes, I made it through the better part of two months of college without a laptop more sophisticated than a 2009-era Chromebook). Aside from the obligatory setup process and acclimation to the new operating system, and a mild bout of screen-size-anorexia (which with proper counseling, I’ve more or less recovered from- the 13” is actually somewhat small, but I still can’t quite shake the feeling that 15” is a smidgen too big), the process has been quite painless.

Getting a laptop slightly more capable than the Series 5 Chromebook (not the beefier Celeron model, the original Atom) is a quite long overdue change. I participated in my first hackathon (incidentally also the first time I’ve really written code since the start of the school year) during the beginning of the month. By the end of that 32-hour stretch, I did yearn for a functional trackpad, larger screen and more performant setup. But my shining knight in Unibody Aluminium armor would not have come until three weeks later. But I don’t think the productivity gains would have affected things too much- even with this dinky setup, the prototype scored the second place trophy.

The exact subject of the project was actually discussed briefly in the last progress update, on that long list of projects which I’ve yet to start. That night, I had actually taken the initiative to do a proper port of some Matlab implementation of the Stroke Width Transform. I hooked it up as a content script which would listen for mouse events over image elements and search for textual regions and draw semitransparent blue boxes where appropriate, and connected it to a node backend which would run tesseract to recognize characters. By the end of it all, I had enough for a pretty impressive demo.

Intermittently for more or less the entire month, I’ve been trying to improve the project- replacing some of the more hacky bits with more reliable implementations. I’ve read about some more algorithms, and experimented with different approaches to improve the method. I’m trying to add more stages to the text detection algorithm, such as the ability to segment an image into different lines, and improve the process of splitting lines into characters beyond mere connected components. But the process is rather tedious and with my limited free time, the project remains quite far away from public availability.


hqx.js - pixel art scaling in the browser 31 March 2013

Screenshot 2013-03-31 at 5.15.07 PM

Every once in a while some gadget has the misfortune of epitomizing the next first world problem. I guess right now, this is owning a Retina (or equivalent) laptop, tablet (arguably phone, but most web pages are scaled out so it’s not that big of a problem) and being irked at the prevalence of badly scaled graphics. So there’s a new buzzword “Retina Ready” for websites, layouts and designs which support higher resolution graphics for devices which support it, often meaning of lots of new files and new css rules. It’s this trend of high-pixel-density devices (with devices like the iPad 3, Retina Macbook Pro, Nexus 10 and Chromebook Pixel - though I for one don’t currently have any of them, just this old glitchy-albeit-functional first generation Chromebook) that is driving people to vector icon fonts.

But the problem of radical increases in terms of resolution isn’t a new one. Old arcade games rarely exceeded 260x315, and the Game Boy Color had a paltry 160x144. While a few people still nostalgically lug around game cabinets and dig out their dust-covered childhood handheld consoles for nostalgic sneezing fits, most of the old games are now played with emulators running on systems several orders of magnitude more sophisticated in every imaginable aspect. So that arcade monitor that once could engross a childhood (and maybe early manhood) now appears nothing more than a two inch square on a twenty inch monitor. But luckily there is a surprisingly good solution to all of this in the form of algorithms designed in particular for scaling pixel art.

The most basic form of image scaling that exists is called nearest-neighbor interpolation, which is extra simple for retina devices because it means simply growing the size of each pixel by a factor of two along each axis. That leads to things which are blocky, and unless you’re part of an 8-bit retro-art project with a chiptune soundtrack looks ugly.

The most common form of image scaling borrows a lot from the math and signal processing fields, with names like bilinear, bicubic, and lanzcos essentially they treat an image as some kind of composition of sinusoidal parts and try to ideally extrapolate and interpolate such that visible artifacts are marginalized. It’s all very mathy, but the result is kind of the opposite of nearest-neighbor because it has the tendency to make things blurry and fuzzy.

The thing is that the latter tries to reach some kind of mathematical ideal, because images taken by your friendly neighborhood DSLR-toting amateur (spider-powers optional) are actually samples of real world points of data— so this mathematical pursuit of purity works out very well. There’s still the factor-of-four information-theoretic gap that needs to be filled in with best-guesstimates, but there isn’t really any way to improve the way a photograph is scaled without using a higher-resolution version of said photograph. But most photographs that are taken already are sixteen-megapixel monsters and they usually still look acceptable when upscaled.

The problem arises with pixel art, little icons or buttons which someone painstakingly drew in Photoshop one lazy summer afternoon in the late 90s. They’re everywhere and each pixel isn’t captured and encoded by a sampling algorithm of some analog natural phenomona— each pixel was lovingly crafted and planted by some meticulous artist. There is no underlying analog signal to interpret, it’s a direct perceptual hookup to the mind of the creator— and that’s why bicubic sampling looks especially bad here.

Video games, before 3d graphics engines and math-aware anti-aliasing concerned with murdering jaggies, in the old civilized age of bit-blitting, were mostly constructed out of pixel art. Each color in that limited palette was placed there for a reason and could be exploited by specialized algorithms to construct higher-quality upscaled versions which remained sharp. These come with the names EPX, Scale2x, AdvMAME2x, Eagle, 2×Sal, Super 2×Sal, hqx, and most recently, Kopf-Lischinksi. These algorithms could be applied in real time to emulator windows to acceptably scale a game to new sizes while eschewing jagged corners and blurry edges.

Anyway the cool thing is that you can probably apply these algorithms in lieu of the nearest-neighbor or bilinear scaling algorithms used by browsers on retina platforms to effortlessly upgrade old sites to shiny and smooth. With a few rough heuristics (detect if an image appears to be a sprite by testing for a limited palette, see if the image is small or a perfect square, detect if it has transparent pixels) this could be packed into a simple script include that website makers could easily inject into their pages to automagically upconvert old graphics to new shiny high-resolution ones without having to go through the actual effort of drawing new high resolution graphics and uploading them online. And this could also be packaged as a browser extension so that, once and forever after, this first-world nuisance shall be no more.

Before setting out to port hqx-java to javascript, I actually did some cursory googling to see if it actually had been done before. Midway through writing this post, I found out that it actually had been done before, in a better way, so I won’t even bother linking to my inferior version. But either way the actual goal of this project was the part which was detailed in the last paragraph, that of an embeddable script or browser extension which could heuristically apply pixel-scaling algorithms— something I probably won’t bother trying to do until at least after I get my college laptop (which I anticipate will be a Retina Macbook Pro 15”). Nonetheless, I haven’t written an actual blog post in almost three months and it’s the last day of this month, and I guess it’s better than having you all (though nobody’s probably going to read this now that Google Reader has died) assume that I’ve died. Anyway, now I’m probably going to retroactively publish old blog posts in previous months to fraud continuity.


Swipe Gesture 2 Development 16 August 2012

So I’m trying something new, returning to quasi-daily somewhat short updates about the development of whatever I’m working on rather than withholding everything until something of somewhat acceptable release quality is achieved. I have a blog post about that transition, but I’m still working on it (as in, writing it is somewhat boring). It’s probably better given my development cycle is quite nonlinear, usually I get something somewhat promising made in the first few days or so and pause for long and possibly indefinite durations doing other stuff in the process. Probably, writing short blog posts about what I have yet to finish will remind me to, well, finish them. Just maybe. But I’m probably going to have to preface every post that I write with this kind of disclaimer until I actually get that post finished and published so I have something to reference rather than pointing crazily into the air and saying “oh yeah, it’s coming, now, someday, maybe.”.

Starting about yesterday, I started working on the successor to Swipe Gesture. The new version tries to mimic the actual behavior of Chrome on Lion, which I think is really quite cool. Here’s a video I found on YouTube which shows how it basically looks like if you aren’t familiar with it. The first thing to notice that it’s substantially less trivial, code-wise. No more is it a 30-line software lightweight, but it’s not _too _complex and arcane to forbid any kind of comprehension. Now, the simple prototype of its functionality is already nearing 300 lines of code.

Another big difference is now it’s no longer designed strictly for Chromebooks. In fact, one of the reasons for starting this was that I was informed that the kind of functionality might be useful on Macbooks running Windows via Bootcamp. In fact, it’s meant to be as general as possible, to work on pretty much any kind of platform. And it’s not even bound strictly to the horizontal axis: the code is meant to work with linear swipes in any direction including diagonally (although some experimentation on my chromebook seems to indicate that swiping at angles isn’t terribly useful).

The most significant conceptual change is the transition between a speed/acceleration metric to a distance metric. That is, in the old version, an action was triggered when there was a swipe in one direction vigorous enough to be considered. This was a fairly simple way to avoid the problem of distinguishing between a horizontal scroll action and a swipe by not making a distinction. In a sense, cheating. The new version instead does things “the right way™” by observing events carefully to determine if a swiping action actually results in scrolling. If that’s your kind of thing, the technical nitty gritty details have their own dedicated blog post, so feel free to click through if you’re interested.

Once it’s determined that that scroll thing is actually probably a swipe gesture, it renders a nice little arrow in canvas. I considered using a unicode arrow and setting the font to huge, but that didn’t turn out quite as well as I expected (plus, it makes rotations and interactions with the embedding page CSS a little less predictable).

Also another thing is that it turns out that it’s a bad idea to set a css transition on something which is meant to hook with mouse or scroll movements because, while this ends up smoothing things out (which is good for mouse wheels because they click to the nearest 120 magical click units) it ends up producing a significant amount of lag and just feels so awkward.

Another thing (since this post is written over the course of several days, and the actual update has already been published at time of writing) is the cool redesign of the Settings page. The first thing to notice is that the settings page for once actually has settings, which is quite an accomplishment by itself. Also, it has a visual refresh that makes it look somewhat bootstrap-esque. That’s because ever since using Bootstrap in the making of Protobowl (a rather big project that I have yet to blog about), I’ve pretty much fallen in love with the color whiteSmoke. Partly because it has a name, which means I don’t have to google it or tattoo it on my arm for a mnemonic’s sake, and also because it’s a pretty nice color.


Determining if a Mousewheel Event Results in Scroll 14 August 2012

So here’s a somewhat technical post, actually it’s pretty technical. But either way the premise is sort of simple to understand, and probably so is the context. I’m working on Swipe Gesture 2.0, which basically tries to take Chrome and Safari on OS X Lion’s awesome back-forward transitions and make them work on other operating systems. See, the thing is that multitouch isn’t _strictly _a requirement for it to work, a lot of computers just have the little bars on the bottom and right of the track bar (often with a little somewhat abrasive textured surface so you don’t accidentally tread upon it). Regardless, the title is a bit of a misnomer, because even though the event is called the “mousewheel”, it’s hardly meant to be observed from an actual mouse (or a wheel), instead it means the scrolling gesture on some kind of trackpad, either multitouch or not.

Well, first, I guess I’ll talk about the difference between how Lion and Leopard do it. The way Leopard did it was pretty cool but not particularly applicable to other platforms since it relied on the existence of a three-finger gesture. As in, you needed some kind of touchpad which was cool enough to support three-finger multiouch, reliably. It also behaved completely independent of the current zoom or scroll position, which makes implementation in software entirely trivial given access to some drivers which can recognize three fingers on a touchpad.

Lion did it a completely different way. Instead of creating an entirely new gesture which was entirely dedicated to the singular task of navigating through history, it conflated the notions of scrolling with navigating, which sort of makes sense. Apple’s quite dedicated to skeuomorphic metaphors, and they want to treat the web more like literal pages. A user can move it around to better keep certain things in view, and the physical movement to slide a sheet out of view is just an extension of that panning gesture.

However, technically this poses a completely different challenge, because this requires you to distinguish between scrolls and navigation requests. Scrolling is always the default behavior, but the navigation swipe gesture happens when scrolling wouldn’t actually result in anything. However, many implementations of scrolling are at least somewhat kinetic, often it’s emphasized in software (in the form of smooth-scrolling) or hardware (scroll wheels that don’t click but instead move basically freely) or because your arm has to obey the laws of kinematics (unless it doesn’t, in which case that’s certainly fascinating). So not only does the software have to determine when a mouse wheel action results manifests as a scroll, it has to see if it was the user’s intent to do the extraneous scrolling.

This is done by clustering the mouse wheel movements together temporally. Scroll events flow in in discrete chunks, and you can split events off into little buckets (in a sense), where if there isn’t any event sent within some arbitrary threshold (say, 500msecs or half a second), you stick stuff in a new bucket. This way, lets say you scroll from the top of the page to the bottom and you’re sort of excited, and spin the wheel as fast as possible, you hit the bottom of the page but it’s not some instant stop. You continue scrolling (because you’re just that excited, and just can’t stop) for a little bit more. Ignoring the fact that you probably won’t have a vertical/horizontal event handler (though there are some sort of intriguing possibilities for this, one idea is to have the upper threshold trigger full screen). Without segmenting them into certain buckets, it doesn’t recognize that the time when you’re ramming into the top of the page is part of the same general gesture as when you were scrolling, and it may interpret that as an intentional gesture. So that’s one part which makes it a bit more complex.

So now, you have these series of mousewheel events conveniently delimited into little gesture-chunks. The next part is determining whether or not the gesture-chunks are part of a scroll action or not.

Thankfully that’s a really simple thing to do. Just look at the document’s scrollTop and check if it’s zero (or scrollLeft for horizontal stuff) or whatever value is the width of the element. If it can’t scroll no more, then you have a winner and you can start the falling balloons and confetti.

Except it’s not that easy, because the document isn’t the only thing which can scroll. Thanks to the glory of overflow:scroll, there are lots of things which can scroll. Things which aren’t necessarily documents may be in arbitrary scroll positions to wreak havoc on your well-meaning heuristics.

So back to the drawing board, I guess. Actually, to think of it, maybe it’s simpler to listen for the scroll event, which fires when a scroll happens, and quite intuitively doesn’t fire when a scroll doesn’t happen. And mouse wheel actions always precede scroll ones (because the wheel events bubble and are cancelable, so you can prevent a scroll from happening). The only problem is that scroll events don’t bubble. As in, when a scroll event happens on some element, it’s not going to show up on the document, it’s only going to show up if you’re listening on that specific element at the right point in time.

The naive approach to this dilemma is just to attach a scroll listener to every single event on the document, and to reattach to some other elements whenever the DOM tree is modified in some way. This means the overhead grows rather significantly when pages are larger, in a way which could be likened to O(n) time where n represents the number of nodes in the document. If you want, you could lazily do it by attaching the scroll listener only once the wheel event has fired, but that would cause a significant delay when attempting to legitimately scroll.

Another thing you could do, is to make another assumption: that the element which gets scrolled has to be some parent of the element which the mouse is currently over. Making that assumption, we can add a mousewheel listener to the root of the document, as those kinds of events actually do bubble. And since they’re mouse events, once you capture it, you can get a clientX and clientY, comprising the current coordinates of the mouse. And with that, you can get the element immediately below the cursor with document.elementFromPoint. And since the scroll might fire on any one of the elements which are parents of the current element, you ascend up the tree and add a listener on all of those (until, of course you hit the document element, at which point you can’t go any further up). This yields performance which could essentially be modeled with O(log n), quite a bit better than O(n).

So now the finished process is fairly simple, you listen for a mousewheel event, and when it happens we determine the element, and ascend the tree, yada yada. That scroll listener, when fired, sets a global variable lastDetectedScroll to the current timestamp. We set a little temporary variable set to the before time and then we set a little timer, 150 milliseconds. It usually only takes like four to see if a scroll thingy happened, but let’s be safe by having an order-of-magnitude threshold. The Cuckoo clock rings, and we check if the lastDetectedScroll is the same thing, and if it is, it’s a swipe, and otherwise, it’s a scroll.

Here’s a little demo: http://antimatter15.com/misc/experiments/swipe-gesture/minimal.html