somewhere to talk about random ideas and projects like everyone else

stuff

#interface

A New Approach to Video Lectures 21 November 2012

At time of writing, a video is being processed by my v2.py script, it’s only eight lines of code thanks to the beautifully terse nature of python and SimpleCV. And since it’s clearly not operating at the breakneck speed of one frame per second, I don’t have time to write this README, meaning that I’m writing this README. But since I haven’t actually put a description of this project out in writing before, I think it’s important to start off with an introduction.

It’s been over a year since I first wrote code for this project. It really dates back to late April 2011. Certainly it wouldn’t have been possible to write the processor in eight painless lines of python back then, when SimpleCV was considerably in more of an infancy. I’m pretty sure that puts the pre-production stage of this project in about the range of a usual Hollywood movie production. However, that’s really quite unusual for me because I don’t tend to wait to get started on projects often. Or at least, I usually publish something in a somewhat workable state before abandoning it for a year.

However, the fact is that this project has been dormant for more than an entire year. Not necessarily because I lost interest in it, but because it always seemed like a problem harder than I had been comfortable tackling at any given moment. There’s a sort of paradox that afflicts me, and probably other students (documented by that awesome Calvin and Hobbes comic) where at some point, you find a problem hard enough that it gets perpetually delayed until, of course, the deadline comes up and you end up rushing to finish it in some manner that only poses a vague semblance to the intent.

The basic premise is somewhat simple: videos aren’t usually the answer. That’s not to say video isn’t awesome, because it certainly is. YouTube, Vimeo and others provide an absolutely brilliant service, but those platforms are used for things that they aren’t particularly well suited for. Video hosting services have to be so absurdly general because there is this need to encompass every single use case in a content-neutral manner.

One particular example is with music, which often gets thrown on YouTube in the absence of somewhere else to stick it. A video hosting site is pretty inadequate, in part because it tries to optimize the wrong kinds of interactions. Having a big player window is useless, having an auto-hiding progress slider and having mediocre playback, playlist and looping interfaces are signs that a certain interface is being used for the wrong kind of content. Contrast this to a service like SoundCloud which is entirely devoted to the interacting with music.

The purpose of this project is somewhat similar. It’s to experiment with creating an interface for video lectures that goes above, in terms of interactivity and in terms of usability (perhaps even accessibility), what a simple video can do.

So yeah, that’s the concept that I came up with a year ago. I’m pretty sure it sounds like a pretty nice premise, but really at this point the old adage of “execution is everything” starts to come into play. How exactly is this going to be better than video?

One thing that’s constantly annoyed me about anything video-related is the little progress slider tracker thing. Even for a short video, I always end up on the wrong spot. YouTube has the little coverflow-esque window which gives little snapshots to help, and Apple has their drag down to do precision adjustment, but in the end the experience is far from optimal. This is especially unsuitable because moreso in lectures than perhaps in any other type of content, you really want to be able to step back and go over some little thing again. Having to risk cognitive derailment just to go over something you don’t quite get can’t possibly be good (actually, for long videos in general, it would be a good idea to snap the slider to the nearest camera/scene change which wouldn’t be hard to find with basic computer vision, since that’s in general, where I want to go). But for this specific application, the canvas itself makes perhaps the greatest navigatory tool of all. The format is just a perpetually amended canvas with redactions rare, and the most sensible way to navigate is by literally clicking on the region that needs explanation.

But having a linear representation of time is useful for pacing, and to keep track of things when there isn’t always a clear relationship between the position of the pen and time. A more useful system would have to be something more than just a solid gradient bar crawling across the bottom edge of the screen, because it would also convey where in the content the current step belongs. This is something analogous to the way YouTube shows a strip of snapshots when thumbing through the slider bar, but in a video-lecture setting we have the ability to automatically and intelligently build populate the strip with with specific and useful information.

From this foundation we can imagine looking at the entire lecture in it’s final end state, except with the handwriting grayed out. The user can simply circle or brush over the regions which might seem less trivial, and the interface could automatically stitch together a customized lecture at just the right pacing, playing back the work correlated with audio annotations. On top of that, the user can interact with the lecture by contributing his or her own annotations or commentary, so that learning isn’t confined to the course syllabus.

Now, this project, or at least its goals evolved from an idea to vectorize Khan Academy. None of these truly requires a vector input source, in fact many of the ideas would be more useful implemented with raster processing and filters, by virtue of having some possibility of broader application. I think it may actually be easier to do it with the raster method, but I think, if this is possible at all, it’d be cooler to do it using a vector medium. Even if having a vector source was a prerequisite, it’d probably be easier to patch up a little scratchpad-esque app to record mouse coordinates and to re-create lectures rather than fiddling with SimpleCV in order to form some semblance of a faithful reproduction of the source.

I’ve had quite a bit to do in the past few months, and that’s been reflected in the kind of work I’m doing. I guess there’s a sort of prioritization of projects which is going on now, and this is one of those which has perennially sat on the top of the list, unperturbed. I’ve been busy, and that’s led to this wretched mentality to avoid anything that would take large amounts of time, and I’ve been squandering my time on small and largely trivial problems (pun not intended).

At this point, the processing is almost done, I’d say about 90%, so I don’t have much time to say anything else. I really want this to work out, but of course, it might not. Whatever happens, It’s going to be something.


Determining if a Mousewheel Event Results in Scroll 14 August 2012

So here’s a somewhat technical post, actually it’s pretty technical. But either way the premise is sort of simple to understand, and probably so is the context. I’m working on Swipe Gesture 2.0, which basically tries to take Chrome and Safari on OS X Lion’s awesome back-forward transitions and make them work on other operating systems. See, the thing is that multitouch isn’t _strictly _a requirement for it to work, a lot of computers just have the little bars on the bottom and right of the track bar (often with a little somewhat abrasive textured surface so you don’t accidentally tread upon it). Regardless, the title is a bit of a misnomer, because even though the event is called the “mousewheel”, it’s hardly meant to be observed from an actual mouse (or a wheel), instead it means the scrolling gesture on some kind of trackpad, either multitouch or not.

Well, first, I guess I’ll talk about the difference between how Lion and Leopard do it. The way Leopard did it was pretty cool but not particularly applicable to other platforms since it relied on the existence of a three-finger gesture. As in, you needed some kind of touchpad which was cool enough to support three-finger multiouch, reliably. It also behaved completely independent of the current zoom or scroll position, which makes implementation in software entirely trivial given access to some drivers which can recognize three fingers on a touchpad.

Lion did it a completely different way. Instead of creating an entirely new gesture which was entirely dedicated to the singular task of navigating through history, it conflated the notions of scrolling with navigating, which sort of makes sense. Apple’s quite dedicated to skeuomorphic metaphors, and they want to treat the web more like literal pages. A user can move it around to better keep certain things in view, and the physical movement to slide a sheet out of view is just an extension of that panning gesture.

However, technically this poses a completely different challenge, because this requires you to distinguish between scrolls and navigation requests. Scrolling is always the default behavior, but the navigation swipe gesture happens when scrolling wouldn’t actually result in anything. However, many implementations of scrolling are at least somewhat kinetic, often it’s emphasized in software (in the form of smooth-scrolling) or hardware (scroll wheels that don’t click but instead move basically freely) or because your arm has to obey the laws of kinematics (unless it doesn’t, in which case that’s certainly fascinating). So not only does the software have to determine when a mouse wheel action results manifests as a scroll, it has to see if it was the user’s intent to do the extraneous scrolling.

This is done by clustering the mouse wheel movements together temporally. Scroll events flow in in discrete chunks, and you can split events off into little buckets (in a sense), where if there isn’t any event sent within some arbitrary threshold (say, 500msecs or half a second), you stick stuff in a new bucket. This way, lets say you scroll from the top of the page to the bottom and you’re sort of excited, and spin the wheel as fast as possible, you hit the bottom of the page but it’s not some instant stop. You continue scrolling (because you’re just that excited, and just can’t stop) for a little bit more. Ignoring the fact that you probably won’t have a vertical/horizontal event handler (though there are some sort of intriguing possibilities for this, one idea is to have the upper threshold trigger full screen). Without segmenting them into certain buckets, it doesn’t recognize that the time when you’re ramming into the top of the page is part of the same general gesture as when you were scrolling, and it may interpret that as an intentional gesture. So that’s one part which makes it a bit more complex.

So now, you have these series of mousewheel events conveniently delimited into little gesture-chunks. The next part is determining whether or not the gesture-chunks are part of a scroll action or not.

Thankfully that’s a really simple thing to do. Just look at the document’s scrollTop and check if it’s zero (or scrollLeft for horizontal stuff) or whatever value is the width of the element. If it can’t scroll no more, then you have a winner and you can start the falling balloons and confetti.

Except it’s not that easy, because the document isn’t the only thing which can scroll. Thanks to the glory of overflow:scroll, there are lots of things which can scroll. Things which aren’t necessarily documents may be in arbitrary scroll positions to wreak havoc on your well-meaning heuristics.

So back to the drawing board, I guess. Actually, to think of it, maybe it’s simpler to listen for the scroll event, which fires when a scroll happens, and quite intuitively doesn’t fire when a scroll doesn’t happen. And mouse wheel actions always precede scroll ones (because the wheel events bubble and are cancelable, so you can prevent a scroll from happening). The only problem is that scroll events don’t bubble. As in, when a scroll event happens on some element, it’s not going to show up on the document, it’s only going to show up if you’re listening on that specific element at the right point in time.

The naive approach to this dilemma is just to attach a scroll listener to every single event on the document, and to reattach to some other elements whenever the DOM tree is modified in some way. This means the overhead grows rather significantly when pages are larger, in a way which could be likened to O(n) time where n represents the number of nodes in the document. If you want, you could lazily do it by attaching the scroll listener only once the wheel event has fired, but that would cause a significant delay when attempting to legitimately scroll.

Another thing you could do, is to make another assumption: that the element which gets scrolled has to be some parent of the element which the mouse is currently over. Making that assumption, we can add a mousewheel listener to the root of the document, as those kinds of events actually do bubble. And since they’re mouse events, once you capture it, you can get a clientX and clientY, comprising the current coordinates of the mouse. And with that, you can get the element immediately below the cursor with document.elementFromPoint. And since the scroll might fire on any one of the elements which are parents of the current element, you ascend up the tree and add a listener on all of those (until, of course you hit the document element, at which point you can’t go any further up). This yields performance which could essentially be modeled with O(log n), quite a bit better than O(n).

So now the finished process is fairly simple, you listen for a mousewheel event, and when it happens we determine the element, and ascend the tree, yada yada. That scroll listener, when fired, sets a global variable lastDetectedScroll to the current timestamp. We set a little temporary variable set to the before time and then we set a little timer, 150 milliseconds. It usually only takes like four to see if a scroll thingy happened, but let’s be safe by having an order-of-magnitude threshold. The Cuckoo clock rings, and we check if the lastDetectedScroll is the same thing, and if it is, it’s a swipe, and otherwise, it’s a scroll.

Here’s a little demo: http://antimatter15.com/misc/experiments/swipe-gesture/minimal.html


Swipe Gesture for Chrome 13 August 2012

Here’s an extension which I actually released some time back, but never got around to writing a blog post for. Part of the reason was that the early reviews didn’t quite pan out, in large part due to not working. But I was using my Chromebook and I somehow felt a vague longing for some kind of multitouch gesture, and remembered that I had made this little extension (which I had disabled for some reason). Anyway, this is as appropriate a time as any to formally announce it to my probably remarkably small blog readership.

There is, however a tad bit of difficulty representing the function of it in pictures because really, it doesn’t have a big UI. It makes hardware more useful, and in its idealized form, should have no interface. But of course, we don’t live in a place where apps are perfectly idealized and either way, Apple has plenty of nice pretty pictures of people swiping fingers to the right.

I really fell in love with the Macbook multitouch gestures, almost at first sight. They just seemed so natural and so beautiful that I sort of felt that that was like the epitome of design or HCI perfection. And from that point, any time I used a laptop which wasn’t made by Apple (or even the ones which were made by Apple but were stuck in the barbaric ages preceding the inclusion of the glass multitouch pad, where its invention might have produced a scene like this), I felt thoroughly disgusted.

Flipping through the Chromium OS design papers, there is one page dedicated specifically to cool multitouch gestures which could be used. And as far as I’m aware the Samsung Series 5 550 (the new chromebook) is the only device which supports these gestures (thus far), and even then it’s only pinch to zoom and forward/back (three finger). All the other Chromebook users have been left out.

Another cool thing about the implementation is that it uses a certain webkitDirectionInvertedFromDevice property of the mousewheel events, which gives you a boolean value about whether or not the platform you’re on has some magical direction inversion like on OS X Lion or if you’ve enabled “simple scrolling” on Chrome OS. But this might not have been a good idea since swipe directions too are sort of inverted on those platforms naturally as well, so it might be better to _not _compensate for it.

Anyway, the implementation is actually quite simple. The current version doesn’t even break the 40 line mark, because all it does it it listens for mousewheel events on every page (via a content script), and it calculates the current acceleration. If that acceleration ever passes a certain threshold, it triggers a forward or back action. Right now, the threshold is preconfigured based on my own testing on a Samsung Series 5 (note, not 550) chromebook. But for people with other devices, I’m working on a second version which will be slightly more Apple-esque in its implementation.


HTML5/CSS3 Zooming User Interface 29 August 2010

I’m taking liberties with the concept of zooming user interfaces, but this is an example of something that lets you browse wikipedia by zooming in toward a link, and zooming out to go back. So I guess it’s more of a z-axis spatial visualization for history somewhat similar to what I guess Apple’s Time Machine program is like (though I have never tried it).

It uses html5’s popState and pushState to get the URL to change without reloading the page (which, btw people should use instead of the weird /#/ urls). It uses webkit transformations, which probably aren’t part of CSS3 since it’s vendor specific, but I haven’t had time to hack it to work on firefox, feel free to fork.

javascript:(function(){var scale=1,tx=innerWidth/2,ty=innerHeight/2,sx=innerWidth/2,sy=innerHeight/2;document.onmousewheel=function(a){if(a.wheelDelta){if(a.wheelDelta>0)scale*=a.wheelDelta/1000;if(a.wheelDelta<0)scale/=-a.wheelDelta/1000}scale<0.05&&history.go(-1);showTransform();a.preventDefault()};window.onpopstate=function(a){loadPage(a.state.url)}; function showTransform(){document.body.style.webkitTransform="scale("+scale+") translate("+(innerWidth/2-tx)+"px,"+(innerHeight/2-ty)+"px)";var a=document.elementFromPoint(innerWidth/2,innerHeight/2);if(a.nodeName=="A"&&a.offsetWidth*scale>0.3*innerWidth&&a.offsetHeight*scale>0.3*innerHeight){console.log("clicky");loadPage(a.href);history.pushState({url:a.href},a.href,a.href)}} function loadPage(a){var b=new XMLHttpRequest;b.open("get",a,true);b.onload=function(){document.body.innerHTML=b.responseText;showTransform()};b.send(null);scale=1;tx=innerWidth/2;ty=innerHeight/2;sx=innerWidth/2;sy=innerHeight/2;showTransform()}var dragging=false;document.onmousemove=function(a){if(dragging){tx+=(sx-a.pageX)/scale;ty+=(sy-a.pageY)/scale;sx=a.pageX;sy=a.pageY;showTransform();a.preventDefault();a.stopPropagation()}}; document.onmousedown=function(a){dragging=true;sx=a.pageX;sy=a.pageY;a.preventDefault()};document.onclick=function(a){a.preventDefault()};document.onmouseup=function(a){dragging=false;a.preventDefault()};})()

So that’s a bookmarklet. feel free to click it on this site and it’ll get rid of the infinite scrolling and for some reason it doesn’t work well on this site. Try it on wikipedia.

(function(){
    var scale=1,
        tx=innerWidth/2,
        ty=innerHeight/2,
        sx=innerWidth/2,
        sy=innerHeight/2,
        mx=innerWidth/2,
        my=innerHeight/2,

    document.onmousewheel=function(a){
        console.log(a.wheelDelta, scale)
        scale = Math.exp(Math.log(scale) + a.wheelDelta / 200)
        if(scale<0.05) history.go(-1);
        showTransform();
        a.preventDefault()
    };
    window.onpopstate=function(a){
        loadPage(a.state.url)
    };
    function showTransform(){
        document.body.style.webkitTransform="scale("+scale+") translate("+(mx-tx)+"px,"+(my-ty)+"px)";
        var a=document.elementFromPoint(innerWidth/2,innerHeight/2);
        if(a.nodeName=="A"&&a.offsetWidth*scale>0.3*innerWidth&&a.offsetHeight*scale>0.3*innerHeight){
            console.log("clicky");
            loadPage(a.href);
            history.pushState({url:a.href},a.href,a.href)
        }
    } 
    function loadPage(a){
        var b=new XMLHttpRequest;
        b.open("get",a,true);
        b.onload=function(){
            document.body.innerHTML=b.responseText;
            showTransform()
        };
        b.send(null);
        scale=1;
        tx=innerWidth/2;
        ty=innerHeight/2;
        sx=innerWidth/2;
        sy=innerHeight/2;
        showTransform()
    }
    var dragging=false;
    document.onmousemove=function(a){

        if(dragging){
            tx+=(sx-a.pageX)/scale;
            ty+=(sy-a.pageY)/scale;
            sx=a.pageX;
            sy=a.pageY;
            showTransform();
            a.preventDefault();
            a.stopPropagation()
        }
    }; 
    document.onmousedown=function(a){
        dragging=true;
        sx=a.pageX;
        sy=a.pageY;
        a.preventDefault()
    };
    document.onclick=function(a){
        a.preventDefault()
    };
    document.onmouseup=function(a){
        dragging=false;
        a.preventDefault()
    };
})()