somewhere to talk about random ideas and projects like everyone else

stuff

A New Approach to Video Lectures 21 November 2012

At time of writing, a video is being processed by my v2.py script, it’s only eight lines of code thanks to the beautifully terse nature of python and SimpleCV. And since it’s clearly not operating at the breakneck speed of one frame per second, I don’t have time to write this README, meaning that I’m writing this README. But since I haven’t actually put a description of this project out in writing before, I think it’s important to start off with an introduction.

It’s been over a year since I first wrote code for this project. It really dates back to late April 2011. Certainly it wouldn’t have been possible to write the processor in eight painless lines of python back then, when SimpleCV was considerably in more of an infancy. I’m pretty sure that puts the pre-production stage of this project in about the range of a usual Hollywood movie production. However, that’s really quite unusual for me because I don’t tend to wait to get started on projects often. Or at least, I usually publish something in a somewhat workable state before abandoning it for a year.

However, the fact is that this project has been dormant for more than an entire year. Not necessarily because I lost interest in it, but because it always seemed like a problem harder than I had been comfortable tackling at any given moment. There’s a sort of paradox that afflicts me, and probably other students (documented by that awesome Calvin and Hobbes comic) where at some point, you find a problem hard enough that it gets perpetually delayed until, of course, the deadline comes up and you end up rushing to finish it in some manner that only poses a vague semblance to the intent.

The basic premise is somewhat simple: videos aren’t usually the answer. That’s not to say video isn’t awesome, because it certainly is. YouTube, Vimeo and others provide an absolutely brilliant service, but those platforms are used for things that they aren’t particularly well suited for. Video hosting services have to be so absurdly general because there is this need to encompass every single use case in a content-neutral manner.

One particular example is with music, which often gets thrown on YouTube in the absence of somewhere else to stick it. A video hosting site is pretty inadequate, in part because it tries to optimize the wrong kinds of interactions. Having a big player window is useless, having an auto-hiding progress slider and having mediocre playback, playlist and looping interfaces are signs that a certain interface is being used for the wrong kind of content. Contrast this to a service like SoundCloud which is entirely devoted to the interacting with music.

The purpose of this project is somewhat similar. It’s to experiment with creating an interface for video lectures that goes above, in terms of interactivity and in terms of usability (perhaps even accessibility), what a simple video can do.

So yeah, that’s the concept that I came up with a year ago. I’m pretty sure it sounds like a pretty nice premise, but really at this point the old adage of “execution is everything” starts to come into play. How exactly is this going to be better than video?

One thing that’s constantly annoyed me about anything video-related is the little progress slider tracker thing. Even for a short video, I always end up on the wrong spot. YouTube has the little coverflow-esque window which gives little snapshots to help, and Apple has their drag down to do precision adjustment, but in the end the experience is far from optimal. This is especially unsuitable because moreso in lectures than perhaps in any other type of content, you really want to be able to step back and go over some little thing again. Having to risk cognitive derailment just to go over something you don’t quite get can’t possibly be good (actually, for long videos in general, it would be a good idea to snap the slider to the nearest camera/scene change which wouldn’t be hard to find with basic computer vision, since that’s in general, where I want to go). But for this specific application, the canvas itself makes perhaps the greatest navigatory tool of all. The format is just a perpetually amended canvas with redactions rare, and the most sensible way to navigate is by literally clicking on the region that needs explanation.

But having a linear representation of time is useful for pacing, and to keep track of things when there isn’t always a clear relationship between the position of the pen and time. A more useful system would have to be something more than just a solid gradient bar crawling across the bottom edge of the screen, because it would also convey where in the content the current step belongs. This is something analogous to the way YouTube shows a strip of snapshots when thumbing through the slider bar, but in a video-lecture setting we have the ability to automatically and intelligently build populate the strip with with specific and useful information.

From this foundation we can imagine looking at the entire lecture in it’s final end state, except with the handwriting grayed out. The user can simply circle or brush over the regions which might seem less trivial, and the interface could automatically stitch together a customized lecture at just the right pacing, playing back the work correlated with audio annotations. On top of that, the user can interact with the lecture by contributing his or her own annotations or commentary, so that learning isn’t confined to the course syllabus.

Now, this project, or at least its goals evolved from an idea to vectorize Khan Academy. None of these truly requires a vector input source, in fact many of the ideas would be more useful implemented with raster processing and filters, by virtue of having some possibility of broader application. I think it may actually be easier to do it with the raster method, but I think, if this is possible at all, it’d be cooler to do it using a vector medium. Even if having a vector source was a prerequisite, it’d probably be easier to patch up a little scratchpad-esque app to record mouse coordinates and to re-create lectures rather than fiddling with SimpleCV in order to form some semblance of a faithful reproduction of the source.

I’ve had quite a bit to do in the past few months, and that’s been reflected in the kind of work I’m doing. I guess there’s a sort of prioritization of projects which is going on now, and this is one of those which has perennially sat on the top of the list, unperturbed. I’ve been busy, and that’s led to this wretched mentality to avoid anything that would take large amounts of time, and I’ve been squandering my time on small and largely trivial problems (pun not intended).

At this point, the processing is almost done, I’d say about 90%, so I don’t have much time to say anything else. I really want this to work out, but of course, it might not. Whatever happens, It’s going to be something.


Introduction to the Pedant 07 October 2012

Recently, I’ve been racking up on hobbyist electronics components from Sparkfun. Actually, this has been going on for quite a while, and most of that spending was justified by this project, which currently has the working name of the “pedant” (which is like at least a three-layered pun). I won’t say that it’s my very first foray into building some actual piece of hardware, but it’s probably the biggest and most original hardware project I’ve ever attempted.

I probably won’t be able to sell you on what it is, because it’s actually quite simple and uninteresting in principle. So instead of selling you on the cynical summary of its functionality, I’ll gild the concept with buzz words and try my very best to instill the same kind of enthusiasm I have for this project (which might just be because I haven’t done anything before with e-textiles or other electronics stuff).

The pedant is my foray into augmented reality, hopefully that means that it’s, at least some ways, original. It’s cheap, though actually in retrospect, not nearly as cheap as it should have been. And probably the most interesting aspect is that it skips through the whole perceived evolution of augmented reality from some bulky extremity into something sleek and unobtrusive. That’s not technically untrue, because the actual device will be fairly bulky, but it would exist in an already considerably bulky device (a shoe), so the net effect is that it’s sleek and unobtrusive.

I can’t say I was into that whole augmented reality thing before it was hip and cool. I only got interested in it fairly recently, likely due to somewhat high profile forays by Google and others. In mid-to-late 2008, I had just gotten my iPhone and I was deeply attached to it. At one point, I was on a vacation and at one point there was some arbitrary fact which came into question, at which point I pulled out my glorious first generation iPhone with its pristine anodized aluminium backing and loaded an app which searched an offline copy of all the textual content in the English Wikipedia (a concept which I had become so attached with that I ended up making Offline Wiki for the same reasons). And as the question was settled, the new subject of conversation was how incredible it is to keep all the world’s knowledge in a palm sized device.

But that’s not just an anecdote about the marvels of technology, it’s also a sad tale about how distracting it was. Somehow having access to that information allowed whatever pedantic instincts to prevail, shifting the conversation from a meaningful discussion into an artless digital query. And even forgiving that fact, it was slow and distracting, destroying the asynchronous exchange of ideas by creating this handheld bottleneck. Yes, we got an answer, but at what cost?

And I think that is a beautiful way to frame the argument for augmented reality. That whatever reality we have now is already being corrupted by the influence of the virtual world, and that only by willfully acknowledging that they both share the same space, can we start in the right direction of fixing it. That’s the direction Google’s Project Glass is headed, and I think that’s the right way.

The approach taken by the SixthSense project and by Google Glass mainly interacts with the user in a visual manner. And for the latter, there isn’t any really “good” and unobtrusive way to interact with that information. Both of the projects have extremely high output bandwidth (conveying information by projecting it into the user’s eye in one way or another), but limited input bandwidth and still fairly non-discreet (waving hands around to form shapes and sliding a bar on the frame, respectively). The Pedant takes a different approach by focusing on tactile input and output. This places the project more in the league of people who implant magnets under their skin by hijacking the sense of touch to convey information about the surroundings.

It’s going to be a tiny device which fits within the dimensions of a shoe insole including an Arduino Pro, a Bluetooth Mate, an Accelerometer, 2000mAh LiPo battery, and three or more vibration motors. By tapping the foot (or by orienting it in slightly different ways) the user can input data in a manner similar to the telegraph. However, nothing necessarily restricts it to being sent through a single “stream”, so it could end up more like a chorded telegraph (a la chorded keyboard). The great thing is that with chording, it becomes much more practical to receive information at reasonable rates.

Just like how a cell phone can vibrate to signal that the user has been left a message, the pedant would be used primarily to handle notifications, but rather than indistinguishable general sensations on the thigh, it’ll portray the type of notifications as well as the content, and the user even has the possibility to respond without changing the environment.

Without weird tactile abstract character sets, the Pedant could be interesting just as a sort of social network where users can feel the presence of other users in their general vicinity. It could monitor the footsteps of all nearby Pedant wearers and as it’s connected via a cellular data network and a smartphone GPS to trigger the specific vibration motors to evoke an awareness of how fast they’re walking and what general direction they are. In a sense, a social network of pedometers.


New Host Bitcable 30 September 2012

I haven’t been particularly raving about Hostmonster for the near two years I’ve been a customer of their’s, and it’s time for a change. In the past week, I moved to a new web host. As of this moment, I’m using Bitcable, specifically the cheapest shared plan plus another discount. I’ve had my share of gripes about the service, but speed and reliability so far have not been among them. I discovered Bitcable through a friend who knew a friend who operated a web host, well, primarily a VPS service. Over a fair amount of begging for a VPS discount, I decided to try out their (I always find it awkward when referring to companies or services as a plural in spite of knowledge that it’s pretty much a one-man-show, but that makes it all the more impressive) shared plan, since that was analogous whatever I was paying for with Hostmonster. And anyway, for $2/mo, how bad can it get?

That question was a tad misleading, so far, Bitcable’s pretty awesome. Part of the thing about using a service from a friend-of-a-friend (FOAF, if I ever need to use an acronym later on in this post, but I’ll keep it here just because it’s a fun thing to say) is that you can get some pretty good support over some random communication channels. It’s small enough that he doesn’t oversell, and the performance really shows through.

I did, however have some issues with the configuration of the server. The first issue is that by default, shared customers don’t get SFTP access. That’s pretty annoying because I’ve recently fallen in love with passwordless login using public keys. I sent an angry support ticket and it was enabled soon enough. But a much more pressing issue was that soon after my migration, there was some long server outage due to some power supply failure (which thankfully, since it took over a month to write this post which is more indicative of the bad state that my blog is in than the host, hasn’t happened again, I haven’t noticed a minute of downtime since then and I’ve set uptime monitors to ensure that).

So yeah, I’ll be on this host for the foreseeable future.


Whammy: A Real Time Javascript WebM Encoder 19 August 2012

This is sort of a conceptual reversal (or not, this might just be making the description needlessly confusing) of one of my older projects,Weppy. First, what Weppy did was it added support for WebP in browsers which didn’t support it by converting it into a single-frame video. This is instead predicated on the assumption that the browser already has support for WebP (at this point, it means it only works on Chrome since it’s the only browser which actually supports WebP), not only decoding WebP but encoding it as well.

The cool thing about WebP which was exploited in Weppy is that it’s actually based on the same codec as WebM, On2’s VP8. That means the actual image data, when the container formats are ignored, are virtually interchangable. With a catch: it’s intraframe only.

So it’s a video encoder in that it generates .webm files which should play in just about any program or device which supports the WebM format. But interframe compression is actually a fairly important thing which could reduce the file size by an order of magnitude or more.

But, there isn’t too much you can do on the client side in the ways of encoding stuff. And whatever you do, you basically can’t do interframe compression (aside from some really rudimentary delta encoding). More or less, when your only alternative is to maintain an array of DataURL encoded frames or encoding it (rather slowly) as a GIF, a fast but inefficient WebM encoder stops looking too bad.

This was actually Kevin Geng‘s idea, and he contributed some code too, but in the end most of the code was just leftovers from Weppy.

Demo

http://antimatter15.github.com/whammy/clock.html

Basic Usage

First, let’s include the JS file. It’s self contained and basically namespaced, which is pretty good I guess. And it’s not too big, minified it’s only about 4KB and gzipped, it’s under 2KB. That’s like really really tiny.

<script src="whammy.js"></script>

The API isn’t terrible either (at least, that’s what I’d like to hope)

var encoder = new Whammy.Video(15); 

That 15 over there is the frame rate. There’s a way to set the individual duration of each frame manually, but you can look in the code for that.

encoder.add(context or canvas or dataURL); 

Here, you can add a frame, this happens fairly quickly because basically all it’s doing is running .toDataURL() on the canvas (which isn’t exactly a speed-demon either, but it’s acceptable enough most of the time) and plopping the result onto an array (no computation or anything). The actual encoding only happens when you call .compile()

var output = encoder.compile(); 

Here, output is set to a Blob. In order to get a nice URL which you can use to stick in a &lt;video&gt; element, you need to send it over tocreateObjectURL

var url = (window.webkitURL || window.URL).createObjectURL(output); 

And you’re done. Awesome.

Documentation

Weppy.fromImageArray(image[], fps) this is a simple function that takes a list of DataURL encoded frames and returns a WebM video. Note that the images have to all be encoded with WebP.

new Weppy.Video(optional fps, optional quality) this is the constructor for the main API. quality only applies if you’re sending in contexts or canvas objects and doesn’t matter if you’re sending in encoded stuff

.add(canvas or context or dataURL, optional duration) if fps isn’t specified in the constructor, you can stick a duration (in milliseconds) here.

Todo

This pretty much works as well as it possibly could at this point. Maybe one day it should support WebWorkers or something, but unlike the GIF Encoder, it doesn’t actually require much real computation. So doing that probably wouldn’t net any performance benefits, especially since it can stitch together a 120-frame animation in like 20 milliseconds already.

But one of the sad things about it is that now it uses Blobs instead of strings, which is great and all except that blobs are actually slower than strings because it still has to do the DataURL conversion from string to Blob. That’s pretty lame. Firefox supports the canvas toBlob thing, but for some reason Chrome doesn’t, but eventually it probably might, and that might be useful to add.

Also, if someone ever makes a Javascript Vorbis encoder, it would be nice to integrate that in, since this currently only does the video part, but audio’s also a pretty big part.


Upcoming Changes 18 August 2012

This post has been hinted at by the past few blog posts, but I guess eventually it has to be written. But the basic gist is that rather than making this the home of random announcements of mostly finished projects, it’ll be the home of mostly daily (or weekly, whenever significant progress is made) and probably shorter updates on the progress certain projects. That is, the blog is transitioning back into something more like the olden days (circa 2008-ish) but without falling into the trap of using this as an alternative to having commit messages and still supporting the fact I’m now working on quite a bit more than one project at a time.

The problem is that I can’t exactly stay true to that because I actually have quite a bit of backlog in terms of stuff I have to write about, stuff which is for the most part done (so it’s not particularly viable for me to make up progress updates retroactively, and I’ll probably have to stick with writing a big blog post about it).

This should be the culmination of tons of factors and trends building up for the past year or so. I’ve always felt that the blog needed to be overhauled eventually (or end up rotting as nothing more than a backup kept in the eternal resting spot which is the Internet Archive, leaching fluids into the soil as bacterium leave the corpse punctured by holes and missing vital organs, a sure sign that I’m probably going too far into this metaphor, but in the end that’s the way many of the forums I used to visit have become). But the real spark came in the form of a migration to a new web host, something which I still alas have yet to blog about despite it happening over a week ago.

Those changes are hardly precipitous (however much anyone wants to unveil something in one flash of an instant in order to feign the appearance that everything happened suddenly and approached new heights of grandeur, that never actually happens, and it’s simply harder to work in that sort of manner - slow and steady doesn’t always lose the race). The first part was the change of the web host itself which was actually not exactly planned (I was testing out it, and unexpectedly on a whim cancelled my old web host and migrated over over the course of an afternoon and left the site down for a few hours). The second front at which this evolution occurred was a slight redesign, changing the color scheme a bit, upgrading the theme, reorganizing the categories and menus (this is meant to be chronicled in detail in some other blog post which I have yet to written). And the third and last one (which was meant to be the topic of this blog post) is a change in content.

In summary, three inevitable changes on three different fronts. Content, Frontend, and Backend. All in a not-so-grand gesture to save this blog from decaying into a moldy blob of feces on the internet’s great sidewalk.