somewhere to talk about random ideas and projects like everyone else

stuff

#image

Cooley-Tukey FFT + DCT + IDCT in under 1K of Javascript 30 May 2015

Technically this is a Hough transform and isn't at all related to the FFT*, but it looks a lot cooler than any of the actual FFT/DCT pictures I have, so I might as well stick it here

Almost a year ago, I wrote a simple script for computing perceptual hashes in Javascript (it’s going to be part of a forthcoming Naptha feature which allows it recognize source images for memes and do perfect inpainting). One step of the process is computing the 2D Discrete Cosine Transform of a shrunken version of the image.

The problem is that step was kind of ridiculously slow, and it’s not really hard to see why: there’s a series of four nested for loops— because that’s the obvious naive way for computing the DCT.

In practice, it’s actually not all that bad because you only have to do it once per image, and the dimensions of the source image are a fixed 32x32. But it still felt pretty bad, so when I was looking over the code again earlier this week, I tried doing some research to figure out a faster way to do the DCT.

Surely you’d imagine that there have to be extraordinarily efficient implements of the DCT out there, given that it’s the very foundation of JPEG out there. And you’d be right, but they’ve generally unrolled the butterfly into a series of bit-shifts which are quite unlikely to generalize beyond the 8x8 block size.

I’m reasonably confident that FFTW3 has a fast DCT implementation which also is capable of generalization, but it seems buried within quite a lot of code.

MiniFFT

This isn’t a super performant implementation, but it’s not horrendously slow and it doesn’t use recursion. It’s probably pretty trivial to make it quite a bit faster by precomputing sine and cosine tables. This uses the standard Cooley-Tukey radix-2 decimation-in-time algorithm, which means it only works for one dimensional arrays with a length which is a power of two.

It doesn’t check that the length is a power of two, it’ll just give you the wrong answer. You should definitely check that it’s true beforehand.

function miniFFT(re, im) {
    var N = re.length;
    for (var i = 0; i < N; i++) {
        for(var j = 0, h = i, k = N; k >>= 1; h >>= 1)
            j = (j << 1) | (h & 1);
        if (j > i) {
            re[j] = [re[i], re[i] = re[j]][0]
            im[j] = [im[i], im[i] = im[j]][0]
        }
    }
    for(var hN = 1; hN * 2 <= N; hN *= 2)
        for (var i = 0; i < N; i += hN * 2)
            for (var j = i; j < i + hN; j++) {
                var cos = Math.cos(Math.PI * (j - i) / hN),
                    sin = Math.sin(Math.PI * (j - i) / hN)
                var tre =  re[j+hN] * cos + im[j+hN] * sin,
                    tim = -re[j+hN] * sin + im[j+hN] * cos;
                re[j + hN] = re[j] - tre; im[j + hN] = im[j] - tim;
                re[j] += tre; im[j] += tim;
            }
}

The transformation happens in place, so it’ll modify whatever arrays you pass in.

var re = [1, 2, 3, 4],
    im = [0, 0, 0, 0];
miniFFT(re, im);
console.log(re, im); // [ 10, -2, -2, -2 ] [ 0, 2, 0, -2 ]

Minified, it’s only 359 bytes. Unlike this other contender for the dubious prize of smallest javascript FFT implementation, it doesn’t require a special complex number library three times its size (though this doesn’t fit in a tweet— I’ll be really impressed if someone manages to do that).

I haven’t really golfed the code to the smallest size possible, but I think it’s at a reasonable balance of brevity while still being somewhat comprehensible.

It’s based on an in-place FFT from Princeton’s intro CS class Project Nayuki’s FFT.

MiniIFFT

Turns out that you can invert the FFT modulo a scaling factor just by swapping the real and imaginary arguments.

function miniIFFT(re, im){
    miniFFT(im, re);
    for(var i = 0, N = re.length; i < N; i++){
        im[i] /= N;
        re[i] /= N;
    }
}

MiniDCT

This is an implementation of a Type-II DCT using MiniFFT. For details about Makhoul’s algorithm for transforming a DCT into an FFT, see this post

function miniDCT(s){
    var N = s.length,
        K = -Math.PI / (2 * N),
        re = new Float64Array(N),
        im = new Float64Array(N);
    for(var i = 0, j = N; j > i; i++){
        re[i] = s[i * 2]
        re[--j] = s[i * 2 + 1]
    }
    miniFFT(re, im)
    for(var i = 0; i < N; i++)
        s[i] = 2*re[i]*Math.cos(K*i)-2*im[i]*Math.sin(K*i);
}

Like MiniFFT, it operates in-place (well, not really, but it writes its output to the source array).

var arr = [3, 4, 1, 7];
miniDCT(arr);
console.log(arr); // [ 30, -5.09493566, 7.0710678118, -8.604744653 ]

It produces the same output as scipy’s fftpack, but Matlab’s DCT uses orthogonal normalization and produces a different result. However, it’s pretty simple to do it the Matlab way— just scale everything by 1/sqrt(2 * N) and divide the first element by sqrt(2).

function matlabDCT(arr){
    miniDCT(arr)
    arr[0] /= Math.sqrt(2);   
    for(var i = 0, N = arr.length; i < N; i++) arr[i] /= Math.sqrt(2 * N);
}

MiniIDCT

I really considered renaming this series of functions into TinyFFT and friends because of the predicament of MiniIDCT— yeah, exactly— the horror— repeating I’s.

This is an implementation of a Type-III DCT, also known as the IDCT because it happens to be the inverse of a Type II DCT (frequently refered to as “The DCT”).

function miniIDCT(s){
    var N = s.length,
        K = Math.PI / (2 * N),
        im = new Float64Array(N),
        re = new Float64Array(N);
    re[0] = s[0] / N / 2;
    for(var i = 1; i < N; i++){
        var im2 = Math.sin(i*K), re2 = Math.cos(i*K);
        re[i] = (s[N - i] * im2 + s[i] * re2) / N / 2;
        im[i] = (im2 * s[i] - s[N - i] * re2) / N / 2;
    }
    miniFFT(im, re)
    for(var i = 0; i < N / 2; i++){
        s[2 * i] = re[i]
        s[2 * i + 1] = re[N - i - 1]
    }
}

This is based on Tom Grydeland’s IDL implementation of the IDCT.

Let’s start with a 8 element array of real numbers (DCT values).

[A, B, C, D, E, F, G, H]

From that it generates the corresponding complex values. Note that the real part is intact, and the complex part is the original array with its head chopped off and the rest of it flipped around and attached to a water balloon.

[A, B, C, D, E, F, G, H] - j[0, H, G, F, E, D, C, B]

Then the complex values are rotated by exp(i * pi / (2 * N)) to get spectral components. After an IFFT that gets transformed into:

[a, h, c, f, e, d, g, b]

It took me a while to see what was going on there, but the even indices are map to the first half of the results, and the odd ones map to the second half in reverse

Scrambled      a h c f e d g b
Even Indexed   a   c   e   g
Odd Indexed      h   f   d   b
Odd Reversed     b   d   f   h
Unscrabmled    a b c d e f g h

February Progress Report 28 February 2014

Work expands so as to fill the time available for its completion.

— Parkinson’s Law

For the past four or so months, I’ve been working on just one major project. It’s rather depressing to think that I built a reasonably impressive initial prototype over the course of about a dozen sleep-deprived hours, and that all I’ve accomplished since then is minor polish. Technically, there have been at least two rewrites, completely new capabilities and substantially improved technology, but none of that really matters when it comes to describing the project. A project is what it is, at 80% completion, at 95% and 99%.

Second semester at school has started, and that means that Pass/No Record isn’t a thing anymore, and everyone else is adjusting their behavior appropriately. Problem is, I haven’t changed. It turns out that an entire semester of racing toward the bottom is unsustainable when suddenly the floor has been adjusted to above your current position. But the more important point is that it’s leaving less time, and in particular, less contiguous time to work on anything.

Last month, I was working on a port of the Telea inpainting algorithm. Inpainting refers to any kind of image processing operation which attempts to intelligently fill in certain regions of an image with other content. Perhaps the most famous implementation of this is Photoshop’s Content Aware Fill feature, which uses a texture-synthesis and patch-based system, which enables them to copy over things like patterns, and textures, filling in vast contiguous regions of missing content. The problem is patch-based inpainting is almost always quite slow, in spite of its high quality results. There are simpler algorithms based on the Fast Marching Method like Telea or Navier-Stokes which use diffusion models and the ilk in order to propagate the solid colors of the bordering region of some inpainting mask inwards. I’ll write an actual blog post about this once I package it up and build a nifty demo for it.

Last year, Ben Kamens of Khan Academy posted about reverse engineering the flyout menu in Amazon, which suffered from the rather odd behavior of acting exactly how it should. You totally should be able to navigate to a submenu without worrying about accidentally mousing over something else. I looked around for a context menu library which could actually do this, but for some reason I couldn’t find one, so I decided to make my own.


August Progress Report 31 August 2013

It must be infuriating to be in a situation where there’s a clear problem, and all the obvious remedies continually fail to produce any legitimate result. That’s what this blog is like: every month rolls by with some half baked ideas partially implemented and barely documented. And in the last day of the month, there’s a kind of panicked scramble to fulfill an entirely arbitrary self-enforced quota just to convince myself that I’m doing stuff.

Anyway, it’s pretty rare for me to be doing absolutely nothing, but mustering the effort to actually complete a project to an appreciable extent is pretty hard. This might be in some way indicative of a kind of shift in the type of projects that I try to work on- they’re generally somewhat larger in scope or otherwise more experimental. And while I may have been notorious before for not leaving projects at a well documented and completed state, these new ideas often languish much earlier in the development process.

There’s always pressure to present things only when they are complete and presentable, because after all timing is key and nothing can be more detrimental to an idea than ill-timed and poor execution. But at the same time, I think the point of this blog is to create a kind of virtual paper trail of an idea and how it evolves, regardless of whether or not it falters in its birth.

With that said, I’m going to create a somewhat brief list of projects and ideas that I’m currently experimenting with or simply have yet to publish a blog post for.

  • One of the earlier entries of the backlog is a HTML5 scramble with friends clone, acting highly performant on mobile with touch events and CSS animations while supporting keyboard based interaction on desktop. I’ve always been intending to build some kind of multiplayer real time server component so that people could compete in real time. Maybe at some point it’ll be included as a kind of mini game within protobowl.
  • The largest project is definitely Protobowl, which has just recently passed its one year anniversary of sorts. It’s rather odd that it hasn’t formally been given a post on this blog yet, but c’est la vie. Protobowl is hopefully on the verge of a rather large update which will add several oft-requested features, and maybe by then I’ll have completed something which is publishable as a blog post.
  • Font Interpolation/Typeface Gradients. I actually have no idea how long this has been on my todo list (years, no doubt), but the concept is actually rather nifty. With attributes like object size or color, things can be smoothly interpolated in the form of something like a gradient. The analogue for this kind of transition when applied to text would be the ability to type a word whose first letter is in one font, and the last letter being another font, with all the intermediate letters some kind of hybrid. I never did get quite far in successfully implementing it, so it may be a while until this sees the light of day.
  • I’ve always wanted to build a chrome extension with some amount of OCR or text detection capabilities so that people could select text which was embedded within an image as if it weren’t just an image. At one point I narrowed down the scope of this project so that the OCR part wasn’t even that important (the goal was then just some web worker threaded implementation of the stroke width transform algorithm and cleverly drawing some rotated boxes along with mouse movements). I haven’t had too much time to work on this so it hasn’t gone too far, but I do have a somewhat working prototype somewhere. This one too is several years old.
  • In the next few days, I plan on publishing a blog post I’ve been working on which is something of a humorous satire on some of the more controversial issues which have arisen this summer.
  • And there are several projects which have actually gotten blog posts which weren’t themselves formal announcements so much as a written version of my thinking process behind them. I haven’t actually finished the Pedant, in spite of the fact that the hardware is in theoretically something of a functional state (I remember that I built it with the intent that it could be cool to wear around during MIT’s CPW back in April, but classes start in a few days and it hasn’t progressed at all since). Probably one of the most promising ideas is the kind of improved, vectorized and modern approach to video lectures- but that too needs work.
  • I’m building the website for a local charity organization which was started by a childhood friend, and maybe I’ll publish a blog post if that ever gets deployed or something.

Controlling A Native App from a Javascript Desktop 29 August 2008

Well, I worked on a sort of VNC-like solution to controlling native desktop applications from a remote PC. It’s an interesting concept, I settled with something less than easy to use, and less than really feasable. It’s a Proof of concept, and it’s likely I won’t work on it again (Like ForkSwif).

It is an application (module) on a hacked Ext 2.0 Desktop example that uses Ajax (Polling) to query a local PHP proxy to query a remote desktop. The remote desktop is running some software (powered by .NET sadly…) that captures the window’s contents, does a diff to see if there are modifications and where (only sends updates to changed parts of the screen, sorta like VNC). It base64 encodes it and sends it along HTTP to the proxy, which sends it to the javascript client.

The client can send events to be repeated on the remote desktop, currently only left mousedown and mouseup (so, basically only clicking), but using the Keyboard should be easy enough.

I imagine that a more feasable option is to create a javascript X11 client, taking proxied connections to a X session under SSH so it is better with window-specificness, and an overall more stable platform, so you could also run multiple applications simultaneously on the desktop.

here’s an early screenshot:

Controlling A native desktop app from a javascript desktop
Controlling A native desktop app from a javascript desktop
Controlling A native desktop app from a javascript desktop (early build)

Files here:

http://antimatter15.110mb.com/misc/coolstuff.zip

Extract desktop2 to your Apache server, start Windows Calculator (or another app of your choosing). Run Screen.exe, type the app title, press “Get Handle” and check the “Run Server” box. Navigate to the desktop.html (on your PHP-enabled apache), and start the “NV Window” app.

To be able to control the app, you have to set it to your network ip (not localhost) on a VM or a different computer. and configure sample.js at line 191.

proxy: "gdat.php?url=",
updateurl : "[http://localhost:12345](http://localhost:12345/)",
baseurl : "[http://localhost:12345/base](http://localhost:12345/base)",
showimg : false,
uinterval: 1000,
updater: null,
xoffset: -8,
yoffset: -28,

change localhost to the server’s IP. And tick “Control Desktop”

Note that you need firebug.

If i can think of a name for the app, i’ll be sure to creakte a google code project for it. so if anyone want’s the source, think of a name for it.