somewhere to talk about random ideas and projects like everyone else

stuff

#doge

Methods and Functions 16 December 2015

I watched a recording of Rob Pike’s talk Simplicity is Complicated a few weeks ago, and I was struck by his sentiment that many popular languages seem to be converging by adding popular features from other languages.

In particular, it seems that many popular languages combine aspects of object-oriented and functional programming— support for classes, high-order functions, and reduction. Sometimes this leads to situations where the distinction between methods and functions are confusing or inconsistent.

In Javascript, you find the length of an array or string using a getter method "blah".length but in Python[^2], it’s done through a function invocation len("blah").

This dichotomy is more evident when operations get chained in sequence to process some stream of data. Here’s an example in Java 8 of the object-oriented approach where methods are chained[^1]:

double average = roster.stream()
    .filter(p -> p.getGender() == Person.Sex.MALE)
    .mapToInt(Person::getAge)
    .average()

An alternative way to write the code is by thinking about it as function composition— as you can do in Haskell:

(sum (map sqrt (filter (>3) [1,2,3,4,5,6,7])))

An interesting exercise is to pay attention how your eyes move as you scan through snippets of code to figure out what it does.

To do that, let’s abstract away the slightly different syntax for these different languages and imagine a hypothetical language where x.y.z is syntactic sugar for z(y(x)). That is, when a function name is suffixed after a . behind some expression, it is equivalent to wrapping the entire expression and using it as the first argument.

More concretely, 32.double is equivalent to double(32), and 49.divide(7) is equivalent to divide(49, 7).

For a more complex example:

range(0, 20).map(x: sqrt(x)).filter(x: x > 3).sort.sum

sum(sort(filter(map(range(0, 20), x: sqrt(x)), x: x > 3)))

With the first code snippet, the process can be read left-to-right, and you can imagine it as the story of a little chunk of data flowing through the system.

With the second code snippet, you instead read the function from the inside-out. To a certain extent, it’s right-to-left, but you have to check the other side to make sure you aren’t forgetting any extra arguments.

Let’s try to visualize this:

sum(sort(filter(map(range(0, 20), x: sqrt(x)), x: x > 3)))
                    ------------
                ----            ------------
         -------                             ----------
    ----                                               -
---                                                      -

You can see here that interpreting the behavior with the functional style[^3] begins in the middle and approaches the beginning right-to-left. But when you reach some new function name, you have to zig-zag to the opposite side to visually inspect whether or not the method has any additional arguments.

range(0, 20).map(x: sqrt(x)).filter(x: x > 3).sort.sum
------------
            ----------------
                            -----------------
                                             -----
                                                  ----

The method chaining approach (read: OOP style) is nice because it fits the conceptual model of data flowing through a sequence of transformations, and it doesn’t disrupt the typical western left-to-right reading order.

To me, this seems much easier both to read and to write (for I am not so blessed as to use Paredit and must carefully tread a vast sea of syntax to manually close my parens) and I’ve always been mildly infuriated trying to learn Haskell[^4] because function composition feels much less natural[^5].

One of the stated reasons for why Python uses len is that you can’t glance at the type of the expression by looking at its head. When you’re scanning through a line and you see that the outermost function is “len”, you can immediately tell that the output is a number. When length is determined in postfix, you can’t determine the final type of an expression until you look at the end of the expression.

Another problem is that this linearized sequential syntax only helps when the the structure of an expression is relatively linear. Imagine instead that you have an expression like

sum(range(floor(sqrt(multiply(3, 4))), divide(square(5), 2)))
                     --------------           
                -----              -          ---------
          ------                    -  -------         ----
    ------                           --                    -
----                                                        -

In this case, both multiply(3, 4) and square(5) are of relatively similar depth which gives it the humps making it look like a snake eating an elephant.

Flattening it with the . syntax then forces range into a little invisible spot in the middle and that makes it a lot harder to interpret the behavior of the program.

3.multiply(4).sqrt.floor.range(5.square.divide(2)).sum

In fact this dot stuff is beginning to look a lot like the postfix notation in stack-based programming languages like FORTH) (read: similarly unreadable).

That actually leads to the nifty realization that it’s kind of like tail call optimization for the human mind.

When reading some code composed of nested function application, you end up having to keep track of how many level within the function you are. There’s an idea in psychology that human working memory can contain up to 7 ± 2 items. So reading some deeply nested set of expressions is naturally difficult.

But if the expression tree naturally represents a sequence of actions transforming a piece of data— you can reorganize it in such a way that depth of your mental stack doesn’t keep growing and objects can be substituted in place.


Passing objects as the first argument isn’t really all that strange. The way classes are implemented in python, methods are just functions with an implicit self argument.

class Dog:
    def talk(self, text):
        print "woof!", text, "bark!"

shibe = Dog()
shibe.talk("wow! such speak!")

So with the code above, shibe.talk("wow! such speak!") is really just shorthand for Dog.talk(shibe, "wow! such speak!").

Neither is it particularly strange to access method properties through function calls. This seems to be quite typical for accessing fields of typeclasses in Haskell.

data Car = Car {
    company :: String, 
    model :: String, 
    year :: Int
}
let WusTangClan = Car {company="Ford", model="Wustang", year=1967}
model(WusTangClan) -- "Wustang"

[^3]: Note that while I’m characterizing this as the functional style, many functional languages such as F# have a “pipe-forward” operator “|>” which has a similar effect. Haskell seems to have a convention where the data is passed as the last argument rather than the first argument, which makes implementing this particular strategy a bit clumsy.

[^2]: In the Python FAQ Guido describes the rationale behind the “len” function as opposed to using a method on strings and lists. Interestingly more recent revisions of the FAQ make it sound as if it were some mistake rendered unfixable by history.

[^1]: Something analogous to this sort of chained-method style is also used in popular frameworks like d3 and jQuery.

[^4]: Well, there’s also that incident where I tried installing some Hello World example with Cabal and ended up, through a far more infuriating series of events, rendering my Macbook unbootable.

[^5]: In similar vein, I’d like there to be a “@\” in Mathemath so that I can do data \@ transformation 1 rather than the other way around with the existing “/@” operator


Brass Doge 10 April 2015

“We are stuck with technology when what we really want is just stuff that works.” — Douglas Adams

At MIT we get a class ring called the “brass rat”. It’s rather amusing that the ring is neither made of brass, nor features a rat. It’s actually Gold, and features our mascot— the beaver.



HeapQueue.JS 28 January 2014

heap

I have a habit of writing blog posts in Sublime Text in lieu of legitimate READMEs. This project was simple and short enough that I think I did manage to write something with a passable semblance to a functional README before initiating this quest to craft a blog post.

This is hardly a project that I’d expect to go down in the annals of my blog as particularly interesting or notable. Instead, it’s a rather boring low-level computer sciencey thingy who would feel perfectly comfortable amidst homework assignments.

In fact, it’s virtually assured that documenting this endeavor– the combination of writing the words which comprise this post and the documentation already written on how to use this rather simple library– will inevitably consume more of my time than actual coding (people who are more familiar with highly intricate professional software tapestries may claim that such is merely the ideal and typical experience of a programmer).

Notice that I’ve managed to spend three paragraphs (well, my paragraphs are on the order of two sentences, so it doesn’t even qualify for the fourth-grade definition for what a paragraph is) talking about the uninteresting character of the project, its relative diminutive size, and the process of writing about the subject, without actually describing what exactly the project is. This, more than perhaps any other project of mine deserves such a diversion, for it is unlikely to yield “much amaze” or even a modest “such wow”.

It is a binary heap priority queue, a teensy bit shy of 50 lines, a mere kilobyte minified. It’s very loosely based off of Adam Hooper’s js-priority-queue but considerably smaller.

Now that I’ve gotten the actual informative and useful part of the blog post off my shoulders, I can take you along for the unproductive plunge into the rather lame backstory for the project. I was working on a port of the Telea inpainting algorithm and needed a priority queue, because the little incremental insertion sort that I hacked together in two lines was taking a few orders of magnitude longer than was acceptable.

With my considerable Google-Fu, I searched “javascript heap queue” and found js-priority-queue as well as PriorityQueue.js. They both happened to be implemented in CoffeeScript, which I love, except not really. I like the syntax, but creating a project using it usually involves setting up some compilation pipeline in order to get it to do the things that I want it to do. Overall, it’s a bit too much of a hassle for little projects.

I couldn’t actually find the source code for PriorityQueue.js so I settled for js-priority-queue. It was a bit annoying in that it required require.js to function and included several storage strategies that were totally useless. So I copied and pasted a few files together and stripped it of its AMD dependencies and created a somewhat dieted version.

But as 9th grade World History teaches us, appeasement never works, and this purely pragmatic minification leads inexorably to an irrational code golfing. At that point, I was overcome by this undeniable urge to cleanse it further, perhaps blinded by not-invented-here syndrome and software anorexia. Minutes later the file had been laid waste and in its place there existed its bare skeletal remains.

Now all I need to do is push it to github and write a blog post…