#mapreduce

Methods and Functions 16 December 2015

I watched a recording of Rob Pike’s talk Simplicity is Complicated a few weeks ago, and I was struck by his sentiment that many popular languages seem to be converging by adding popular features from other languages.

In particular, it seems that many popular languages combine aspects of object-oriented and functional programming— support for classes, high-order functions, and reduction. Sometimes this leads to situations where the distinction between methods and functions are confusing or inconsistent.

In Javascript, you find the length of an array or string using a getter method "blah".length but in Python[^2], it’s done through a function invocation len("blah").

This dichotomy is more evident when operations get chained in sequence to process some stream of data. Here’s an example in Java 8 of the object-oriented approach where methods are chained[^1]:

double average = roster.stream()
    .filter(p -> p.getGender() == Person.Sex.MALE)
    .mapToInt(Person::getAge)
    .average()

An alternative way to write the code is by thinking about it as function composition— as you can do in Haskell:

(sum (map sqrt (filter (>3) [1,2,3,4,5,6,7])))

An interesting exercise is to pay attention how your eyes move as you scan through snippets of code to figure out what it does.

To do that, let’s abstract away the slightly different syntax for these different languages and imagine a hypothetical language where x.y.z is syntactic sugar for z(y(x)). That is, when a function name is suffixed after a . behind some expression, it is equivalent to wrapping the entire expression and using it as the first argument.

More concretely, 32.double is equivalent to double(32), and 49.divide(7) is equivalent to divide(49, 7).

For a more complex example:

range(0, 20).map(x: sqrt(x)).filter(x: x > 3).sort.sum

sum(sort(filter(map(range(0, 20), x: sqrt(x)), x: x > 3)))

With the first code snippet, the process can be read left-to-right, and you can imagine it as the story of a little chunk of data flowing through the system.

With the second code snippet, you instead read the function from the inside-out. To a certain extent, it’s right-to-left, but you have to check the other side to make sure you aren’t forgetting any extra arguments.

Let’s try to visualize this:

sum(sort(filter(map(range(0, 20), x: sqrt(x)), x: x > 3)))
                    ------------
                ----            ------------
         -------                             ----------
    ----                                               -
---                                                      -

You can see here that interpreting the behavior with the functional style[^3] begins in the middle and approaches the beginning right-to-left. But when you reach some new function name, you have to zig-zag to the opposite side to visually inspect whether or not the method has any additional arguments.

range(0, 20).map(x: sqrt(x)).filter(x: x > 3).sort.sum
------------
            ----------------
                            -----------------
                                             -----
                                                  ----

The method chaining approach (read: OOP style) is nice because it fits the conceptual model of data flowing through a sequence of transformations, and it doesn’t disrupt the typical western left-to-right reading order.

To me, this seems much easier both to read and to write (for I am not so blessed as to use Paredit and must carefully tread a vast sea of syntax to manually close my parens) and I’ve always been mildly infuriated trying to learn Haskell[^4] because function composition feels much less natural[^5].

One of the stated reasons for why Python uses len is that you can’t glance at the type of the expression by looking at its head. When you’re scanning through a line and you see that the outermost function is “len”, you can immediately tell that the output is a number. When length is determined in postfix, you can’t determine the final type of an expression until you look at the end of the expression.

Another problem is that this linearized sequential syntax only helps when the the structure of an expression is relatively linear. Imagine instead that you have an expression like

sum(range(floor(sqrt(multiply(3, 4))), divide(square(5), 2)))
                     --------------           
                -----              -          ---------
          ------                    -  -------         ----
    ------                           --                    -
----                                                        -

In this case, both multiply(3, 4) and square(5) are of relatively similar depth which gives it the humps making it look like a snake eating an elephant.

Flattening it with the . syntax then forces range into a little invisible spot in the middle and that makes it a lot harder to interpret the behavior of the program.

3.multiply(4).sqrt.floor.range(5.square.divide(2)).sum

In fact this dot stuff is beginning to look a lot like the postfix notation in stack-based programming languages like FORTH) (read: similarly unreadable).

That actually leads to the nifty realization that it’s kind of like tail call optimization for the human mind.

When reading some code composed of nested function application, you end up having to keep track of how many level within the function you are. There’s an idea in psychology that human working memory can contain up to 7 ± 2 items. So reading some deeply nested set of expressions is naturally difficult.

But if the expression tree naturally represents a sequence of actions transforming a piece of data— you can reorganize it in such a way that depth of your mental stack doesn’t keep growing and objects can be substituted in place.

Passing objects as the first argument isn’t really all that strange. The way classes are implemented in python, methods are just functions with an implicit self argument.

class Dog:
    def talk(self, text):
        print "woof!", text, "bark!"

shibe = Dog()
shibe.talk("wow! such speak!")

So with the code above, shibe.talk("wow! such speak!") is really just shorthand for Dog.talk(shibe, "wow! such speak!").

Neither is it particularly strange to access method properties through function calls. This seems to be quite typical for accessing fields of typeclasses in Haskell.

data Car = Car {
    company :: String, 
    model :: String, 
    year :: Int
}
let WusTangClan = Car {company="Ford", model="Wustang", year=1967}
model(WusTangClan) -- "Wustang"

[^3]: Note that while I’m characterizing this as the functional style, many functional languages such as F# have a “pipe-forward” operator “|>” which has a similar effect. Haskell seems to have a convention where the data is passed as the last argument rather than the first argument, which makes implementing this particular strategy a bit clumsy.

[^2]: In the Python FAQ Guido describes the rationale behind the “len” function as opposed to using a method on strings and lists. Interestingly more recent revisions of the FAQ make it sound as if it were some mistake rendered unfixable by history.

[^1]: Something analogous to this sort of chained-method style is also used in popular frameworks like d3 and jQuery.

[^4]: Well, there’s also that incident where I tried installing some Hello World example with Cabal and ended up, through a far more infuriating series of events, rendering my Macbook unbootable.

[^5]: In similar vein, I’d like there to be a “@\” in Mathemath so that I can do data \@ transformation 1 rather than the other way around with the existing “/@” operator

Multicore Javascript Pi Computing with WebWorkers 14 July 2010

Selection_012

http://antimatter15.github.com/pi/partial.html It’s not using the fastest algorithm, but the nice thing about this one is that it’s capable of digit-extraction (calculating one section without knowing the digits before it) and was nice for doing distributed computing. This also has the nice side effect of working pretty well with the MapReduce paradigm (To calculate a block, calculate primes 3 to 2*N, map it all to the magical pi algorithm and then add it all up and truncate the fractional part, however it doesn’t really use MapReduce because there aren’t enough machines/threads to make it really necessary to distribute out the reducing part). So on the time-memory tradeoff scale, this algorithm uses low memory and is slower, which makes it pretty good for the purposes of something implemented with WebWorkers as I can’t imagine it would be good to have the same data multiplied by the number of threads and having lots of data being passed with each postMessage.

stuff by antimatter15

Archive

Tags

#mapreduce

Methods and Functions 16 December 2015

Multicore Javascript Pi Computing with WebWorkers 14 July 2010