somewhere to talk about random ideas and projects like everyone else

stuff

X-No-Wiretap 07 September 2013

Recently, the American public has been forcefully made aware of the existence of various programs by the NSA- including massive infrastructure for intercepting all domestically routed communications to better protect us from imminent foreign threats. With legions of patriotic analysts, the NSA methodically ranks communications on the basis of their “foreignness“ factor to determine candidacy for prolonged retention. Although it was developed with the best interests of the American people at heart, this program unwittingly ensnares communications of purely domestic nature on the order of tens of thousands of incidents per day. These innocent mistakes are putting the agency at a great risk because the 4th Amendment of the Constitution expressly prohibits such affronts to American privacy. Making determinations of foreignness is hard, but to prevent further inconvenience to the American way of life, we should take these leaks as an opportunity for us on the civilian front to aid the NSA by voluntarily indicating citizenship on all our networked communications.

Here, we define the syntax and semantics of X-No-Wiretap, a HTTP header-based mechanism for indicating and proving citizenship to well-intentioned man-in-the-middle parties. It is inspired by the enormously successful RFC 3514 IPv4 Security Flag and HTTP DNT header.

Syntax

Screenshot from 2013-08-22 21:41:06

The HTTP header, “X-No-Wiretap” takes the value of the current user’s given name under penalty of perjury. The full name must be immediately followed by identity verification in the form of a standard U.S. Social Security Number, formatted with a hyphen “-“ after every third and fifth digit.

Future revisions of the protocol may introduce additional forms of verification, as while the presence of an SSN should be able to lower the foreignness coefficient of the vast majority of domestic communications to well below 51%- initial research seems to indicate that the combination of full first name and SSN is able to reduce an associated message’s foreignness factor by over 76.8% for 99.997% of Americans. However, there is a chance that certain instances may additional require Passport, Driver’s License, Address, Birthdate, Mother’s Maiden Name, and Childhood Best Friend’s Name to further lower the foreignness factor. This capability will be addressed in future versions of the protocol.

What about SSL/TLS?

Of course adding encryption makes it substantially more difficult for the NSA to interpret the content of what a user is sending, and increases the chance that they may unwittingly collect and retain your communications. In order to address these concerns, this proposal necessarily deprecates all the SSL/TLS ciphers in favor of Double CAESAR’13, a thoroughly studied and well-known military-grade solution which offers excellent modes for graceful redegradation.

Isn’t it dangerous to send your social security number in plaintext along with every request?

Conventional security warns of the possibility of man-in-the-middle attacks, but these new intelligence revelations require entirely new types of cryptographic thinking. Here, the trusted entity is not the server acting at one end, it’s not even the user issuing the requests- but rather, it’s the bureaucracy sitting in the middle politely intercepting all traffic for benevolent analysis- protecting your way of life.

One may be tempted to characterize this as a sacrifice of privacy in order to optimize security, but this position is simply naive. Every new progressive initiative of the government advances both fronts- both security and liberty, never at the expense of either. If you take a holistic long term perspective on the impact on a global scale with a vast array of (classified) information sources, there is very little question that you too would arrive at the same conclusions on the genuine merits of this surveillance system.

In this case, the removal of encryption ensures that the government is able to parse the content of messages to identify terrorists. At the same time, the inclusion of the citizenship identification information should give citizens the safety of mind, knowing that their messages will not be stored indefinitely in a NSA datacenter.

What about Identity Theft?

What if you set up a server to transparently capture the browser headers? Any malicious entity could then collect all the social security numbers and real identities of everyone who happened to stumble onto their websites and use the information to sign up for credit cards, create hazardous investments, threaten or blackmail loved ones, and masquerade as a citizen while doing terrorist activities! There isn’t any real evidence that such sweeping surveillance will even substantially reduce the chances of events that are intrinsically outliers anyway. On the other hand, identity theft is a real world issue which affects millions of Americans on a daily basis- and these changes will only make our real problems worse. — Short-Sighted Critic

Our government has to reconcile with the fact that the flow of information has radically shifted in the past few decades- all the previous paradigms of privacy, security and adversaries have been obsoleted. Understandably, they need to create infrastructure to tackle this next generation of attacks. This could mean highly orchestrated attacks being planned online, and the government is justified in trying to exercise every available option to avert the next cyber-9/11. Our adversaries may have no limits to their capabilities, and so waiting for definitive evidence on the efficacy of counter-intelligence approaches is giving them an opportunity to plan their next attack.

When what’s at stake is the American way of life, it’s easy to put aside things that don’t really matter.

If the terrorists do find a way to cheat the foreignness heuristic, that’s not a problem, because this proposal is backwards compatible with the existing catch-all NSA policy. They can always, in the end, ignore the X-No-Wiretap header, but we wouldn’t know so it’d be okay.

When can I use this?

It’s expected that this proposal will breeze through the standardization process- because we as Americans can always get together and do that which must be done in these times which try men’s souls. Browsers should implement the feature as soon as possible, so that people can make use of the increased sense of security and privacy it affords.

If you’re truly eager to try it out, you can contribute to the prototype chrome extension which supports the header injection (the reversal of HTTPS Everywhere, a feature called HTTPS Nowhere hasn’t been implemented yet, but we’re accepting pull requests!). Since this extension is still experimental, inserting your personal identifiable information must be done by editing the source code, but you should expect a more user friendly interface in the next revision. Since it isn’t thoroughly tested, there may be a chance that it fails to leak the user’s personally identifiable information with every networked request, but rest assured this will be fixed as soon as the bugs are made aware to us.

We should all rally behind this proposal for a simple technical solution which will go a great length to simultaneously enhancing both privacy and security, while overall preserving the only thing which matters, our American way of life.


August Progress Report 31 August 2013

It must be infuriating to be in a situation where there’s a clear problem, and all the obvious remedies continually fail to produce any legitimate result. That’s what this blog is like: every month rolls by with some half baked ideas partially implemented and barely documented. And in the last day of the month, there’s a kind of panicked scramble to fulfill an entirely arbitrary self-enforced quota just to convince myself that I’m doing stuff.

Anyway, it’s pretty rare for me to be doing absolutely nothing, but mustering the effort to actually complete a project to an appreciable extent is pretty hard. This might be in some way indicative of a kind of shift in the type of projects that I try to work on- they’re generally somewhat larger in scope or otherwise more experimental. And while I may have been notorious before for not leaving projects at a well documented and completed state, these new ideas often languish much earlier in the development process.

There’s always pressure to present things only when they are complete and presentable, because after all timing is key and nothing can be more detrimental to an idea than ill-timed and poor execution. But at the same time, I think the point of this blog is to create a kind of virtual paper trail of an idea and how it evolves, regardless of whether or not it falters in its birth.

With that said, I’m going to create a somewhat brief list of projects and ideas that I’m currently experimenting with or simply have yet to publish a blog post for.

  • One of the earlier entries of the backlog is a HTML5 scramble with friends clone, acting highly performant on mobile with touch events and CSS animations while supporting keyboard based interaction on desktop. I’ve always been intending to build some kind of multiplayer real time server component so that people could compete in real time. Maybe at some point it’ll be included as a kind of mini game within protobowl.
  • The largest project is definitely Protobowl, which has just recently passed its one year anniversary of sorts. It’s rather odd that it hasn’t formally been given a post on this blog yet, but c’est la vie. Protobowl is hopefully on the verge of a rather large update which will add several oft-requested features, and maybe by then I’ll have completed something which is publishable as a blog post.
  • Font Interpolation/Typeface Gradients. I actually have no idea how long this has been on my todo list (years, no doubt), but the concept is actually rather nifty. With attributes like object size or color, things can be smoothly interpolated in the form of something like a gradient. The analogue for this kind of transition when applied to text would be the ability to type a word whose first letter is in one font, and the last letter being another font, with all the intermediate letters some kind of hybrid. I never did get quite far in successfully implementing it, so it may be a while until this sees the light of day.
  • I’ve always wanted to build a chrome extension with some amount of OCR or text detection capabilities so that people could select text which was embedded within an image as if it weren’t just an image. At one point I narrowed down the scope of this project so that the OCR part wasn’t even that important (the goal was then just some web worker threaded implementation of the stroke width transform algorithm and cleverly drawing some rotated boxes along with mouse movements). I haven’t had too much time to work on this so it hasn’t gone too far, but I do have a somewhat working prototype somewhere. This one too is several years old.
  • In the next few days, I plan on publishing a blog post I’ve been working on which is something of a humorous satire on some of the more controversial issues which have arisen this summer.
  • And there are several projects which have actually gotten blog posts which weren’t themselves formal announcements so much as a written version of my thinking process behind them. I haven’t actually finished the Pedant, in spite of the fact that the hardware is in theoretically something of a functional state (I remember that I built it with the intent that it could be cool to wear around during MIT’s CPW back in April, but classes start in a few days and it hasn’t progressed at all since). Probably one of the most promising ideas is the kind of improved, vectorized and modern approach to video lectures- but that too needs work.
  • I’m building the website for a local charity organization which was started by a childhood friend, and maybe I’ll publish a blog post if that ever gets deployed or something.

Adaptive Passphrase Length 31 July 2013

Longer passwords are more secure. But if you’ve ever been on a website or computer network which mandated that you change your password every 30 days to a string which you haven’t used in the past 180 which could not contain any part of your username or an English language word without being liberally adorned with num3r1cal and sy#b@!ic substitutions and aRbiTRarY capitalization, you should be able to understand the pain of long and arcane passwords.

At time of writing, I’m a high school senior with something on the order of three weeks left in school. I can’t help but feel a tad wistful of the four years I’ve spent roaming these red-tinted halls of learning. Many of the particularly bittersweet memories include PE, a mandatory elective for my freshman and sophomore years. A bizarre policy forbid the possession of backpacks in the locker rooms, leading to the acquisition of the skill to open rotary combination padlocks, a skill which I just as quickly dispossessed myself of, having absolutely no need for a locker after that.

During that time, I had read about a pretty nifty hack involving Master Locks at the time. If you entered the combination and opened the lock and closed it again and left it on the last digit, you could open it again by rotating clockwise by fifteen and rotating back to the final digit (to fight the possibility of attackers who are aware of the exploit, you could turn the dial clockwise some number less than 15 after locking it, so that you would only have to remember the final digit). It was this trick that I used to open my locker just about every other school day for those two years- a reduction in security no doubt, but a great convenience. I’ve been thinking over the design of the lock recently, and I’m not sure if it’s better classified as a bug or a feature. Conventional wisdom dictates that no doubt this substantial reduction in the key space is a rather severe design flaw. The fact that many newer locks were immune (a lot of locks will snap the dial forward when closed so that it is physically impossible to land on the final digit) to this trick seems to support the notion that this was somehow an undesired consequence of locksmithery rather than some kind of delightful accident.

But the more I think about it, the more this bug seems like a feature. Perhaps the loss of security isn’t nearly as egregious as it appears, and this little shortcut is actually a delightful little accident. Nearly every school day, I’d only have to take four seconds to open my locker because of this handy trick, except on those occasions some kid before me decided to walk around the lockers to press their heads against the metallic locks to spin them and pretend they can lock-pick, in which case, I’d have to enter the whole combination to open __sesame. That is, you only need to know one number to open it unless someone earlier guessed wrong, in which case, you’d have to enter three numbers.

The next story is about an iPhone.

I playfully grab at a friend’s phone for the second time in a week. A friend takes a selfie using the newly emancipated phone, with a rather contorted grin. I dutifully replace her lock screen photo, which had previously been an aerial shot of her grassy picturesque college campus with my friend’s contorted close-up selfie. She learns to be a bit more protective about her phone, quickly issuing a light tap on the power button to reengage the lock screen, a four headed Cerberus with an exponential backoff growl. But this doesn’t mean anything to the enterprising background changer, because the unlocking gesture happens all the time and it’s fairly easy to figure out what numbers someone’s typing.

This is a fairly innocent class of authentication breach, but it should be fairly unnerving given the extent to which our lives and personal information are accessible. Anyway, these two stories led me to think about ways to both simplify the process of entering passcodes and passphrases while simultaneously hardening against some general attacks, in particular replay attacks (and to a lesser extent, man in the middle).

The premise is simple enough: As you’re typing a password, letter by letter and stroke by stroke, each character is processed (either locally, or sent to a server). As soon as the authorization engine is comfortable with the amount of data provided, it sends a signal so that no more data is transmitted, and that authorization is complete.

In the same manner as the old MasterLock, maybe in a particularly auspicious occasion, one might only need one or two symbols in order to prove their identity. However, if the wrong digit is entered on the first try, the subsequent attempt must then obviously come with increased difficulty. Likewise, it may be advantageous to have a low initial difficulty if the login scenario seems very ordinary, which can be determined by any number of variables, including but not limited to the current IP Address, the location as determined by GPS or other means, stylometrics such as the rate and delay between typing individual letters, the number of simultaneous concurrent sessions, how the last session ended (by timeout or by an active logout). But an important part of the system is that the difficulty has to be non-deterministic, because if we assume the attacker knows the system and can manipulate the variables at play, there is still a last vestige of hope that the attacker’s first guess is wrong (after that the difficulty climbs and the attacker is no longer really a problem).

Note that the server would not specifically reveal that a password is incorrect (until it’s manually submitted, like in existing systems). It would only indicate when the received content is sufficient.

In a keyboard input scenario, for a shell prompt or for generic password entry, it’s probably important to keep the text box open until the user acknowledges that he has been authorized, however in case the connection has been compromised by a non-interactive eavesdropping man in the middle with the intent to replay the password, the client should stop sending additional letters as soon as the server signals that it’s ok. This gives the server the option of never revealing to the user whether a response is sufficient, essentially forcing the user to enter the entire password and pressing submit.

Like how some safes or button pads have duress codes that silently call the police when triggered, there system could ignore invalid characters that prepend the actual password for entry, while subsequently blacklisting that given prefix and raising subsequent difficulty.

Of course this isn’t a complete solution. There needs to be some segmentation of access based on current authorization levels. For instance, checking whatever new mail or facebook statuses exist probably does not warrant much authentication friction, but combing through old personal documents and emails, installing root software or changing system settings, should probably require higher levels. This can probably help in that by keeping a unified password of sorts for all levels, but isn’t itself a complete solution. Two factor authentication is also important, in case a single device is compromised or some part of the scheme is untrusted.

I think it’d actually be pretty cool to see something like this implemented, because I think it might serve as a genuinely useful feature that enables a continuously variable spectrum that varies between when security is a situational necessity and when excessive user friction is detrimental.


Array Addition and Other Fun Javascript Hacks 30 June 2013

fun hacks

It’s times like these that I’m reminded of my favorite childhood TV show (actually it’s kind of a close contest between this and Spongebob), Mythbusters, and that perennial warning: “don’t try this at home”. But this isn’t actually dangerous in any physically life-damaging way, it’s just dangerous in that it’s a very bad idea. The whole thing started with a friend who was irked by a few Javascript oddities, one of them being the inability for arrays or tuples of numbers to add.

I actually came across an idea some time ago while reading about cross-site JSON exploits. But the premise for adding arrays is rather simple: since the language only supports subtracting and adding numbers, the javascript engine will try to convert an array to a number by calling its valueOf method. There really isn’t a numeric representation of an array so valueOf usually just returns NaN, which isn’t tremendously useful. However, we can override the valueOf function to return any number we want.

We can create a valueOf function which assigns numbers to arrays according to a specific pattern, such that the resultant number can reveal (without much ambiguity) which arrays were involved, and whether they were added or subtracted. There are probably a myriad of ways to construct numbers like that, but one of the simplest (or at least the first one that I could come up with) is to raise some given radix (at least 3) to a unique number that increments every time valueOf is called. You can understand it pretty simply in base 10.

Lets say our first array is [1, 2, 3] and our second array is [5, 4, 3]. Every time the instance’s valueOf is called, we plop it onto a temporary global array and record the index. In this case, lets assign the first array index 1, and the second array index 2. If we raise the radix 10 to that index, we can get the respective unique numbers: 10 and 100. Now these numbers can be added or subtracted, leading to 110 or 90 or -90 (or unaltered, leaving 10 and 100). To find out what operations and what numbers are involved in the operation, we can first add 1000 which is 10 raised to the size of the global array, which has the useful effect of making everything positive. Now we’re left with 1110, 1090, 0910, 1010, and 1100. We can ignore the first and last digits, and each digit can be either a 1, a 0 or a 9 (if it’s anything else, this is just a number, not the result of a magical array addition). A 1 tells us the number was added, a 0 tells us the number isn’t used, and a 9 tells us that the number was subtracted.

This leaves a weird problem though, which is how you’re going to convert a number back into that array transparently. And this segues into the more atrocious segment of this hack. If and when Skynet and the other assorted malevolent (uh… I mean, misunderstood) artificial intelligences develop their courts, the fembots and gentledroids of the jury will no doubt consider me guilty of whatever felonious usercrimes may exist. They’ll consider me an equal of Norman Bytes, butchering idiomatic javascript in the shower.

What exactly makes it such a heinous offence, you may find yourself asking. The answer is simple: the unholy matrimony of with and Object.defineProperty (also, keep in mind that it isn’t DOMA’s fault).

The great thing about Object.defineProperty is that it lets you get in the middle of the whole variable assignment process. The problem is that this only works with object properties, not top level variables. But Javascript has a nice (read: wretched) with statement which lets you treat variables as if they were object properties. This still leaves a slight problem because there’s no way to define a catch-all property. And if nothing so far has wrought utter terror to your soul, this last critical part may very well do exactly that. Since variables that are called must exist there in name, we can use the Function toString method to decompile the source and use a simple regular expression (any symbol which starts with A-z $ or _ followed by any of that or a number) to extract candidate variable tokens, and for each one, we can define a getter and setter on a new object which is subsequently passed as the first argument to the function whose first enclosed statement is a with.

The power of intercepting all function calls, variable declarations and retrievals then comes by recursively creating another fake object filled with getters and setters whenever a property is accessed. For method calls to primitive types like strings and numbers, we do the same type of sorcery but directly on their respective prototype properties. Whenever a function is passed a number which happens to match the pattern for an array addition or subtraction, we can passively intercept and substitute its value. Any string which matches a certain pattern of CSS selectors can be then transparently substituted with the NodeList which results from a document.querySelectorAll. And we can change all the variable declarations for a for..in loop such that array values are used instead of keys.

And now, four minutes before the end of this month, I’ve successfully yet again managed to eke out a blog post to fulfill my quota. And I guess I don’t have an Humane Society to prove that no humans were harmed in the making of this blog post, but how bad could it possibly be— it’s only 150 lines.


Automatically Scanning on Lid Close with Arduino 30 May 2013

Arduino Scanner

Before the 29th of January, I had a rather irritatingly Rube Goldberigan scheme for using my CanoScan LiDE 80. It didn’t have native Linux support, so the only way I could scan documents was through a Windows XP (Technically MicroXP) installation inside a VMWare virtual machine extant for the sole purpose of indulging my blue moon scanning requirements. But after that fateful day, I had a new scanner which- in its matte black glory could reasonably produce bitmap reproductions from pieces of paper on its surface, notably sans the convoluted meta-computing.

Ubuntu ships with a nice Simple Scan app, which as the name might suggest, allows you to scan things rather simply. But for some reason I thought that getting a scanner was a good reason to start scanning every one of the numerous pieces of paper I get from school. Naturally, there’s a problem of course which arises from the fact that I have to open up an app (usually with the absurdly slow unity launcher) and then use my mouse to click buttons, all of which happens on top of the already undue burden of opening the scanner lid to put papers inside.

Obviously this system would be untenable for a mass scanning addiction. Maybe a reasonable person would at that point take a deep introspective about the end goal of such an endeavor. Maybe wasting the 5 Gigabytes a year needed to archive some 8,000 school papers (assuming that the files are stored at a fairly reasonable 200dpi as jpegs averaging around 5MB per)- most of which illegible chickenscratch anyway, plus the ten minutes or so per day at minimum to shuffle papers around. Well, a _reasonable _person would take into account the cost and relative benefit of such a scheme and decide that this really isn’t worth it.

But of course, I couldn’t be so overtly dismissive if I had that same reasonable sense. Obviously this setback was not a serendipitous invitation from providence to investigate the social, ethical and utilitarian merits of the goal (I say this of course with a great deal of sarcasm, but there are many less trivial opportunities that technologists ignore). Instead, ignoring whether or not it was a good idea in the first place, I decided to fix it.

The first step was trying to get the buttons to work. Canon scanners, and presumably all other scanners, have little buttons on the device that you can push to accomplish certain easy tasks. On Windows, it’s pretty cool because it’ll automatically launch whatever app it is that scans things and start it automatically. There’s a set of four: PDF, Auto, Copy and Email, pretty self explanatory at that (and if that fails to suffice, they’re coupled with cute monochrome icons).

The problem is that it doesn’t work automagically, at least on Linux. I found little project called scanbuttond, or something to that effect, and tried patching it in some way that would fix it, but after investing an hour or so- I pretty much gave up.

The thing is that having button isn’t even the most efficient system for document scanning. So what physical cues are there to indicate intent-to-scan? The first and most obvious one is the presence of a document in the bay. It’s a bit difficult to divine when a document is placed on the surface, though it’s probably possible with some fancy arrangement of lasers or light sensors exploiting ftir or precisely measuring the weight of the entire scanner or monitoring the butterflies in proximity for fft’d wing flaps. Plus, chances are that immediately after a document’s been placed, you probably want some time to adjust the paper’s alignment and get your fingers off the paper (before the intense searing beam of panchromatic LED light reduces your fingertips to burnt crisps).

For flat documents, most of the time when that happens, you close the lid so that the foam white lid insert will press down on the document (flattening out the wrinkles endemic to a haphazard backpack-paper-shoving technique) after making adjustments to ensure proper alignment. The only problem with this scheme is if you’re scanning some thicker document (a book- whatever that is) or don’t feel like lowering the lid (now, why would you want to do that?).

So with a piece of folded up aluminum foil and foam and some judicious application of tape (in a setup eerily similar to that door alarm that I built from a kit from the Discovery Channel store when I was 7) I crafted a little switch mechanism that activated whenever the lid was closed. Then the two prongs of the foam switch were attached to a voltage divider on and Arduino Leonardo board so that it could measure when the thing closed or opened. A tiny piece of code ran on the Arduino which would do exponential smoothing on the voltage between the pins and send a signal over serial.

On the computer-side of things, there’s a little python script which listens on the serial device for whenever the lid has closed. It uses the pySANE bindings to scan the document and then using PIL compares it to a blank scan in order to figure out if what we scanned was actually empty. If it isn’t empty, then it saves it to the disk as a file named after the current date.

Twenty lines of Arduino C, and a hundred of Python later, I have a scanner which makes it exceedingly easy to digitize my paper trail. Too bad I haven’t used it in four months. I reinstalled Ubuntu at some point and never quite bothered setting it up again (even though it would only entail like two lines to attach it to the startup applications list). Even when it was working, it took some amount of work and offered almost no immediate benefit- it didn’t integrate any form of OCR or fancy visualization which could potentially make this interesting from a scientific perspective. But school is winding down, I mean literally graduation is a smidgen more than a week away, so I’m not exactly going to have much more paper to digitize. So I guess this marks the end of a failed experiment, the pre and post mortem- interesting in that it succeeds in just about everything I set out to do, but I never quite got it into using it.

Maybe you’ll find it more useful than I did.