JavaScript and Maps (in that order)

Block Scoping JavaScript: no Automatic Curly Bracket Insertion

Riddle me this JavaScript user, when you write

if(something) doStuff();

what is that equivalent to? If you’re like me you guess that much like how semicolons are inserted by the parser, curly brackets are inserted as well, meaning that the above statement would be equivalent to

if(something){doStuff();}

But no, from this recent thread on esdiscuss it would seem that the above statement is equivalent to

void(something && doStuff());

Now currently these 2 methods of parsing the statement (for the purposes of our example) have identical results, but with ES6 and the block scoped variables there will be a difference,

if(something){let x = 5;}

is not the same as

void(something && let x = 5);//not valid

If for no other reason that the lower statement isn’t valid as let x = 5; is a declaration not of a statement, but hopefully you get my point that

if(something) let x = 5;

Would declare x in the outer scope and not do nothing like one might expect.

While this might be a bit confusing at first what it really means is that, with a few exceptions, block scope and curly braces are synonymous and will always do exactly what they look like they are doing, HOORAY!!!

*the exception is the top part of a for loop (but not a while loop as you can’t declare anything in the head) in that for(let x in y){} x is part of the lower scope.

Exporting a function in ES6

Exporting a function in ES6

Using generators via co to put Massachusetts case law into CouchDB

I’ve been hearing good things about generators for a while and was looking for a project to use them on, then someone posted a huge amount of case law and I had one. Maybe six months ago I scraped the Massachusetts General Court’s (think state legislature, but we’re a commonwealth so the name has to be ridiculous, we don’t have a DMV either, but I digress).  So adding case law seemed the next logical step.  I had used python last time but that ended up being a poor choice so this one I was going to do it in straight up JavaScript (node.js) to get all the laws into a database (CouchDB).

The cases were in individual xml files organized into folders to specify types leaving 4 folders of court cases (department of industrial accidents , appellate court opinions, district court appellate division opinions, superior court opinions) with a total of 101057 files (with apparently random filenames). 

The steps I need to take was for each of those 4 types

  1. get a listing of all the files in in the folder and for each file
  2. read it’s contents
  3. convert the xml to json
  4. get it into CouchDB

You will also notice that each of these is an asynchronous operation, which is where generators come in.  I used co for flow control which allows you to use generators to write asynchronous code in a synchronous way, which means that the main loop of my code was (more or less) this

If your curious, here is the full code.

Your first reaction should be, well so what, you’ve successfully made an async program synchronous.   The key is that while it may be synchronous it is still non-blocking. Which means that if this was part of a bigger program (like a server) the program wouldn’t be idle while this part was waiting around.

That is also the case in this example, you’ll notice that I am using PouchDB to put the document into CouchDB and that PouchDB can act as both a client to a remote CouchDB instance but also create a local instance (on node using LevelDB). Local PouchDB instances are a lot faster then remote ones so what I did was I put the documents into a local one and had it replicate in the background to the remote one.  Since the replication is also asynchronous but slower than the main task of putting the stuff into the local database then whenever we are waiting around for files to be read or stuff to be put into LevelDB then PouchDB is moving more local stuff remotely.

(yes I could have also removed the outer co and had it do each of the 4 folders at the same time, but that would have used up a lot more memory and I was having this run in the background while I was doing other stuff).

I haven’t made a a site with the case law yet, but if you want it, it’s in a publicly accessible CouchDB instance caselaw.calvinmetcalf.com (or if you just want to see it all, here is the link to all of the documents)

Best puzzle ever

Best puzzle ever

You probably don’t want to use the module pattern.

Every so often I see code (and this includes my old code) which uses the ‘module pattern’.  This is when instead of using a constructor to make an object with a prototype you just create an object, add methods to it, and return it, for instance

You probably don’t want to do this.  Why? Because the benefits don’t outweigh the costs. 

Benefits to the module pattern:

  • Conceptually simpler, especially if you aren’t familiar whit how JavaScript inheritance works.

Benefits to constructor pattern:

  • Faster, JS engines have optimized it.
  • Uses less memory. 
  • instanceof operator works.
  • Can inherent with it easily. 

Two other things differences which I’m calling a wash are

  • monkey patching, which is modifying the prototype method to change current functions, which some people like but I believe is usually an awful idea, can do it with constructors
  • data hiding, which is having truly ‘private’ data, this can be done sorta with constructors but breaks inheritability. I usually consider this bad, but some people like it.

Lest anyone using the module pattern’s feelings get hurt I will point out that I myself have used the module pattern extensively in libraries.   But the good news is that it is easy to switch, step one, have your module object be ‘this’ instead of an empty object and delete the return value

there is no step 2.Well that’s a lie, step 2 for that one would be to go find all the places in your code that called it and add the new keyword, there are 2 other options, capitalize the new constructor and then make a lowercase factory function.

or just use the instanceof trick

I usually use the latter as I find the factory pattern to be more complex with it’s multiple functions, for a final outcome of:

Bear in mind that calling it without new adds another frame to the stack, so should be avoided except when necessary.

If you want to ‘hide data’ your constructor you can do it here like so

but bear in mind that most of the benefits that come form constructors we haven’t realized yet.

To get the most benefits out of using prototypes etc you need to actually use prototypes by moving your method definitions outside of the body of the constructor. 

Note you can’t data hide in this way, and if you need to redefine ‘this’ so it can be used in callbacks you need to do it per method you can’t do it once. But all the functions only get defined once so it is significantly faster.

Note: for the love of God we are not talking about AMD and CommonJS modules, we are talking about an object construction pattern which happens to have the same word in the title, that exists in the wild, sometimes wrapped in an IIFE.

regex + eval = crazy delicious

A while back someone (who shall remain anonymous  because I have made this type of category mistake before) someone opened an issue for an ajax library I have, complaining about the security risks related to using eval as a fallback to parse JSON if someone had a browser that didn’t support JSON.parse.  What was notable was that the library also supports JSONP which the issue made no note of. 

So why does eval (and friends like Function and SetTimeout with a string) get so much FUD exactly, it’s not just the security, preventing eval doesn’t do much good when any script can add arbitrary script tags, I mean your closing the barn doors after the sheep cows have left if your preventing the Function constructor on page that has jQuery (with it’s $.jsonp).

This is not to say that there are no security issues, there are but you can’t deal with them by just avoiding eval, CSP are the way to do that, they can prevent pretty much everything I talk about here, except maybe the blob worker through I’m not sure. But they are beyond the scope of this article.

One reason has to do with historical misuse of eval, take a look at this snippet which is honest to god in the wild code

This file (which is downloaded with a bunch of files which start with ‘dnn’ someone is going to figure it out, it’s this script, from this page part of Dot Net Nuke CMS site) uses eval nearly 340 times for string concatenation.  340 times they create a new instance of the virtual machine in order … in order to do nothing, the script will act exactly the same without the eval (except be faster).

Brendan Eich mentioned how back in the day, people wouldn’t realize you could do obj[key] and instead would do eval(“obj.”+key).

This is the reason eval is ‘evil’, because unless you actually know what you are doing, you’re more likely to shoot your self in the foot then anything else so for all intensive purposes you should consider it evil and move along.

eval and friends

While eval is well known, when we are talking about eval we are actually talking about a couple things, first you have the Function constructor. new Function(arg1, arg2, string) creates a new function with the last argument as the body and the other arguments as the functions arguments. In essence it’s this.

So you’re running eval on the body of it. The other ‘classic’ evalesque functions are setInterval and setTimeout when you give it a string in the first argument instead of a function.  Unlike the function constructor which occasionally has some legit uses passing a string to setTimeout and setInterval doesn’t, ever.

There are several other JavaScript things which for security purposes may as well be the same as eval, the first is to execute a script via the dome either by inserting a script tag into the dom and setting it’s src, loading an iframe which has a script tag in it, or setting the onload property of an image and a myriad other way.  The next is loading a web worker, either from a url or from a string via a blob url, the last last is the web workers importScripts function which loads an executes a script.

Using the DOM  isn’t as bad as the previous methods performance wise as it doesn’t create a new VM, and when creating a working a new VM is the point so it’s a legit trade off.  Workers are somewhat less of a risk as they have no access to the DOM, and can only message the main thread but now execute anything, on the other hand workers can do ajax requests and can load scripts via the importScripts function and run eval so security wise there are trade offs.

When eval isn’t evil

Until the ES6 proxy object eval was the only way meta program.  For instance in CouchDB you write a function in the database which calls a function called ‘emit’ with the result. e.g.

The only way to implement this in JavaScript for PouchDB and have the emit the function inside that function refer to the emit we want it to and not the emit it’s closure would usually refer to is to use eval. 

Eval and it’s friends, mostly blob workers are used extensively by Catiline, a library which creates new creates a new API on top of web workers which allow them to be used easier in libraries but also with fall backs that don’t support workers. In order to be able to write a function in current scope, but run inside a web worker is by using the toString method of functions and then running regexes on them before opening them in the worker (or iframe as a fallback). 

The function constructor can be used to make very fast templates as it uses the constructor to create a function which concats strings and nothing else.

Axel Rauschmayer points out that eval can be very effective in carefully controlled offline situations to make config files much more expressive.

Thomas gratier pointed me to an article by Nick Zakas on using eval in a css parser.

Regex + eval as a combination can allow meta programming as powerful as macros in other languages. Of course these are unhygienic as hell (but this never stopped certain LISPs) and have some major performance penalties so you while you can do amazing things with regex + eval you can also shoot yourself in the foot. 

In other words if you don’t know what you’re doing, you should just go ahead and consider eval evil, but if you do, and your careful, you can do some amazing things.

Immediately Invoked Function Expressions

Note: if you aren’t familiar with what an IIFE is, skip to the bottom it should be obvious where.

For some reason I have always written my IIFEs as ‘(function(){})();’. I’ve noticed that others use ‘(function(){}());’ and decided to investigate. I posted a query on twitter which @rauschma helpfully re tweeted and here is what I learned. 

(fn)() and (fn()) are for all intensive purposes the same the few differences (hat tip Ben Menoza) are extremely superficial. 

You can also use !function(){}(), void function(){}(), ~function(){}() for the same purpose, ~function(){}() is just silly, void function(){}() requires using void which i can think of no good reason to ever use and !function(){}() might be a good idea to use in code not intended to be human readable (e.g. code produced by a minifier) it is not very clear though when used in human readable code, for the same reason ~array.indexOf() is bad.

It’s probably a better idea to use (fn()) over (fn)() because 

  1. (fn)() looks like dog balls, which in context means that there is stuff related to it that is hanging out side of it (this is not a problem specific to canine testicles) 
  2. If you are in a situation where function(){}() works (like assigning to a variable or using as an argument) then you can just remove the outer parentheses instead of removing inner ones. 

Edit:

It was pointed out to me that there is a second type of IIFE, you can use var foo = new function(){}; to create a new object (singleton in the jargon), you don’t actually have to assign it to a variable, but if you don’t it’s no different then a regular IIFE. This works as written because using the new operator allows you to omit parentheses if they aren’t needed (aka new Constructor(); is the same as new Constructor;) so new function(){}; is the same as new function(){}(); I didn’t cover this because.

  1. I’d never heard of this pattern
  2. It serves a different purpose. 

IIFE?

An Immediately Invoked Function Expression (IIFE) is a term in JavaScript for when you want to define a function, and then immediately invoke it. Due to parsing rules if you try to write function(){}() you will get an error, this is because if function is the first thing it sees it assumes your writing function myFunctionName(){} and is then disappointed when it sees a ‘(‘. For this same reason you can’t type into the console function(){}, you’ll get the same error, and if your curious the technical explanation for the parse error is that it is expecting a function declaration but is getting a function statement. 

Browserify/NPM: Package Manager Odyssey Part IV

Part 4 in a however long I end up making it series, see the intro, or the previous ones on Jam and Component.

Today we are talking about about a very different type of client side package manager, the reason it’s different then the previous two is that it isn’t a client side package manager, it’s a server side one, NPM which you probably have heard of.  The only reason we are talking about it is because of the program substack wrote called browserify.

Browserify can end up being used a couple of different ways, you could use it as a drop in replacement for component to build a library, the API is almost identical, to create standalone file ‘foo.js’ in directory ‘dist’ with an entry point of file ‘bar.js’ and to have it be a stand alone umd bundle called ‘baz’: in component it’s

component build -o dist -n foo -s baz (you specify bar.js in the component.json)

in browserify you do:

browserify -o dist/foo.js -s baz bar.js

the main difference is that browserify doesn’t double as a package manager, so all your dependencies are handled by npm, this is a big plus as npm has a significant percentage of all JavaScript libraries even if they are only client side, this means you can do npm install backbone —save and then just require it in your app. 

For trickier libraries like jquery where you can’t do npm install for them (the jquery in npm is not the real one) you can do bower install jquery and then do use the ‘require’ command when you build, ‘—require ./bower_components/jquery/jquery.js:jquery’ allows you to do ‘var $ = jquery;’.

The other thing browserify does is it adds shims in for the node standard library, meaning you can use say path to manipulate path strings, or crypto to hash things or glib to unzip stuff.  As long as the app doesn’t try to do anything that the browser doesn’t allow (i.e. use the fs library) you can require that module, meaning you can require modules even if they author didn’t opt in. 

One of the reasons this series has taken so long is that I have been building real honest to goodness libraries and apps with the different things I’m covering and with browserify I built a library for parsing ESRI file geodatabases and was nicely able to make a node library, a browser library, and an app

I’m also in the process of converting proj4js to browserify and the dev branch of my Massachusetts law viewer I converted as well (which I’ll be speaking about at CouchDB conf btw, if you want to come you can save $25 with this link). I feel like I say this in each blog post but I’m going to be making my apps with this going forward for the following reasons:

  • being able to use almost all of the modules in npm without the author having to explicitly design it for this package manager means that you have access to an order of magnitude more modules.  The no opt-in for modules is really the killer feature in many ways.
  • The ability for libraries like curl to asynchronously load commonJS modules really is the final nail in the coffin for AMD. commonJS is the clear winner (unless TC39 decides to come to its senses and bring something besides pythons module system to the browser)
  • Browserify also allows complex transforms for those who use CoffeeScript or other such things. 

Next time (whenever that is) I’ll probably be looking into bower by way of yeoman, a package manager setup that doesn’t actually compete with npm/browserify 

Proxies in ES6 something.*()

I had an interesting problem today, it involved catiline, my library for web workers, how do I fake console in all it’s various forms in the worker.  This is easy for log, warn etc it just sends back the method names but there are quite a few and they aren’t universally supported in different browsers. In the end I just hard coded a list of the ones I’d go with, but what I really wanted to do was be able create something that allows me to do console.* = function…

As it turns out you can do this with proxies in ES6, and that’s just the beginning. This

A couple notes: that will only work in firefox as V8 has an older version of proxy, (edit this shims new proxy via the old proxy) also I seem to remember you don’t need to use the new keyword for proxies but firefox throws an error if you omit it.

The way it works is that we are intercepting the call to look up the key and then building the function for that key, there is no reason we couldn’t return a value in certain cases. Like with

which makes the method lowercase and tries to return a value if its all caps. Now all of this only uses the ‘get’ option, the full list is here

If you’re into Aspect Oriented Programing then this is going to be your dream come true as you can intercept function calls before they touch the function, like in this example where we wrap it in a try statement to make sure it runs.

Some more links: