Directory-Backed Resources in Node.js

After writing my last post about the flow control techniques we are using in cast I realized that the process of writing the post was probably as valuable to me as the post will be to anyone else (especially since it was on such a well covered topic). It forced me to explicitly state and justify my reasoning, and caused me to rethink a few things in the process. Since then, whenever I’ve found myself contemplating a design issue I’ve tried to think of how I would write it up. This post is the logical extension of that plan, that is, an actual writeup of just such an issue.

The Problem

In cast, we maintain very little state in memory between requests. Most state is stored to the filesystem in the form of the directory structures that we use to manage applications, services, etc. As such we have a lot of code dedicated to managing these resources, including the ‘Instance’ class I mentioned last time (pedants: I know, JavaScript technically has no classes, but thats the pattern we’re emulating so thats what I’m going to call them). For a lot of reasons, most of which are outside the scope of this discussion, we’ve found an Object-Oriented model to be the best way to represent these resources.

There are three main ways that instances of such a class tend to be created:

  1. We create a new resource on the filesystem and return an instace of the class for further operations.
  2. We create a single instance of the class to represent a resource that already exists.
  3. We create a list of class instances to represent the set of all existing resources of this sort on the filesystem.

The problem is that creating a new resource on the filesystem (obviously) requires filesystem operations and when retrieving one or more instances to represent pre-existing resources we almost invariably want to make sure that the resource in question actually exists, again requiring filesystem operations. This means that each of these operations must be done in an asynchronous manner. In short

var foo = new Instance('foo');

is not sufficient to either create a new instance or retrieve one with any assurance that it exists.

The Solution

Once again, the solution itself isn’t all that revolutionary: we need factories. While many seem to associate the term with unpleasant memories of Java, factories are pretty standard practice in Node, and as a general matter are a very good thing.

There are two general patterns with factories in Node. The first, and the one that is most common in Node’s public APIs, is to return objects that implement the EventEmitter (a sort of general purpose promise - but don’t tell anyone I said that) interface, then let the user register callbacks for various events on that object. For example, to create a TCP connection:

var net = require('net');

// Create a connection to google.com on port 80
var goog = net.createConnection(80, 'google.com');

// Wait for the connection to actually open before using it
goog.on('connect', function() {
  goog.setEncoding('utf8');
  goog.on('data', function(chunk) {
    console.log(chunk);
  });
  goog.write('GET /\r\nhost: google.com\r\n\r\n');
});

The other, which is less common in Node itself, is to pass a callback to the factory function, which is called with the instance once it is ready. One case where this pattern still exists is in the Node API is in fs.stat():

var fs = require('fs');

fs.stat('/tmp', function(err, stats) {
  console.log(stats.isDirectory());
});

Note: the pattern of passing a callback directly to a function is quite common in Node, but fs.stat() is the only case I know of where the second argument (ie, the one that isn’t ‘err’) is something other than a primitive or a list of primitives. On reflection that’s probably a bad way to distinguish a ‘factory’, so maybe the above is a bad, example but you get the idea.

This second pattern is more appropriate for our purposes, as the instances will tend to be short lived and would be unlikely to ever emit any other event.

Sticking to this pattern, what we’ve ended up with is something like our instance API in cast:

// Create instance 'foo' using fooapp version 1.0
deployment.create_instances('foo', 'fooapp', '1.0', function(err, instance) {
  ...
});

// Retrieve instance 'foo'
deployment.get_instance('foo', function(err, instance) {
  ...
}); 

// Retrieve a list of all instances
deployment.list_instances(function(err, instances) {
  ...
});

Factory Factories

I’ve spent the last week or so reworking our test framework in cast and updating everything to work with Node 0.4.0, but when I get the chance one thing I may try is factoring out all of the common code into a single module. This would include a single class that all such resources could inherit from, which would likely include an ‘exists’ method to test for existence and (yeah, go ahead and hate me, I’m going to say it anyway) “factory factories” to generate factory methods for retrieving and listing them.

While “factory factories” are commonly cited as a great reason to hate Java, in my opinion JavaScript’s ‘functions as first-class objects’ model makes quite pretty. In this case they could make generation of factory methods for directory-backed resource FooResource look something like this:

...
exports.get_fooresource = dir_resources.get(FooResource);
exprots.get_fooresource = dir_resources.get_list(FooResource);