Examining the Scriptaculous Unit Testing Implementation

5: toQueryParams: an Example of Prototype Extension

In the previous page, I remarked on the curious string of symbols in the line of code that appears in both of Test.Unit.Runner's "query" functions: window.location.search.parseQuery(). Now I'm going to make sense of the code, which will lead me deep into the workings and even the philosophy of prototype.js.

I know the global object window has a property called location: it's usually the value in the browser's location bar. When I open string_html.html in my browser, the location should be something like file:///C:/here/be/dragons/scriptaculous-js-1.7.0/test/unit/string_test.html

Now, the next symbol is .search. Does the location have a search member? Yes, because window.location is not a simple string, but an object. The search member represents the portion of the URL after the '?'. In my case, there's nothing after the question mark. But it looks as though we could control the values returned by the two functions by specifying certain URL parameters.

window.location.search returns a string. So what is parseQuery -- a string method, or maybe something inherited from object? No: parseQuery is an addition to the String prototype that's made in prototype.js. In fact, it's an alias for an addition. Here's the code:

  Object.extend(String.prototype, {
    ...
    toQueryParams: function(separator) {
      var match = this.strip().match(/([^?#]*)(#.*)?$/);
      if (!match) return {};
  
      return match[1].split(separator || '&').inject({}, function(hash, pair) {
        if ((pair = pair.split('='))[0]) {
          var name = decodeURIComponent(pair[0]);
          var value = pair[1] ? decodeURIComponent(pair[1]) : undefined;
  
          if (hash[name] !== undefined) {
            if (hash[name].constructor != Array)
              hash[name] = [hash[name]];
            if (value) hash[name].push(value);
          }
          else hash[name] = value;
        }
        return hash;
      });
    },
    ...
  });
  ...
  String.prototype.parseQuery = String.prototype.toQueryParams;

The first two lines of the toQueryParams function check to see whether the string matches a pattern for URL query information. In line one, this is the string object to which the toQueryParams function now "belongs" through inclusion in the String class's prototype. Unless the string matches the pattern, this function will just return an empty hash/object.

Assuming we get past the regex test, we next come right upon something we were not expecting: a return statement. There are a dozen lines left in this function, but we're already returning a value? This is confusing to read. It's an example of the chaining of function calls and inline-function definitions that are becoming more common in contemporary JavaScript code. I've even started writing this way at times too, because it seems the most expressive and least awkward way to do some things. But I won't deny that it is hard to read. The density of those powerful dots, parens, and brackets is off-putting, and it's still jarring to see "function" near the right end of a long line instead of at the left end where it's always belonged before.

Here's what's going on. In the first lines, we applied a regular expression against the value of the String object. That call to match stored two strings in the match array variable. The rest of the toQueryParams function concerns itself with the second match string, match[1]. Specifically, it splits this string, either by the argument separator, if it has been supplied, or by an ampersand. That 'either or' functionality is effected by the || operator. As described before, if the first element in an or expression returns a value, that is the value of the whole expression. Otherwise, if the second element has a value, that is returned. The value of '&' after all is the string '&'.

The split function returns an array. Chained to the call to split is a call to the function inject. By chained, I mean that the dot operator between the split call and the inject call indicates that inject will be interpreted as a method of the object returned by split. What is inject? It is a new function for arrays and hashes, provided by prototype.js, and it's quite sophisticated. Here's the inject function's code, in a nutshell.

inject({}, function(hash, pair) {
  ...
})

You can find some documentation on inject, and the rest of the prototype.js API, at http://www.gotapi.com/prototypejs. I also recommend Scott Raymond's Prototype Quick Reference.

The first thing you should know about the inject function is that it is going to act over the entire array, element by element, on which it is called. in other words, if my_array has ten members, inject is going to act ten times. prototype.js includes several of these iterative functions for arrays and hashes.

How does inject behave? The first thing it does, before any iteration takes place, is set up an initial value. You pass it that value in the first argument you pass to inject. in this case, it's an empty hash. The second argument you pass is a function. This function is going to execute once for each member of the array. Here is the function passed in our example:

function(hash, pair) {
  if ((pair = pair.split('='))[0]) {
    var name = decodeURIComponent(pair[0]);
    var value = pair[1] ? decodeURIComponent(pair[1]) : undefined;

    if (hash[name] !== undefined) {
      if (hash[name].constructor != Array)
        hash[name] = [hash[name]];
      if (value) hash[name].push(value);
    }
    else hash[name] = value;
  }
  return hash;
}

This function is by convention called an iterator. inject expects the iterator function to have a certain signature: it has to accept two or three arguments. The first is for an object called an accumulator. Each time the iterator function is called, inject passes the accumulator as the first argument. The first time the iterator is called, the initial value is passed as the accumulator. It is up to you, the coder, to make sure that each iteration has the potential to alter the accumulator object as appropriate and that the accumulator is always returned at the end of the iteration. The inject function, at the end of each execution of the iterator, needs to receive the returned object to pass it the next execution of the the iterator. In this way, you have what feels a bit like a static variable across each iteration. Actually, though, what's going on is more like recursion with a collector function that maintains the value(s) you are building.

The second argument passed to the iterator function is called the value. It is going to receive, from inject, a representation of the element of the array on which the iterator is to act. Remember, in toParseQuery, the inject function is called on an array of strings. So the value of each element of the array is a string. The iterator assigns the string to the variable pair.

Now, what's going on inside the iterator? First, we attempt to split the pair string by an equal sign. There's some very C-like code in that test statement: (pair = pair.split('='))[0]. The pair variable symbol is now assigned to the array that results by splitting pair the string at an equal sign. And then the [0] notation returns the first element of the resulting new array. If that element resolves to a boolean value of true, then we continue. Otherwise, we don't do anything except return the hash variable, which is our accumulator, back to inject without any modification.

Assuming pair the string split successfully, pair the array is ready for use. We assign some well-named variables, name and value, with the name and value segments of the pair array. Well, we do some stuff to the values first--we attempt to decode them from URI encoding--but let's skip over that.

Here's the next section of code:

if (hash[name] !== undefined) {
  if (hash[name].constructor != Array)
    hash[name] = [hash[name]];
  if (value) hash[name].push(value);
}
else hash[name] = value;

So the first line checks to see whether there is a record in the hash array variable, the variable that is serving as our accumulator, with a key that matches the value of the string variable name. If there isn't such a record, hash[name] returns undefined. The undefined value is an odd beast. It isn't a boolean false, and it isn't the null value, although it can be treated as both in certain contexts. The first line handles it the way the ECMAScript spec tells you to handle it, by using the !== operator, not !=, to test whether a value is unequal to the undefined value. If you use !=, then the undefined variable is treated as the null value. (Yikes, I am going to need to review some of my code now.)

If there is a record for name in hash, the second line of code above checks to see whether the value hash[name] is an array. Note how it does this--by checking whether the constructor for hash[name] equals 'Array'. What is 'Array'? It is not a class type. It is a function, the constructor function used to make instances of Array. What would we have gotten using the typeof operator on an array? That would have returned 'object'. Checking the constructor is a better way to determine whether an object is of a specific pseudoclass.

What does line three do? Executing when hash[name] is not an array, it reassigns hash[name] to an array with one value, the previous value of hash[name]. So, if hash['bunny'] were equal to 'Harvey', hash['bunny'] is now equal to ['Harvey'].

Line four pushes the new value onto the array hash[name]. So if value were 'Peter' at this point, hash['bunny'] now equals ['Harvey', 'Peter'].

How about that push method for our array. If you have experience programming JavaScript, you may remember this: my_array[my_array.length] = "foo" That was always kind of awkward. There's something much more satisfying about my_array.push("foo").

Finally, if there were no matching record already in hash, we add such a record. Note that the value of the record is only a string, not an array. If there proves to be another value with this key to add later, then we'll convert this value into an array.

This brings us to the final line of the iterator function: return hash;

This returns the hash variable, our accumulator object, back to the inject function. This closes the circle, if you like, handing the accumulator object over to be used again in the next iteration. What if there is no further iteration? The inject function returns the accumulator object, and that in turn is returned, via that return statement that perplexed us earlier, as the value of toQueryParams().

Whew.

BTW, the third, optional, argument of the iterator function, if it were present, would receive the index of the member. In this case, in toQueryParams, the index didn't matter, so it was omitted.

Why was I looking so closely at the extended String method toQueryParams? So that I would know what the methods parseResultsURLQueryParameter and parseTestsQueryParameter of Test.Unit.Runner do:

So let's get back to those functions.