JavaScript Objects and Arrays are Broken

Evan Purcer Sometime in '21

JavaScript and TypeScript share a secret — their native Object and Array data structures, used the world over to represent key/value data, don’t work exactly as you’d expect. To understand why, and what sorts of subtle, maddening bugs you risk inflicting on your future self when using them dangerously, behold this simple example (even better — try running it yourself in the devtools of this browser tab):

const foo = { bar: "baz" };
OUTPUT >>> undefined

if (foo["buzz"]) {
  console.log("This shoudn't print; `foo` has no value under `buzz`.");
}
// no output, so far so good...

const userProvidedKey = "constructor";
OUTPUT >>> undefined

if (foo[userProvidedKey]) {
  console.log("There's more to `foo` than meets the eye...");
}
OUTPUT >>> "There's more to `foo` than meets the eye..."
// output? why does `foo` have something other than `baz`?

if (foo["toString"]) {
  console.log("Objects, prototypes, constructors, oh my!");
}
OUTPUT >>> "Objects, prototypes, constructors, oh my!"
// oh no, it seems to happen for some indeterminate set of keys...

Why did our if statements think that foo contained these extra keys, when it was defined as simply { bar: "baz" }? The answer lies in JavaScript’s "prototypical" inheritance system (not to say that JavaScript inheritance is in any way normal).

You may have heard: everything in JavaScript is an object. This sounds beautifully simple! It means everything you’re working with shares fundamental similarities — for instance, everything has a constructor method that’s called when it’s initialized; everything has a toString method that allows it to be easily converted to some string representation; and notably, everything has a prototype, where all methods such as these are stored. Really these prototypes are chains, shaping hierarchies that give form to "inheritance", more or less. This is where JavaScript objects get their toString method along with everything else — it’s defined near the very top of their prototype chain, in a grab-bag of methods shared by just about everything under the sun.

What’s important here is that everything has a "toString" method. That method is accessible in two different ways:

foo.toString
OUTPUT >>> ƒ toString() { [native code] }

foo["toString"]
OUTPUT >>> ƒ toString() { [native code] }

foo.toString()
OUTPUT >>> "[object Object]"

foo["toString"]()
OUTPUT >>> "[object Object]"

The former seems very usual; the latter, not so much. JavaScript was apocryphally written in 10 days, and on such a short timescale, axioms such as: - everything is an object and - every attribute of an object may be accessible via dot-notation or square-bracket-notation start to sound very appealing — there’s an elegant or extreme form of simplicity there, depending on your predelictions. Either way, the design came with a bitter side-effect, with regard to sanely using JavaScript objects as key/value maps.

const fooArr = [1, 2, 3];
const userProvidedKey = "constructor";
const hasIndex = fooArr[userProvidedKey];
console.log(hasIndex);
OUTPUT >>> true

// yes, arrays have the same quirkiness

When might this become a problem? I’d never use toString or constructor as an object key when writing a real program, you might say to yourself.

It’s true, you might not. But your users have other ideas. In short, if you ever use user-provided data as keys on JavaScript (or TypeScript) objects or arrays, you’re subjecting yourself to risk of future bugs of this sort. This isn’t that common of a practice, but it’s often a hard one to identify in the midst of a complex application — it’s not always clear where the data that led to a given variable has come from. It’s an issue that takes developer discipline to avoid, and as teams get larger and developers get busy, discipline becomes lofty ambition more than general practice.

So what can we do? There are a few interesting options, each with trade offs and quirks. Let’s briefly compare them.

The Map Approach

One school of thought tells us that when trying to safely store and access key/value data, we should just use the tool designed for the job. This would be JavaScript’s built-in Map object, available in all major browsers since ES6 (aka ES2015).

const fooMap = new Map(Object.entries({ bar: "baz" }));
fooMap.get("bar");
OUTPUT >>> "baz"

fooMap.get("toString");
OUTPUT >>> undefined    // hooray! actually the behavior we want

This works perfectly — code that safely accesses bar could never unexpectedly access toString by mistake, thanks to Map’s get-based key access.

There are some glaring downsides though. For one, it’s awkward as a giraffe in a car wash. The Object.entries boilerplate is unpleasant, and the Map interface generally doesn’t lend itself well to declarative or immutable code patterns. How would one merge two Maps, for instance?

const fooMap = new Map(Object.entries({ bar: "baz" }));
const barMap = new Map(Object.entries({ bizz: "buzz" }));
const mondoMap = new Map(fooMap.entries().concat(barMap.entries());

Ouch. Not exactly elegant, especially sitting next to ES6’s object spread syntax:

const foo = { bar: "baz" };
const bar = { bizz: "buzz" };
const mondo = { ...foo, ...bar };

"Plain Old JavaScript Objects" (POJOs, if you like weird acronyms) have especially come into favor lately as "reactive" UI frameworks (Vue, React, et. al.) have become the ligua franca of modern web development, completely eclipsing their MVC bretheren of yore. These frameworks rely on efficient tracking of state changes to determine exactly what parts of UI need to be re-rendered, exactly when.

Unfortunately, Map does not lend itself well to this. The fundamental issue is that changes to Maps happen mutably — you must "mutate" the data structure in order to change it, meaning you sneakily change a value deep within it. The alternative would be to construct a completely new version of your data structure, with the desired key/value set, and swap the old map out for the new one. This sounds roundabout, but bear with for a moment and consider this example:

const myMap = new Map(Object.entries({ a: "b", c: "d" }));
// ... some code later ...
myMap.set("c", "z"); // "mutate" myMap
console.log(myMap.get("c"));
OUTPUT >>> "z"

let myObj = { a: "b", c: "d" };
// ... some code later ...
myObj = { ...myObj, c: "z" }; // update myObj "immutably"
console.log(myObj.get("c"));
OUTPUT >>> "z"

This may not persuade on its own, but at least the code styles are roughly similar. You might think that the second example, with its superflouous copying of data, would be horribly inefficient compared to the first; in reality, running on modern JavaScript engines like V8, the performance differences will be negligible until data becomes massive. There are many tools built by incredibly clever people that make "immutable" data mangement efficient even in these cases — I’ll leave that exploration to other more-brilliant minds that have produced plenty of quality material on the subject.

When it comes to "reactive" web UI development, the bottleneck is not object copying; it’s DOM manipulation. What becomes important is identifying exactly what has changed, not reducing the number of operations required to make those changes. V8 eats objects and Maps for breakfast.

The problem then is that Maps provide no way of efficiently indicating that they have changed, in order to trigger a fine-grained re-render of relevant DOM. This is especially true when dealing with complex state object built of many layers. Consider this sort of check, for whether a peice of state has changed:

if (newState.pethaus.dogs !== app.state.pethaus.dogs) {
  app.render(newState);
}

The !== comparison operator compares object references in JavaScript and TypeScript, meaning that it will only result in false when used to compare two completely different and separate object instances. It’s an extremely fast check, operating with O(1) complexity — constant time as even as compared objects grow in size. In our "immutable" object update example above, this sort of efficient reference check works great. This is exactly why so many libraries in the vein of Redux have cropped up in recent years — libraries that actively discourage state mutation and rely on a set of immutable operations (reducers in Redux nomenclature). Immutable state management lends itself extremely well to the problem of keeping UIs up-to-date alongside complex state — and standard JavaScript objects paired with ES9’s spread syntax lend _them_selves extremely well to the task of manipulating state in immutable fashion.

In contrast, Maps do not work well for this task. If app.state.pethaus.dogs were a Map, how would we tell if it had changed? Since Map.set mutates the Map’s internals in a way that is invisible to the outside, newState.pethaus.dogs !== app.state.pethaus.dogs would likely compare the same Map instance to itself, uselessly. Or worse, we may have a copy of our original map that may or may not have changed, and the only way to tell would be an expensive enumeration + comparison of internal key/value pairs. If you’re dealing with nested state comprised of many layers of data, these would need to recursively crawled and compared as well — you can only be certain if two maps are actually different once every piece of data within them has been looked at (worst-case; you can of course exit early once you find differing datums, but on average you will have traversed half the map at this point, which might as well be all of it when talking about efficiency in relation to size of state data).

Overall, this is an extremely common set of problems in modern UI development of all kinds, since even outside of the web arena most frameworks seem to be evolving toward the "declarative" approach popularized by React (the value of which has become extremely clear in recent years; that’s a blog post for the masochistic historians of JS framework yore). Due to its reliance on immutable state management, declarative apps will never be able to effectively embrace such a mutation-centric data structure as Map. For this reason alone, the idea of consistently using Maps for all dynamic-key data storage seems like a non-starter in most modern JavaScript and TypeScript contexts.

The Object.create(null) Approach

It’s been suggested from the highest echelons of StackOverflow that Object.create(null) might be a robust solution to our problem. Let’s give it a try:

const foo = Object.create(null);
foo.bar = "baz";
const userProvidedKey = "constructor";
if (foo[userProvidedKey]) {
  console.log("There's more to `foo` than meets the eye...");
}
OUTPUT >>> undefined

Lovely. Object.create(null) effectively creates a "prototypeless" object, meaning that none of the methods (toString, constructor) that come along with the standard JavaScript object prototype are present on our new object. Even prototype itself is missing, solving all three of our original example cases in which our object incorrectly reported a key/value pair being present. Blind trust in strange programmers online should tell us that any other problematic cases should be resolved as well.

Unfortunately, this to me seems a worse approach than Map, will all the same downsides and even less clarity. You may have noticed that foo.bar = "baz"; is the simplest form of mutation you can possibly make; this in itself disqualifies Object.create(null) for use as state data in modern UI applications. Worse, errant developers may forget to use Object.create(null) in dark corners of your codebase and there will be absolutely no way to tell within other code contexts where objects are accessed; in contrast to Map where thing1.get(dynamicKey); thing2["constKey"]; clearly demarcates thing1 as a Map and thing2 as a simple object. You will have no such luxury with Object.create(null) — even TypeScript won’t be able to help as it cannot distinguish between "prototypeless" objects and normal (broken) ones.

This is all a recipe for codebases that develop extremely subtle bugs over time, as developers are asked to adhere to code conventions whose intentions become murkier with each passing commit; with each addition of a new team member and departure of a more experienced one. As a general rule, safety should never be enforced through convention — there must be a better way.

The Safekey Approach

What we need is static analysis. We need a mechanical rather than social method of ensuring that no unsafe object access code makes it into our codebase. As it turns out, this is quite possible using a custom rule in a JavaScript (and TypeScript) code-quality analyzer known as eslint. There is a tradeoff — some brevity of code is lost when accessing objects with dynamic keys. But apart from this minor sacrifice of convenience, we can have our cake and eat it too: we get to use standard JavaScript objects with all the ES6 spread syntax we want along with them, and we get to know concretely that there will never be instances of unsafe access covertly appearing in our codebase.

Instead, in dynamic-key-access cases we’ll use a "known-safe" form of getter that incorporates JavaScript’s hasOwnProperty to ensure no false-positive keys are retrieved. The end result would look something like this, in practice:

const foo = { bar: "baz" };
OUTPUT >>> undefined

foo["bar"]; // this is OK per our eslint rule; key is constant
OUTPUT >>> "baz"

const userProvidedKey = "constructor";
foo[userProvidedKey]; // this is NOT OK per our eslint rule; key is "dynamic"
OUTPUT >>> ƒ Object() { [native code] }

// so instead, we'll use this equivalent-but-safe form:

import { get } from "safekey";
get(foo, "bar");
OUTPUT >>> "baz"
get(foo, userProvidedKey);
OUTPUT >>> undefined  // hooray, this is all we wanted in the first place

The folks at e² have codified this approach into a tiny library called safekey; if you’re interested in these issues with JavaScript and exploring mitigation techniques, give it a look. I’ll leave the details to the source — nothing speaks more clearly than a good README over a cup of ☕