High-performance array transformations

This simple technique let’s you compose array `map`s and `filter`s in a nice and performant way.

Hajime Yamasaki Vukelic
codeburst

--

Photo by Toa Heftiba on Unsplash

NOTE: After publishing this article initially, some of the readers were able to spot obvious and blatant errors in the code both in the article and in the GitHub repo linked at the end. I’ve corrected bugs and bad assumptions in my original post.

I apologize to all the readers that clapped and applauded to this post, as well as the Codeburst editors, for providing completely wrong information. Also, big thanks to commenters spotting the errors and/or suggesting improvements.

The cover picture is fitting as we are going to talk about a healthy diet of array transformation methods, and how to best digest large arrays. My original intent was to show how this can be done without using any imperative code, but I have been proven wrong. In this article, you will see all the mistakes I’ve made when publishing the original article, the fixes and improvements and the lessons learned.

As you get into the more advanced aspects of working with JavaScript arrays, you inevitably meet the map(), filter(), and reduce() trio, the bread and butter of array transformations. Chaining method calls is one of the most idiomatic formats in JavaScript, and these three array methods probably drive this point home.

But you’ve got to be careful, because there are (not so) hidden costs.

The setup

First we will create a small counter thingy. This is what we’ll use to fill a really big array.

const counter = {
next: 0,
step() {
return ++this.next;
},
reset() {
this.next = 0;
}
};

Every time step() is called, the counter will return the next value, and then increment it.

Let’s start with a really big array. Say 100K items.

const bigArray = new Array(100000).fill(0);

We want to fill this array with numbers from the counter. We then want to halve each number in the array, and then filter only integer results.

Throughout this article, we’ll use three functions that look like this:

const getNum = () => counter.step();const halve = x => x / 2;const isInt = x => Math.floor(x) === x;

The naive way

The naive way to do this is like so:

bigArray
.map(getNum)
.map(halve)
.filter(isInt);

First of all, this ‘naive’ approach comes about because we like to think in the terms of the problem.

  • First we create a random number.
  • Second we halve each number.
  • Third we filter out those that are not round.

Unfortunately, if we literally translate this into our code, we have to iterate over the array exactly 3 times, which is very expensive on an array of this size.

The fix

To fix this, we can use reduce() to iterate the array just once and perform all three operations in one go.

bigArray.reduce(newArr => {
const x = getNum();
const hlv = halve(x);
if (isInt(hlv)) {
newArr.push(hlv);
}
return newArr;
}, []);

This solution is about 10x faster than the naive approach.

It’s starting to look a bit messy, though. It merges the three separate well-defined steps into an Italian dish we all love as long as it isn’t used to describe our code — and it’s not lasagna. It may not look so bad in this particular example, but imagine a more complex scenario with more transformations or more involved ones.

Can we keep the nice structure from the first example, but still do it in one go?

A better looking fix?

The first fix is actually technically correct. reduce() is the way to reduce the number of iterations. That’s not why it’s called ‘reduce’, but why not, you can think of it that way if it helps.

Although in this simple example, the imperative code inside the reduce callback may not be a big deal, imagine a much more complex scenario. We may want to clean things up a bit.

We could try to reuse the code from the naive example inside the reduce() callback.

bigArray.reduce((newArr, member) => {
return newArr.concat(
[member]
.map(getNum)
.map(halve)
.filter(isInt)
);
}, []);

If you remember how reduce() works, its callback function receives the accumulator object newArr, and the array member we are currently operating on, and it’s expected to return a new accumulator object.

Since member is not an array (well, it could be, but here it’s just a member of an array we are interested in), we can’t map() or filter() over it… unless we wrap it in an array. Since [member] is just an array our method calls work fine.

The real magic happens in concat() where an empty array (as returned from the filter() method) doesn’t do anything, so it effectively ‘filters out’. We don’t really need to test whether the return value from filter() is empty or not.

The result is still not as nice as could be, though so let’s clean it up a bit:

const transform = (arr, transformations) => 
arr.reduce(
(newArr, member) => newArr.concat(transformations([member])),
[]
);
transform(bigArray, arr =>
arr
.map(getNum)
.map(halve)
.filter(isInt)
);

Now it’s as clean as it was originally, with three well-defined steps. But is it faster?

Turns out it’s actually quite a bit slower. Some 200x slower in fact.

Even though this solution performs a single iteration, it does a few things that are very expensive. The main offender though is that it creates a new array every iteration (because .concat() creates a new array).

The reason for the slow down is the garbage it creates. Once the disposable arrays become very large, the garbage collector has to work a lot harder. The arrays that we no longer need have to be garbage-collected at some point, which can add a lot of time.

This can be confirmed by using a profiler.

Profile of the transformation using .concat()

Improving transform

To improve the transform() function, we can get rid of concat(). Using push() instead, we avoid disposing of the existing array, preventing large amounts of garbage from accumulating.

const transform = (arr, transformer) =>
arr.reduce((newArr, member) => {
const t = transformer([member]);
if (t.length) {
newArr.push(t[0]);
}
return newArr;
}, []);
transform(bigArray, arr =>
arr
.map(getNum)
.map(halve)
.filter(isInt)
);

This time the results look good. It’s about 2x faster than the naive version, and there is very little garbage along the way. Still, this is 5x slower than the fastest fix thus far. Which makes you question the point of insisting on nice-looking code.

As it turns out, the small disposable [member] array does not pose a major ecological problem.

What about generators?

Generators can definitely be used for single-iteration processing over a huge array. You would need to write your own generator-friendly map(), and filter() (which is very easy).

Just for kicks, let’s take a look at what generators look like and how they compare to the previously shown code:

const map = function* (arr, fn) {
for (let member of arr) {
yield fn(member);
}
};
const filter = function* (arr, fn) {
for (let member of arr) {
if (fn(member)) {
yield arr;
}
}
};
const nums = map(bigArray, getNum);
const hlv = map(nums, halve);
[...filter(hlv, isInt)];

If you’re new to generators, here’s how they work. Both map() and filter() functions will return an iterator object. These don’t do anything until you start requesting values from them. When you request a value (e.g., using for..of or destructuring or Array.from()), they only work up until the yield statement, and then they go back to sleep. When the next value is requested, they wake up, do the work until the next yield. Rinse and repeat.

In theory, if you compose the generators like we did in the example, it should result in a single iteration. The final rest spread destructuring drives the filter iterator. This causes filter to request the value from the hlv iterator, which in turn causes a request for the next value from the nums iterator. In other words, none of the three iterators have to wait for the entire array to be iterated — they work through it one member at a time.

Note that using for..of or destructuring is faster than using Array.from(), but there isn’t much of a difference between for..of and destructuring. Destructuring just looks nicer. Also note that this only applies to native implementations. Simulating these things using a polyfill on a non-supported browser maybe be a lot slower.

The generator-based implementation gives us 30% improvement over the naive approach, but still slower than the previous solutions.

Haven’t your heard of transducers?

Ramda offers a function called transduce() which does something very similar in effect to what we’ve done here. There’s a great article by Jeremy Daly that explains how they work. If you love FP, you should definitely check it out.

As suggested by the transducers article, the code would look like this:

const transform = R.transduce(
R.compose(
R.map(getNum)),
R.map(halve),
R.filter(isInt)
)
);
transform(R.flip(R.append), [], bigArray)

This solution is some 280x slower than the naive version. Completely unusable. A thing to note here is the usage of R.flip(R.append), which is the Ramda’s equivalent of concat().

Based on what we already know about concat() and garbage, we assume that the usage of R.append() is the likely problem area. We can use push() here to try and fix it.

transform((xs, x) => (xs.push(x), xs), [], bigArray);

And sure enough, the push() fix is about 3x faster than the naive approach, and even faster than the solution using reduce() with push().

Or… just use a frigging for loop?

The 2023 update.

After all that’s said and done, ultimately the for loop (and the new for of) are still the fastest ways to work with sequences of data. To be honest, after all these years messing with alternative solutions, I’m starting to realize how silly it is to talk loudly about performance, and then not using the fastest methods available. I feel really stupid.

“But but readability!” I hear ya. I used to say the same thing. Thing is, once you’ve used the for loop for a while (see what I did there?) it grows on you, and you can read it blind-folded. It just works, works so well, and it’s been around so long that the browser vendors have optimized the heck out of it, to the point where they probably couldn’t optimize it more.

See, the ability to compromise is also the ability to correctly gauge your priorities. Are you for performance or for some arbitrary aesthetic criteria of yours? What’s more important? If you care about speed, just use them and be happy.

Code is on GitHub

The code shown in the article can be found on GitHub. There’s also a fiddle which you can use to check the behavior in the browsers.

--

--

Helping build an inclusive and accessible web. Web developer and writer. Sometimes annoying, but mostly just looking to share knowledge.