Working on p-j-s we found some interesting things related to the Function
constructor and function inlining in v8.
Context
While comparing the performance of the parallel and non-parallel implementations of map
(docs here) we found that our parallel implementation performed a lot worse than what we expected if we took into account the size of the TypedArray
and the mapper
function.
The scenario
An image tranformation to apply a sepia tone effect to pictures was used for the comparison (based on this post). The core linear implementation was:
The serial implementation works with a
Uint8ClampedArray
but accesses three elements at the same time. The parallel implementation uses aUint32Array
and works on one pixel (A,R,G,B) at a time.
Which was implemented as shown in the following snippet for the parallel case:
The problem
Taking the experience from previous benchmarks into account we considered that the performance of the two implementations with a Uint32Array
of about 8*10^7 (eight million) elements should be similar, even favoring the parallel one. Contrary to our thoughts, the initial measurements showed that the linear implementation fared approximately four times better than the parallel one.
The hypothesis
As the two code samples were similar, our hypothesis was that for some reason some v8 optimizations were not taking place. It was a definite possibility considering that the parallel implementation depends on the Function
constructor to create Function
objects in the Web Workers. Specifically we believed that the internal functions noise
, clamp
and colorDistance
were not being inlined. Inlining could be the cause of both benefits and drawbacks but the belief was that performance was being affected because of lack of it.
Using IR Hydra it was easy to verify the hypothesis, as the following figures show.
The blue chevron indicates that a function has been inlined.
Parallel implementation generated code
The figures in this section are related to the serial implementation. As it can be seen from them, all functions are being inlined.
Parallel implementation generated code
The figures in this section are related to the parallel implementation. As it can be seen from them, only Math.random
is being inlined.
The solution
The way to solve the problem was to manually inline the functions. The resulting parallel implementation was:
We created a benchmark to understand the difference between the initial version (the one in which functions were not inlined) and the final one (with inlining in charge of the developer).
The results for it are displayed in the following figure: