Skip to content

Make everything 4% faster by skipping empty tasks [NFC]#8571

Merged
kripken merged 8 commits intoWebAssembly:mainfrom
kripken:fasten
Apr 3, 2026
Merged

Make everything 4% faster by skipping empty tasks [NFC]#8571
kripken merged 8 commits intoWebAssembly:mainfrom
kripken:fasten

Conversation

@kripken
Copy link
Copy Markdown
Member

@kripken kripken commented Apr 2, 2026

When a visitor is the original

void visitFoo(Foo* curr) {}`

(that is, empty), and the doVisit is also unchanged,

static void doVisitFoo(Self* self, Foo* curr) { self->visitFoo(curr); }

(that is, it just calls the visitor), then we do not need to queue such
tasks for execution at all.

I had wanted to do this a while ago but just now figured out how to
make the double check work.

Measurements on -O3 on a large Kotlin testcase:

Before:

   166,045,403,096      cpu_core/instructions/u          #    1.42  insn per cycle              ( +-  0.01% )
    36,389,460,009      cpu_core/branches/u              #  512.421 M/sec                       ( +-  0.01% )
           26.0642 +- 0.0926 seconds time elapsed  ( +-  0.36% )

After:

   158,567,768,086      cpu_core/instructions/u          #    1.44  insn per cycle              ( +-  0.00% )
    34,644,951,424      cpu_core/branches/u              #  521.618 M/sec                       ( +-  0.01% )
            24.744 +- 0.163 seconds time elapsed  ( +-  0.66% )

Instructions and branches decrease by 5% with almost no noise. Wall time also
shrinks by about 5%, but the noise is around 1% there.

@kripken kripken requested a review from a team as a code owner April 2, 2026 23:49
@kripken kripken requested review from tlively and removed request for a team April 2, 2026 23:49
Copy link
Copy Markdown
Member

@tlively tlively left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@kripken
Copy link
Copy Markdown
Member Author

kripken commented Apr 3, 2026

Unfortunately gcc-11 errors on those function pointers not being constexpr. I tried to find a refactoring workaround among the users of the code, but couldn't. Instead I pushed a hack to avoid constexpr in gcc 11 and earlier.

Numbers from another machine, where I could test multiple compilers:

Before:

gcc-11
            9.3755 +- 0.0445 seconds time elapsed  ( +-  0.47% )
gcc-15
            8.7343 +- 0.0397 seconds time elapsed  ( +-  0.45% )
clang-19
            8.8103 +- 0.0211 seconds time elapsed  ( +-  0.24% )

After, with the opt:

gcc-11 (no constexpr)
            9.1343 +- 0.0263 seconds time elapsed  ( +-  0.29% )
gcc-15 (constexpr)
            8.5140 +- 0.0369 seconds time elapsed  ( +-  0.43% )
clang-19 (constexpr)
            8.5180 +- 0.0137 seconds time elapsed  ( +-  0.16% )

So with and without constexpr, there is a 2.5% speedup with this patch on gcc, and 3.5% with clang.

@kripken kripken changed the title Make everything 5% faster by skipping empty tasks [NFC] Make everything 4% faster by skipping empty tasks [NFC] Apr 3, 2026
@kripken
Copy link
Copy Markdown
Member Author

kripken commented Apr 3, 2026

Testing some C++, Dart, and Java, the average speedup is around 4%.

@kripken kripken merged commit 6780d4b into WebAssembly:main Apr 3, 2026
16 checks passed
@kripken kripken deleted the fasten branch April 3, 2026 19:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants