Optimise new encoding functions #1306

tobil4sk · 2026-02-09T01:39:10Z

Hopefully this counteracts the performance regression seen after they started being used for haxe.io.Bytes, see: #1294 (comment)

This is lighter than #1302 since it doesn't introduce an external library. It also addresses the problem from a different angle, by avoiding inefficiencies in the surrounding code, rather than making the encoding/decoding itself as fast as possible. Even if we decide to use simdutf I think it would be worth making some of these changes anyway, as some of them would also benefit the simdutf implementation.

These are the changes:

Merge encodings into a single file - having the encodings in separate files made it impossible for the compiler to inline simple function calls without link time optimisation (which is off by default).
Avoid duplicate string iteration for ascii checks - we can perform one iteration to count length and check for ascii.
Add utf8 encode function that allocates output - current haxe has to iterate to get the utf8 length, then we also iterate to ensure the utf8 length haxe gave is correct, which is wasteful.
Pass char32_t by value instead of by reference - not sure if this makes a big difference but it is better practice anyway.
Estimate utf8 length instead of iterating - this removes the need for iterating to find the length. Instead, we overallocate and truncate when done.

With this I see an improvement locally in the dox benchmark compared to master (1.3% vs 5.5% time spent in Bytes.ofString).

Expand for flamegraphs

Download as svg and use search bar to search for "Bytes_obj::ofString".

Unpatched:

Patched:

The decode methods use methods from other encodings. For example, Utf8::decode calls Utf16::getCharCount and Utf16::encode in a loop. Placing them in the same file makes it easier for them to be inlined which improves performance.

The existing Utf8::encode function takes in a buffer, but we don't know what size is required so we have to iterate through the string before writing to make sure the buffer is big enough. If the caller already ran getByteCount, then this means we have duplicated their work just for the case where they did not do it properly. This new method allocates its own buffer that is definitely the right size, which avoids the need for unnecessary checks

Aidan63 · 2026-02-09T19:54:31Z

Just did a quick bit of benchmarking with a "lorem ipsum" like file full of chinese characters. With these changes Bytes.toString is now faster than Haxe 4.3.7.

10k iterations:
latest: 0.09s
these: 0.07s

Bytes.ofString is still a fair bit slower than 4.3.7 though.

10k iterations:
latest: 0.05s
these: 0.12s

This was my benchmark code for reference.

class Main {
    static function main() {
        // final bytes = File.getBytes("C:\\Users\\AidanLee\\Desktop\\test\\lorem.txt");
        final string = File.getContent("C:\\Users\\AidanLee\\Desktop\\test\\lorem.txt");

        Timer.measure(() -> {
            for (_ in 0...10000) {
                final _ = Bytes.ofString(string);
                // final _ = bytes.toString();
            }
        });
    }
}

lorem.txt

Aidan63 · 2026-02-09T19:56:22Z

On length estimation, the one downside of this would be strings which are actually just under the "large object" size threshold of 2k bytes but are estimated to be higher. In this case those strings would go through the large object allocator rather than the thread local one.

tobil4sk · 2026-02-09T21:24:05Z

On length estimation, the one downside of this would be strings which are actually just under the "large object" size threshold of 2k bytes but are estimated to be higher. In this case those strings would go through the large object allocator rather than the thread local one.

Yeah, this is where the tradeoff lies. __hxcpp_bytes_of_string loops and pushes to an array, so it does constant reallocations. This patch avoids reallocations at the expense of overallocating.

Bytes.ofString is still a fair bit slower than 4.3.7 though.

For some reason, it's the other way around for me:

ofString

	10,000	1,000,000
patched haxe/hxcpp	0.06568s	5.673s
haxe 4.3.7	0.07428s	7.339s

toString

	10,000	1,000,000
patched haxe/hxcpp	0.0674s	6.107s
haxe 4.3.7	0.05319s	4.535s

tobil4sk added 6 commits February 8, 2026 23:53

Merge encodings into a single file

e121aaa

The decode methods use methods from other encodings. For example, Utf8::decode calls Utf16::getCharCount and Utf16::encode in a loop. Placing them in the same file makes it easier for them to be inlined which improves performance.

Avoid duplicate string iteration for ascii checks

93cfd0e

Pass char32_t by value instead of by reference

c696b23

Estimate utf8 length instead of iterating

ecc950f

Make Utf16::codepoint locally inline

df2734c

tobil4sk mentioned this pull request Feb 9, 2026

[cpp] Optimise Bytes.ofString utf8 HaxeFoundation/haxe#12585

Open

skial mentioned this pull request Feb 10, 2026

Haxe Roundup 732 skial/haxe.io#1207

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimise new encoding functions #1306

Optimise new encoding functions #1306

Uh oh!

tobil4sk commented Feb 9, 2026

Uh oh!

Aidan63 commented Feb 9, 2026

Uh oh!

Aidan63 commented Feb 9, 2026

Uh oh!

tobil4sk commented Feb 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Optimise new encoding functions #1306

Are you sure you want to change the base?

Optimise new encoding functions #1306

Uh oh!

Conversation

tobil4sk commented Feb 9, 2026

Uh oh!

Aidan63 commented Feb 9, 2026

Uh oh!

Aidan63 commented Feb 9, 2026

Uh oh!

tobil4sk commented Feb 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants