Skip to content

Signed char sign-extension bug in iso_8859_1.hpp #49

@captainurist

Description

@captainurist

If char is signed, the ISO-8859-1 decoder produces incorrect output for bytes 0x80-0xFF due to sign-extension when casting to unsigned int and when writing to the output code point.

Repro:

#include <string>
#include <span>
#include <iostream>
#include <ztd/text.hpp>

int main() {
    // ISO-8859-1: 0xE4 = 'ä', 0xF6 = 'ö', 0xFC = 'ü'
    std::string iso = "\xe4\xf6\xfc";
    std::span<const char> input(iso.data(), iso.size());

    auto result = ztd::text::transcode(input, ztd::text::iso_8859_1, ztd::text::compat_utf8,
                                       ztd::text::replacement_handler);

    // Expected: "äöü" (UTF-8: \xc3\xa4\xc3\xb6\xc3\xbc)
    // Actual: "���" (three replacement characters, UTF-8: \xef\xbf\xbd\xef\xbf\xbd\xef\xbf\xbd)
    std::cout << "Result: " << result << std::endl;
    return 0;
}

My fix here 03422de.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions