Skip to content

Simplify string function classification and move locale discussion to a separate paragraph #5433

@masakielastic

Description

@masakielastic

Problem

Page:
https://www.php.net/manual/en/language.types.string.php

Section: "Details of the String Type"

The current section classifies string functions into several categories, including byte-oriented, encoding-aware, locale-dependent, and UTF-8-assuming functions.

However, this mixes different concerns:

  • how strings are interpreted (bytes, encodings, Unicode)
  • how behavior may vary (locale)

This makes it harder to understand how to correctly handle UTF-8 strings.

In addition, the documentation does not clearly state that the mbstring extension supports UTF-8, and the distinction between mbstring and intl is unclear.

Proposal

Simplify the classification to focus on how strings are interpreted, and move the locale discussion into a separate paragraph.

For example:

String functions in PHP can be broadly categorized based on how they interpret string data:

  1. Byte-oriented functions operate on strings as raw sequences of bytes.
  2. Encoding-aware functions interpret strings according to a specified encoding, such as UTF-8. The mbstring extension provides such functions and supports UTF-8 and other multibyte encodings.
  3. Unicode-aware functions assume UTF-8 and provide higher-level operations. These are primarily provided by the intl extension.

Then describe locale separately:

Some operations may be locale-dependent.
Historically, certain functions relied on the system locale (setlocale), but this behavior is being reduced.
For locale-aware operations based on Unicode, the intl extension provides explicit support using ICU.

Reference: https://wiki.php.net/rfc/strtolower-ascii

Rationale

  • separates string interpretation from locale concerns
  • makes UTF-8 support in mbstring explicit
  • clarifies the role of mbstring vs intl
  • improves readability and reduces confusion for users working with UTF-8

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions