Skip to content

support non UTF-8 mingw filesystem#1758

Open
magic-cucumber wants to merge 6 commits into
square:masterfrom
magic-cucumber:feature-non-utf8-support
Open

support non UTF-8 mingw filesystem#1758
magic-cucumber wants to merge 6 commits into
square:masterfrom
magic-cucumber:feature-non-utf8-support

Conversation

@magic-cucumber

@magic-cucumber magic-cucumber commented Dec 25, 2025

Copy link
Copy Markdown

@JakeWharton close #1757.

  • Refactored FileSystem.list(): Extracted the core logic into a variantList function.
    • Moved the original implementation to unixMain.
    • Replaced the implementation in mingwX64Main with wide-character Win32 APIs.
  • Enhanced Error Reporting: Replaced FormatMessageA with FormatMessageW in lastErrorString to correctly handle wide-character error messages.
  • Updated WindowsPosixVariant: Replaced all narrow-character function calls with their corresponding wide-character versions to ensure full compatibility with the Windows filesystem.

@magic-cucumber

Copy link
Copy Markdown
Author

To test this PR, we need an Windows system with a non-UTF8 global encoding (such as GBK) and a test directory containing files with names that exceed the ANSI encoding range

@swankjesse

Copy link
Copy Markdown
Collaborator

This is neat!

@swankjesse swankjesse self-requested a review December 26, 2025 23:42
@magic-cucumber

magic-cucumber commented Jan 6, 2026

Copy link
Copy Markdown
Author

I forget called spotlessApply..., but now it's all ok :D @swankjesse

@magic-cucumber

Copy link
Copy Markdown
Author

@oldergod
it seems wrong when download yarn, please restart this workflow again.
PixPin_2026-03-10_19-49-58

@gfreitash

Copy link
Copy Markdown

I applied this PR's patch successfully, so I wanted to share real-world data since the PR appears stale.

We have a Kotlin/Native CLI at work for managing data pipelines and Power BI integrations. Power BI workflows are Windows-native, so we can't avoid the platform - we would also like to avoid having to fallback to the JVM.
One command swaps database connections in PBIP project files — and it hit the encoding issue from narrow-char Windows APIs in Okio's FileSystem.

Before and after from the same machine - Windows 11, portuguese/windows-1252:

BEFORE — stock Okio (narrow-char APIs garble UTF-8 paths)

$ lago-cli local dag set-pbip-dw --host db-host.internal.example.com --name my-database
Project root: C:\Users\my.user\Documents\projects\my-orchestrator
Saved current DW connection to ...\.pbi-dw-config.json
.pbi-dw-config.json already in .gitignore
Warning: Failed to patch PBIP 'BI - MY-APP':
  ConnectionSubstitutionFailed(
    pbipName=,
    cause=Failed to read file
      ...\BI - MY-APP.SemanticModel\definition\tables\Agrega��es.tmdl:
      No such file or directory
  )
Replaced 0 file(s) across 1 PBIP(s) with host='db-host.internal.example.com', database='my-database'

AFTER — patched Okio (wide-char APIs handle Unicode)

$ lago-cli local dag set-pbip-dw --host db-host.internal.example.com --name my-database
Project root: C:\Users\my.user\Documents\projects\my-orchestrator
Saved current DW connection to ...\.pbi-dw-config.json
.pbi-dw-config.json already in .gitignore
Replaced 19 file(s) across 1 PBIP(s) with host='db-host.internal.example.com', database='my-database'

@swankjesse

Copy link
Copy Markdown
Collaborator

Weird lint error:

> There were 1 lint error(s), they must be fixed or suppressed.
  src/unixMain/kotlin/okio/UnixPosixVariant.kt:LINE_UNDEFINED ktlint(java.lang.NoClassDefFoundError) Could not initialize class org.jetbrains.kotlin.com.intellij.openapi.util.objectTree.ThrowableInterner (...)

Comment thread okio/src/mingwX64Main/kotlin/okio/WindowsPosixVariant.kt Outdated
@swankjesse

Copy link
Copy Markdown
Collaborator

Any idea how we’d create such a file system in GitHub actions? Hmmm

@magic-cucumber

magic-cucumber commented Jun 18, 2026

Copy link
Copy Markdown
Author

@swankjesse Windows internally uses UTF-16.

On Windows, the system code page does not matter as long as paths are handled in Unicode and use wide-character APIs.

In practice, we only need to verify that all Okio filesystem operations consistently rely on wide-character APIs throughout the call chain.

This is largely a historical compatibility issue. GitHub Actions does not provide container environments configured in such an unusual way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

metadata() fails on Windows with non-UTF-8 system locales (e.g., GBK) in MinGWX64

3 participants