'What does "Beta: Use Unicode UTF-8 for worldwide language support" actually do?
In some Windows 10 builds (insiders starting April 2018 and also "normal" 1903) there is a new option called "Beta: Use Unicode UTF-8 for worldwide language support".
You can see this option by going to Settings and then: All Settings -> Time & Language -> Language -> "Administrative Language Settings"
This is what it looks like:
When this checkbox is checked I observe some irregularities (below) and I would like to know what exactly this checkbox does and why the below happens.
Create a brand new Windows Forms application in your Visual Studio 2019. On the main form specify the Paint
even handler as follows:
private void Form1_Paint(object sender, PaintEventArgs e)
{
Font buttonFont = new Font("Webdings", 9.25f);
TextRenderer.DrawText(e.Graphics, "0r", buttonFont, new Point(), Color.Black);
}
Run the program, here is what you will see if the checkbox is NOT checked:
However, if you check the checkbox (and reboot as asked) this changes to:
You can look up Webdings font on Wikipedia. According to character table given, the codes for these two characters are "\U0001F5D5\U0001F5D9"
. If I use them instead of "0r"
it works with the checkbox checked but without the checkbox checked it now looks like this:
I would like to find a solution that always works that is regardless whether the box checked or unchecked.
Can this be done?
Solution 1:[1]
You can see it in ProcMon.
It seems to set the REG_SZ
values ACP
, MACCP
, and OEMCP
in HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodePage
to 65001
.
I'm not entirely sure but it might be related to the variable gAnsiCodePage
in KernelBase.dll
, which GetACP
reads. If you really want to, you might be able to change it dynamically for your program regardless of the system setting by dynamically disassembling GetACP
to find the instruction sequence that reads gAnsiCodePage
and obtaining a pointer to it, then updating the variable directly.
(Actually, I see references to an undocumented function named SetCPGlobal
that would've done the job, but I can't find that function on my system. Not sure if it still exists.)
Solution 2:[2]
Most Windows C APIs come in two different variants:
- "A" variant that uses 8-bit strings with whatever the systems configured encoding is. This varies depending on the configured country/language. (Microsoft calls the configured encoding the "ANSI Code Page", but it's not really anything to do with ANSI).
- "W" variant that uses 16-bit strings in a fixed almost-UTF-16 encoding. (The "almost" is because "unpaired surrogates" are allowed; if you don't know what those are then don't worry about them).
The official Microsoft advice is not to use the "A" versions, but to ensure your code always use uses the "W" variants. That way you're supposed to get consistent behaviour no matter what the user's country/language is configured as.
However, it looks like that checkbox is doing more than one thing. It's clear it's supposed to change the "ANSI Code Page" to 65001, which means UTF-8. It looks like it's also changing font rendering to be more Unicody.
I suggest you detect if GetACP() == 65001, then draw the Unicode version of your strings, otherwise draw the old "0r" version. I'm not sure how you do that from .NET...
Solution 3:[3]
Please look at this question to see what it solves when it is enabled: How to save to file non-ascii output of program in Powershell?
Also I found explanation written by Ghisler helpful (source):
If you check this option, Windows will use codepage 65001 (Unicode UTF-8) instead of the local codepage like 1252 (Western Latin1) for all plain text files. The advantage is that text files created in e.g. Russian locale can also be read in other locale like Western or Central Europe. The downside is that ANSI-Only programs (most older programs) will show garbage instead of accented characters.
I leave here two ways to enable it, I think they will be helpful for many users:
- Win+R ->
intl.cpl
Administrative
tab- Click the
Change system locale
button. - Enable
Beta: Use Unicode UTF-8 for worldwide language support
- Reboot
or alternatively via reg
file:
Windows Registry Editor Version 5.00
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodePage]
"ACP"="65001"
"OEMCP"="65001"
"MACCP"="65001"
Solution 4:[4]
On my windows, When I checked the Beta: Use Unicode UTF-8 for worldwide language support
.
The following regedit values in HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodePage
changed.
ACP: 936 -> 65001
MACCP: 10008 -> 65001
OEMCP : 936 -> 65001
If I do not checked, then the visual studio compilation failed with Exception: Bad UTF-8 encoding (U+FFFD; REPLACEMENT CHARACTER) found while decoding string: ...
, If I checked, then the compilation successed, but the os is full with unreadable code.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | user9876 |
Solution 3 | |
Solution 4 | Donghua Liu |