'How to make browsers load Japanese fonts for CJK text, instead of Chinese fonts

I have an XHTML1.1 document with a mix of English and Japanese text, with charset indicators lang="jp" and xml:lang="jp" in the opening tag for the <html> element. The actual content is encoded in UTF-8, and this is stated in the content-type as well:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
    "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="jp" lang="jp">
<head>
    <title>Test page</title>
    <meta http-equiv="Content-Type"
      content="application/xhtml+xml; charset=utf-8"/>
</head>
<body><div>今</div><div>込</div></body></html> 

The XML/HTML specs say that the "lang" attribute is inherited, so the content should end up being rendered with a font that supports Japanese, but instead I'm seeing it use fonts that are intended for Chinese. (Japanese "kanji" are actually subtly different in many cases from the equivalent Chinese "Hanzi", and wildly different for a few common characters.)

For instance, in the above code the top part of the first character should be ˄ with a - under it. If a Chinese font is used instead, this character will invariably instead look like a ˄ with ` underneath. Also, the second character should have a shape that looks like 7\, but when a Chinese font is used it will more often look like a lambda, λ. Neither of these are correct print/screen forms in Japanese.

The question: is there a way to force browsers to pick Japanese fonts for CJK text without writing a CSS rule that just contains a hundred and one font names in the hopes that at least one of them will match what the user has installed?

(Since minimal CJK fonts are along the lines of >4MB, with complete ones more around 15~20MB, relying on an @font-face declaration to ensure the right font gets loaded would be slow.)

I'd like a solution that works in all major browsers.



Solution 1:[1]

At least in the modern browsers (in 2022, over a decade after you asked), the lang attribute will work the way you wanted it to, but the lang code you need is ja, not jp as you used in the question. (jp is the ISO 3166 code that identifies Japan, the country, while ja is the BCP 47 code that identifies Japanese, the language.)

We can demonstrate the behaviour in a modern browser with this simple test document:

<!DOCTYPE html>
<html xml:lang="ja" lang="ja">
  <title>Test page</title>
  <div>?</div>
  <div>?</div>
</html>

As shown in the screenshot below (taken in Chrome in Ubuntu), the Japanese forms (kanji) of ? and ? get used when we open the document above:

screenshot

... and if we change ja to zh in our HTML document, the Chinese forms (Hanzi) get used instead:

screenshot

It similarly works if we use the XHTML from the question (after changing jp to ja), or if we use Firefox, or if we use a Windows or macOS browser.

Note that depending upon the browser and the OS, the use of Japanese glyphs may be implemented by way of language-specific glyphs within a single font (as is the case with the Noto Sans CJK JP font used by default for CJK characters in Chrome on Ubuntu) or by selecting a different default font. For instance, Chrome for Windows will use Microsoft YaHei for lang="zh" text but Yu Gothic for lang="ja" text.

Solution 2:[2]

Well, if all else fails, I would explicitly specify common Japanese fonts in the CSS. Look up which fonts are available on which platforms, and create a font stack.

Basically, just select fonts the old fashioned way, and see if that fixes the problem.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Mark Amery
Solution 2 timw4mail