Is there any way to know whether a String is English or not?


#1

Hi,

Is there any way to know whether a String is English or not? My situation is here: I want to basically show Helvetica font in popup menu, but change the font when a menu item has other language in order to prevent garbage characters. Ideally I want to create a menu item that multi fonts are mixed (ex. If English, it’s drawn with Helvetica. If Japanese, another font is selected). Thank you,


#2

Not really. I suppose you could iterate through each character of a string and break when you find a character that matches a letter in the English alphabet. This is not a good solution though, since the English alphabet is one of many languages that derive from the Latin alphabet. So its not really specific to English, other languages like French or Spanish would be detected as English as well.

The proper way to render text with multiple fonts is to use the TextLayout class. TextLayout supports automatic font fallback on several modern operating systems (OS X 10.5+, iOS 3.2+, Windows Vista w/ Platform Update and later). This means that if the font you are using does not contain some of the characters you wish to display, another font will automatically be chosen that does contain the appropriate characters so you should never really end up with garbage characters being displayed.


#3

[quote=“jazzunko”]Hi,

Is there any way to know whether a String is English or not? My situation is here: I want to basically show Helvetica font in popup menu, but change the font when a menu item has other language in order to prevent garbage characters. Ideally I want to create a menu item that multi fonts are mixed (ex. If English, it’s drawn with Helvetica. If Japanese, another font is selected). Thank you,[/quote]

It’s easy, make a Markov chain (for each language you want to detect) using a good list of english words, another with spanish words, etc.
Then, to detect the language of a string, process the string with these matrices and sum the values ​​of transition that are activated with each character of the string. The “string*matrix” that outputs the highest transitions sumatory will tell you the most probable language of the string.

But if you only need to detect between english and japanese, you can get a UTF-32 copy of the string and check if there are chars in the ranges of the Unicode Kanji tables (0x3000-0x30FF and 0x4E00-0x9FAF), if that’s the case then the string is in japanese.


#4

Thank you for your kind answers, sonic59 and Xavi! I’m now trying to use TextLayout in LookAndFeel::drawPopupMenuItem() instead of drawFittedText after calling Font::setFallbackFontName(). When I create a textlayout object, I set Helvetica font. However, even if a menu item contains Japanese, Helvetica is used and garbled. Then, I read discussion related to fallback font but couldn’t get a concreate answer. Probably, I might make mistakes about usage of TextLayout and Fallback…


#5

You’re welcome.
For your program I think a good (and portable) solution would be to serialize an unicode font with latin+kanjis in your executable. In this way you wouldn’t have problems rendering english and japanese text with the same font, and latin chars and kanjis would have the same “style”. There are a lot of free(?) fonts you can use, like these: http://cooltext.com/Fonts-Unicode-Japanese
Check this thread: http://www.rawmaterialsoftware.com/viewtopic.php?f=2&t=5699&hilit=fontbuilder


#6

It makes sense in the usual case, but in my case,

  • I want to show Helvetica as much as possible in order to keep whole look&feel of my app.
  • It would be a rare case that Japanese and other langs other than english are inputted, but I want to prevent garbled chars for the rare case.
    The way that I want to aim is probably not smart though…