Juce::String to const char*?

Hi all,

Let’s suppose that I have created a juce::String from a C-style const char*,

const char* cstring = “Some text”;
juce::String string(cstring);

How do I convert it back to a const char*? The standard library has std::string::c_str(), but is there a juce::String function doing the same? I have found juce::String::toWideCharPointer(), which gives me a const wchar_t*, but that’s not what I want. I need to pass the string to another function which expects a C-style const char* (not wide character). I figured that I could get a UTF-8 pointer and just cast it to const char*,

const char* new_cstring = static_cast<const char*> (string.toUTF8());

since ASCII is forward-compatible with UTF-8, but that seems like a very ugly solution and probably wrong. Is there a better way?

Any help appreciated…

Yes, that’s the way. But when you’re passing it to a function you don’t need to cast it, so it’s not ugly at all, e.g.

someOldCFunction (myString.toUTF8());

I think this reads better than c_str because it makes it totally clear that what you’re getting is UTF-8, which may be relevant in some situations.

…and BTW I’d always recommend never putting the pointer in a temporary variable like you did, because just like c_str, the pointer will become invalid when the string changes, so there’s a danger that if you keep the pointer hanging around, you’ll accidentally use it later when it’s toast.

Yeah, I know it will be invalidated, just like with c_str(), so I am copying it to another char* using strcpy if I want to keep it around.

Thanks for the quick reply!

If I remember how juce::String is implemented, you can probably roll your own simple class that wraps a juce::String and exposes it as a char* with suitable conversion operators. Just keep a reference to the original String when your class constructs and you won’t have to make a copy.

Jules is this possible?

Sure, it’s possible to use strcpy or to build your own wrapper object. But the best way to keep a copy of that char* is to just keep the String object, and then call toUTF8() on it whenever you need the raw pointer.

I wouldn’t use strcpy, because it’s just so 1990s. And it’s far less efficient than taking a copy of the String, which would simply involve a ref-count, not a malloc/copy.

And I wouldn’t wrap it in an object with an implicit cast to const char*, because this isn’t an operation that should be hidden from view. I deliberately avoided giving the String class an operator const char*() because I think it’s best to always see “toUTF8()” in code that uses it, so that any reader will immediately be aware that a raw pointer is being taken, and that the encoding is UTF-8.

And it’s worth noting that toUTF8() is completely free - it just returns a pointer to the underlying string, so compilers will optimise it down to nothing. So there’s no reason at all to bother caching the pointer that it returns.

For information, I’m wondering if the two implementations discussed here are really equivalent… I had a hard time creating a function which converts ANSI coded text files into Unicode. Because the text files contain french characters, I was unable to succeed using any standard JUCE::String function, because of the juce_wchar behaviour.

For example, I have somewhere the character “é”, and I wanted to detect it to convert it into CharPointer_UTF8 ("\xc3\xa9"). The only way to get the value ‘-23’ or the unsigned equivalent was to use “static cast <const char*>”. Any solution based on juce_wchar gives me ‘9’ instead… Maybe I have done something wrong somewhere, but finally I was able to make it work. Moreover, the text files I am talking about were written with JUCE 1.49 if I remember correctly :mrgreen:

Working with Windows French ANSI text files is really a pain…

“é” is this standard ASCII? I don’t think so… you would need a special conversion table, and treat the JUCE string as plain ASCII with no conversion.

I could be wrong though.

I’m not an expert about ASCII codes, but the ASCII tables I have seen had all the time the same “extended characters” :

http://www.asciitable.com/

Anyway, I was talking about that because the solution I have found to replace the characters looks not optimal for me, and because I think a “standard” JUCE function should return ASCII codes between 0 and 255 instead of 0 and 127… It’s just my opinion :mrgreen:

ASCII is only the the characters from 0 to 127, see here for example.
AFAIK the extended ASCII may vary depending on the system language and such.

Chris

:shock: You do know that you can’t get a value beyond 127 with a signed char on a platform where sizeof(char) = 1 byte, right?

As TheVinn said exactly; [quote]you would need a special conversion table, and treat the JUCE string as plain ASCII with no conversion.[/quote]

First, I have found a workaround, so I have code which allows me to do the conversion successfully :wink:

My problem was that the standard JUCE functions of the String class are unable to give something as simple as the ASCII value of the extended characters, because of the use of the juce_wchar : they give only values between 0 and 127. So I had to use a cast to get what I needed. I would have liked a function which returns a char pointer, to get values between 0 and 255 or -128 / 127, whatever…

And one of the reasons why it annoyed me is because Visual Studio 2008 has the capability to show the content of the String classes in debug mode. Well, some kind of… If the String contains “special french characters”, they are correctly displayed when they are coded as ANSI, and treated wrongly by JUCE (which considers them as UTF8). And when the String is displayed correctly in JUCE, what is displayed by Visual Studio is wrong :lol:

Well, is it that wrong to think that one of the String function should return the content of the data with 8 bits instead of 7 ? I don’t think I am the only one who uses Strings for languages other than English with special characters, like French or Spanish ?

And my bad about the length of the original ASCII table :wink:

That’s not simple. In fact, it’s impossible because ASCII does not contain any extended characters.

What you’re talking about is not ASCII, it’s ANSI code-pages - and to convert unicode to a specific code-page, you need to use a conversion table for the codepage you want. I’ve not included any of that bloat in juce, because it’s rarely needed these days, since most things are finally moving to unicode and utf-8. Sorry if Visual studio wrongly assumes that the const char* is ANSI rather than UTF8, but that’s not the fault of the juce class.

I understand that what I have tried to do is not used by everybody here :wink: And I don’t think JUCE is the cause of Visual Studio strange behaviour :mrgreen:

However, before I have found the solution to do the conversion from ANSI to Unicode / UTF8, I was looking for a “simple” solution in the JUCE classes, I mean a function of the String class I just have to call to get what I expect, and I did not find it. I have been able to succeed because I have seen this thread. I have to confess I am not at all familiar with the use of the cast instruction.

So, what I propose is the just the following : adding a function in the String class which returns a char, so anybody who manages ANSI coded data would be able to access to characters coded between -128 and -1 or 128 and 255, without using something I judge personnally not “JUCE spirit-compatible” like a static cast. Or maybe just adding something about that in the JUCE documentation. That’s it.

Then you may say no also, I don’t want to get a “balls breaking” point for a such a request :lol: It’s just everything related to coding a multi-target JUCE application with french strings and Windows is enough boring as it is… (I have just thrown all my JUCE_T("") for the UTF8 representation of strings thanks to the IntroJucer utility :mrgreen: )

Moreover, I can tell that Visual Studio 2008 cpp/h files and behaviour is ANSI-like, so we may say that ANSI is still used today by a certain amount of people. I don’t know about Visual Studio 2010 and 2012…

The reason you’re getting a balls-breaking is because you seem to think that a String can be used to hold ANSI data - it can’t! It holds UTF-8, end of story! So to ask for a way to “cast” it to get ANSI data out makes no sense!

Maybe you’ve somehow managed to force some ANSI into a String class by pretending that it’s UTF-8 or ASCII? That’s a bad idea - the String class assumes that its data is in UTF-8 format, and if it’s not then Bad Things will happen.

OK, so let’s start again from the beginning. I have just used the function File::loadFileAsString() to read the content of a ANSI-coded text file.

What should I have done not to put ANSI coded data into a String object at the beginning ? Loading the file as binary data and do the table lookup at this moment ?

Maybe you should precise in the documentation of this function that it does not work with ANSI text files…

Ok, fair point about the documentation for loadFileAsString - I’ll make that more explicit.

BUT here’s the real issue:

If your code is supposed to run on user’s machines, and will read ANSI files of unknown origin, then you can’t possibly know the correct code-page for them. Unless you actually ask the user to choose a code-page, it’s better to just say that your program requires UTF-8 input, so it’s not your problem.

Or… if this file is something you’re working on locally or embedding into your app, why not encode it in UTF-8? There’s little reason for using ANSI nowadays except in some legacy situations.

I agree with you for all the points.

My problem is that I have updated an old version of a program I have done several years ago with JUCE, which reads and writes text files. The new one reads and writes Unicode text files of course, but have to be compatible with the data generated with the old version, which was ANSI. My program has to read the files the users have created this time, and not only mine. So I have to read ANSI text files in my program, and I have to convert them into UTF8. That’s why I didn’t have any other choice.

Ok, well in that case you’re probably safe to assume that it was encoded with their local code-page, so a little conversion routine that uses the C library would do the job. If they’ve changed their machine’s locale since creating the file then you’ll have problems, but I guess that’s unlikely.

Yep, that’s what I have done :wink: