Juce::String to const char*?

timur · July 29, 2012, 11:18am

Hi all,

Let’s suppose that I have created a juce::String from a C-style const char*,

const char* cstring = “Some text”;
juce::String string(cstring);

How do I convert it back to a const char*? The standard library has std::string::c_str(), but is there a juce::String function doing the same? I have found juce::String::toWideCharPointer(), which gives me a const wchar_t*, but that’s not what I want. I need to pass the string to another function which expects a C-style const char* (not wide character). I figured that I could get a UTF-8 pointer and just cast it to const char*,

const char* new_cstring = static_cast<const char*> (string.toUTF8());

since ASCII is forward-compatible with UTF-8, but that seems like a very ugly solution and probably wrong. Is there a better way?

Any help appreciated…

jules · July 29, 2012, 11:45am

Yes, that’s the way. But when you’re passing it to a function you don’t need to cast it, so it’s not ugly at all, e.g.

someOldCFunction (myString.toUTF8());

I think this reads better than c_str because it makes it totally clear that what you’re getting is UTF-8, which may be relevant in some situations.

jules · July 29, 2012, 11:47am

…and BTW I’d always recommend never putting the pointer in a temporary variable like you did, because just like c_str, the pointer will become invalid when the string changes, so there’s a danger that if you keep the pointer hanging around, you’ll accidentally use it later when it’s toast.

timur · July 29, 2012, 11:57am

Yeah, I know it will be invalidated, just like with c_str(), so I am copying it to another char* using strcpy if I want to keep it around.

Thanks for the quick reply!

TheVinn · July 29, 2012, 11:12pm

If I remember how juce::String is implemented, you can probably roll your own simple class that wraps a juce::String and exposes it as a char* with suitable conversion operators. Just keep a reference to the original String when your class constructs and you won’t have to make a copy.

Jules is this possible?

jules · July 30, 2012, 7:01am

Sure, it’s possible to use strcpy or to build your own wrapper object. But the best way to keep a copy of that char* is to just keep the String object, and then call toUTF8() on it whenever you need the raw pointer.

I wouldn’t use strcpy, because it’s just so 1990s. And it’s far less efficient than taking a copy of the String, which would simply involve a ref-count, not a malloc/copy.

And I wouldn’t wrap it in an object with an implicit cast to const char*, because this isn’t an operation that should be hidden from view. I deliberately avoided giving the String class an operator const char*() because I think it’s best to always see “toUTF8()” in code that uses it, so that any reader will immediately be aware that a raw pointer is being taken, and that the encoding is UTF-8.

And it’s worth noting that toUTF8() is completely free - it just returns a pointer to the underlying string, so compilers will optimise it down to nothing. So there’s no reason at all to bother caching the pointer that it returns.

IvanC · October 13, 2012, 8:40am

For information, I’m wondering if the two implementations discussed here are really equivalent… I had a hard time creating a function which converts ANSI coded text files into Unicode. Because the text files contain french characters, I was unable to succeed using any standard JUCE::String function, because of the juce_wchar behaviour.

For example, I have somewhere the character “é”, and I wanted to detect it to convert it into CharPointer_UTF8 ("\xc3\xa9"). The only way to get the value ‘-23’ or the unsigned equivalent was to use “static cast <const char*>”. Any solution based on juce_wchar gives me ‘9’ instead… Maybe I have done something wrong somewhere, but finally I was able to make it work. Moreover, the text files I am talking about were written with JUCE 1.49 if I remember correctly :mrgreen:

Working with Windows French ANSI text files is really a pain…

TheVinn · October 13, 2012, 3:06pm

“é” is this standard ASCII? I don’t think so… you would need a special conversion table, and treat the JUCE string as plain ASCII with no conversion.

I could be wrong though.

IvanC · October 15, 2012, 9:46am

I’m not an expert about ASCII codes, but the ASCII tables I have seen had all the time the same “extended characters” :

http://www.asciitable.com/

Anyway, I was talking about that because the solution I have found to replace the characters looks not optimal for me, and because I think a “standard” JUCE function should return ASCII codes between 0 and 255 instead of 0 and 127… It’s just my opinion :mrgreen:

ckk · October 15, 2012, 9:51am

ASCII is only the the characters from 0 to 127, see here for example.
AFAIK the extended ASCII may vary depending on the system language and such.

Chris

jrlanglois · October 16, 2012, 6:24pm

:shock: You do know that you can’t get a value beyond 127 with a signed char on a platform where sizeof(char) = 1 byte, right?

As TheVinn said exactly; [quote]you would need a special conversion table, and treat the JUCE string as plain ASCII with no conversion.[/quote]

IvanC · October 17, 2012, 5:42am

First, I have found a workaround, so I have code which allows me to do the conversion successfully

My problem was that the standard JUCE functions of the String class are unable to give something as simple as the ASCII value of the extended characters, because of the use of the juce_wchar : they give only values between 0 and 127. So I had to use a cast to get what I needed. I would have liked a function which returns a char pointer, to get values between 0 and 255 or -128 / 127, whatever…

And one of the reasons why it annoyed me is because Visual Studio 2008 has the capability to show the content of the String classes in debug mode. Well, some kind of… If the String contains “special french characters”, they are correctly displayed when they are coded as ANSI, and treated wrongly by JUCE (which considers them as UTF8). And when the String is displayed correctly in JUCE, what is displayed by Visual Studio is wrong :lol:

Well, is it that wrong to think that one of the String function should return the content of the data with 8 bits instead of 7 ? I don’t think I am the only one who uses Strings for languages other than English with special characters, like French or Spanish ?

And my bad about the length of the original ASCII table

jules · October 17, 2012, 9:18am

That’s not simple. In fact, it’s impossible because ASCII does not contain any extended characters.

What you’re talking about is not ASCII, it’s ANSI code-pages - and to convert unicode to a specific code-page, you need to use a conversion table for the codepage you want. I’ve not included any of that bloat in juce, because it’s rarely needed these days, since most things are finally moving to unicode and utf-8. Sorry if Visual studio wrongly assumes that the const char* is ANSI rather than UTF8, but that’s not the fault of the juce class.

IvanC · October 17, 2012, 10:16am

I understand that what I have tried to do is not used by everybody here And I don’t think JUCE is the cause of Visual Studio strange behaviour :mrgreen:

However, before I have found the solution to do the conversion from ANSI to Unicode / UTF8, I was looking for a “simple” solution in the JUCE classes, I mean a function of the String class I just have to call to get what I expect, and I did not find it. I have been able to succeed because I have seen this thread. I have to confess I am not at all familiar with the use of the cast instruction.

So, what I propose is the just the following : adding a function in the String class which returns a char, so anybody who manages ANSI coded data would be able to access to characters coded between -128 and -1 or 128 and 255, without using something I judge personnally not “JUCE spirit-compatible” like a static cast. Or maybe just adding something about that in the JUCE documentation. That’s it.

Then you may say no also, I don’t want to get a “balls breaking” point for a such a request :lol: It’s just everything related to coding a multi-target JUCE application with french strings and Windows is enough boring as it is… (I have just thrown all my JUCE_T("") for the UTF8 representation of strings thanks to the IntroJucer utility :mrgreen: )

Moreover, I can tell that Visual Studio 2008 cpp/h files and behaviour is ANSI-like, so we may say that ANSI is still used today by a certain amount of people. I don’t know about Visual Studio 2010 and 2012…

jules · October 17, 2012, 10:47am

The reason you’re getting a balls-breaking is because you seem to think that a String can be used to hold ANSI data - it can’t! It holds UTF-8, end of story! So to ask for a way to “cast” it to get ANSI data out makes no sense!

Maybe you’ve somehow managed to force some ANSI into a String class by pretending that it’s UTF-8 or ASCII? That’s a bad idea - the String class assumes that its data is in UTF-8 format, and if it’s not then Bad Things will happen.

IvanC · October 17, 2012, 1:27pm

OK, so let’s start again from the beginning. I have just used the function File::loadFileAsString() to read the content of a ANSI-coded text file.

What should I have done not to put ANSI coded data into a String object at the beginning ? Loading the file as binary data and do the table lookup at this moment ?

Maybe you should precise in the documentation of this function that it does not work with ANSI text files…

jules · October 17, 2012, 1:37pm

Ok, fair point about the documentation for loadFileAsString - I’ll make that more explicit.

BUT here’s the real issue:

If your code is supposed to run on user’s machines, and will read ANSI files of unknown origin, then you can’t possibly know the correct code-page for them. Unless you actually ask the user to choose a code-page, it’s better to just say that your program requires UTF-8 input, so it’s not your problem.

Or… if this file is something you’re working on locally or embedding into your app, why not encode it in UTF-8? There’s little reason for using ANSI nowadays except in some legacy situations.

IvanC · October 17, 2012, 1:50pm

I agree with you for all the points.

My problem is that I have updated an old version of a program I have done several years ago with JUCE, which reads and writes text files. The new one reads and writes Unicode text files of course, but have to be compatible with the data generated with the old version, which was ANSI. My program has to read the files the users have created this time, and not only mine. So I have to read ANSI text files in my program, and I have to convert them into UTF8. That’s why I didn’t have any other choice.

jules · October 17, 2012, 1:58pm

Ok, well in that case you’re probably safe to assume that it was encoded with their local code-page, so a little conversion routine that uses the C library would do the job. If they’ve changed their machine’s locale since creating the file then you’ll have problems, but I guess that’s unlikely.

IvanC · October 17, 2012, 2:41pm

Yep, that’s what I have done

Topic		Replies	Views
String to char* General JUCE discussion	35	3886	May 12, 2017
Char * and juce::String General JUCE discussion	3	1737	May 12, 2017
Things that are confusing about String Feature Requests	24	2081	January 31, 2022
Convert Juce::String to std::string and vice versa General JUCE discussion	4	6837	May 12, 2017
Const char* and char* General JUCE discussion	10	1000	November 7, 2011

Juce::String to const char*?

Purchase

Discover

Learn

Support

About

Events

Juce::String to const char*?

Related topics

Purchase

Discover

Learn

Support

About

Events