[RFC] Typesafe printf like


#1

Hi all,

I’m in the process of writing a “printf” like feature for Juce, which is typesafe.
The basic idea behind this code is to avoid code crash we are used to see with “String::formatted(“some text: %s”, String(“some value”))”.
Also, since I’m using TRANS macro a lot for i18n, the code pattern
String::formatted(TRANS(“The %s %s is now closed”), TRANS(“blue”), TRANS(“house”));
would work, and would allow positional argument for the translations (like “La %2$s %1$s est maintenant fermée” for the out-of-order translation).

My main goals are:

  • Type safety means that your code either compile and run successfully, either doesn’t compile (failure at compile time), or throw at runtime (since the format string is only processed at run-time).
  • printf compatible interface (all the “greatest common denominator” format string works)
  • User definable type specifier. For example, you can register type “y” for MyClass, and have a valid format string: Print(“Dump of my class: %y”, myClass)
  • You can replace the default formatting function to your own. As an example, you could have: Print("%64.s", String(“some string to print in Base64”))

The last two options add a lot of complexity to the current code, and we were wondering, with Jules, if we could drop them in order to have a simplest code, but with less features.
So, what do you think, does the features worth them, do you use them already in printf under linux ?


#2

First, this would be very nice. It’s not easy getting colleagues to switch to boost::format, because, e.g., syslog(LOG_ERR, “%s”, (boost::format(“blah: %s, %s”) % foo % bar).str().c_str()) is so much more verbose than syslog(LOG_ERR, “blah: %s, %s”, foo, bar) and log << setLevel(LOG_ERR) << boost::format(“blah: %s, %s”) % foo % bar “doesn’t look like C”. Having functions that look like sprintf, syslog, etc. but are type-safe and extensible sounds like a trivial thing, but it’s incredibly helpful in practice. So, thanks for doing this!

Anyway, the last two options you suggested dropping both look like they may be more complicated than necessary, so if they’re causing problems, I’d say drop them. But there may be ways to get a lot of the same benefits without most of the complexity. Without knowing exactly your design, it’s hard to know exactly what to suggest, but let me toss out some ideas.

This isn’t all that useful, because there are only so many unused type specifiers, and a lot of types. Why not just use a single new specifier for all user-defined types, like Cocoa’s %@ or Python’s %r, or just let user-defined classes work with the existing specifiers, as in boost::format (where you can even do things like format("%#x") % myRational and it does what you’d hope)?

Either way, you don’t need any kind of registration mechanism; function overloads take care of it automatically. For %@, either it’s rendered by calling format(myClass, flags, width, precision, length) or it’s rendered by stream insertion on an internal stream that’s already been modified by flags, width, precision, and length. If you want to just use %s/etc., then format gets an extra specifier parameter, or the stream has been modified by the specifier. If the user has defined the appropriate overload for format or operator<<, it works; if not, it fails to compile. (Unless the user has defined format on a base class or another type that MyClass can be unambiguously implicitly converted to, in which case that’s probably exactly what they want.)

The only reason I’ve ever seen anyone want user-defined specifiers is so they can define multiple specifiers to render the same type (or type family) differently—e.g., %t formats MyDate as an ISO string, but %T formats it as seconds since 1970. You can get the same thing—with a whole lot more flexibility (there are only so many unused letters, after all)—by just adding more flags (anything in a reasonable subset of ASCII that’s not already used by printf is a flag) and letting each overload do what it wants with the flags. So, with the “%@” solution, “%@” with MyDate means ISO string, “%n@” means seconds since 1970, and you’ve still got dozens more flags to define other ways to render MyDate without interfering with the dozens of other types you want to print. With the existing-specifiers solution, it’s even easier (and more flexible): “%s” is ISO string, “%f” is seconds since 1970, “%#llX” is integral seconds since 1970 as a 16-character hex string, etc.

If you really want user-defined specifiers, you still don’t need registration, and don’t have to worry about conflicts: all the unused letters become specifiers instead of flags, so “%y” doesn’t work for standard types, but it does what you want for MyClass, and it can also something different for MyOtherClass. The only problem is that this means fewer errors at compile time (unless you want to start passing specifiers, and possibly other things, as non-type template parameters instead of as function arguments, but that leads to a lot of problems).

There are other ways to extend the format with more fields that users can use in their extensions that don’t add much complexity—e.g., parse an optional digit string (or *) inside parentheses before the length modifier, and that gets passed as an extra parameter (or inserted as a stream manipulator) that can mean whatever the user wants. Once you’ve got the separate fields as separate parameters or stream manipulators they’re orthogonal in your internals and ignorable in your default formatters.

This one also sounds pretty complicated, and I think it could add as much confusion as benefit. Having “%64.s” sometimes mean 64-padded 0-truncated string (OK, admittedly that specific case isn’t that useful, since anything becomes 64 spaces…), and sometimes base64 string, seems dangerous.

Anyway, if you have more flags (and/or specifiers), you don’t need to reinterpret the standard fields this way. Let “%64s” mean string padded to 64 width as usual, and make “%bs” mean base-64. But again, people don’t need to replace the default function; just replace the individual overloads. I suppose if you want the 64 width to mean base-64 for all types (so “%64d” on a MyDate first formats the date as a 32-bit int seconds since 1970, and then the resulting string is UTF-8 encoded and base-64’d?) you need to be able to replace the default function, but why would you want that?


#3

Well, in fact the code tries to match printf behaviour as much as possible.
The code extracts the formatting expressions (for example: “%0.3d”) and pass it to a Formatter function with a (void *) containing the argument.
The formatting function is a function pointer that can be set at runtime (if the feature is required) or at compile time.
It’s stored in an array indexed by (char)type.

In the complex version, the code detects String::operator << for any type using SFINAE so it implements “%@” as a “use stream operator to format the argument”. Any specific formatting is ignored in that case.
This can’t provide a work-for-all design however, since the code will be included before the user could declare its own operator <<, hence the function pointers above.

In the simple version, operator << is used as a fallback and if it doesn’t exists, so whenever you try to compile a Print("%@ something", MyClass(3)), the compiler will spit out an error. No SFINAE.
The function pointers are replaced by overloads.

Nope. You can’t do this since you don’t want people to modify the Print class code. So at compile time, if the declaration of the format() function or stream operator is not available (and it’ll not be available since you’ll have #include <juce.h> first in your file), it’ll fail. You must have a way to specify/register the function at runtime.
However, in the “simple” version, such feature will not be available so you’ll have to write Print("%s", MyClass.toString(yourFlags)), which is not a real issue IMHO.


#4

Funny that I hear about typesafe printf in 2 places at the same time: I just saw the streaming of Alexandrescu’s talk at GoingNative12 conference, and typesafe printf is the canonical example illustrating variadic templates in C++11. Not with as many features though.
Have you seen the presentation as well?


#5

I’ve read a paper from Alexandrescu years ago, when he was writing Loki.
Anyway, the only reason I’m writing this is because I need to port another code using printf like formatting everywhere, and it’s a PITA to change with Juce’s String formatted function that’s using wide char’s printf rules internally (so it’s crashing with strings in UTF8).