[IDEA] String::split


#1

Trying to split "foo,bar" I'm looking for:

auto _out = _in.split(",")

Turns out I can do:

auto _out = StringArray::fromTokens(_in, ",", "");

... but it would be nice to invoke a method on String.  Kinda feels like it belongs there...

π


#2

+1


#3

Not really.. It's natural for the StringArray class to depend on String, but to do what you're saying the String class would have to depend on StringArray, which feels kind of wrong and recursive to me..


#4

I can see it does escalate complexity, but only by small controlled amount.

And this kind of circular dependency is fairly standard C++ fare.. isn't it just a matter of adding a #include "StringArray.h" at the top of "String.h"?

(I know there are pitfalls of doing this, but this pair of classes should be comfortably safe).

As real-world concepts, String and StringArray are interwoven conceptually; sometimes you break down a String into a StringArray, other times you assemble a StringArray out of a bunch of Strings.

i.e. There is a circular dependency at the conceptual level.

So surely it would be the most natural thing for that to be mirrored at the code level..?

π


#5

As real-world concepts, String and StringArray are interwoven conceptually; sometimes you break down a String into a StringArray, other times you assemble a StringArray out of a bunch of Strings.

No, there's currently no circular dependency, not even conceptually. Nothing in the String class itself assumes the existance of a StringArray, all the split/join functionality lives within StringArray.


#6
StringArray String::split(String token, String stringToTokenize) {
    StringArray sa;
    std::string s;
    while( std::getline( stringToTokenize, s, token ) ) {
        sa.add( s );   
    }
    return sa;
}

borrowed from: 

http://stackoverflow.com/questions/5167625/splitting-a-c-stdstring-using-tokens-e-g

 


#7

you should think about what kind of container you want for it when you do the split.

std::vector<std::string> ?

std::vector<juce::String> ?

const char * []?

juce:StringArray? 

StringArray::fromTokens() seems like the ideal solution for the problem presented.  your request seems like you aren't that familiar with the framework yet or how to fully use all the classes in the framework. 


#8

ok, lemme try to say something that makes sense this time :)

I like design discussions -- I always learn from them!

I can see it is much of a muchness.

Just flagged it because the current implementation creates a little twinge in the back of my brain by going against the "object should encapsulate it's behaviour" OOP paradigm.

And I like that pattern because conversely if an object has external behaviours it is trickier to locate / keep track of them.

e.g. first responder on the IRC channel to 'how to split a string' had just rolled his own(!) i.e. failed to find it.

It comes at a cost of cojoining String and StringArray via a circular #include, .. I don't see any harm in lumping them together -- it seems natural.

π


#9

This has nothing at all to do with encapsulation!

Splitting a string is just one of an infinite number of operations that you could do that convert a String into some other kind of object. It's not the responsibility of the String class to offer you all these functions.

I do agree that String is probably the first place you'd look for this kind of operation, but that's really the only argument in favour of putting it in there.

And here's a very important fact: If you look at StringArray::fromTokens, you'll see that it doesn't actually use a String. It uses a StringRef. So it works without needing a String object at all - it can act directly on a raw char literal or any other blob of characters that can be cast to a StringRef, which avoids unnecessary allocation.

In fact the modern C++ way to do this would actually be with a templated free function that could take a String or some other equivalent iterator type.


#10

I see.

So rather than relocating StringArray::fromTokens, it may make sense to leave it be, and create an additional String::split that simply forwards to it.

That was actually my original thought, before I veered off track into actually relocating the definition.

 As I see it currently...

 Arguments for:

  • It's the first place the user would search for the feature.
  • Increases selfdocumenting-ness of the framework -- consumer browses String's methods & naturally is led to StringArray.
  • More natural syntax: foo.split(",") vs StringArray::fromTokens(foo,",",""").
  • Still allows StringArray::fromTokens to be used on other string types.

Arguments against:

  •  Extra indirection step -- not sure if compilers can optimise this out.
  •  Introduces circular #include-s / co-dependency of String and StringArray.

π


#11

In my opinion putting this in the String class feels a bit wrong. It's not mutating the String or returning a new String and as Jules pointed out the functionality isn't really String specific. It's really just a helper method.

If I wanted a StringArray of tokens, whatever the input to that was, I would immediately look at StringArray.

If anything this should be a non-member function. That seems to be the way the standard library treats these things.


#12

It's not mutating the String or returning a new String and as Jules pointed out the functionality isn't really String specific. It's really just a helper method.

 That logic doesn't make sense.

 Consider...

class Complex {
    float re,im;
    float mag() {return sqrt(re*re+im*im)};
}

By your logic, mag() doesn't belong in Complex, as it is neither mutating the object nor returning a Complex.

The key criterion is that it is operating upon the object instance.

If I wanted a StringArray of tokens, whatever the input to that was, I would immediately look at StringArray.

But suppose you're new to the framework. You don't know StringArray exists! You have a String and you wish to split it. How to progress? Googling 'juce string split'? Sketchy.. Locating the file containing String's definition and looking in the same folder for String helper functions? Easy with 20:20 hindsight.

If anything this should be a non-member function. That seems to be the way the standard library treats these things.

I don't think it should be constrained to existing in one place only. How about primarily as a free function, and then both String and StringArray have member functions that forward to it?

π


#13

Having a duplicate function definitely sounds like a bad idea. Imagine the amount of things in JUCE that could live in two places! The correct thing to do in my opinion is be consistent with where things are placed and the relationships between classes.

 

Your example of Complex is a bad one. Firstly, mag as a member function is a bad idea because it's less easily optimized by the compiler due to the implicit this pointer. Admittedly the function will probably get inlined but I think it's still bad practice.

And if you take a look at std::complex the abs function is implemented as a non-member function (along with many other operations). It's largely considered good practice to keep member the number of member functions as small as possible to make the class easy to read and reason about. Adding effective helper methods can easily be done with non-member functions.

There is the argument that member syntax is nicer but hopefully the standard's proposal to unify member and non-member function calling  will go through soon!


#14

Wow! I gave that example without a shred of doubt that mag() belonged inside a Complex. But as you say, stdlib decides upon a free function. And I can see why it makes sense. The concept of an absolute value makes sense in a variety of contexts. Integer, real, complex, vector. Probably it would make sense to return the determinant of a matrix also.

Now I get what Jules was saying by "modern C++ practice is towards templated free functions".

I also remember Bjarne railing against the association of C++ with OOP.

Still mulling the idea of using 'duplicate' member functions that simply forward to the relevant free function... having the same functionality in different places gives the programmer flexibility, and makes it easy to find stuff. But I can see it creates bloat.

π