StringPool: how often should I call getPooledString()?

I’m doing a lot of String comparison, so I’m trying to cut down on the overhead using a StringPool. I understand the basic idea (that it forces identical Strings to point to the same underlying array, so that comparing them is just a matter of comparing their pointers). But the documentation doesn’t give me a good indication of when and where to use getPooledString(). Strings are passed around a lot and converted to other things (var, StringRef, arrays, etc.). Is it safe to consider them “pooled” forever, or should I run getPooledString() each time before comparing them?

The following example should illustrate my question.

    StringPool pool;

    String s1 = String("one"),
           s2 = String("one");

    bool compare1 = s1 == s2;
    // no pointer comparison (strings have not been pooled)

    String p1 = pool.getPooledString(s1),
           p2 = pool.getPooledString(s2);

    bool compare2 = p1 == p2;
    // pointer comparison (p1 and p2 have been pooled)

    var v1 = p1,
        v2 = p2;

    bool compare3 = v1.toString() == v2.toString();
    // is there pointer comparison here?

    Array<var> a1 = Array<var>{ v1 },
               a2 = Array<var>{ v2 };

    bool compare4 = a1.getFirst().toString() == a2.getFirst().toString();
    // how about now?

A related question: what is the overhead cost of calling getPooledString()? I’m suspicious because the tutorial on ValueTrees indicates that creating an Identifier is quite costly because of String pooling. So I’m wondering whether under certain circumstances, pooling Strings might do more harm than good.

You should watch @dave96’s talk from ADC on ValueTrees, specifically the part about creating identifiers.

I’ve watched this talk several times, and learned the practice of storing static Identifiers together in one namespace. But I don’t remember it addressing the questions I’m asking here. At the moment, I’m not working with Identifiers, but with Strings, and I’m trying to implement my own StringPool. My system will take a lot of user specified Strings, which will need to be compared quite regularly, so I’m trying to work out the best way of pooling them. The main question is: can I get away with pooling them only once, when they are first given, or should I pool them every time before I compare them?

I think you should probably do what Identifier itself does. That is, make an class which pools the string on construction and then keeps a copy of that pooled string as a data member. Or you could just use the Identifier class, and pass around Identifiers instead of Strings.

Sometimes the best solution is hidden in plain sight! I don’t know why I didn’t think of this already. Also, poking around the Idenfitier class will probably help me to answer the questions above.

Looking at the code for the Identifier class, I’ve answered both of my own questions.

  1. From my experiments, it seems that two pooled Strings are comparable come what may (but I was comparing them incorrectly in my first post–you have to use getCharPointer().

     StringPool pool;
    
     String s1 = String("one"),
            s2 = String("one");
    
     bool compare1 = s1.getCharPointer() ==
                     s2.getCharPointer();
    
     if (compare1) DBG("match"); else DBG("no match");  // prints "no match"
    
     String p1 = pool.getPooledString(s1),
            p2 = pool.getPooledString(s2);
    
     bool compare2 = p1.getCharPointer() ==
                     p2.getCharPointer();
    
     if (compare2) DBG("match"); else DBG("no match"); // prints "match"
    
        // convert Strings to vars
     var v1 = p1,
         v2 = p2;
    
     bool compare3 = v1.toString().getCharPointer() ==
                     v2.toString().getCharPointer();
    
     if (compare3) DBG("match"); else DBG("no match"); // prints "match"
    
         // set vars to Array
     Array<var> a1 = Array<var>{ v1 },
                a2 = Array<var>{ v2 };
    
     bool compare4 = a1.getFirst().toString().getCharPointer() ==
                     a2.getFirst().toString().getCharPointer();
    
     if (compare4) DBG("match"); else DBG("no match"); // prints "match"
    
        // copy arrays...
     Array<var> a3(a1),
                a4(a2);
    
        // and insert some arbitrary new data.
     a3.insert(0, var("new"));
    
     bool compare5 = a3.getLast().toString().getCharPointer() ==
                     a4.getFirst().toString().getCharPointer();
    
     if (compare5) DBG("match"); else DBG("no match");  // still prints "match"!
    

As for the second question, it seems that getPooledString() is exactly what is warned about in the documentation for creating Identifiers (I can’t see anything else going on in the constructor), so it should be used judiciously.