readNextLine is slow


#1

I’m working on a 3D application and I have to read big (50 MB) text files containing 3D model data. As the file format is line oriented I tried to do it the clean way (the juce way) and use InputStream::readNextLine(). It turns out to be very slow.

The profiler revealed that the program spends its time in String::operator+=, in fact 10 times more than in any other relevant functions, including the actual I/O operations.

As the lines in my file type are generally longer than 32 characters, I tried to recompile juce with s.preallocateStorage(256) instead of the original s.preallocateStorage(32) in InputStream::readNextLine, but it only gave a marginal improvement.

I think that inside InputStream::readNextLine the code should not use a juce String to accumulate the characters. Reference counting, nice allocation growing and in general “juceness” are not important here, the chosen simple C data type could be converted to a juce String at the end.

Jules, what do you think? If it is not possible in juce I will have to fall back to file block reading and good old character pointers …

Thanks


#2

Fair point - how about this as an optimisation:

[code]const String InputStream::readNextLine()
{
String s;
const int maxChars = 256;
char buffer [maxChars];
int charsInBuffer = 0;

while (! isExhausted())
{
    const char c = readByte();
    const int64 lastPos = getPosition();

    if (c == '\n')
    {
        break;
    }
    else if (c == '\r')
    {
        if (readByte() != '\n')
            setPosition (lastPos);

        break;
    }

    buffer [charsInBuffer++] = c;

    if (charsInBuffer == maxChars)
    {
        s.append (buffer, maxChars);
        charsInBuffer = 0;
    }
}

if (charsInBuffer > 0)
    s.append (buffer, charsInBuffer);

return s;

}[/code]


#3

Thanks Jules, that’s exactly what I needed here. With the 50 MB test file the the 40 sec read time came down to 5 secs :slight_smile:

A minor comment: you should use tchar instead of char as the type of buffer (otherwise it did not compile for me). I hope you will include this in the next release, it’s already in my personal build of juce.

Thanks again, I love your reactivity, imagine I ask the same thing from the MFC team at Microsoft …


#4

[quote=“Gwynhale”]Thanks Jules, that’s exactly what I needed here. With the 50 MB test file the the 40 sec read time came down to 5 secs :slight_smile:

A minor comment: you should use tchar instead of char as the type of buffer (otherwise it did not compile for me). I hope you will include this in the next release, it’s already in my personal build of juce.

Thanks again, I love your reactivity, imagine I ask the same thing from the MFC team at Microsoft …[/quote]

There’s still an MFC team?

Bring me my broadsword! There is killing to be done.


#5

Yes, sorry I posted code there without even trying to compile it! Changing it to a tchar is correct.

I’m sure the MFC team will by now have been moved to new roles. Hopefully to washing-up duties in the microsoft canteen.


#6

I don’t understand here.
If a line only ends up with ‘\r’ it is still seen as a whole line ?
Shouldn’t it be ‘\r\n’ to end the line, thus a continue instead of break?


#7

If there’s only a \r without a \n after it, then you’d still want that to be counted as a line, wouldn’t you? Some unix stuff might use a single \r as a new-line.


#8