FileInputStream.readNextLine() terribly slow?

How come the FileInputStream.readNextLine() is so terribly slow? When using equivalent functions in Java or Node, it can handle a text file that contain 170k lines within a second.

But the Juce implementation takes a whopping 12.5 seconds!

My code:

    void MainComponent::loadWordlist() {
    File cef = File::getSpecialLocation(File::currentExecutableFile);
    File file(cef.getParentDirectory().getChildFile("word_list.txt"));

    if (!file.existsAsFile()) {
        Logger::outputDebugString("File does not exist!");
        return; 
    }

    Logger::outputDebugString("pre loading");

    int start = Time::getMillisecondCounterHiRes();
    FileInputStream is(file);

    while (!is.isExhausted()) {
        auto line = is.readNextLine();

        if (line.length() == 8) {
            if (line.containsOnly("abcdefghijklmnopqrstuvwxyz")) {
                word_list.push_back(line);
            }
        }
    }

    int finish = Time::getMillisecondCounterHiRes();
    int time_elapsed = finish - start;
    Logger::outputDebugString("loading took: "+std::to_string(time_elapsed)+ " millis");

    std::random_device rd;
    std::mt19937 g(rd());
    std::shuffle(word_list.begin(), word_list.end(), g);

    file_loaded = true;
}

Output:

loading took: 12489 millis

Is there a better way of handling this?

Do you have other numbers that point to ‘readnextLine’? your code here encompasses more than just that function. You might want to use a profiler to get actual numbers

What do you mean by “do you have other numbers that point to readNextLine()”? I literally clock what time it took to iterate over the FileInputStream.

FWIW, this method is called in the constructor of the MainComponent.

This is the equivalent code in Java:

private ArrayList<String> retrieveWordList() {
        ArrayList<String> list = new ArrayList<String>();

        long start = System.currentTimeMillis();
        InputStream is = getClass().getResourceAsStream("word_list.txt");
        Scanner in = new Scanner(is);

        String the_line;
        while( in.hasNextLine()) {
            the_line = in.nextLine();
            if (the_line.length() == 8) {
                String regex = "[a-z]+";
                if (the_line.matches(regex) ) {
                    list.add(the_line);
                }

            }
        }
        in.close();

        long end = System.currentTimeMillis();
        long elapsed = end - start;
        System.out.println(" File loaded in: "+Long.toString(elapsed)+ " ms." );

        return list;
}

This code is executed in 197 ms in Java…

no, you read the file, and did other things. you are not measuring that API alone. Just loop over the read and time it. That number will mean something regarding the read. It may be exactly the same, but it is best to solve problems with the scope of them minimized.

Right, well this:

while (!is.isExhausted()) {
    auto line = is.readNextLine();
}

took, 5810 ms

And this:

while (!is.isExhausted()) {
    auto line = is.readNextLine();

    if (line.length() == 8) 
    {
        if (line.containsOnly("abcdefghijklmnopqrstuvwxyz")) 
        {
            list.push_back(line);
        }
    }    
}

took, 6086 ms.

Previous times where from another pc, this is on a Ryzen 3700x with a nvme SSD… It’s embarrassingly slow.

Just ran a couple of quick tests on a Mac Mini (12-core Intel i7, SSD) with a 200k line text file, and I’m seeing similar numbers.

File::readLines() is running considerably faster here – you might be better off to use that and do a pass to remove the lines that don’t meet the length or character requirements.

I haven’t written any JUCE code that has to deal with text files of this size, though, so others here may know of a better approach.

Well, I found a workaround. I now simply load the entire file as one String. This takes around 100ms.

The filtering is done ‘on demand’ when a new word is needed from the list.

I now simply load the entire file

That was going to be my recommendation. I could imagine that java/node is doing something similar under the hood…

You can wrap also your FileInputStream with BufferedInputStream, it will speed up reading from the file significantly.

1 Like

Tried that and not really.