How come the FileInputStream.readNextLine() is so terribly slow? When using equivalent functions in Java or Node, it can handle a text file that contain 170k lines within a second.
But the Juce implementation takes a whopping 12.5 seconds!
My code:
void MainComponent::loadWordlist() {
File cef = File::getSpecialLocation(File::currentExecutableFile);
File file(cef.getParentDirectory().getChildFile("word_list.txt"));
if (!file.existsAsFile()) {
Logger::outputDebugString("File does not exist!");
return;
}
Logger::outputDebugString("pre loading");
int start = Time::getMillisecondCounterHiRes();
FileInputStream is(file);
while (!is.isExhausted()) {
auto line = is.readNextLine();
if (line.length() == 8) {
if (line.containsOnly("abcdefghijklmnopqrstuvwxyz")) {
word_list.push_back(line);
}
}
}
int finish = Time::getMillisecondCounterHiRes();
int time_elapsed = finish - start;
Logger::outputDebugString("loading took: "+std::to_string(time_elapsed)+ " millis");
std::random_device rd;
std::mt19937 g(rd());
std::shuffle(word_list.begin(), word_list.end(), g);
file_loaded = true;
}
Do you have other numbers that point to āreadnextLineā? your code here encompasses more than just that function. You might want to use a profiler to get actual numbers
What do you mean by ādo you have other numbers that point to readNextLine()ā? I literally clock what time it took to iterate over the FileInputStream.
FWIW, this method is called in the constructor of the MainComponent.
This is the equivalent code in Java:
private ArrayList<String> retrieveWordList() {
ArrayList<String> list = new ArrayList<String>();
long start = System.currentTimeMillis();
InputStream is = getClass().getResourceAsStream("word_list.txt");
Scanner in = new Scanner(is);
String the_line;
while( in.hasNextLine()) {
the_line = in.nextLine();
if (the_line.length() == 8) {
String regex = "[a-z]+";
if (the_line.matches(regex) ) {
list.add(the_line);
}
}
}
in.close();
long end = System.currentTimeMillis();
long elapsed = end - start;
System.out.println(" File loaded in: "+Long.toString(elapsed)+ " ms." );
return list;
}
no, you read the file, and did other things. you are not measuring that API alone. Just loop over the read and time it. That number will mean something regarding the read. It may be exactly the same, but it is best to solve problems with the scope of them minimized.
while (!is.isExhausted()) {
auto line = is.readNextLine();
}
took, 5810 ms
And this:
while (!is.isExhausted()) {
auto line = is.readNextLine();
if (line.length() == 8)
{
if (line.containsOnly("abcdefghijklmnopqrstuvwxyz"))
{
list.push_back(line);
}
}
}
took, 6086 ms.
Previous times where from another pc, this is on a Ryzen 3700x with a nvme SSD⦠Itās embarrassingly slow.
Just ran a couple of quick tests on a Mac Mini (12-core Intel i7, SSD) with a 200k line text file, and Iām seeing similar numbers.
File::readLines() is running considerably faster here ā you might be better off to use that and do a pass to remove the lines that donāt meet the length or character requirements.
I havenāt written any JUCE code that has to deal with text files of this size, though, so others here may know of a better approach.