DirectoryIterator possible stack overflow when scanning big folders


#1

Using Windows 8.

I tried scanning my C: drive for audio/midi files, which resulted in an exception (system heap related) with thousands of DirectoryIterator::next calls. Turns out there's a folder in C:\Windows\WinSxS which contains ~64000 subfolders.

Can this be prevented? My application allows the user to input folders for scanning, and I can't tell what folders he might input there.

To reproduce, basically run these two lines wherever (You have to have a folder akin to this WinSxS folder though):

DirectoryIterator* di = new DirectoryIterator(File("C:\\"), true, "*.mid;*.midi;*.wav;*.aif;*.aiff;*.flac;*.ogg;*.wm;*.wma;.mp3", File::findFiles);

while (di->next());

#2

I had exactly this problem the other day and here's how I limited the search to 100,000 files:

int total = 0;

for (DirectoryIterator di(location, true); di.next();)
{
File newFile = di.getFile();

if (newFile.hasFileExtension("fileext"))
fileArray.add(newFile);
if (total < 100000)
++total;
else
break;
}

 


#3

After debugging it, it seems like the recursive nature of the next() method makes it prone to stack overflow when not finding a match for a long time. It seems like there's no other option but changing the implementation to a non-recursive one. Also it is safe to say that this crash will be platform independent.


#4

I can't see how that'd be a problem unless the depth of folders is 64000.. Just having 64000 folders side-by-side shouldn't be too bad, because it'd still only take up 1 level on the stack, and the list itself would be in the heap (??)


#5

I think there's something you're missing. When you conclude that there are no matches and that you've reached a new directory, you are setting a new subIterator and returning next(...). On that call to next(), if subIterator->next() doesn't find any matches, that call to next() will start its own while loop, the same while loop that the previous call to next() was working on (same hierarchy), thus, sibliing folders do affect the depth of the stack.

The easiest way I found to fix it was to add a boolean value to the next() method called 'checkOnlySubIterator', if true, after exhausting the subIterator, the method will return to the previous call to continue the while loop of the parent folder.

So, the changes are as follows:

juce_DirectoryIterator.h:

    bool next (bool* isDirectory,
               bool* isHidden,
               int64* fileSize,
               Time* modTime,
               Time* creationTime,
               bool* isReadOnly,
               bool checkOnlySubIterator);

juce_DirectoryIterator.cpp:

bool DirectoryIterator::next()
{
    return next(nullptr, nullptr, nullptr, nullptr, nullptr, nullptr, false);
}


bool DirectoryIterator::next (bool* const isDirResult, bool* const isHiddenResult, int64* const fileSize,
    Time* const modTime, Time* const creationTime, bool* const isReadOnly, bool checkOnlySubIterator)
{
    hasBeenAdvanced = true;

    if (subIterator != nullptr)
    {
        if (subIterator->next (isDirResult, isHiddenResult, fileSize, modTime, creationTime, isReadOnly, false))
            return true;

        subIterator = nullptr;

        if (checkOnlySubIterator)
        {
            return false;
        }
    }

    String filename;
    bool isDirectory, isHidden = false;

    while (fileFinder.next (filename, &isDirectory,
                            (isHiddenResult != nullptr || (whatToLookFor & File::ignoreHiddenFiles) != 0) ? &isHidden : nullptr,
                            fileSize, modTime, creationTime, isReadOnly))
    {
        ++index;

        if (! filename.containsOnly ("."))
        {
            bool matches = false;

            if (isDirectory)
            {
                if (isRecursive && ((whatToLookFor & File::ignoreHiddenFiles) == 0 || ! isHidden))
                    subIterator = new DirectoryIterator (File::createFileWithoutCheckingPath (path + filename),
                                                         true, wildCard, whatToLookFor);

                matches = (whatToLookFor & File::findDirectories) != 0;
            }
            else
            {
                matches = (whatToLookFor & File::findFiles) != 0;
            }

            // if we're not relying on the OS iterator to do the wildcard match, do it now..
            if (matches && (isRecursive || wildCards.size() > 1))
                matches = fileMatches (wildCards, filename);

            if (matches && (whatToLookFor & File::ignoreHiddenFiles) != 0)
                matches = ! isHidden;

            if (matches)
            {
                currentFile = File::createFileWithoutCheckingPath (path + filename);
                if (isHiddenResult != nullptr)     *isHiddenResult = isHidden;
                if (isDirResult != nullptr)        *isDirResult = isDirectory;

                return true;
            }
            if (subIterator != nullptr)
            {
                if (next(isDirResult, isHiddenResult, fileSize, modTime, creationTime, isReadOnly, true))
                {
                    return true;
                }
            }
        }
    }
    return false;
}

juce_DirectoryContentsList:checkNextFile (side effect):

if (fileFindHandle->next (&fileFoundIsDir, &isHidden, &fileSize,
                          &modTime, &creationTime, &isReadOnly, false))

I hope this helps.


#6

Aha, I didn't spot that bit of tail-recursion! Thanks, I'll see what I can do...


#7

Thanks. I pulled and checked the new code - working just fine!