BR: findChildFiles shouldn't follow links

    auto f = juce::File::getSpecialLocation(juce::File::tempDirectory);
    auto top = f.getChildFile("a");
    auto bot = top.getChildFile("b/c/d/e");
    bot.createDirectory();

    auto ln = bot.getChildFile("f");
    top.createSymbolicLink (ln, true);

    auto files = top.findChildFiles (juce::File::findFilesAndDirectories, true);
    for (auto f : files)
        DBG(f.getFullPathName());
    DBG(files.size());

The above finds 297 files. That seems a bit odd. I’d expect it to either find 5 or ∞. Why does it just stop after a while?

However, if I change the code to:

    auto f = juce::File::getSpecialLocation(juce::File::tempDirectory);
    auto top = f.getChildFile("a");
    auto bot = top.getChildFile("qwre/jhgf/khjg/nvbc");
    bot.createDirectory();

    auto ln = bot.getChildFile("f");
    top.createSymbolicLink (ln, true);

    auto files = top.findChildFiles (juce::File::findFilesAndDirectories, true);
    for (auto f : files)
        DBG(f.getFullPathName());
    DBG(files.size());

it deadlocks. It’s sitting at 100% CPU while memory slowly goes up.

I’m not sure exactly what the behaviour should be in this case. If we want to allow following symlinks, but also want to avoid spinning when traversing filesystems containing cycles, then the directory iterator would have to keep track of previously-visited directories, and exit out if it saw the same directory twice - or we’d have to add an option to avoid following symlinks altogether. Both approaches seem to work, but I worry that keeping track of visited directories might add quite a lot of overhead when traversing large filesystems. We’d also have to make the new behaviour opt-in, because following symlinks may be desirable in some cases.

I wrote the following program to check the behaviour of the std::filesystem library when recursing through a similar hierarchy. I tested on Ubuntu 21.10, building with g++ 11.2. I see practically identical behaviour to the JUCE program above:

  • When we only create one set of cyclical folders (with the marked line commented out), the program terminates after finding around 200 files.
  • With the marked line active, the program spins indefinitely.

It seems that JUCE’s behaviour isn’t completely unreasonable here…

#include <algorithm>
#include <filesystem>
#include <iostream>
#include <ranges>
#include <vector>

int main() {
  const std::filesystem::path top{"/home/reuk/Desktop/test/a"};
  remove_all(top);

  for (const auto bot : {
           top / "b" / "c" / "d" / "e",
           top / "f" / "g" / "h" / "i", // also try with this line commented out
       }) {
    create_directories(bot);
    create_directory_symlink(top, bot / "link");
  }

  const auto iter = std::filesystem::recursive_directory_iterator{
      top, std::filesystem::directory_options::follow_directory_symlink};

  std::vector<std::filesystem::path> files;
  std::ranges::copy(iter | std::ranges::views::transform(
                               [](const auto &x) { return x.path(); }),
                    std::back_inserter(files));

  for (const auto &i : files)
    std::cout << i << '\n';

  std::cout << files.size() << '\n';
}

It seems that JUCE’s behaviour isn’t completely unreasonable here…

I don’t think deadlocking the app is reasonable. There is no way to know what dumb things users do with their system. And then you get support calls with a hung app.

bool File::copyDirectoryTo (const File& newDirectory) const
{
    if (isDirectory() && newDirectory.createDirectory())
    {
        for (auto& f : findChildFiles (File::findFiles, false))
            if (! f.copyFileTo (newDirectory.getChildFile (f.getFileName())))
                return false;

        for (auto& f : findChildFiles (File::findDirectories, false))
            if (! f.copyDirectoryTo (newDirectory.getChildFile (f.getFileName())))
                return false;

        return true;
    }

    return false;
}

I think this function has the same issue in that it goes into endless loops.

I understand the desire to not deadlock an app, but if findChildFiles() doesn’t follow symlinks then you’re probably also going to get support calls from users asking why an app or plugin can’t find a file that is sitting there, perfectly valid in the file system, but mysteriously gets ignored; in fact I would argue this is the more likely scenario.

Most users that even know what symlinks are, and how to create/use them, know the woes of cyclic graphs and infinite recursion. As a user, my expectation is that an app is going to respect symlinks I create in the file system, and I have plenty of reasons why I might want to organize things using symlinks, and in fact do this often. Tthe Venn diagram of users that both know how to create symlinks, and yet don’t understand that symlinking a child folder to a parent folder is bad, is pretty small. Please don’t punish the rest of us for the odd user out there that learns this the hard way (this is why we can’t have nice things).

I think an opt-in or opt-out option for whether or not to traverse symlinks is a fine solution, and allows developers to take their chances with their users based on their own needs and requirements. And it’s a much better solution than just outright not supporting links. What about a visitor pattern that had a definable stack depth limit? Then I as the developer can somewhat manage how much memory I’m willing to allocate in an attempt to protect users from themselves - and if you manage to create a cyclic graph past the N nodes deep that I’m willing to allocate memory for, then that’s on you.

But honestly either an opt-in or opt-out solution seems the most straightforward, it’s just figuring out what the default behavior should be. Although, if symlinks are currently supported, I would say for backwards compatibility, ignoring symlinks should require an explicit opt-out.

2 Likes

We’d also have to make the new behaviour opt-in, because following symlinks may be desirable in some cases.

macos sandbox uses symlinks heavily so opt-in makes sense.

I’ve made some updates to the DirectoryIterator and findChildFiles functions to make the symlink-following behaviour configurable. The default behaviour has not changed.

5 Likes