Find all files in a directory and its subdirectory - c++

I would like to list all files in a given directory and its different subdirectory.
I found some code that I modified but it doing a never ending loop and I don't understand why.
int getdir (string dir, vector<string> &files)
{
DIR *dp;
struct dirent *dirp;
if((dp = opendir(dir.c_str())) == NULL) {
cout << "Error(" << errno << ") opening " << dir << endl;
return errno;
}
while ((dirp = readdir(dp)) != NULL) {
files.push_back(string(dirp->d_name));
string test=dir+"/"+dirp->d_name;
getdir(test,files);
}
closedir(dp);
return 0;
}
My main:
int main()
{
string dir = string(".");
vector<string> files = vector<string>();
getdir(dir,files);
for (unsigned int i = 0;i < files.size();i++) {
cout << files[i] << endl;
}
return 0;
}
How could I fix it?

This is likely due to the "." directory entry returned as the first entry which represents the current directory.
This causes your algorithm to try to list the entries for ./. and then ././. endlessly repeating until your program would eventual crash when it ran out of memory.
There's also a ".." directory entry which represents the parent directory and can cause a similar recursive problem.
As noted by Jerry Coffin, symbolic links can also cause a very similar issue if you have links which point to a directory which is the parent or ancestor of the symbolic link. This could be avoided with a much more complicated check or just simply excluding DT_LNK type entries all together.
Another issue is that you're trying to call getdir on files as well as subdirectories.
Try the following changes
while ((dirp = readdir(dp)) != NULL) {
string name(dir->d_name);
if (name != "." && name != "..") {
string test=dir+"/"+name;
files.push_back(test);
if (dir->d_type == DT_DIR) {
getdir(test,files);
}
}
}

Related

C++ folder opening and file counting

So this is my code but I cant prevent it from printing out: . .. and it counts them as a file. I couldnt understand why.
The output is:
.
1files.
..
2files.
course3.txt
3files.
course2.txt
4files.
course1.txt
5files.
But there are only 3 files... It should say 3 files instead it counts that . .. and i dont know its meaning.
int folderO(){
DIR *dir;
struct dirent *ent;
int nFiles=0;
if ((dir = opendir ("sampleFolder")) != NULL) {
/* print all the files and directories within directory */
while ((ent = readdir (dir)) != NULL) {
std::cout << ent->d_name << std::endl;
nFiles++;
std::cout << nFiles << "files." << std::endl;
}
closedir (dir);
}
else {
/* could not open directory */
perror ("");
return EXIT_FAILURE;
}
}
. and .. are meta directories, current directory and parent directory respectively.
What you have found is that subdirectories are being printed along with files. And so are symlinks and other "weird" Unix-y stuff. Couple ways to filter those out if you don't want them printed:
If your system supports d_type in the dirent structure, check that d_type == DT_FILE before printing. (GNU page on dirent listing possible d_types)
if (ent->d_type == DT_FILE)
{
std::cout << ent->d_name << std::endl;
nFiles++;
std::cout << nFiles << "files." << std::endl;
}
if d_type is not supported, stat the file name and check that it is a file st_mode == S_ISREG.
struct stat statresult;
if (stat(ent->d_name, &statresult) == 0)
{
if (statresult.st_mode == S_ISREG)
{
std::cout << ent->d_name << std::endl;
nFiles++;
std::cout << nFiles << "files." << std::endl;
}
}
And of course there is the dumb-simple strcmp-based if statement, but this will list all other subdirectories.
Crap. Sorry. C++. that last line should be "And of course there is the dumb-simple std::string operator==-based if statement, but this will list all other subdirectories."
. is current directory inode (technically, a hardlink), .. is parent directory.
These are there for navigation. They're directories, perhaps you can ignore them if they are directories?
A Google search would have revealed that these are special folder names with these meanings:
. the current directory
.. the parent directory
Any tutorial on iterating a directory shows you how to filter these out with a simple "if" statement.

Drive Search does not locate .docx and .txt files

void findFile(const std::wstring &directory, wstring inputSearch)
{ // , char *currDrives){ //function for algorithm to search your computer to find what you want to load
//wcout << directory << endl;
//cout << "In" << endl;
wstring search = inputSearch;
std::wstring tmp = directory + L"\\*";
size_t found;
WIN32_FIND_DATAW file;
HANDLE search_handle = FindFirstFileW(tmp.c_str(), &file);
if (search_handle != INVALID_HANDLE_VALUE)
{
std::vector<std::wstring> directories;
do
{
found = tmp.find_last_of(L"/\\");
if (tmp.substr(found + 1) == inputSearch) {
cout << "Found File" << endl;
continueSearching = false;
foundFilePath = tmp;
}
if (file.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY)
{
if ((!lstrcmpW(file.cFileName, L".")) || (!lstrcmpW(file.cFileName, L".."))) {
continue;
}
}
tmp = directory + L"\\" + std::wstring(file.cFileName);
bruteForceComp++;
//std::wcout << tmp << std::endl;
if (file.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY)
directories.push_back(tmp);
} while (FindNextFileW(search_handle, &file) && continueSearching == true);
FindClose(search_handle);
for (std::vector<std::wstring>::iterator iter = directories.begin(), end = directories.end(); iter != end; ++iter) {
findFile(*iter, inputSearch);
}
}
}
We have this function that searches all sub-directories within a certain drive. We compare this to a search term that is inputted by the user. (The drive and the search term are the two parameters passed into the function) We locate .exe files fine and a bunch of other files, but when we try to locate a .docx or .txt file they do not appear. Any help with finding why would be appreciated. We're somewhat stumped.
Thanks!
foundFilePath is a static wstring by the way. It is where we store the matching path to the search term. Search term is inputted as word.exe or yay.docx also.

Reading file names from a directory

I'm reading all file names from a certain directory using this function:
void getdir(std::string dir, std::list<std::string>& files)
{
DIR *dp;
struct dirent *dirp;
if((dp = opendir(dir.c_str())) == NULL)
{
std::cout<< "Error: path " << dir << " onbekend!\n";
}
else
{
while ((dirp = readdir(dp)) != NULL)
{
files.push_back(std::string(dirp->d_name));
}
closedir(dp);
}
}
When I print them out, I get '.' or '..' too with the filenames. But the file '.' or '..' is not in the directory.
I'm using ubuntu 12.04 :)
. is current directory, and .. is parent directory, you will find them in every directory.

How do I ignore hidden files (and files in hidden directories) with Boost Filesystem?

I am iterating through all files in a directory recursively using the following:
try
{
for ( bf::recursive_directory_iterator end, dir("./");
dir != end; ++dir )
{
const bf::path &p = dir->path();
if(bf::is_regular_file(p))
{
std::cout << "File found: " << p.string() << std::endl;
}
}
} catch (const bf::filesystem_error& ex) {
std::cerr << ex.what() << '\n';
}
But this includes hidden files and files in hidden directories.
How do I filter out these files? If needed I can limit myself to platforms where hidden files and directories begin with the '.' character.
Unfortunately there doesn't seem to be a cross-platform way of handling "hidden". The following works on Unix-like platforms:
First define:
bool isHidden(const bf::path &p)
{
bf::path::string_type name = p.filename();
if(name != ".." &&
name != "." &&
name[0] == '.')
{
return true;
}
return false;
}
Then traversing the files becomes:
try
{
for ( bf::recursive_directory_iterator end, dir("./");
dir != end; ++dir)
{
const bf::path &p = dir->path();
//Hidden directory, don't recurse into it
if(bf::is_directory(p) && isHidden(p))
{
dir.no_push();
continue;
}
if(bf::is_regular_file(p) && !isHidden(p))
{
std::cout << "File found: " << p.string() << std::endl;
}
}
} catch (const bf::filesystem_error& ex) {
std::cerr << ex.what() << '\n';
}
Let's assume for now that you want to ignore files which start with a '.'. This is the standard indication in Unix for a hidden file. I suggest writing a recursive function to visit each file. In pseudocode, it looks something like this:
visitDirectory dir
for each file in dir
if the filename of file does not begin with a '.'
if file is a directory
visitDirectory file
else
do something with file (perhas as a separate function call?)
This avoids the need to search the whole path of a file to determine whether or not we want to deal with it. Instead, we simply skip any directories which are "hidden."
I can think of several iterative solutions as well, if that's what you prefer. One is to have a stack or queue to keep track of which directory to visit next. Basically this emulates the recursive version with your own data structure. Alternatively, if you are stuck on parsing the full path of the file, simply make sure you get the absolute path. This will guarantee that you don't encounter a directory with a name like './' or '../', which would cause problems with checking for a hidden file.

C++ Multi threaded directory scan code

I was looking how to write a multi threaded C++ code for scanning directory and get list of all files underneath. I have written a single threaded code which can do and below the code which can do that.
#include <sys/types.h>
#include <dirent.h>
#include <errno.h>
#include <vector>
#include <string>
#include <iostream>
#include <sys/stat.h> /* for stat() */
using namespace std;
int isDir(string path)
;
/*function... might want it in some class?*/
int getdir (string dir, vector<string> &dirlist, vector<string> &fileList)
{
DIR *dp;
struct dirent *dirp, *dirFp ;
if((dp = opendir(dir.c_str())) == NULL) {
cout << "Error(" << errno << ") opening " << dir << endl;
return errno;
}
while ((dirp = readdir(dp)) != NULL) {
if (strcmp (dirp->d_name, ".") != 0 && strcmp(dirp->d_name, "..") != 0) {
//dirlist.push_back(string(dirp->d_name));
string Tmp = dir.c_str()+ string("/") + string(dirp->d_name);
if(isDir(Tmp)) {
//if(isDir(string(dir.c_str() + dirp->d_name))) {
dirlist.push_back(Tmp);
getdir(Tmp,dirlist,fileList);
} else {
// cout << "Files :"<<dirp->d_name << endl;
fileList.push_back(string(Tmp));
}
}
}
closedir(dp);
return 0;
}
int isDir(string path)
{
struct stat stat_buf;
stat( path.c_str(), &stat_buf);
int is_dir = S_ISDIR( stat_buf.st_mode);
// cout <<"isDir :Path "<<path.c_str()<<endl;
return ( is_dir ? 1: 0);
}
int main()
{
string dir = string("/test1/mfs");
vector<string> dirlist = vector<string>();
vector<string> fileList = vector<string>();
getdir(dir,dirlist,fileList);
#if 0
for (unsigned int i = 0;i < dirlist.size();i++) {
cout << "Dir LIst" <<dirlist[i] << endl;
//string dirF = dir + "/" + dirlist[i];
//getdir(dirF,fileList);
}
#endif
for (unsigned int i = 0; i < fileList.size(); i++)
cout << "Files :"<<fileList[i]<< endl;
return 0;
}
Now issue is that it is single threaded and I need to scan say about 8000 directories under which file can be present. So I am not getting how to do so as number of directories can vary as it is decided by N dimension matrix.
Any help in this regard will be great. Thanks in advance.
boost::filesystem has directory_iterator and recursive_directory_iterator, the former will get all the contents of a directory but not recurse sub-directories, the latter will also recurse subdirectories.
With regard to thread-safety, you could lock a mutex then copy the results into a std::vector or two vector instances, one for files and one for directories, in which case you will at least have a local snapshot copy.
To actual "freeze" the file-system at that point to stop any process modifying it is not something you can normally do - well you could try setting the file attributes on it to read-only then change it back later but you will need to have permission to do that first.