directory_iterator file_iter to rename files in a folder - c++

I wanted to rename the files in a directory.There are 52 folders in the directory. Each folder has a different name and has around 40 files in each of them.I wanted to extract the name of a particular folder and attach that name to the name of the files in that particular folder.
It worked fine, when there was only 31 or less files in each folder. But whenever the number of files in a particular folder was above 31 the rename algorithm i wrote failed. I am not able to figure out why it crashes when there are more files. Do enlighten me if u understand why...!
I'm attaching the code:
int main( int argc, char** argv ){
directory_iterator end_iter;
directory_iterator file_itr;
string inputName;
string checkName;
inputName.assign(argv[1]);
if (is_directory(inputName))
{
for (directory_iterator dir_itr(inputName); dir_itr != end_iter; ++dir_itr)
{
if (is_directory(*dir_itr))
{
for (directory_iterator file_itr(*dir_itr); file_itr != end_iter; ++file_itr)
{
string folderName(dir_itr->path().filename().string());
if (is_regular_file(*file_itr))
{
std::string fileType = file_itr->path().extension().string();
std::transform(fileType.begin(), fileType.end(), fileType.begin(), (int(*)(int))std::toupper);
if (fileType == ".JPG" || fileType == ".JPEG" || fileType == ".JPG" || fileType == ".PGM")
{
string filename(file_itr->path().string());
string pathName(file_itr->path().parent_path().string());
string oldName(file_itr->path().filename().string());
cout << folderName << endl;
folderName += "_";
folderName += oldName;
string newPathName = pathName + "\\" + folderName;
cout << pathName <<"\\"<< folderName << endl;
//RENAMING function
rename(file_itr->path(), path(newPathName.c_str()));
}
}
}
}
}
}
}

It's likely that Boost's directory_iterator implementation is getting confused by you renaming files that are in the directory listing.
From the docs:
Warning: If a file or sub-directory is removed from or added to a directory after the construction of a directory_iterator for the directory, it is unspecified whether or not subsequent incrementing of the iterator will ever result in an iterator whose value is the removed or added directory entry.
I recommend trying it in two phases. In the first phase, use the code you have now to build a vector<pair<string, string> > instead of renaming the file. Then, once you've scanned the directory, it should just be a matter of iterating through the list performing the actual renames.

Related

Recursive listing files in C++ doesn't enter all subdirectories

!!!Solved!!!
Thank you guys for your help, it's all working now. I made changes to my code as suggested by #RSahu and got it to work.
Thanks for all your input I've been really stuck with this.
To #Basile: I will definitely check that out but for this particular piece of code I'm not gonna use it because it looks way too complicated :) But thanks for suggestion.
Original question
I'm trying to make a C++ code to list all files in given directory and it's subdirectories.
Quick explanation
Idea is that function list_dirs(_dir, _files, _current_dir) will start in top directory and put files into vector _files and when it find a directory it will call itself on this directory. The _current_dir is there to be prepended to file name if in subdirectory because I need to know the path structure (it's supposed to generate sitemap.xml).
In list_dirs there is a call to list_dir which simply returns all files in current directory, not making difference between file and directory.
My problem
What codes does now is that it lists all files in original directory and then all files in one subdirectory but skipping all other subdirectories. It will list them but not the files in them.
And to be even more cryptic, it list files only in this one specific directory and none other. I tried running it in multiple locations but it never went into any other directory.
Thanks in advance and please note that I am beginner at C++ so don't be harsh ;)
LIST_DIR
int list_dir(const std::string& dir, std::vector<std::string>& files){
DIR *dp;
struct dirent *dirp;
unsigned fileCount = 0;
if ((dp = opendir(dir.c_str())) == NULL){
std::cout << "Error opening dir." << std::endl;
}
while ((dirp = readdir(dp)) != NULL){
files.push_back(std::string (dirp->d_name));
fileCount++;
}
closedir(dp);
return fileCount;
}
and LIST_DIRS
int list_dirs (const std::string& _dir, std::vector<std::string>& _files, std::string _current_dir){
std::vector<std::string> __files_or_dirs;
list_dir(_dir, __files_or_dirs);
std::vector<std::string>::iterator it = __files_or_dirs.begin();
struct stat sb;
while (it != __files_or_dirs.end()){
if (lstat((&*it)->c_str(), &sb) == 0 && S_ISDIR(sb.st_mode)){
/* how to do this better? */
if (*it == "." || *it == ".."){
__files_or_dirs.erase(it);
continue;
}
/* here it should go into sub-directory */
list_dirs(_dir + *it, _files, _current_dir + *it);
__files_or_dirs.erase(it);
} else {
if (_current_dir.empty()){
_files.push_back(*it);
} else {
_files.push_back(_current_dir + "/" + *it);
}
++it;
}
}
}
The main problem is in the line:
if (lstat((&*it)->c_str(), &sb) == 0 && S_ISDIR(sb.st_mode)){
You are using the name of a directory entry in the call to lstat. When the function is dealing with a sub-directory, the entry name does not represent a valid path. You need to use something like:
std::string entry = *it;
std::string full_path = _dir + "/" + entry;
if (lstat(full_path.c_str(), &sb) == 0 && S_ISDIR(sb.st_mode)){
Suggestions for improvement
Update list_dir so that it doesn't include "." or ".." in the output. It makes sense to me to exclude those files to start with.
int list_dir(const std::string& dir, std::vector<std::string>& files){
DIR *dp;
struct dirent *dirp;
unsigned fileCount = 0;
if ((dp = opendir(dir.c_str())) == NULL){
std::cout << "Error opening dir." << std::endl;
}
while ((dirp = readdir(dp)) != NULL){
std::string entry = dirp->d_name;
if ( entry == "." or entry == ".." )
{
continue;
}
files.push_back(entry);
fileCount++;
}
closedir(dp);
return fileCount;
}
In list_dirs, there is no need to erase items from _files_or_dirs. The code can be simplified with a for loop and by removing the calls to erase items from _files_or_dirs.
It's not clear to me what the purpose of _current_dir is. Perhaps it can be removed.
Here's an updated version of the function. _current_dir is used only to construct the value of the argument in the recursive call.
int list_dirs (const std::string& _dir,
std::vector<std::string>& _files, std::string _current_dir){
std::vector<std::string> __files_or_dirs;
list_dir(_dir, __files_or_dirs);
std::vector<std::string>::iterator it = __files_or_dirs.begin();
struct stat sb;
for (; it != __files_or_dirs.end() ; ++it){
std::string entry = *it;
std::string full_path = _dir + "/" + entry;
if (lstat(full_path.c_str(), &sb) == 0 && S_ISDIR(sb.st_mode)){
/* how to do this better? */
/* here it should go into sub-directory */
list_dirs(full_path, _files, _current_dir + "/" + entry);
} else {
_files.push_back(full_path);
}
}
}
For this line:
if (lstat((&*it)->c_str(), &sb) == 0 && S_ISDIR(sb.st_mode)){
Note that readdir and consequently list_dir only return the file name, not the full file path. So at this point (&*it)->c_str() only has a file name (e.g. "input.txt"), not the full path, so when you call lstat on a file in a subdirectory, the system can't find it.
To fix this, you will need to add in the file path before calling lstat. Something like:
string fullFileName;
if (dir.empty()){
fullFileName = *it;
} else {
fullFileName = dir + "/" + *it;
}
if (lstat(fullFileName.c_str(), &sb) == 0 && S_ISDIR(sb.st_mode)){
You may have to use _currentDir instead of dir, depending on what they are actually for (I couldn't follow your explanation).
I am not sure all of the problems in your code but I can tell you that this line and the other one similar to it are going to cause you problems:
__files_or_dirs.erase(it);
When you call erase you invalidate the iterator and references at or after the point of the erase, including the end() iterator (see this erase reference). You are calling erase and then not storing the returned iterator and are then looking at it again after this call which is not a good thing to do. You should at least change the line to this so that you capture the returned iterator which should point to the element just after the erased element (or end() if it was the last element)
it = __files_or_dirs.erase(it);
It also appears from the code you posted that you have a redundancy between _dir and _current_dir. You do not modify either of them. You pass them in as the same value and they stay the same value throughout the function execution. Unless this is simplified code and you are doing something else, I would recommend you remove the _current_dir one and just stick with _dir. You can replace the line in the while loop with _dir where you are building the file name and you will have simplified your code which is always a good thing.
A simpler way on Linux is to use the nftw(3) function. It is scanning recursively the file tree, and you give it some handler function.

ifstream not working with dirent.h

I'm testing optimizations for dijkstra algorithm and to make it easier to open files I used "dirent.h" to get all the test files in the running path and then ifstream to open this file.
the readDirec method reads all the files in the directory and ignores folder and puts those files names in a vector called files.
void selectDirec(){
files.clear();
DIR *dir;
struct dirent *ent;
if ((dir = opendir (".")) != NULL) {
while ((ent = readdir (dir)) != NULL) {
if(opendir(ent->d_name) == NULL){
files.push_back(ent->d_name);
}
}
closedir (dir);
} else {
cout<<"directory error"<<endl;
}
}
after that I uses a function called selectFile which assigns the name of the file the user chooses to a variable called fileName.
void selectFile(){
selectDirec();
for(int i = 0 ; i < files.size() ; i++){
cout<<i+1<<" : "<<files[i]<<endl;
}
int choice = 0;
do{
cout<<"enter file number"<<endl;
cin>>choice;
}while(choice > files.size());
choice--;
fileName = files[choice];
cout<<fileName<<":"<<endl;
}
after that I enter my readGraph function which opens the file and continue graph operations
void readGraph(){
ifstream ifile; ifile.open(fileName);
if(!ifile.is_open()){
cout<<"no file with the name specified"<<endl;
eflag = true;
return;
}
...
...
}
initialization:
vector<char *> files;
char * fileName ;
now I have those 5 files to test which I got from here http://algs4.cs.princeton.edu/44sp/:
tinyEWD.txt contains 8 vertices and 15 edges [140B]
mediumEWD.txt contains 250 vertices and 2,546 edges[40KB]
1000EWG.txt contains 1,000 vertices and 16,866 edges[313KB]
10000EWG.txt contains 10,000 vertices and 123,462 edges[2.4MB]
NYC.txt . contains 264346 vertices and 733846 edges[12.7MB].
but there's a weird problem with those 3 files:
'mediumEWD' , '10000EWD.txt' , 'NYC.txt'
when I choose any of them the code shows me "no file with the name specified" that in the else statement in readGraph.
but when I enter their name manually and comment selectDirec and selectFile the program opens them successfully.
P.S. I checked the file name and spacing and everything.
P.S.2 currently running this code on ubuntu 14.04 LTS.
thanks in advance.
if(opendir(ent->d_name) == NULL){
files.push_back(ent->d_name);
}
What is files? I suspect that you are using a std::vector<const char *>, or something along the same lines.
This won't work. d_name is a part of the dirent structure. Immediately afterwards, and certainly after the closedir(), that pointer is no longer valid, and points to deallocated memory.
Looks to me like you then proceed and attempt to use the no-longer valid pointer as the filename parameter to std::ifstream.
You should use a std::vector<std::string> to store the filenames, and use the c_str() member function to extract a pointer to a C-style string, for the open() call.
You can't be using a vector of std::strings here, this must be a vector of raw character pointers. That's because you're assigning one of its values to fileName, whatever it is, and then passing it directly to open() without using c_str(). So it can't be a vector of strings.

Writing image file names within folders into a text file c++

I wrote a little bit of code to easily add the file names of images i had within a directory and add them to a list in a text file. This worked fine, but when the images were within a sub-folder it would just add the folder name as an entry into the text file.
I need it to be able to check whether it's a folder and then add the correct directory into the text for the images which might be within the sub-folder, e.g subfolder/image.jpg
Can't work out what i'd need to add. This is what I've got so far...
#include<stdio.h>
#include<cstdlib>
#include<iostream>
#include<string.h>
#include<fstream>
#include<dirent.h>
void listFile();
std::ofstream myfile;
int main(){
listFile();
return 0;
}
void listFile(){
DIR *pDIR;
struct dirent *entry;
if( pDIR=opendir("/home/hduser/Example2Files/TrainImages/") ){
while(entry = readdir(pDIR)){
if( strcmp(entry->d_name, ".") != 0 && strcmp(entry->d_name, "..") != 0 )
myfile.open ("/home/hduser/Example2Files/TrainImages/train.txt",std::ios_base::app);
myfile << entry->d_name << "\n";
myfile.close();
}
closedir(pDIR);
}
}
To traverse into directories, you will (probably) have to modify your code such that you have a function that takes the name of a directory, and lists regular files within that directory. If it finds a directory, it should call recursively with the concatenated name of the current directory and the found directory.
To identify if the file is a directory, you can use something like entry.d_type == DT_DIR.

readdir(): re-reading certain files

I got a function which task is to rename all files in a folder however, it re-rename certain files:
http://i.imgur.com/JjN8Qb2.png, the same kind of "error" keeps occurring for every tenth number onwards. What exactly is causing this "error"?
The two arguments to the function is the path for the folder and what start value the first file should have.
int lookup(std::string path, int *start){
int number_of_chars;
std::string old_s, file_format, new_s;
std::stringstream out;
DIR *dir;
struct dirent *ent;
dir = opendir (path.c_str());
if (dir != NULL) {
// Read pass "." and ".."
ent = readdir(dir);
ent = readdir(dir);
// Change name of all the files in the folder
while((ent = readdir (dir)) != NULL){
// Old string value
old_s = path;
old_s.append(ent->d_name);
// Get the format of the image
file_format = ent->d_name;
number_of_chars = file_format.rfind(".");
file_format.erase(0,number_of_chars);
// New string value
new_s = path;
out << *start;
new_s += out.str();
new_s.append(file_format);
std::cout << "Successfully changed name on " << ent->d_name << "\tto:\t" << *start << file_format << std::endl;
// Switch name on the file from old string to new string
rename(old_s.c_str(), new_s.c_str());
out.str("");
*start = *start+1;
}
closedir (dir);
}
// Couldn't open
else{
std::cerr << "\nCouldn't open folder, check admin privileges and/or provided file path\n" << std::endl;
return 1;
}
return 0;
}
You are renaming files to the same folder in which the original files were, resulting in an infinite loop. You renamed 04.png to 4.png but since you are iterating over all files in the folder, at some point you're going to iterate to the "new" 4.png file (in your smaple, on the 40th iteration) and rename that file to 40.png and so on...
The easiest way to resolve this with minimal changes to the existing code is to "rename" (move) the files to a temporary folder with their new names. Something like:
new_s = temp_path;
out << *start;
new_s += out.str();
new_s.append(file_format);
// Switch name on the file from old string to new string
rename(old_s.c_str(), new_s.c_str());
and when you are done renaming all the files in path (outside the while loop), delete the folder and "rename" (move) temp_path to `path:
closedir (dir);
deletedir(path);
rename(temp_path, path);
`
Possible problems I see:
Renaming files causes them to be fed to your algorithm twice.
Your algorithm for computing the new filename is wrong.
You should be able to write a test for this easily, which in turn should help you fix the problem or write a more specific question. Other than that, I don't see any grave issues, but it would help if you reduced the scope of variables a bit, which would make sure that different iterations don't influence each other.

list top 10 files by size in a unix directory

I am trying a to read a unix directory (including all subdirectories) using c++ and list the top 10 largest files.
I have read that I can use #include dirent.h and use struct dirent but I am having trouble passing the directory name as a variable to opendir/readdir.
Basically it doesn't recognise it and says file/directory not found.
Please can you help me with how I can do this in c++ and print out the top 10 largest files in the directory? Thanks
DIR *dir;
struct dirent *ent;
dir = opendir ("homedir");
if (dir != NULL) {
while ((ent = readdir (dir)) != NULL) {
cout << ent->d_name <<endl;
}
closedir (dir);
} else {
cout << "Can't open directory" << endl;
}
You don't really give enough details, but when you are reading
recursively, are you postfixing the names you read to the
previous names. Reading a directory doesn't change the current
directory, so your function should look more or less like:
std::vector
readDirectoriesRecursively( std::string const& path )
{
std::vector results;
for each name in path
if is directory
results.insert(
results.end(),
readDirectoriesRecursively( path + '/' + filename ) ) ;
else
results.push_back( FileInfo( path + '/' + filename ) );
return results;
}
In the constructor of FileInfo, use stat to obtain the size. Once you have the results, sort by size, and output the first 10.
You're almost there. You have all the filenames. With these, you can do a stat to obtain the filesize for each file. When you sort the filesizes descending, you have the ten largest files.
struct stat buf;
stat(ent->d_name, &buf);
See the detailed example in the man page.