I am trying to convert a path string to a normalized (neat) format where any number of directory separators "\\" or "/" is converted to one default directory separator:
R"(C:\\temp\\Recordings/test)" -> R"(C:\temp\Recordings\test)"
Code:
#include <string>
#include <vector>
#include <iostream>
#include <filesystem>
std::string normalizePath(const std::string& messyPath) {
std::filesystem::path path(messyPath);
std::string npath = path.make_preferred().string();
return npath;
}
int main()
{
std::vector<std::string> messyPaths = { R"(C:\\temp\\Recordings/test)", R"(C://temp\\Recordings////test)" };
std::string desiredPath = R"(C:\temp\Recordings\test)";
for (auto messyPath : messyPaths) {
std::string normalizedPath = normalizePath(messyPath);
if (normalizedPath != desiredPath) {
std::cout << "normalizedPath: " << normalizedPath << " != " << desiredPath << std::endl;
}
}
std::cout << "Press any key to continue.\n";
int k;
std::cin >> k;
}
Output on Windows VS2019 x64:
normalizedPath: C:\\temp\\Recordings\test != C:\temp\Recordings\test
normalizedPath: C:\\temp\\Recordings\\\\test != C:\temp\Recordings\test
Reading the std::filepath documentation:
A path can be normalized by following this algorithm:
1. If the path is empty, stop (normal form of an empty path is an empty path)
2. Replace each directory-separator (which may consist of multiple slashes) with a single path::preferred_separator.
...
Great, but which library function does this? I do not want to code this myself.
As answered by bolov:
std::string normalizePath(const std::string& messyPath) {
std::filesystem::path path(messyPath);
std::filesystem::path canonicalPath = std::filesystem::weakly_canonical(path);
std::string npath = canonicalPath.make_preferred().string();
return npath;
}
weakly_canonical does not throw an exception if path does not exist.
canonical does.
the question is simple , I want to find a file path inside a directory but I have only part of the filename, so here is a functions for this task
void getfiles(const fs::path& root, const string& ext, vector<fs::path>& ret)
{
if(!fs::exists(root) || !fs::is_directory(root)) return;
fs::recursive_directory_iterator it(root);
fs::recursive_directory_iterator endit;
while(it != endit)
{
if(fs::is_regular_file(*it)&&it->path().extension()==ext) ret.push_back(it->path());//
++it;
}
}
bool find_file(const filesystem::path& dir_path, const filesystem::path file_name, filesystem::path& path_found) {
const fs::recursive_directory_iterator end;
const auto it = find_if(fs::recursive_directory_iterator(dir_path), end,
[file_name](fs::path e) {
cerr<<boost::algorithm::icontains(e.filename().native() ,file_name.native())<<endl;
return boost::algorithm::icontains(e.filename().native() ,file_name.native());//
});
if (it == end) {
return false;
} else {
path_found = it->path();
return true;
}
}
int main (int argc, char* argv[])
{
vector<fs::path> inputClass ;
fs::path textFiles,datasetPath,imgpath;
textFiles=argv[1];
datasetPath=argv[2];
getfiles(textFiles,".txt",inputClass);
for (int i=0;i<inputClass.size();i++)
{
ifstream lblFile(inputClass[i].string().c_str());
string line;
fs::path classname=inputClass[i].parent_path()/inputClass[i].stem().string();
cerr<<classname.stem()<<endl;
while (getline(lblFile,line))
{
bool find=find_file(datasetPath,line,imgpath);
if (find)
{
while(!fs::exists(classname))
fs::create_directories (classname);
fs::copy(imgpath,classname/imgpath.filename());
cerr<<"Found\n";
}
else
cerr<<"Not Found \n";
}
lblFile.close();
}
}
Console out:
"490"
vfv343434.jpeg||E9408000EC0
0
fsdfdsfdfsf.jpeg||E9408000EC0
0
1200E9408000EC0.jpeg||E9408000EC0
0
Not Found
but when I set the search string manually it works fine ! I tried other methods for searching string like std::find but all the methods fail to find the substring, it seems there is problem with input string (line) I printed all the chars but no especial characters or anything.
if I set the search string manually it works as desired
string search="E9408000EC0";
cerr<<e.filename().native()<<"||"<<search<<endl;
cerr<<boost::algorithm::icontains(e.filename().native() ,search)<<endl;
the results for above change is like
"490"
vfv343434.jpeg||E9408000EC0
0
fsdfdsfdfsf.jpeg||E9408000EC0
0
1200E9408000EC0.jpeg||E9408000EC0
1
Found
I cannot reproduce this.
The only hunch I have is that on your platform, perhaps the string() accessor is not returning the plain string, but e.g. the quoted path. That would break the search. Consider using the native() accessor instead.
(In fact, since file_name is NOT a path, but a string pattern, suggest passing the argument as std::string__view or similar instead.)
Live On Coliru
#include <boost/filesystem.hpp>
#include <boost/algorithm/string.hpp>
#include <iostream>
namespace fs = boost::filesystem;
template <typename Out>
void find_file(const fs::path& dir_path, const fs::path file_name, Out out) {
fs::recursive_directory_iterator it(dir_path), end;
std::copy_if(it, end, out, [file_name](fs::path e) {
return boost::algorithm::icontains(e.filename().native(),
file_name.native());
});
}
int main() {
fs::path d = "a/b/c/e";
fs::create_directories(d);
{
std::ofstream ofs(d / "1200E9408000EC0.jpeg");
}
std::cout << fs::path("000EC0").native() << "\n";
std::vector<fs::path> found;
find_file(".", "000EC0", back_inserter(found));
for (auto &f : found)
{
std::cout << "Found: " << f << "\n";
}
}
Prints
000EC0
Found: "./a/b/c/e/1200E9408000EC0.jpeg"
UPDATE: Code Review
To the updated question, came up with an somewhat improved tester that works with boost::filesystem and with std::filesystem just the same.
There are many small improvements (removing repetition, explicit conversions, using optional to return optional matches, etc.
Also added a whitespace trim to avoid choking on extraneous whitespace on the input lines:
Live On Coliru (-DUSE_BOOST_FS)
Live On Coliru (std library)
#include <boost/algorithm/string.hpp>
#include <fstream>
#include <iostream>
using boost::algorithm::icontains;
using boost::algorithm::trim;
#if defined(USE_BOOST_FS)
#include <boost/filesystem.hpp>
namespace fs = boost::filesystem;
using boost::system::error_code;
#else
#include <filesystem>
namespace fs = std::filesystem;
using std::error_code;
#endif
void getfiles(
const fs::path& root, const std::string& ext, std::vector<fs::path>& ret)
{
if (!exists(root) || !is_directory(root))
return;
for (fs::recursive_directory_iterator it(root), endit; it != endit; ++it) {
if (is_regular_file(*it) && it->path().extension() == ext)
ret.push_back(it->path()); //
}
}
std::optional<fs::path> find_file(const fs::path& dir_path, fs::path partial)
{
fs::recursive_directory_iterator end,
it = fs::recursive_directory_iterator(dir_path);
it = std::find_if(it, end, [partial](fs::path e) {
auto search = partial.native();
//std::cerr << e.filename().native() << "||" << search << std::endl;
auto matches = icontains(e.filename().native(), search);
std::cerr << e << " Matches: " << std::boolalpha << matches
<< std::endl;
return matches;
});
return (it != end)
? std::make_optional(it->path())
: std::nullopt;
}
auto readInputClass(fs::path const& textFiles)
{
std::vector<fs::path> found;
getfiles(textFiles, ".txt", found);
return found;
}
int main(int argc, char** argv)
{
std::vector<std::string> const args(argv, argv + argc);
auto const textFiles = readInputClass(args.at(1));
std::string const datasetPath = args.at(2);
for (fs::path classname : textFiles) {
// open the text file
std::ifstream lblFile(classname);
// use base without extension as output directory
classname.replace_extension();
if (!fs::exists(classname)) {
if (fs::create_directories(classname))
std::cerr << classname << " created" << std::endl;
}
for (std::string line; getline(lblFile, line);) {
trim(line);
if (auto found = find_file(datasetPath, line)) {
auto dest = classname / found->filename();
error_code ec;
copy(*found, dest, ec);
std::cerr << dest << " (" << ec.message() << ")\n";
} else {
std::cerr << "Not Found \n";
}
}
}
}
Testing from scratch with
mkdir -pv textfiles dataset
touch dataset/{vfv343434,fsdfdsfdfsf,1200E9408000EC0}.jpeg
echo 'E9408000EC0 ' > textfiles/490.txt
Running
./a.out textfiles/ dataset/
Prints
"textfiles/490" created
"dataset/1200E9408000EC0.jpeg" Matches: true
"textfiles/490/1200E9408000EC0.jpeg" (Success)
Or on subsequent run
"dataset/fsdfdsfdfsf.jpeg" Matches: false
"dataset/1200E9408000EC0.jpeg" Matches: true
"textfiles/490/1200E9408000EC0.jpeg" (File exists)
BONUS
Doing some more diagnostics and avoiding repeatedly traversing the filesystem for each pattern. The main program is now:
Live On Coliru
int main(int argc, char** argv)
{
std::vector<std::string> const args(argv, argv + argc);
Paths const classes = getfiles(args.at(1), ".txt");
Mappings map = readClassMappings(classes);
std::cout << "Procesing " << map.size() << " patterns from "
<< classes.size() << " classes" << std::endl;
processDatasetDir(args.at(2), map);
}
And the remaining functions are implemented as:
// be smart about case insenstiive patterns
struct Pattern : std::string {
using std::string::string;
using std::string::operator=;
#ifdef __cpp_lib_three_way_comparison
std::weak_ordering operator<=>(Pattern const& other) const {
if (boost::ilexicographical_compare(*this, other)) {
return std::weak_ordering::less;
} else if (boost::ilexicographical_compare(other, *this)) {
return std::weak_ordering::less;
}
return std::weak_ordering::equivalent;
}
#else
bool operator<(Pattern const& other) const {
return boost::ilexicographical_compare(*this, other);
}
#endif
};
using Paths = std::vector<fs::path>;
using Mapping = std::pair<Pattern, fs::path>;
using Patterns = std::set<Pattern>;
using Mappings = std::set<Mapping>;
Mappings readClassMappings(Paths const& classes)
{
Mappings mappings;
for (fs::path classname : classes) {
std::ifstream lblFile(classname);
classname.replace_extension();
for (Pattern pattern; getline(lblFile, pattern);) {
trim(pattern);
if (auto [it, ok] = mappings.emplace(pattern, classname); !ok) {
std::cerr << "WARNING: " << std::quoted(pattern)
<< " duplicates " << std::quoted(it->first)
<< std::endl;
}
}
}
return mappings;
}
size_t processDatasetDir(const fs::path& datasetPath, Mappings const& patterns)
{
size_t copied = 0, failed = 0;
Patterns found;
using It = fs::recursive_directory_iterator;
for (It it = It(datasetPath), end; it != end; ++it) {
if (!it->is_regular_file())
continue;
fs::path const& entry = *it;
for (auto& [pattern, location]: patterns) {
if (icontains(it->path().filename().native(), pattern)) {
found.emplace(pattern);
if (!exists(location) && fs::create_directories(location))
std::cerr << location << " created" << std::endl;
auto dest = location / entry.filename();
error_code ec;
copy(entry, dest, ec);
std::cerr << dest << " (" << ec.message() << ") from "
<< std::quoted(pattern) << "\n";
(ec? failed : copied) += 1;
}
}
}
std::cout << "Copied:" << copied
<< ", missing:" << patterns.size() - found.size()
<< ", failed: " << failed << std::endl;
return copied;
}
With some more "random" test data:
mkdir -pv textfiles dataset
touch dataset/{vfv343434,fsdfdsfdfsf,1200E9408000EC0}.jpeg
echo .jPeg > textfiles/all_of_them.txt
echo $'E9408000EC0 \n e9408000ec0\nE9408\nbOgUs' > textfiles/490.txt
Running as
./a.out textfiles/ dataset/
Prints:
WARNING: "e9408000ec0" duplicates "E9408000EC0"
Procesing 4 patterns from 2 classes
"textfiles/all_of_them" created
"textfiles/all_of_them/1200E9408000EC0.jpeg" (Success) from ".jPeg"
"textfiles/490" created
"textfiles/490/1200E9408000EC0.jpeg" (Success) from "E9408"
"textfiles/490/1200E9408000EC0.jpeg" (File exists) from "E9408000EC0"
"textfiles/all_of_them/vfv343434.jpeg" (Success) from ".jPeg"
"textfiles/all_of_them/fsdfdsfdfsf.jpeg" (Success) from ".jPeg"
Copied:4, missing:1, failed: 1
i have loaded both of my files into an array and im trying to compare both of the files to get the comparisons inside the file. However when I run my code I don't receive an output.
This is the contents of both files.
file1
tdogicatzhpigu
file2
dog
pig
cat
rat
fox
cow
So when it does a comparison between the words from search1.txt and the words from text1.txt. I want to find the occurence of each word from search1.txt in text1.txt
What I want to eventually output is whether it has been found the index of the location inside the array.
e.g
"dog". Found, location 1.
Here is my code
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
int main()
{
ifstream file1("text1.txt");
if (file1.is_open())
{
string myArray[1];
for (int i = 0; i < 1; i++)
{
file1 >> myArray[i];
any further help would be greatly appreciated. Thanks in advance.
I believe the goal is to search the text in file1 for each word in file2.
You can't use equality for the two strings, as they aren't equal. You'll need to use the std::string::find method:
std::string target_string;
std::getline(file1, target_string);
std::string keyword;
while (getline(file2, keyword))
{
const std::string::size_type position = target_string.find(keyword);
std::cout << "string " << keyword << " ";
if (position == std::string::npos)
{
std::cout << "not found.\n";
}
else
{
std::cout << "found at position " << position << "\n";
}
}
Edit 1:
An implemented example:
#include <iostream>
#include <string>
using std::cout;
using std::string;
using std::endl;
int main()
{
const std::string target_string = "tdogicatzhpigu";
const std::string key_list[] =
{
"dog",
"pig",
"cat",
"rat",
"fox",
"cow",
};
static const unsigned int key_quantity =
sizeof(key_list) / sizeof(key_list[0]);
for (unsigned int i = 0; i < key_quantity; ++i)
{
const std::string::size_type position = target_string.find(key_list[i]);
std::cout << "string " << key_list[i] << " ";
if (position == std::string::npos)
{
std::cout << "not found.\n";
}
else
{
std::cout << "found at position " << position << "\n";
}
}
return 0;
}
I know how to get the list of files in Unix. The c++ program that I tried is below. Now how do I print the largest files in descending order?
int main() {
DIR* drstrm = opendir(".");
if (drstrm == NULL) {
perror("error opening directory");
return 1;
}
struct dirent* directoryentry = readdir(drstrm);
while (directoryentry != NULL) {
cout << (*directoryentry).d_name << endl;
directoryentry = readdir(drstrm);
}
return 0;
}
Since you said you can use C++17, the filesystem library it introduces makes this really easy (And portable to systems that don't have opendir()/readdir()):
#include <iostream>
#include <vector>
#include <filesystem>
#include <algorithm>
#include <string>
int main(int argc, char **argv) {
if (argc != 2) {
std::cerr << "Usage: " << argv[0] << " DIRECTORY\n";
return 1;
}
std::vector<std::filesystem::directory_entry> files;
for (const auto &dirent : std::filesystem::directory_iterator(argv[1])) {
if (dirent.is_regular_file()) {
files.push_back(dirent);
}
}
std::sort(files.begin(), files.end(), [](const auto &a, const auto &b){
return a.file_size() > b.file_size(); });
for (const auto &dirent : files) {
// Quotes the filenames
// std::cout << dirent.path() << '\n';
// Doesn't quote
std::cout << static_cast<std::string>(dirent.path()) << '\n';
}
return 0;
}
Usage:
$ g++-8 -std=c++17 -O -Wall -Wextra test.cpp -lstdc++fs
$ ./a.out .
a.out
bigfile.txt
test.cpp
smallfile.txt
etc.
If you can't use C++17, the same approach still holds: Put the file names and their sizes in a vector, and sort based on the sizes using > instead of the normal < (Which would sort from smallest to largest). On POSIX systems, you can get the file size with stat(2).
To do this you are going to have to read the file info into a data structure (like a std::vector) and then sort the file info according to their size.
The old fashioned way could go something like this:
DIR* drstrm = opendir(".");
if(drstrm == NULL)
throw std::runtime_error(std::strerror(errno));
struct stat st; // this is to use decltype
// keep info from dirent & stat in one place
struct file_info
{
std::string name;
decltype(st.st_size) size;
};
// store list of files here to be sorted
std::vector<file_info> files;
while(dirent* entry = readdir(drstrm))
{
// get file info
if(::stat(entry->d_name, &st) == -1)
throw std::runtime_error(std::strerror(errno));
// is it a regular file?
if(!S_ISREG(st.st_mode))
continue;
// store it ready for sorting
files.push_back({entry->d_name, st.st_size});
}
// sort the file_info objects according to size
std::sort(std::begin(files), std::end(files), [](file_info const& a, file_info const& b){
return a.size < b.size;
});
// print them out
for(auto const& file: files)
std::cout << file.name << ": " << file.size << '\n';
Fortunately in newer versions of C++ (C++17) you can use the new <filesystem> standard library:
namespace fs = std::filesystem; // for brevity
std::vector<fs::path> files;
for(auto const& ent: fs::directory_iterator("."))
{
if(!fs::is_regular_file(ent))
continue;
files.push_back(ent);
}
std::sort(std::begin(files), std::end(files), [](fs::path const& a, fs::path const& b){
return fs::file_size(a) < fs::file_size(b);
});
for(auto const& file: files)
std::cout << file << ": " << fs::file_size(file) << '\n';
I have a directory with 15 folders and each folder has 100 of text files. In each text files contains a column of numbers.
I need those numbers to do some calculations, but I cannot figure out how to obtain it. I was thinking about a 2D vector, but I need different type of data structure (string for the name of the folder and interger for the numbers).
What is my best solution?d
What I got so far is a code that will search all the files by given a path.
#include <iostream>
#include <fstream>
#include <string>
#include <vector>
#include <sstream>
#include <algorithm>
#include <tuple>
#include <boost/filesystem.hpp>
#include<dirent.h>
using namespace std;
namespace fs = boost::filesyst
// prototype to search all the files by given it a path
vector<double> getFilesFromDirectory(const fs::path& startDirectory);
int main()
{ // the directory
string dir = "/home/...";
// testing to call my methode
vector<double> myDataStructure = getFilesFromDirectory(dir);
// print out the value of myDataStructure
for (auto it = myDataStructure.begin(); it != myDataStructure.end(); it++)
{
cout << *it << " " << endl;
}
return 0;
}
// methode to search all the files by given it a path
vector<double> getFilesFromDirectory(const fs::path& startDirectory)
{
vector<double> di;
// First check if the start path exists
if (!fs::exists(startDirectory) || !fs::is_directory(startDirectory))
{
cout << "Given path not a directory or does not exist" << endl;
exit(1);
}
// Create iterators for iterating all entries in the directory
fs::recursive_directory_iterator it(startDirectory); // Directory iterator at the start of the directory
fs::recursive_directory_iterator end; // Directory iterator by default at the end
// Iterate all entries in the directory and sub directories
while (it != end)
{
// Print leading spaces
for (int i = 0; i < it.level(); i++)
cout << "";
// Check if the directory entry is an directory
// When directory, print directory name.
// Else print just the file name.
if (fs::is_directory(it->status()))
{
// print out the path file
cout << it->path() << endl;
}
else
{
cout << it->path().filename() << endl;
// test
di = getValueFromFile(it->path().c_str());
// test, here I want to group the numbers of the file
// and each name of the folder
for(int i = 0; i < 15; i++)
{
di.push_back(mi(fs::basename(it->path()), it->path().c_str());
}
}
// When a symbolic link, don't iterate it. Can cause infinite loop.
if (fs::is_symlink(it->status()))
it.no_push();
// Next directory entry
it++;
}
return di;
}
If I understand the problem correctly, I'd write a class (or struct) to hold the contents of each file:
A string containing the path:
A vector containing every value represented in the column for that file
In your main program, a vector containing each object you create.
Definition:
#ifndef __COLVALS_HPP__
#define __COLVALS_HPP__
#include <vector>
#include <string>
class ColVals {
private:
std::vector<double> _colValues;
std::string _pathName;
public:
ColVals(const std::string& pathName);
~ColVals() {}
void appendValue(const double colValue);
std::vector<double> getValues();
std::string getPath();
};
#endif // __COLVALS_HPP__
Implementation:
#include "colvals.hpp"
using namespace std;
ColVals::ColVals(const string& pathName) {
_pathName = pathName;
}
void ColVals::appendValue(const double colValue) {
_colValues.push_back(colValue);
}
vector<double> ColVals::getValues() {
return _colValues;
}
string ColVals::getPath() {
return _pathName;
}