C++ Sorting Filenames In A Directory - c++

I wanted to have some advice about the code I have.
I managed to get what I wanted done, but I do not think it is the "proper" way of doing it in the programmers' world.
Could you help me improve the code by any means and also if there are any better ways of doing this please share them as well.
I have files named in the format:
501.236.pcd
501.372.pcd
...
612.248.pcd etc.
I wanted to put the filenames in ascending order according to the filenames using C++.
This is the code I use:
#include <string>
#include <iostream>
#include <boost/filesystem.hpp>
#include <sstream>
using namespace std;
using namespace boost::filesystem;
int main()
{
vector <string> str,parsed_str;
path p("./fake_pcd");
string delimiter = ".";
string token,parsed_filename;
size_t pos = 0;
int int_filename;
vector <int> int_dir;
//insert filenames in the directory to a string vector
for (auto i = directory_iterator(p); i != directory_iterator(); i++)
{
if (!is_directory(i->path())) //we eliminate directories in a list
{
str.insert(str.end(),i->path().filename().string());
}
else
continue;
}
//parse each string element in the vector, split from each delimiter
//add each token together and convert to integer
//put inside a integer vector
parsed_str = str;
for (std::vector<string>::iterator i=parsed_str.begin(); i != parsed_str.end(); ++i)
{
cout << *i << endl;
while ((pos = i->find(delimiter)) != string::npos) {
token = i->substr(0,pos);
parsed_filename += token;
i->erase(0, pos + delimiter.length());
}
int_filename = stoi(parsed_filename);
int_dir.push_back(int_filename);
parsed_filename = "";
}
cout << endl;
parsed_str.clear();
sort(int_dir.begin(), int_dir.end());
//print the sorted integers
for(vector<int>::const_iterator i=int_dir.begin(); i != int_dir.end(); i++) {
cout << *i << endl;
}
//convert sorted integers to string and put them back into string vector
for (auto &x : int_dir) {
stringstream ss;
ss << x;
string y;
ss >> y;
parsed_str.push_back(y);
}
cout << endl;
//change the strings so that they are like the original filenames
for(vector<string>::iterator i=parsed_str.begin(); i != parsed_str.end(); i++) {
*i = i->substr(0,3) + "." + i->substr(3,3) + ".pcd";
cout << *i << endl;
}
}
This is the output, first part is in the order the directory_iterator gets it, the second part is the filenames sorted in integers, and the last part is where I change the integers back into strings in the original filename format.
612.948.pcd
612.247.pcd
501.567.pcd
501.346.pcd
501.236.pcd
512.567.pcd
613.008.pcd
502.567.pcd
612.237.pcd
612.248.pcd
501236
501346
501567
502567
512567
612237
612247
612248
612948
613008
501.236.pcd
501.346.pcd
501.567.pcd
502.567.pcd
512.567.pcd
612.237.pcd
612.247.pcd
612.248.pcd
612.948.pcd
613.008.pcd

Taking a few hints from e.g. Filtering folders in Boost Filesystem and in the interest of total overkill:
Live On Coliru Using Boost (also On Wandbox.org)
#include <boost/range/adaptors.hpp>
#include <boost/spirit/home/x3.hpp>
#include <boost/filesystem.hpp>
#include <iostream>
#include <optional>
#include <set>
namespace fs = boost::filesystem;
namespace {
using Path = fs::path;
struct Ranked {
std::optional<int> rank;
Path path;
explicit operator bool() const { return rank.has_value(); }
bool operator<(Ranked const& rhs) const { return rank < rhs.rank; }
};
static Ranked rank(Path const& p) {
if (p.extension() == ".pcd") {
auto stem = p.stem().native();
std::string digits;
using namespace boost::spirit::x3;
if (phrase_parse(begin(stem), end(stem), +digit >> eoi, punct, digits))
return { std::stoul(digits), p };
}
return { {}, p };
}
}
int main() {
using namespace boost::adaptors;
auto dir = boost::make_iterator_range(fs::directory_iterator("."), {})
| transformed(std::mem_fn(&fs::directory_entry::path))
| transformed(rank)
;
std::multiset<Ranked> index(begin(dir), end(dir));
for (auto& [rank, path] : index) {
std::cout << rank.value_or(-1) << "\t" << path << "\n";
}
}
Prints:
-1 "./main.cpp"
-1 "./a.out"
501008 "./501.008.pcd"
501236 "./501.236.pcd"
501237 "./501.237.pcd"
501247 "./501.247.pcd"
501248 "./501.248.pcd"
501346 "./501.346.pcd"
501567 "./501.567.pcd"
501948 "./501.948.pcd"
502008 "./502.008.pcd"
502236 "./502.236.pcd"
502237 "./502.237.pcd"
502247 "./502.247.pcd"
502248 "./502.248.pcd"
502346 "./502.346.pcd"
502567 "./502.567.pcd"
502948 "./502.948.pcd"
512008 "./512.008.pcd"
512236 "./512.236.pcd"
512237 "./512.237.pcd"
512247 "./512.247.pcd"
512248 "./512.248.pcd"
512346 "./512.346.pcd"
512567 "./512.567.pcd"
512948 "./512.948.pcd"
612008 "./612.008.pcd"
612236 "./612.236.pcd"
612237 "./612.237.pcd"
612247 "./612.247.pcd"
612248 "./612.248.pcd"
612346 "./612.346.pcd"
612567 "./612.567.pcd"
612948 "./612.948.pcd"
613008 "./613.008.pcd"
613236 "./613.236.pcd"
613237 "./613.237.pcd"
613247 "./613.247.pcd"
613248 "./613.248.pcd"
613346 "./613.346.pcd"
613567 "./613.567.pcd"
613948 "./613.948.pcd"
BONUS: No-Boost Solution
As the filesystem library has been standardized and using Rangev3:
Live On Wandbox
#include <filesystem>
#include <iostream>
#include <map>
#include <optional>
#include <range/v3/action/remove_if.hpp>
#include <range/v3/range/conversion.hpp>
#include <range/v3/view/filter.hpp>
#include <range/v3/view/subrange.hpp>
#include <range/v3/view/transform.hpp>
namespace fs = std::filesystem;
namespace {
using namespace ranges;
using Ranked = std::pair<std::optional<int>, fs::path>;
bool has_rank(Ranked const& v) { return v.first.has_value(); }
static Ranked ranking(fs::path const& p) {
if (p.extension() == ".pcd") {
auto stem = p.stem().native();
auto non_digit = [](uint8_t ch) { return !std::isdigit(ch); };
stem |= actions::remove_if(non_digit);
return { std::stoul(stem), p };
}
return { {}, p };
}
}
int main() {
using It = fs::directory_iterator;
for (auto&& [rank, path] : subrange(It("."), It())
| views::transform(std::mem_fn(&fs::directory_entry::path))
| views::transform(ranking)
| views::filter(has_rank)
| to<std::multimap>())
{
std::cout << rank.value_or(-1) << "\t" << path << "\n";
}
}
Prints e.g.
501236 "./501.236.pcd"
501346 "./501.346.pcd"
501567 "./501.567.pcd"
502567 "./502.567.pcd"

Related

Removing duplicates and counting duplicates in a text file with C++

I am a beginner at C++. I created a text file with two columns in it. However, there are around 1 million rows and there are many rows that repeat each other. I want to delete the duplicates and count how many duplicates there were making it into a third row. This is what it would look like before and after:
Before:
10 8
11 7
10 8
10 8
15 12
11 7
After:
10 8 3
11 7 2
15 12 1
I don't really know where to start can someone point me in the right direction of what I should be looking up in order to do this?
You can create std::map<std::pair<int, int>, int>, and after each insertion check if the given pair is contained in the map. If pair is contained just increment number of duplicates, otherwise emplace it in the map.
Something like this:
#include <iostream>
#include <map>
int main(int argc, char* argv[]) {
std::map<std::pair<int, int>, int> rows;
int num1;
int num2;
while (std::cin >> num1 >> num2) {
auto pair = std::make_pair(num1, num2);
if (rows.find(pair) != rows.end())
++rows[pair];
else
rows.emplace(pair, 1);
}
}
#include <string>
#include <fstream>
#include <unordered_map>
using namespace std;
int main()
{
string line;
unordered_map<string, int> count_map;
ifstream src("input.txt");
if (!src.is_open())
{
return -1;
}
while (getline(src, line))
{
if (line.empty())
continue;
count_map[line]++;
}
src.close();
ofstream dst("output.txt");
if (!dst.is_open())
{
return -2;
}
for (auto & iter : count_map)
{
dst << iter.first << " " << iter.second << endl;
}
dst.close();
return 0;
}
#include <iostream>
#include <fstream>
#include <string>
#include <map>
#include <set>
using namespace std;
int main() {
ifstream src("input.txt");
if (!src.is_open()) {
return -1;
}
// store each line, filter out all of duplicated strings
set<string> container;
// key is to maintain the order of lines, value is a pair<K, V>
// K is the itor pointed to the string in the container
// V is the counts of the string
map<int, std::pair<set<string>::iterator, int>> mp;
// key is the pointer which points to the string in the container
// value is the index of string in the file
map<const string *, int> index;
string line;
int idx = 0; // index of the string in the file
while (getline(src, line)) {
if (line.empty()) {
continue;
}
auto res = container.insert(line);
if (res.second) {
index[&(*res.first)] = idx;
mp[idx] = {res.first, 1};
idx++;
} else {
mp[index[&(*res.first)]].second += 1;
}
}
src.close();
ofstream dst("output.txt");
if (!dst.is_open()) {
return -2;
}
for (const auto & iter : mp) {
dst << *iter.second.first << " " << iter.second.second << endl;
}
dst.close();
return 0;
}
BTW, Redis can solve this problem easily if you are allowed to use it.
This can be done with a std::priority_queue, which automatically sorts the entries. With the data sorted like this, one only has to count the number of subsequent identical entries:
#include <queue>
#include <iostream>
#include <vector>
#include <utility> // for std::pair
int main() {
std::priority_queue<std::pair<int,int>> mydat;
mydat.push(std::make_pair(10,8));
mydat.push(std::make_pair(11,7));
mydat.push(std::make_pair(10,8));
mydat.push(std::make_pair(10,8));
mydat.push(std::make_pair(15,12));
mydat.push(std::make_pair(11,7));
std::vector<std::vector<int>> out;
std::pair<int,int> previous;
int counter;
while(!mydat.empty()) {
counter = 1;
previous = mydat.top();
mydat.pop(); // move on to next entry
while(previous == mydat.top() && !mydat.empty()) {
previous = mydat.top();
mydat.pop();
counter++;
}
out.push_back({previous.first, previous.second, counter});
}
for(int i = 0; i < out.size(); ++i) {
std::cout << out[i][0] << " " << out[i][1] << " " << out[i][2] << std::endl;
}
}
godbolt demo
Output:
15 12 1
11 7 2
10 8 3

Finding item in string and say WHEN it was found - c++

I have a string of items (see code). I want to say when a specific item from that list is found. In my example I want the output to be 3 since the item is found after the first two items. I can print out the separate items to the console but I cannot figure out how to do a count on these two items. I think it is because of the while loop... I always get numbers like 11 instead of two separate 1s. Any tips? :)
#include <iostream>
#include <string>
using namespace std;
int main() {
string items = "box,cat,dog,cat";
string delim = ",";
size_t pos = 0;
string token;
string item1 = "dog";
int count = 0;
`;
while ((pos = items.find(delim)) != string::npos)
{
token = items.substr(0, pos);
if (token != item1)
{
cout << token << endl; //here I would like to increment count for every
//item before item1 (dog) is found
items.erase(0, pos + 1);
}
else if (token == item1)
return 0;
}
return 0; //output: box cat
}
I replaced your search algorithm with the method explode, that separates your string by a delimiter and returns a vector, which is better suited for searching and getting the element count:
#include <string>
#include <vector>
#include <sstream>
#include <iostream>
#include <algorithm>
std::vector<std::string> explode(const std::string& s, char delim)
{
std::vector<std::string> result;
std::istringstream iss(s);
for (std::string token; std::getline(iss, token, delim); )
{
result.push_back(std::move(token));
}
return result;
}
int main()
{
std::string items = "box,cat,dog,cat";
std::string item1 = "dog";
char delim = ',';
auto resultVec = explode(items, delim);
auto itResult = std::find_if(resultVec.begin(), resultVec.end()
, [&item1](const auto& resultString)
{
return item1 == resultString;
});
if (itResult != resultVec.end())
{
auto index(std::distance(resultVec.begin(), itResult) + 1); // index is zero based
std::cout << index;
}
return 0;
}
By using std::find_if you can get the position of item1 by iterator, which you can use with std::distance to get the count of elements that are in front of it.
Credits for the explode method go to this post: Is there an equivalent in C++ of PHP's explode() function?
There are many ways to Rome. Here an additional solution using a std::regex.
But main approach is the same as the accepted answer. Using modern C++17 language elements, it is a little bit more compact.
#include <iostream>
#include <string>
#include <regex>
#include <iterator>
#include <vector>
const std::regex re{ "," };
int main() {
std::string items{ "box,cat,dog,cat" };
// Split String and put all sub-items in a vector
std::vector subItems(std::sregex_token_iterator(items.begin(), items.end(), re, -1), {});
// Search and check if found and show result
if (auto it = std::find(subItems.begin(), subItems.end(), "dog"); it != subItems.end())
std::cout << "Found at position: " << std::distance(subItems.begin(), it) + 1 << '\n';
else
std::cout << "Not found.\n";
return 0;
}

Is there way to reduce memory consumption in my c++ code?

I am new in c++ and I am trying to solve educational exercise in quiz platform, but in this platform I should use no more than 64 MB of memory. My code use more than 130 MB.
#include <sstream>
#include <string>
#include <fstream>
#include <iterator>
#include <vector>
#include <map>
using namespace std;
template<class Container>
void splitString(const std::string &basicString, Container &cont, char delim = ' ') {
std::stringstream ss(basicString);
std::string token;
while (std::getline(ss, token, delim)) {
cont.push_back(token);
}
}
int main() {
int target = 0;
int count = 0;
std::map<int, int> set;
string line;
ifstream fileR("input.txt");
std::vector<string> c;
if (fileR.is_open()) {
while (getline(fileR, line)) {
if (count == 0) {
target = std::stoi(line);
count++;
continue;
}
splitString(line, c);
for (auto &d : c) {
int key = std::stoi(d);
if (set.count(key)) {
set[key] += 1;
} else {
set[key] = 1;
}
}
c.clear();
}
fileR.clear();
fileR.close();
}
ofstream fileW;
fileW.open("output.txt");
bool found = false;
for (const auto &p : set) {
int d = target - p.first;
if (set.count(d)) {
if (p.first != d || set[d] > 1) {
fileW << 1;
found = true;
break;
}
}
}
if (!found) {
fileW << 0;
}
fileW.close();
return 0;
}
What I can add, remove or change for keep within the coveted 64 MB? I tried free memory manually but no effects. I am not sure that is possible to write more effective algorithm.
Your vector (c) is declared outside the loop and is not cleared every time you call split string. This means every time you pass in a string to split, your vector contains stuff from the previous run. Is this intentional? If it is not, then move your vector into the loop, before you call split string, or clear it in your split string function. If it is intentional please provide more info about what your code is supposed to do.

Parsing a text file C++ from specific line to specific line

so, I`m new to c++. My task is parse text file that look like:
RE002%%
RE002%%
RE002%%
RE002%%
RE002%%
RE004%on%
$GPGGA,124749.80,5543.3227107,N,03739.1366738,E,1,08,1.11,147.9635,M,14.4298,M,,*5C
$GPGSV,3,1,10,27,13,078,43,05,31,307,48,16,24,042,43,02,10,267,43*7D
$GPGSV,3,2,10,26,03,031,36,07,75,215,51,09,57,121,53,30,40,234,50*76
$GPGSV,3,3,10,23,29,117,46,04,36,114,46*70
$GPGGA,124749.90,5543.3227105,N,03739.1366737,E,1,08,1.11,147.9664,M,14.4298,M,,*54
RE005%off%
And it continuous for few thousand lines.I need to find where it writes RE004%on% and start processing lines in this loop until it finds RE005%off% and do it over and over until it file ends. I was trying to do it with line.find, but I am pretty sure it is wrong way to solve this problem
#include <iostream>
#include <fstream>
#include <string>
#include <stdlib.h>
using namespace std;
int main() {
string line, dollar, star, Checksum;
float *t0 = NULL;
int tount = 0;
int k;
ifstream logs_("C:/Users/Olya/Desktop/LogGLO.txt");
ofstream tout("outLOGTime.txt");
ofstream pout("outLOGPot.txt");
if (logs_.is_open())
{
while(getline(logs_,line))
{
line.find("RE004%on%")
k = 0;
if
dollar = line.find_first_of('$');
star = line.find_first_of('*');
Checksum = line.substr(line, dollar, star - dollar);
for (size_t i = 0; i < Checksum.size(); i++)
{
}
if (line.substr(0,6) == "$GPGSV")
{
for (size_t i = 0, N = 7; i < line.size(); i++)
{
if (line[i] == ',') k++;
if(k == N)
{
pout << line.substr(i+1,2) << endl;
if ((N += 4) > 19) break;
}
}
}
logs_.close();
}
}
else
cout<<"File is not open"<<'\n';
tout.close();
pout.close();
return 0;
}
Unfortunately your description si very unclear. Also by reading your code, I can really not understand, what you intent to do. And you edited your text and changed description. Not so easy for me
But, I made an educated guess. . .
I read all data between your given delimiters, validate the checksum and split the lines into tokens. Finally I store all the lines-with-Tokens in a vector. Then I filter for a specific value and output a column.
Please stude and try to understand. It is not so complicated.
Thank you
#include <iostream>
#include <regex>
#include <vector>
#include <iterator>
#include <string>
#include <utility>
#include <algorithm>
#include <functional>
#include <numeric>
#include <fstream>
const std::regex re{ R"(\$(.*)\*[abcdefABCDEF\d]{2})" };
const std::regex delimiter{ "," };
using Tokens = std::vector<std::string>;
std::tuple<bool, Tokens> checkString(const std::string& str) {
// Return value of the function. Assume that string is not ok
std::tuple<bool, std::vector<std::string>> result(false, {});
// We want to find a string in the given format
std::smatch sm{};
if (std::regex_match(str, sm, re)) {
// OK, found. Validate checksum
if (std::string s = sm[1];std::stoul(str.substr(str.size() - 2), nullptr, 16) == std::accumulate(s.begin(), s.end(), 0U, std::bit_xor<unsigned char>())) {
// Tokenize string
Tokens tokens(std::sregex_token_iterator(str.begin(), str.end(), delimiter, -1), {});
// Build return value
result = std::make_tuple(true, std::move(tokens));
}
}
return result;
}
int main() {
std::vector<Tokens> csvData{};
// Open file and check if it is open
if (std::ifstream logs("r:\\LogGLO.txt"); logs) {
// Shall we process text lines or not
bool processingActive{ false };
// Read all lines of files
for (std::string line{}; std::getline(logs, line);) {
// Check, if we should start or stio processing of the lines
if (line.substr(0, 9) == std::string("RE004%on%")) processingActive = true;
if (line.substr(0, 10) == std::string("RE005%off%")) processingActive = false;
// Check and read csv data
if (processingActive) {
const auto [ok, data] = checkString(line);
if (ok) csvData.push_back(std::move(data));
}
}
}
// So, now we have read all csv data
// Show eight column of GPGSV data
for (const Tokens& t : csvData)
if (t[0] == "$GPGSV")
std::cout << t[7] << "\n";
return 0;
}

Finding ALL Non Repeating characters in a given string

So I was given the question:
Find ALL of the non-repeating characters in a given string;
After doing some Google searching it was clear to me that finding the first non repeating character was pretty common. I found many examples of how to do that, but I have not really found anything on how to find ALL of the non repeating characters instead of just the first one.
my example code so far is:
#include <iostream>
#include <unordered_map>
using namespace std;
char findAllNonRepeating(const string& s) {
unordered_map<char, int> m;
for (unsigned i = 0; i < s.length(); ++i) {
char c = tolower(s[i]);
if (m.find(c) == m.end())
m[c] = 1;
else
++m[c];
}
auto best = m.begin();
for (auto it = m.begin(); it != m.end(); ++it)
if (it->second <= best->second)
best = it;
return (best->first);
}
int main()
{
cout << findAllNonRepeating("dontknowwhattochangetofindallnonrepeatingcharacters") << endl;
}
I am not sure what I need to change or add to have this find all of the non repeating characters.
k, f, p, s should be the non repeating characters in this string.
Any hints or ideas are greatly appreciated!
As suggested, simply keep a frequency map. Then, once the string is processed, iterate over the map, returning only those values that occur exactly once.
#include <iostream>
#include <map>
#include <vector>
using namespace std;
std::vector<char> nonRepeating(const std::string& s)
{
std::map<char, int> frequency;
for(int i=0;i<s.size();i++)
{
frequency[s[i]]++;
}
std::vector<char> out;
for(auto it = frequency.begin(); it != frequency.end(); it++)
{
if(it->second == 1)
out.push_back(it->first);
}
return out;
}
int main() {
// your code goes here
std::string str = "LoremIpsum";
for(char c : nonRepeating(str))
{
std::cout << c << std::endl;
}
return 0;
}