c++ graph input problem, finding the ways of an input? - c++

we had a algorithm coding event in our school today, and they asked a question and no one could answer. I am trying to find an answer by using only standard library. (I am trying to solve this without .h files because in contests they want us to solve it like that.) So basically question is as follows:
*Write a (C / C++) program that will get a graph model as an argument.
*You must get this values from console while we are starting your application.
*Your program must write down all the possible word combination by using the graph model.
Ex Input on Console to call your app: “yourapp.exe 5ABCD1BCD1CDE”
After your application name, second word gives you information about the graph.
Notation: [STEPS][FROM1-TO1-TO2-...TOn]1[FROM2-TO1-TO2-...TOn]1 .....
[STEPS] First integer value ( 5 in our example) is the maximum word length to measure.
[FROM TO ... TO] blocks show connections in the graph. Each node is symbolized with one Upper
Letter. First on is connections start position others are destinations. Each connection(link) is one
way. So: “ABCD” means we have connection from A to B , A to C and A to D
The first node in the text is the start point for word creation.
This input means you have a graph like: https://imgur.com/BioHGqA
Desired Output:
A
AB
AC
AD
ABD
ABC
ABCD
ABCE
ACD
ACE
--------------------------------------------END OF THE QUESTION-----------------------------------------------------
I personally tried to find the index numbers in input, connection starts and etc. but i couldn't figured out how to solve this properly. Please help :=)
#include <iostream>
#include <string>
using namespace std;
int inputLength,maxLength,digitLength;
string word,digitIndex,starters;
int main(int argc, char *argv[])
{
//Saving the graph input as a variable named word
word = argv[1];
//Finding the max word and input lengths
maxLength=word[0] -'0';
inputLength=word.length();
cout<<"Your graph input "<<word<<endl;
cout<<"Maximum word length : "<<maxLength<<endl;
//Finding the digitIndexes in input.
for(int i=0;i<inputLength;i++){
if(isdigit(word[i])){
digitIndex+=to_string(i);
}
}
digitLength=digitIndex.length();
cout<<"digit indexes : "<<digitIndex<<endl;
cout<<"digitindex[1] : "<<digitIndex[1]-'0'<<endl;
cout<<"your word : "<<word<<endl;
//Finding the connection starts
for(int i=0; i<inputLength;i++){
if(isdigit(word[i])==true){
starters+=word[i+1];
}
}
cout<<"starters : "<<starters<<endl;
}

Interesting problem. But easy to implement using std::algorithms and recursive calls. Also, the data structure selection may help to design such an application.
Unfortunately, the description of the input format is not fully clear. I understand the “steps” part, but there is no description for the rest of the digits. I assume that they are simply delimiters and have no further meaning.
We will split the big task into some subtasks. And, we will use a class Graph, where we store all needed data and functions.
So, obviously the first task is, to split the input string. For that we extract the first characters consisting of digits and convert them to the integer value “steps”. The rest of the string will be tokenized by using a C++ standard functionality: The std::sregex_token_iterator.
This thing is an iterator. For iterating over a string, hence “sregex”. The begin()/end() part defines, on what range of input we shall operate, then there is a std::regex for what should be matched / or what should not be matched in the input string. The type of matching strategy is given with last parameter.
1 --> give me the stuff that I defined in the regex and
-1 --> give me that what is NOT matched based on the regex
We can use this iterator for storing the tokens in a std::vector. The std::vector has a range constructor, which takes 2 iterators as parameter, and copies the data between the first iterator and 2nd iterator to the std::vector. The statement:
std::vector<std::string> split(std::sregex_token_iterator(init.begin(), init.end(), re, 1), {});
defines a variable “split” as a std::vector and uses the so called range-constructor of the std::vector.
You can see that I do not use the std::sregex_token_iterator’s "end()"-iterator for the std::vector explicitly.This iterator will be constructed from the empty brace-enclosed initializer list with the correct type, because it will be deduced to be the same as the type of the first argument due to the std::vector constructor requiring that.
Then we transform the resulting “from-to”-strings to our target storage. This is a std::map consisting of the “from” character as a key, and a std::vector of char’s (the targets) as the value. With that we always have an association from one start point (Vertex) to all end points and with that, implicitly the “Edges”. This data structure will span a virtual tree, which we can later traverse to find the required result.
We put all this in an “init” function, can call it from the class constructor and also from the overwritten extractor operator. I added the extractor as an additional functionality to make life easier. So you do not need to use the main-functions argc and argv, but can directly read from std::cin via:
Graph graph;
std::cin >> graph;
Now we have all data in our map and can start to build the solution. We will store all resulting “ways” in a std::vector of std::string. For building the “ways” we “track” the way through the “graph”. So, every time, when we see a new vertex, we add it to the “track” and if we reach the end of a route or if the route is longer than “steps”, we store the new track in “ways”.
So, the OP requested a special output format. To create this, we must use “Breadth First”- or “Level Order”-Traversal. Meaning, before we descent (with a simple recursive algorithm) we need to go horizontally, resulting in 2 for loops. But no problem. Very simple.
For a simple output functionality, I have overwritten the inserter operator.
And, I do validate the input (I allow also lower letters).
Please see the full working example below.
#include <iostream>
#include <vector>
#include <utility>
#include <regex>
#include <map>
#include <iterator>
#include <string>
#include <algorithm>
//std::string test{"5ABCD1BCD1CDE"};
std::regex re1(R"(([a-zA-Z]+))");
std::regex re2(R"(([0-9]+[a-zA-Z]+)+)");
using Map = std::map<char, std::vector<char>>;
class Graph {
public:
// Constructor
Graph() : steps(), fromTo(), root(), ways(), track() {}
Graph(const std::string input) : steps(), fromTo(), root(), ways(), track() { init(input); }
// Build the result
void build() { int level{ 0 }; rBuild(root, level); }
// inserter
friend std::ostream& operator << (std::ostream& os, const Graph& g) {
std::copy(g.ways.begin(), g.ways.end(), std::ostream_iterator<std::string>(os, " "));
return os << "\n";
}
// extractor
friend std::istream& operator >> (std::istream& is, Graph& g) {
if (std::string input{}; std::getline(is, input)) g.init(input);
return is;
}
private:
// Values derived from input
int steps{};
Map fromTo{};
char root{};
// The result
std::vector<std::string> ways{};
std::string track{};
// Recursive function to build all ways
void rBuild(const char vertex, int& level);
// Initialize source values
void init(const std::string& input);
};
void Graph::init(const std::string& input) {
fromTo.clear(), ways.clear(); track.clear(); steps = 0; root = '\0';
if (std::regex_match(input, re2)) {
// Get steps
size_t pos{}; steps = std::stoi(input, &pos); std::string init = input.substr(pos);
// Split string into substrings
std::vector<std::string> split(std::sregex_token_iterator(init.begin(), init.end(), re1, 1), {});
// Get root
root = split[0][0]; track += root; ways.push_back(track);
// Convert substrings to map entries
std::transform(split.begin(), split.end(), std::inserter(fromTo, fromTo.end()), [](std::string & s) {
return std::make_pair(s[0], std::vector<char>(std::next(s.begin()), s.end())); });
}
else
std::cerr << "\n***** Error: Wrong input format\n";
}
// Recursive function to build all ways through the graph
void Graph::rBuild(const char vertex, int& level) {
// Allow only a certain depth, while descencing down
if (level < steps-1) {
// Search the start point for this entry
if (Map::iterator node{ fromTo.find(vertex) }; node != fromTo.end()) {
// Go through all edges to just the next vertex. This is not a breadth first traversal
// So, first we will go horizontally
for (const char to : node->second) {
// We want to track the way that we were going so far
track.push_back(to);
// Saving this track as a new way
ways.push_back(track);
// Restoring the origninal track befor this way, so that we can generate the next way
track.pop_back();
}
// and now we will descent
for (const char to : node->second) {
// One level further down
++level;
// track will be one vertex longer
track.push_back(to);
// Recursive call, descent
rBuild(to, level);
// And backwards
track.pop_back();
--level;
}
}
}
}
int main(int argc, char* argv[]) {
if (argc == 2) {
std::string test = argv[1];
// Define and initialize the graph
Graph graph(test);
// Build the required strings
graph.build();
// Show result
std::cout << graph;
}
else {
std::cout << "\nEnter init string: \n";
if (Graph graph; std::cin >> graph) {
// Build the required strings
graph.build();
// Show result
std::cout << "\nResult:\n" << graph << "\n";
}
}
return 0;
}
What a pity that nobody will read that . . .

Related

Insert into array specific strings from text file

ArticlesDataset.txt file contains all the metadata information of documents. unigramCount contains all unique words and their number of occurrences for each document. There are 1500 publications recorded in the txt file. Here is an example entry for a document:
{"creator":["Romain Allais","Julie Gobert"],
"datePublished":"2018-05-30",
"docType":"article",
"doi":"10.1051\/mattech\/2018010",
"id":"ark:\/\/27927\/phz10hn2bh3",
"isPartOf":"Mat\u00e9riaux & Techniques",
"issueNumber":"5-6",
"language":["eng"],
"outputFormat":["unigram","bigram","trigram"],
"pageCount":7,
"pagination":"pp. null-null",
"provider":"portico",
"publicationYear":2018,
"publisher":"EDP Sciences",
"sequence":3.0,
"tdmCategory":["Applied sciences -Engineering"],
"title":"Environmental assessment of PSS",
"url":"http:\/\/doi.org\/10.1051\/mattech\/2018010",
"volumeNumber":"105",
"wordCount":4446,
"unigramCount":{"others":1,"air":1,"networks,":1,"conventional":1,"IEEE":1}}
My purpose is to pull out the unigram counts for each document and store them in a suitable array. How can I do it by using fstream library?
How can i improve below code to reach my goal.
std::string dummy;
std::ifstream data("PublicationsDataSet.txt");
while (data.good())
{
getline(data, dummy, ',');
}
your question delves in two different topics, one is parsing the data and the other into storing it in memory.
To the first point the answer is, you'll need a parser, you either write one which will involve a syntax parser to convert each "key words" into tokens, for then an interpreter to compile them into a data object based on the token parameter the data is preceded or succeeded eg:
'[' = start an array, every values after this are part of the array
']' = end of the an array, return to previous parsing state
':' = separate key and values, left hand side is key, right hand side is value
...
this is a fine exercise to sharpen one's skills but way too arduous and with potential never-ending-bug-fixing road, as recommended also by other comments finding an already made library is probably the easier road on a time pinch or on a project time crunching scenario.
Another thing to point out, plain arrays in c++ are size fixed, so mostly likely since you are parsing the values you'll probably use std::vectors, which allow insertion, and once you are done processing the file and really intend to send the data back as an array you can do that directly from the object
std::vector<YourObjectType> parsedObject;
char* arr = new char[parsedObject.size()];
std::copy(v.begin(), v.end(), arr);
this is a psudo code, lots of things will depend on the implementation, but it gives the idea.
A starting point to write a parse is this article goes in great details on how it works and it's components, mind you every parser implements it's own language (yes just like c++ and other languages, are all parsed) so you'll need to expand on the concept with your commands
expression parser
Here's a simplified solution of what you could do using std::regex:
Read the lines of a stream (std::cin in this case) one by one.
Check if the line contains a unigramCount element.
If that's the case, walk the different entries within the unigramCount element.
About the regular expressions used:
"unigramCount":{}, allowing:
zero or more whitespaces basically everywhere, and
zero or more characters within the braces.
"<key>":<value>, where:
<key> is one or more characters other than a double quote,
<value> is one or more digits, and
you could have whitespaces at both sides of the :.
A good data structure for storing your unigramCount entries could be a std::map.
[Demo]
#include <iostream> // cout
#include <map>
#include <regex> // regex_match, regex_search, sregex_iterator
#include <string> // stoi
int main()
{
std::string line{};
std::map<std::string, int> unigram_counts{};
while (std::getline(std::cin, line))
{
const std::regex unigram_count_pattern{R"(^\s*\"unigramCount\"\s*:\s*\{.*\}\s*$)"};
if (std::regex_match(line, unigram_count_pattern))
{
const std::regex entry_pattern{R"(\"([^\"]+)\"\s*:\s*([0-9]+))"};
for (auto entry_it{std::sregex_iterator(line.cbegin(), line.cend(), entry_pattern)};
entry_it != std::sregex_iterator{};
++entry_it)
{
auto matches{*entry_it};
auto& key{matches[1]};
auto& value{matches[2]};
unigram_counts[key] = std::stoi(value);
}
}
}
for (auto& [key, value] : unigram_counts)
{
std::cout << "'" << key << "' : " << value << "\n";
}
}
// Outputs:
//
// 'IEEE' : 1
// 'air' : 1
// 'conventional' : 1
// 'networks,' : 1
// 'others' : 1

Input two matrices which didn't specialize size

I need to input two matrices with their sizes unfixed, using a blank row to declare the end of inputting each matrix.
For example, input:
1 2
3 4
(blank row here, end of input matrix 1)
5 6 7
8 9 10
(blank row here, end of input matrix 2)
will get a 2*2 matrix and a 2*3 matrix.
My current idea is to build a matrix large enough (like 1000*1000), then set loops and use cin to input each element (the code only shows how I input matrix 1):
int matx1[1000][1000];
for (i = 0;i < 1000;i++)
{
for (j = 0;j < 1000;j++)
{
temp = getchar();
if (temp == '\n')
{
mat1.col = j;
break;
}
else
{
putchar(temp);
}
cin>>matx1[i][j];
}
temp = getchar();
if (temp == '\n')
{
mat1.row = i;
break;
}
else
{
putchar(temp);
}
}
When I running this on Xcode, error happens, the putchar() function will interrupt my input in terminal by printing a number each time I press Enter, and the input result is in chaos.
I also tried the following code to avoid use of putchar():
for (i = 0; i < 1000; i++)
{
temp = getchar();
if (temp == '\n')
{
break;
}
else
{
matx1[i][0] = temp;
for (j = 1; j < 1000; j++)
{
cin >> matx1[i][j];
if (getchar() == '\n')
{
break;
}
}
}
}
Still, there are serious problems. The temp variable stores char and even if I convert it to int using ASCII, it works only if the first element of each line is smaller than 10, or the data of the first element of each line will be incorrectly stored.
So, the main question is:
How to switch to a new line to input the same matrix after press Enter once and switch to inputting the next matrix after press Enter again?
Or to say: how to get the event of '\n' without interfering with the original input stream?
To solve the problem at hand there is a more or less standard approach. You want to read csv data.
In your case, it is a little bit more difficult, because you do have a special format in your csv data. So first a " " separated list and then a empty line between 2 entries.
Now, how could this to be done? C++ is an object oriented language with many existing algorithms. You can create define a Proxy class and overwrite the extractor operator. The proxy class, and espcially the extractor, will do all the work.
The extractor, and that is the core of the question is, as said, a little bit more tricky. How can this be done?
In the extractor we will first read a complete line from an std::istream using the function std::getline. After having the line, we see a std::string containing "data-fields", delimited by a space. The std::string needs to be split up and the "data-fields"-contents shall be stored.
The process of splitting up strings is also called tokenizing. The "data-fields"-content is also called "token". C++ has a standard function for this purpose: std::sregex_token_iterator.
And because we have something that has been designed for such purpose, we should use it.
This thing is an iterator. For iterating over a string, hence sregex. The begin part defines, on what range of input we shall operate, then there is a std::regex for what should be matched / or what should not be matched in the input string. The type of matching strategy is given with last parameter.
1 --> give me the stuff that I defined in the regex and
-1 --> give me that what is NOT matched based on the regex.
We can use this iterator for storing the tokens in a std::vector. The std::vector has a range constructor, which takes 2 iterators a parameter, and copies the data between the first iterator and 2nd iterator to the std::vector.
The statement
std::vector token(std::sregex_token_iterator(line.begin(), line.end(), separator, -1), {});
defines a variable "token" of type std::vector<std::string>, splits up the std::string and puts the tokens into the std::vector. For your case we will use std::transform to change your strings into integers.
Very simple.
Next step. We want to read from a file. The file conatins also some kind of same data. The same data are rows.
And as for above, we can iterate over similar data. If it is the file input or whatever. For this purpose C++ has the std::istream_iterator. This is a template and as a template parameter it gets the type of data that it should read and, as a constructor parameter, it gets a reference to an input stream. It doesnt't matter, if the input stream is a std::cin, or a std::ifstream or a std::istringstream. The behaviour is identical for all kinds of streams.
And since we do not have files an SO, I use (in the below example) a std::istringstream to store the input csv file. But of course you can open a file, by defining a std::ifstream csvFile(filename). No problem.
We can now read the complete csv-file and split it into tokens and get all data, by simply defining a new variable and use again the range constructor.
Matrix matrix1( std::istream_iterator<ColumnProxy>(testCsv), {} );
This very simple one-liner will read the complete csv-file and do all the expected work.
Please note: I am using C++17 and can define the std::vector without template argument. The compiler can deduce the argument from the given function parameters. This feature is called CTAD ("class template argument deduction").
Additionally, you can see that I do not use the "end()"-iterator explicitely.
This iterator will be constructed from the empty brace-enclosed initializer list with the correct type, because it will be deduced to be the same as the type of the first argument due to the std::vector constructor requiring that.
Ì hope I could answer your basic question. Please see the full blown C++ example below:
#include <iostream>
#include <sstream>
#include <fstream>
#include <string>
#include <vector>
#include <iterator>
#include <regex>
#include <algorithm>
std::istringstream testCsv{ R"(1 2
3 4
5 6 7
8 9 10
)" };
// Define Alias for easier Reading
//using Columns = std::vector<std::string>;
using Columns = std::vector<int>;
using Matrix = std::vector<Columns>;
// The delimiter
const std::regex re(" ");
// Proxy for the input Iterator
struct ColumnProxy {
// Overload extractor. Read a complete line
friend std::istream& operator>>(std::istream& is, ColumnProxy& cp) {
// Read a line
cp.columns.clear();
if (std::string line; std::getline(is, line)) {
if (!line.empty()) {
// Split values and copy into resulting vector
std::transform(std::sregex_token_iterator(line.begin(), line.end(), re, -1),
std::sregex_token_iterator(),
std::back_inserter(cp.columns),
[](const std::string & s) {return std::stoi(s); });
}
else {
// Notify the caller. End of matrix
is.setstate(std::ios::eofbit | std::ios::failbit);
}
}
return is;
}
// Type cast operator overload. Cast the type 'Columns' to std::vector<std::string>
operator Columns() const { return columns; }
protected:
// Temporary to hold the read vector
Columns columns{};
};
int main()
{
// Define variable matrix with its range constructor. Read complete CSV in this statement, So, one liner
Matrix matrix1( std::istream_iterator<ColumnProxy>(testCsv), {} );
// Reset failbit and eofbit
testCsv.clear();
// Read 2nd matrix
Matrix matrix2(std::istream_iterator<ColumnProxy>(testCsv), {});
return 0;
}
Again:
What a pity that nobody will read this . . .

How to loop through vectors for specific strings

I am struggling to declare a loop that takes a field of a vector, check whether it appears for the first time or jump to the next vector until this field contains a new string.
My input file (.csvx) looks something like:
No.; ID; A; B; C;...;Z;
1;1_380; Value; Value; Value;...; Value;
2;1_380; Value; Value; Value;...; Value;
3;1_380; Value; Value; Value;...; Value;
...
41;2_380; Value; Value; Value;...; Value;
42;2_380; Value; Value; Value;...; Value;
...
400000; 6_392; Value; Value; Value;...; Value;
Note:File is relatively large....
I managed to parse my file into a vector<vector<string> > and split lines at semicolons to access any field.
Now I would like to access the first "ID", i.e. 1_380 and store parameters from same line, then go to the next ID 2_380 and store again those parameters and so on...
This is my code so far:
#include <cstdlib>
#include <iostream>
#include <fstream>
#include <sstream>
#include <string>
#include <vector>
#include <algorithm>
#include <boost/algorithm/string.hpp>
using namespace std;
/*
* CSVX Reader defined to fetch data from
* CSVX file into vectors
*/
class CSVXReader
{
string fileName, delimiter;
public:
CSVXReader(string filename, string delm = ";") :
fileName(filename), delimiter(delm)
{}
vector<vector<string> > getData(); //Function to fetch data
}; //from CSVX file
/*
* Parse through CSVX file line by line
* and return the data in vector of vector
* of strings
*/
vector<vector<string> > CSVXReader::getData()
{
ifstream file(fileName);
vector<vector<string> > dataList; //Vector of vector
//contains all data
string line = "";
while (getline(file, line)) //Iterate through each line
//and split the content
//using delimiter
{
vector<string> vec; //Vector contains a row from
//input file
boost::algorithm::split(vec, line, boost::is_any_of(delimiter));
dataList.push_back(vec);
}
file.close();
return dataList;
}
int main(int argc, char** argv)
{
CSVXReader reader("file.csvx"); //Creating an object
//of CSVXReader
vector<vector<string> > dataList = reader.getData();//Get the data from
//CSVX file
for(vector<string> vec : datalist) //Loop to go through
//each line of
//dataList
//(vec1,vec2;vec3...)
if(vec[1] contains "_" && "appears for the first time")
{store parameters...};
else{go to next line};
return 0;
}
As you can see, I have no clue how to declare my loop properly...
To be clear, I want to check the second field of each vector "vec": is it new? -> Store data of same line, if not -> jump to next line, i.e. vector until a new ID appears.
Looking forward for any advice!
Since you wrote pseudo-code, it is difficult to write real code.
But in general, if you want to detect if an item has occurred already, you can utilize a std::unordered_set to implement the "appears for the first time".
Using your pseudo-code:
#include <unordered_set>
//...
std::unordered_set<std::string> stringSet;
//...
for(vector<string>& vec : datalist)
{
if(vec[1] contains "_" && !stringSet.count(vec[1]))
{
//...
stringSet.insert(vec[1]);
}
}
The condition checks if the item is in the unordered_set. If it is, then we skip, if not, then we process the item and add it to the unordered_set.
Basically you do not need all the code that the other answers provide. You need just one statement to copy the data to where you want to have them.
Let us assume that you have read your data already in your dataList. And you defined a new std::vector<std::vector<std::string>> parameter{}; where you want to store the unique result.
The algorithm libraray has a function called std:copy_if. This will copy data only, if a predicate (a condition) is true. Your condition is that a line is different from a previous line. Then it is a new line with new data and you will copy it. If a line is equal to its previous line data, then do not copy it.
So, we will remember the important data from the last line. And then compare in the next line the data with the stored value. If it is different, store the parameter. If not, then not. After each check, we assign the current value to the last value. As initial "last Value" we will use an empty string. So the first line will always be different. The statement will then look like this:
std::copy_if(dataList.begin(), dataList.end(), std::back_inserter(parameter),
[lastID = std::string{}](const std::vector<std::string> & sv) mutable {
bool result = (lastID != sv[1]);
lastID = sv[1];
return result;
}
);
So we copy all data from the begin to the end of the dataList to the parameter vector, if and only if, the second string in the source vector (index=1) is different than our old remembered value.
Rather straightforward.
An additional optimization would be, to immediately sort out the correct parameters and not store the complete vector with all data in the first place, but to store only necessary data. This will reduce the necessary memory drastically.
Modify your while loop to:
string line = "";
string oldValue{};
while (getline(file, line)) //Iterate through each line
//and split the content
//using delimiter
{
vector<string> vec; //Vector contains a row from
//input file
boost::algorithm::split(vec, line, boost::is_any_of(delimiter));
if (oldValue != vec[1]) {
dataList.push_back(vec);
}
oldValue = vec[1];
}
With that you get it right from the beginning.
An additional solution is like below
#include <vector>
#include <iostream>
#include <string>
#include <iterator>
#include <regex>
#include <fstream>
#include <sstream>
#include <algorithm>
std::istringstream testFile{R"(1;1_380; Value1; Value2; Value3; Value4
2;1_380; Value5; Value6; Value7; Value8
3;1_380; Value9 Value10
41;2_380; Value11; Value12; Value13
42;2_380; Value15
42;2_380; Value16
500;3_380; Value99
400000; 6_392; Value17; Value18; Value19; Value20
400001; 6_392; Value21; Value22; Value23; Value24)"
};
class LineAsVector { // Proxy for the input Iterator
public:
// Overload extractor. Read a complete line
friend std::istream& operator>>(std::istream& is, LineAsVector& lv) {
// Read a line
std::string line; lv.completeLine.clear();
std::getline(is, line);
// The delimiter
const std::regex re(";");
// Split values and copy into resulting vector
std::copy( std::sregex_token_iterator(line.begin(), line.end(), re, -1),
std::sregex_token_iterator(),
std::back_inserter(lv.completeLine));
return is;
}
// Cast the type 'CompleteLine' to std::string
operator std::vector<std::string>() const { return completeLine; }
protected:
// Temporary to hold the read vector
std::vector<std::string> completeLine{};
};
int main()
{
// This is the resulting vector which will contain the result
std::vector<std::vector<std::string>> parameter{};
// One copy statement to copy all necessary data from the file to the parameter list
std::copy_if (
std::istream_iterator<LineAsVector>(testFile),
std::istream_iterator<LineAsVector>(),
std::back_inserter(parameter),
[lastID = std::string{}](const std::vector<std::string> & sv) mutable {
bool result = (lastID != sv[1]);
lastID = sv[1];
return result;
}
);
// For debug purposes: Show result on screen
std::for_each(parameter.begin(), parameter.end(), [](std::vector<std::string> & sv) {
std::copy(sv.begin(), sv.end(), std::ostream_iterator<std::string>(std::cout, " "));
std::cout << '\n';
}
);
return 0;
}
Please note: In function main, we do everything in one statement: std::copy_if. The source is in this case an std::istream so an std::ifstream (a file) or wahtever you want. In SO I use an std::istringstream because I cannot use files here. But it is the same. Just replace the variable in the std::istream_iterator. We iterate over the file with the std::istream_iterator.
What a pitty that nobody will read this . . .
Ok fellas, I was playing around with my code and realized that #Armins second solution (modified while loop) doesn't consider unordered lists, i.e. if an element shows up again much later, it is compared with previous element (oldValue) and inserted, although it exists already in my container...
After some reading (and more has to come obviously), I tend to #Paul's unordered_set. My first question arises right here: why didn't you suggest set instead? From what I found, unordered_set is apparently faster for search operations. In my personal very limited mind this is difficult to understand... but I don't want to dig too deep here.
Is this your reason? Or are there other advantages that I missed?
Despite your suggestion, I tried to use set, which seems in my situation a better, because more ordered way. And again my code resists to run:
set<vector<string> > CSVReader::getData() {
ifstream file(fileName);
set<vector<string> > container;
string line = "";
string uniqueValue{};
while (getline(file, line)) //Iterate through each line and split the content using delimiter
{
//Vector contains a row from RAO file
vector<string> vec;
boost::algorithm::split(vec, line, boost::is_any_of(delimiter));
uniqueValue = vec[2];
//Line (or vector) is added to container if the uniqueValue, e.g. 1_380, appears for the first time
if(!container.count(uniqueValue))
{
container.insert(vec);
}
}
file.close();
return container;
}
The error says:
error: no matching function for call to 'std::set<std::vector<std::__cxx11::basic_string<char> > >::count(std::__cxx11::string&)'
if(!localDetails.count(localDetail))
Since I followed your example, what did I do wrong?
PS: Just reading about SO policies... hope this additional question is acceptable though

How do I make an alphabetized list of all distinct words in a file with the number of times each word was used?

I am writing a program using Microsoft Visual C++. In the program I must read in a text file and print out an alphabetized list of all distinct words in that file with the number of times each word was used.
I have looked up different ways to alphabetize a string but they do not work with the way I have my string initialized.
// What is inside my text file
Any experienced programmer engaged in writing programs for use by others knows
that, once his program is working correctly, good output is a must. Few people
really care how much time and trouble a programmer has spent in designing and
debugging a program. Most people see only the results. Often, by the time a
programmer has finished tackling a difficult problem, any output may look
great. The programmer knows what it means and how to interpret it. However,
the same cannot be said for others, or even for the programmer himself six
months hence.
string lines;
getline(input, lines); // Stores what is in file into the string
I expect an alphabetized list of words with the number of times each word was used. So far, I do not know how to begin this process.
It's rather simple, std::map automatically sorts based on key in the key/value pair you get. The key/value pair represents word/count which is what you need. You need to do some filtering for special characters and such.
EDIT: std::stringstream is a nice way of splitting std::string using whitespace delimiter as it's the default delimiter. Therefore, using stream >> word you will get whitespace-separated words. However, this might not be enough due to punctuation. For example: Often, has comma which we need to filter out. Therefore, I used std::replaceif which replaces puncts and digits with whitespaces.
Now a new problem arises. In your example, you have: "must.Few" which will be returned as one word. After replacing . with we have "must Few". So I'm using another stringstream on the filtered "word" to make sure I have only words in the final result.
In the second loop you will notice if(word == "") continue;, this can happen if the string is not trimmed. If you look at the code you will find out that we aren't trimming after replacing puncts and digits. That is, "Often," will be "Often " with trailing whitespace. The trailing whitespace causes the second loop to extract an empty word. This is why I added the condition to ignore it. You can trim the filtered result and then you wouldn't need this check.
Finally, I have added ignorecase boolean to check if you wish to ignore the case of the word or not. If you wish to do so, the program will simply convert the word to lowercase and then add it to the map. Otherwise, it will add the word the same way it found it. By default, ignorecase = true, if you wish to consider case, just call the function differently: count_words(input, false);.
Edit 2: In case you're wondering, the statement counts[word] will automatically create key/value pair in the std::map IF there isn't any key matching word. So when we call ++: if the word isn't in the map, it will create the pair, and increment value by 1 so you will have newly added word. If it exists already in the map, this will increment the existing value by 1 and hence it acts as a counter.
The program:
#include <iostream>
#include <map>
#include <sstream>
#include <cstring>
#include <cctype>
#include <string>
#include <iomanip>
#include <algorithm>
std::string to_lower(const std::string& str) {
std::string ret;
for (char c : str)
ret.push_back(tolower(c));
return ret;
}
std::map<std::string, size_t> count_words(const std::string& str, bool ignorecase = true) {
std::map<std::string, size_t> counts;
std::stringstream stream(str);
while (stream.good()) {
// wordW may have multiple words connected by special chars/digits
std::string wordW;
stream >> wordW;
// filter special chars and digits
std::replace_if(wordW.begin(), wordW.end(),
[](const char& c) { return std::ispunct(c) || std::isdigit(c); }, ' ');
// now wordW may have multiple words seperated by whitespaces, extract them
std::stringstream word_stream(wordW);
while (word_stream.good()) {
std::string word;
word_stream >> word;
// ignore empty words
if (word == "") continue;
// add to count.
ignorecase ? counts[to_lower(word)]++ : counts[word]++;
}
}
return counts;
}
void print_counts(const std::map<std::string, size_t>& counts) {
for (auto pair : counts)
std::cout << std::setw(15) << pair.first << " : " << pair.second << std::endl;
}
int main() {
std::string input = "Any experienced programmer engaged in writing programs for use by others knows \
that, once his program is working correctly, good output is a must.Few people \
really care how much time and trouble a programmer has spent in designing and \
debugging a program.Most people see only the results.Often, by the time a \
programmer has finished tackling a difficult problem, any output may look \
great.The programmer knows what it means and how to interpret it.However, \
the same cannot be said for others, or even for the programmer himself six \
months hence.";
auto counts = count_words(input);
print_counts(counts);
return 0;
}
I have tested this with Visual Studio 2017 and here is the part of the output:
a : 5
and : 3
any : 2
be : 1
by : 2
cannot : 1
care : 1
correctly : 1
debugging : 1
designing : 1
As others have already noted, an std::map handles the counting you care about quite easily.
Iostreams already have a tokenize to break an input stream up into words. In this case, we want to to only "think" of letters as characters that can make up words though. A stream uses a locale to make that sort of decision, so to change how it's done, we need to define a locale that classifies characters as we see fit.
struct alpha_only: std::ctype<char> {
alpha_only(): std::ctype<char>(get_table()) {}
static std::ctype_base::mask const* get_table() {
// everything is white space
static std::vector<std::ctype_base::mask>
rc(std::ctype<char>::table_size,std::ctype_base::space);
// except lower- and upper-case letters, which are classified accordingly:
std::fill(&rc['a'], &rc['z'], std::ctype_base::lower);
std::fill(&rc['A'], &rc['Z'], std::ctype_base::upper);
return &rc[0];
}
};
With that in place, we tell the stream to use our ctype facet, then simply read words from the file and count them in the map:
std::cin.imbue(std::locale(std::locale(), new alpha_only));
std::map<std::string, std::size_t> counts;
std::string word;
while (std::cin >> word)
++counts[to_lower(word)];
...and when we're done with that, we can print out the results:
for (auto w : counts)
std::cout << w.first << ": " << w.second << "\n";
Id probably start by inserting all of those words into an array of strings, then start with the first index of the array and compare that with all of the other indexes if you find matches, add 1 to a counter and after you went through the array you could display the word you were searching for and how many matches there were and then go onto the next element and compare that with all of the other elements in the array and display etc. Or maybe if you wanna make a parallel array of integers that holds the number of matches you could do all the comparisons at one time and the displays at one time.
EDIT:
Everyone's answer seems more elegant because of the map's inherent sorting. My answer functions more as a parser, that later sorts the tokens. Therefore my answer is only useful to the extent of a tokenizer or lexer, whereas Everyone's answer is only good for sorted data.
You first probably want to read in the text file. You want to use a streambuf iterator to read in the file(found here).
You will now have a string called content, which is the content of you file. Next you will want to iterate, or loop, over the contents of this string. To do that you'll want to use an iterator. There should be a string outside of the loop that stores the current word. You will iterate over the content string, and each time you hit a letter character, you will add that character to your current word string. Then, once you hit a space character, you will take that current word string, and push it back into the wordString vector. (Note: that means that this will ignore non-letter characters, and that only spaces denote word separation.)
Now that we have a vector of all of our words in strings, we can use std::sort, to sort the vector in alphabetical order.(Note: capitalized words take precedence over lowercase words, and therefore will be sorted first.) Then we will iterate over our vector of stringWords and convert them into Word objects (this is a little heavy-weight), that will store their appearances and the word string. We will push these Word objects into a Word vector, but if we discover a repeat word string, instead of adding it into the Word vector, we'll grab the previous entry and increment its appearance count.
Finally, once this is all done, we can iterate over our Word object vector and output the word followed by its appearances.
Full Code:
#include <vector>
#include <fstream>
#include <iostream>
#include <streambuf>
#include <algorithm>
#include <string>
class Word //define word object
{
public:
Word(){appearances = 1;}
~Word(){}
int appearances;
std::string mWord;
};
bool isLetter(const char x)
{
return((x >= 'a' && x <= 'z') || (x >= 'A' && x <= 'Z'));
}
int main()
{
std::string srcFile = "myTextFile.txt"; //what file are we reading
std::ifstream ifs(srcFile);
std::string content( (std::istreambuf_iterator<char>(ifs) ),
( std::istreambuf_iterator<char>() )); //read in the file
std::vector<std::string> wordStringV; //create a vector of word strings
std::string current = ""; //define our current word
for(auto it = content.begin(); it != content.end(); ++it) //iterate over our input
{
const char currentChar = *it; //make life easier
if(currentChar == ' ')
{
wordStringV.push_back(current);
current = "";
continue;
}
else if(isLetter(currentChar))
{
current += *it;
}
}
std::sort(wordStringV.begin(), wordStringV.end(), std::less<std::string>());
std::vector<Word> wordVector;
for(auto it = wordStringV.begin(); it != wordStringV.end(); ++it) //iterate over wordString vector
{
std::vector<Word>::iterator wordIt;
//see if the current word string has appeared before...
for(wordIt = wordVector.begin(); wordIt != wordVector.end(); ++wordIt)
{
if((*wordIt).mWord == *it)
break;
}
if(wordIt == wordVector.end()) //...if not create a new Word obj
{
Word theWord;
theWord.mWord = *it;
wordVector.push_back(theWord);
}
else //...otherwise increment the appearances.
{
++((*wordIt).appearances);
}
}
//print the words out
for(auto it = wordVector.begin(); it != wordVector.end(); ++it)
{
Word theWord = *it;
std::cout << theWord.mWord << " " << theWord.appearances << "\n";
}
return 0;
}
Side Notes
Compiled with g++ version 4.2.1 with target x86_64-apple-darwin, using the compiler flag -std=c++11.
If you don't like iterators you can instead do
for(int i = 0; i < v.size(); ++i)
{
char currentChar = vector[i];
}
It's important to note that if you are capitalization agnostic simply use std::tolower on the current += *it; statement (ie: current += std::tolower(*it);).
Also, you seem like a beginner and this answer might have been too heavyweight, but you're asking for a basic parser and that is no easy task. I recommend starting by parsing simpler strings like math equations. Maybe make a calculator app.

Want to read important double value at the end of line of istream C++

I'm trying to read in a large matrix calculated from a text file for a finite element code. The matrix is spatially dependent though and thus I need to be able to conveniently organize the data. The outside source that calculated the values for the matrix was kind enough to put the following lines at the top of the text file
No. activity levels : 3
No. pitch-angles : 90
No. energies : 11
No. L-shells : 10
Which basically tell me the number of positions the matrix is known at. I want to be able to easily pick out these values because it will allow me to preallocate the size of the matrix, as well as know immediately how much I need to interpolate for values not given by this text file. I am trying to do that with the following code
#include<iostream>
#include<fstream>
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#include<vector>
using namespace std;
int main(){
string diffusionTensorFileName = "BAS_drift_averaged_chorus_kp.txt";
string sline;
int alphaSize=0;
ifstream diffusionTensorFile(diffusionTensorFileName.c_str());
while(getline(diffusionTensorFile,sline)){
if(strncmp(sline.c_str(),"No. pitch-angles : 90",sline.size()-1)==0 && sline.size()-1 != 0){
alphaSize = atoi(sline.c_str());
printf("alphaSize %d \n", alphaSize);
vector<double> alpha(alphaSize);
}
}
}
atoi of course doesn't work very well, and I can't seem to get strtod or any of those functions to work either. Any thoughts? I'm also open to this being the completely wrong way to do this and alternate suggestions on how to proceed.
I think the easiest way would be to use the scan_is method of the std::ctype facet imbued in the streams locale. Its job is to search for first character that matches a given classification and return a pointer to it. We'll take the result of that call and use std::stoi (C++11) to parse it into an integer.
std::locale loc(diffusionTensorFile.getloc());
auto& f = std::use_facet<std::ctype<char>>(loc);
while (std::getline(diffusionTensorFile, sline))
{
const char* begin = sline.front(),
end = sline.back() + 1;
const char* result;
if ((result = f.scan_is(f.digit, begin, end)) != end)
{
alphaSize = std::stoi(result);
// do something with alphaSize
}
}
Live Demo