Parsing line in a text file into vectors - c++

#include <string>
#include <iostream>
#include <vector>
#include <sstream>
#include <fstream>
using namespace std;
vector <string> tokenizeString(string filename, string delimiter);
int main() {
vector<string> tokens = tokenizeString("cityLocation.txt", "-");
for (int i = 0; i < tokens.size(); i++) {
cout << tokens[i];
}
return 0;
}
vector <string> tokenizeString (string filename, string delimiter) {
size_t pos = 0;
vector<string>tokens;
string token;
ifstream cityText(filename);
string line;
while (getline(cityText, line)) {
while ((pos = line.find(delimiter)) != string::npos) {
token = line.substr(0,pos);
tokens.push_back (token);
line.erase(0, pos + delimiter.length());
}
}
return (tokens);
}
So this is my code, and my text file data are
[1,1]-3-Big_City
[1,2]-3-Big_City
[1,3]-3-Big_City
[2,1]-3-Big_City
[2,2]-3-Big_City
[2,3]-3-Big_City
[2,7]-2-Mid_City
[2,8]-2-Mid_City
[3,1]-3-Big_City
My code is skipping all the Big_city and Mid_city.
It prints out only the first and second column data.
My delimiter is suppose to be '-'.
I haven't tried saving the data into vectors but would like some recommendation on how to do that

That is because you need another run for the last field after the last delimiter. You can accomplish this by using a post-test loop that will excecute one more time when pos==string::npos, therefore adding line.substr(pos,string::npos); as a token which is defined to be the substring from position pos to the end of the string.
vector <string> tokenizeString (string filename, string delimiter) {
vector<string>tokens;
string token;
ifstream cityText(filename);
string line;
while (cityText >> line) {
size_t pos = 0, lastpos=0;
do {
pos = line.find(delimiter, lastpos);
token = line.substr(lastpos,pos-lastpos);
tokens.push_back (token);
lastpos=pos+1;
} while (pos != string::npos);
}
return (tokens);
}

Related

C++ reading string using more delimiters

I'm quite new to c++. My problem is that I have a string that can be any length and ends with \n. For example:
const string s = "Daniel,20;Michael,99\n"
(It's always "name,age;name,age;name,age.............\n")
and I want to separate name and age and put it into two vectors so it can be stored. But I dont know how to manage string with more separators. So the example would be separated like this:
Vector name contains {Daniel,Michael}
Vector age contains {20,99}
You can use stringstream and getline for this purpose, but since you have a very specific format, simple std::string::find is likely to fix your issue. Here is a simple example:
#include <vector>
#include <string>
#include <cstdio>
#include <cstdlib>
#include <cstddef>
int main() {
std::string const s = "Daniel,20;Michael,99;Terry,42;Jack,34";
std::vector<std::string> names;
std::vector<int> ages;
std::size_t beg = 0;
std::size_t end = 0;
while ((end = s.find(',', end)) != s.npos) {
names.emplace_back(s, beg, end - beg);
char* pend;
ages.push_back(std::strtol(s.c_str() + end + 1, &pend, 10));
end = beg = pend - s.c_str() + 1;
}
for (auto&& n : names) std::puts(n.c_str());
for (auto&& a : ages) std::printf("%d\n", a);
}
Sorry my C++ skills have faded, but this is what I would do :-
vector <string> names;
vector <string> ages;
string inputString = "Daniel,20;Michael,99;Terry,42;Jack,34";
string word = "";
for(int i = 0; i<inputString.length(); i++)
{
if(inputString[i] == ';')
{
ages.push_back(word);
word = "";
}
else if (inputString[i] == ',')
{
names.push_back(word);
word = "";
}
else
{
word = word + inputString[i];
}
}
ages.push_back(word);

How to split string read from text file into array using c++

I want to split the strings on each line of my text file into an array, similar to the split() function in python. my desired syntax is a loop that enters every split-string into the next index of an array,
so for example if my string:
"ab,cd,ef,gh,ij"
, every time I encounter a comma then I would:
datafile >> arr1[i]
and my array would end up:
arr1 = [ab,cd,ef,gh,ij]
a mock code without reading a text file is provided below
#include <iostream>
#include <fstream>
#include <stdio.h>
#include <string.h>
#include <string>
using namespace std;
int main(){
char str[] = "ab,cd,ef,gh,ij"; //" ex str in place of file contents/fstream sFile;"
const int NUM = 5;
string sArr[NUM];//empty array
char *token = strtok(str, ",");
for (int i=0; i < NUM; i++)
while((token!=NULL)){
("%s\n", token) >> sArr[i];
token = strtok(NULL, ",");
}
cout >> sArr;
return 0;
}
In C++ you can read a file line by line and directly get a std::string.
You will found below an example I made with a split() proposal as you requested, and a main() example of reading a file:
Example
data file:
ab,cd,ef,gh
ij,kl,mn
c++ code:
#include <fstream>
#include <iostream>
#include <vector>
std::vector<std::string> split(const std::string & s, char c);
int main()
{
std::string file_path("data.txt"); // I assumed you have that kind of file
std::ifstream in_s(file_path);
std::vector <std::vector<std::string>> content;
if(in_s)
{
std::string line;
std::vector <std::string> vec;
while(getline(in_s, line))
{
for(const std::string & str : split(line, ','))
vec.push_back(str);
content.push_back(vec);
vec.clear();
}
in_s.close();
}
else
std::cout << "Could not open: " + file_path << std::endl;
for(const std::vector<std::string> & str_vec : content)
{
for(unsigned int i = 0; i < str_vec.size(); ++i)
std::cout << str_vec[i] << ((i == str_vec.size()-1) ? ("") : (" : "));
std::cout << std::endl;
}
return 0;
}
std::vector<std::string> split(const std::string & s, char c)
{
std::vector<std::string> splitted;
std::string word;
for(char ch : s)
{
if((ch == c) && (!word.empty()))
{
splitted.push_back(word);
word.clear();
}
else
word += ch;
}
if(!word.empty())
splitted.push_back(word);
return splitted;
}
output:
ab : cd : ef : gh
ij : kl : mn
I hope it will help.
So, a few things to fix. Firstly, arrays and NUM are kind of limiting - you have to fix up NUM whenever you change the input string, so C++ provides std::vector which can resize itself to however many strings it finds. Secondly, you want to call strtok until it returns nullptr once, and you can do that with one loop. With both your for and NUM you call strtok too many times - even after it has returned nullptr. Next, to put the token into a std::string, you would assign using my_string = token; rather than ("%s\n", token) >> my_string - which is a broken mix of printf() formatting and C++ streaming notation. Lastly, to print the elements you've extracted, you can use another loop. All these changes are illustrated below.
char str[] = "ab,cd,ef,gh,ij";
std::vector<std::string> strings;
char* token = strtok(str, ",");
while ((token != nullptr))
{
strings.push_back(token);
token = strtok(NULL, ",");
}
for (const auto& s : strings)
cout >> s >> '\n';
Your code is overly complicated and wrong.
You probably want this:
#include <iostream>
#include <string>
#include <string.h>
using namespace std;
int main() {
char str[] = "ab,cd,ef,gh,ij"; //" ex str in place of file contents/fstream sFile;"
const int NUM = 5;
string sArr[NUM];//empty array
char *token = strtok(str, ",");
int max = 0;
while ((token != NULL)) {
sArr[max++] = token;
token = strtok(NULL, ",");
}
for (int i = 0; i < max; i++)
cout << sArr[i] << "\n";
return 0;
}
This code is still poor and no bound checking is done.
But anyway, you should rather do it the C++ way as suggested in the other answers.
Use boost::split
#include <boost/algorithm/string.hpp>
[...]
std::vector<std::string> strings;
std::string val("ab,cd,ef,gh,ij");
boost::split(strings, val, boost::is_any_of(","));
You could do something like this
std::string str = "ab,cd,ef,gh,ij";
std::vector<std::string> TokenList;
std::string::size_type lastPos = 0;
std::string::size_type pos = str.find_first_of(',', lastPos);
while(pos != std::string::npos)
{
std::string temp(str, lastPos, pos - lastPos);
TokenList.push_back(temp);
lastPos = pos + 1;
pos = str.find_first_of(',', lastPos);
}
if(lastPos != str.size())
{
std::string temp(str, lastPos, str.size());
TokenList.push_back(temp);
}
for(int i = 0; i < TokenList.size(); i++)
std::cout << TokenList.at(i) << std::endl;

Counting occurrences of word in vector of characters

I have written a program to store a text file in vector of characters .
#include<iostream>
#include<fstream>
#include <algorithm>
#include<vector>
using namespace std;
int main()
{
vector<char> vec;
ifstream file("text.txt");
if(!file.eof() && !file.fail())
{
file.seekg(0, std::ios_base::end);
std::streampos fileSize = file.tellg();
vec.resize(fileSize);
file.seekg(0, std::ios_base::beg);
file.read(&vec[0], fileSize);
}
int c = count(vec.begin(), vec.end(), 'U');
cout << c;
return 0;
}
I want to count occurrence of "USER" in the text file , but using count i can only count number of characters . How can i count number of occurrences of "USER" in the vector of character?
For example
text.txt
USERABRUSER#$$* 34 USER ABC RR IERUSER
Then the count of "USER" is 4. Words can only be in uppercase.
std::string has a find member function that will find an occurrence of one string inside another. You can use that to count occurrences something like this:
size_t count(std::string const &haystack, std::string const &needle) {
auto occurrences = 0;
auto len = needle.size();
auto pos = 0;
while (std::string::npos != (pos = haystack.find(needle, pos))) {
++occurrences;
pos += len;
}
return occurrences;
}
For example:
int main() {
std::string input{ "USERABRUSER#$$* 34 USER ABC RR IERUSER" };
std::cout << count(input, "USER");
}
...produces an output of 4.
This is how I would do it:
#include <fstream>
#include <sstream>
#include <iostream>
#include <unordered_map>
#include <string>
using namespace std;
int main() {
unordered_map<string, size_t> data;
string line;
ifstream file("text.txt");
while (getline(file, line)) {
istringstream is(line);
string word;
while (is >> word) {
++data[word];
}
}
cout << data["USER"] << endl;
return 0;
}
Let's try again. Once again, a vector isn't necessary. This is what I would consider to be the most C++ idiomatic way. It uses std::string's find() method to repeatedly find the substring in order until the end of the string is reached.
#include <fstream>
#include <iostream>
#include <string>
int main() {
// Read entire file into a single string.
std::ifstream file_stream("text.txt");
std::string file_contents(std::istreambuf_iterator<char>(file_stream),
std::istreambuf_iterator<char>());
unsigned count = 0;
std::string substr = "USER";
for (size_t i = file_contents.find(substr); i != std::string::npos;
i = str.find(substr, i + substr.length())) {
++count;
}
}

C++ reading a mathematical function and sorting

I'm reading a function from a file in the format f(x,y,f(x),g) once I read the input it is stored as a vector and I am trying to get each value between the commas so in this case i want to get x, y f(x) and g as separate chars/strings. I'm stuck, any ideas?
Here is the solution I came up with:
#include <iostream>
#include <string>
#include <vector>
#include <sstream>
using namespace std;
//Split string into vector of strings
vector<string> split(string str, char delimiter)
{
vector<string> internal;
stringstream ss(str); // Turn the string into a stream.
string tok;
while(getline(ss, tok, delimiter))
{
internal.push_back(tok);
}
return internal;
}
int main()
{
string myInput = "f(x,y,f(x),g)";
//Extract the string between outer brackets
size_t startIndex = myInput.find_first_of("(") + 1;
size_t endIndex = myInput.find_last_of(")");
string innerStr = myInput.substr(startIndex, endIndex-startIndex);
//Split the result by comma
vector<string> sep = split(innerStr, ',');
for(unsigned int i = 0; i < sep.size(); ++i)
{
cout << sep[i] << endl;
}
}
Hope it helps

C++ split string by line

I need to split string by line.
I used to do in the following way:
int doSegment(char *sentence, int segNum)
{
assert(pSegmenter != NULL);
Logger &log = Logger::getLogger();
char delims[] = "\n";
char *line = NULL;
if (sentence != NULL)
{
line = strtok(sentence, delims);
while(line != NULL)
{
cout << line << endl;
line = strtok(NULL, delims);
}
}
else
{
log.error("....");
}
return 0;
}
I input "we are one.\nyes we are." and invoke the doSegment method. But when i debugging, i found the sentence parameter is "we are one.\\nyes we are", and the split failed. Can somebody tell me why this happened and what should i do. Is there anyway else i can use to split string in C++. thanks !
I'd like to use std::getline or std::string::find to go through the string.
below code demonstrates getline function
int doSegment(char *sentence)
{
std::stringstream ss(sentence);
std::string to;
if (sentence != NULL)
{
while(std::getline(ss,to,'\n')){
cout << to <<endl;
}
}
return 0;
}
You can call std::string::find in a loop and the use std::string::substr.
std::vector<std::string> split_string(const std::string& str,
const std::string& delimiter)
{
std::vector<std::string> strings;
std::string::size_type pos = 0;
std::string::size_type prev = 0;
while ((pos = str.find(delimiter, prev)) != std::string::npos)
{
strings.push_back(str.substr(prev, pos - prev));
prev = pos + delimiter.size();
}
// To get the last substring (or only, if delimiter is not found)
strings.push_back(str.substr(prev));
return strings;
}
See example here.
#include <sstream>
#include <string>
#include <vector>
std::vector<std::string> split_string_by_newline(const std::string& str)
{
auto result = std::vector<std::string>{};
auto ss = std::stringstream{str};
for (std::string line; std::getline(ss, line, '\n');)
result.push_back(line);
return result;
}
#include <iostream>
#include <string>
#include <regex>
#include <algorithm>
#include <iterator>
using namespace std;
vector<string> splitter(string in_pattern, string& content){
vector<string> split_content;
regex pattern(in_pattern);
copy( sregex_token_iterator(content.begin(), content.end(), pattern, -1),
sregex_token_iterator(),back_inserter(split_content));
return split_content;
}
int main()
{
string sentence = "This is the first line\n";
sentence += "This is the second line\n";
sentence += "This is the third line\n";
vector<string> lines = splitter(R"(\n)", sentence);
for (string line: lines){cout << line << endl;}
}
We have a string with multiple lines
we split those into an array (vector)
We print out those elements in a for loop
Using the library range-v3:
#include <range/v3/all.hpp>
#include <string>
#include <string_view>
#include <vector>
std::vector<std::string> split_string_by_newline(const std::string_view str) {
return str | ranges::views::split('\n')
| ranges::to<std::vector<std::string>>();
}
Using C++23 ranges:
#include <ranges>
#include <string>
#include <string_view>
#include <vector>
std::vector<std::string> split_string_by_newline(const std::string_view str) {
return str | std::ranges::views::split('\n')
| std::ranges::to<std::vector<std::string>>();
}
This fairly inefficient way just loops through the string until it encounters an \n newline escape character. It then creates a substring and adds it to a vector.
std::vector<std::string> Loader::StringToLines(std::string string)
{
std::vector<std::string> result;
std::string temp;
int markbegin = 0;
int markend = 0;
for (int i = 0; i < string.length(); ++i) {
if (string[i] == '\n') {
markend = i;
result.push_back(string.substr(markbegin, markend - markbegin));
markbegin = (i + 1);
}
}
return result;
}