C++: Read CSV-file separated by ; AND \n [duplicate] - c++

This question already has answers here:
How can I read and parse CSV files in C++?
(39 answers)
Closed 7 years ago.
Sorry if it's just pure stupidity but I'm stuck at a file reading problem via C++.
This is the CSV data that I'd like to read:
5;1;0;3;3;5;5;3;3;3;3;2;3;3;0
5;1;0;3;3;5;0;3;3;3;3;2;0;0;3
5;1;1;3;3;0;0;0;0;3;5;2;3;3;3
0;3;5;5;0;2;0;3;3;0;5;1;1;0;0
0;0;3;5;5;2;0;0;0;0;5;5;1;1;0
0;0;0;0;5;2;0;0;0;0;0;5;5;1;0
;;;;;;;;;;;;;;
Code;Bezeichnung;Kosten;;;;;;;;;;;;
0;Ebene;6;;;;;;;;;;;;
1;Fluss;10; (begrenzt nutzbar);;;;;;;;;;;
2;Weg;2;;;;;;;;;;;;
3;Wald;8;;;;;;;;;;;;
4;Brücke;5;;;;;;;;;;;;
5;Felswand;12;;;;;;;;;;;;
here, I'd like to read the first values (separated by ;;;;) and store it in a 2 dimensional array. Which would not be a problem if it was seperated completely by ';'. But if use
while (getline(csvread, s, ';'))
{
[...]
}
I get information like this: {5}{1}{0}{3}{3}{5}{5}{3}{3}{3}{3}{2}{3}{3}{0\n5}{1}
so it basically saves the newline and does not think of it as delimitator.
So is there an option to use getline even if you have two delimitators? Or am I completely off?
I also thought about reading it line by line to a string, adding a ; to the string and rewriting it in a file in order to reuse getline using ;. But this can't seriously be the best option, right?

You should do the '\n' and ';' splitting separately:
// here split into lines by '\n'
while (getline(csvread, line, '\n'))
{
// in here, split line by ;
std::vector<std::string> elems;
boost::split(elems, line, boost::is_any_of(";"));
// do something with elems
}

you can use a splitting function like :
std::vector<std::string> split(const std::string& source, const std::string& delimiter){
std::vector<std::string> result;
size_t last = 0;
size_t next = 0;
while ((next = source.find(delimiter, last)) != std::string::npos){
result.push_back(source.substr(last, next - last));
last = next + delimiter.length();
}
result.push_back(source.substr(last));
return result;
}
now simply:
std::vector<std::vector<std::string>> parsedCSV;
while (getline(csvread, s, '\n'))
{
parsedCSV.push_back(split(s,";"));
}

I recently had to read csv-data as well and stumbled upon the same 'problem'. Here's what I did:
Read in a full line with getline(csvread, s), this will read up to the first newline.
Split the string on every occurence of ;, I've used this StackOverflow answer as inspiration to split a string, the code is also listed below.
I didn't care much for performance as I only had to run this program once, I won't comment on the speed of this workaround.
Good luck!
Edit: apparently Boost offers code to split a string, that might be cleaner, consider the code below if you want to avoid Boost.
#include <string>
#include <sstream>
#include <vector>
// source: https://stackoverflow.com/a/236803/4841248
std::vector<std::string> &split(const std::string &s, char delim, std::vector<std::string> &elems) {
std::stringstream ss(s);
std::string item;
while (std::getline(ss, item, delim)) {
elems.push_back(item);
}
return elems;
}
std::vector<std::string> split(const std::string &s, char delim) {
std::vector<std::string> elems;
split(s, delim, elems);
return elems;
}

try something like this:
std::vector<std::string> cells;
while (getline(csvread, s) ){
boost::split(cells, s, boost::is_any_of(";"));
....
}

Related

Split text with array of delimiters

I want a function that split text by array of delimiters. I have a demo that works perfectly, but it is really really slow. Here is a example of parameters.
text:
"pop-pap-bab bob"
vector of delimiters:
"-"," "
the result:
"pop", "-", "pap", "-", "bab", "bob"
So the function loops throw the string and tries to find delimeters and if it finds one it pushes the text and the delimiter that was found to the result array, if the text only contains spaces or if it is empty then don't push the text.
std::string replace(std::string str,std::string old,std::string new_str){
size_t pos = 0;
while ((pos = str.find(old)) != std::string::npos) {
str.replace(pos, old.length(), new_str);
}
return str;
}
std::vector<std::string> split_with_delimeter(std::string str,std::vector<std::string> delimeters){
std::vector<std::string> result;
std::string token;
int flag = 0;
for(int i=0;i<(int)str.size();i++){
for(int j=0;j<(int)delimeters.size();j++){
if(str.substr(i,delimeters.at(j).size()) == delimeters.at(j)){
if(token != ""){
result.push_back(token);
token = "";
}
if(replace(delimeters.at(j)," ","") != ""){
result.push_back(delimeters.at(j));
}
i += delimeters.at(j).size()-1;
flag = 1;
break;
}
}
if(flag == 0){token += str.at(i);}
flag = 0;
}
if(token != ""){
result.push_back(token);
}
return result;
}
My issue is that, the functions is really slow since it has 3 loops. I am wondering if anyone knows how to make the function faster. I am sorry, if I wasn't clear enough my english isn't the best.
It might be a good idea to use boost expressive. It is a powerful tool for various string operations more than struggling with string::find_xx and self for-loop or regex.
Concise explanation:
+as_xpr(" ") is repeated match more than 1 like regex and then prefix "-" means
shortest match.
If you define regex parser as sregex rex = "(" >> (+_w | +"_") >> ":" >> +_d >> ")", it would match (port_num:8080). In this case, ">>" means the concat of parsers and (+_w | +"_") means that it matches character or "_" repeatedly.
#include <vector>
#include <string>
#include <iostream>
#include <boost/xpressive/xpressive.hpp>
using namespace std;
using namespace boost::xpressive;
int main() {
string source = "Nigeria is a multi&&national state in--habited by more than 2;;50 ethnic groups speak###ing 500 distinct languages";
vector<string> delimiters{ " ", " ", "&&", "-", ";;", "###"};
vector<sregex> pss{ -+as_xpr(delimiters.front()) };
for (const auto& d : delimiters) pss.push_back(pss.back() | -+as_xpr(d));
vector<string> ret;
size_t pos = 0;
auto push = [&](auto s, auto e) { ret.push_back(source.substr(s, e)); };
for_each(sregex_iterator(source.begin(), source.end(), pss.back()), {}, [&](smatch const& m) {
if (m.position() - pos) push(pos, m.position() - pos);
pos = m.position() + m.str().size();
}
);
push(pos, source.size() - pos);
for (auto& s : ret) printf("%s\n", s.c_str());
}
Output is splitted by multiple string delimiers.
Nigeria
is
a
multi
national
state
in
habited
by
more
than
2
50
ethnic
groups
speak
ing
500
distinct
languages
Maybe, as an alternative, you could use a regex? But maybe also too slow for you . . .
With a regex life would be very simple.
Please see the following example:
#include <iostream>
#include <string>
#include <vector>
#include <regex>
#include <iterator>
const std::regex re(R"((\w+|[\- ]))");
int main() {
std::string s{"pop-pap-bab bob"};
std::vector<std::string> part{std::sregex_token_iterator(s.begin(),s.end(),re),{}};
for (const std::string& p : part) std::cout << p << '\n';
}
We use the std::sregex_token_iterator in combination with the std::vectors range constructor, to extract everything specified in the regex and then put all those stuff into the std::vector
The regex itself is also simple. It specifies words or delimiters.
Maybe its worth a try . . .
NOTE: You've complained that your code is slow, but it's important to understand that most of the answers will have options to potentially speed up the program. And even if the author of the option measured the acceleration of the program, the option may be slower on your machine, so do not forget to measure the execution speed yourself.
If I were you, I would create a separate function that receives an array of strings and outputs an array of delimited strings. The problem with this approach may be that if the delimiter includes another delimiter, the result may not be what you expect, but it will be easier to iterate through different options for string splitting, finding the best.
And my solution would looks like this(though, it requires c++20)
#include <iomanip>
#include <iostream>
#include <ranges>
#include <string_view>
#include <vector>
std::vector<std::string> split_elems_of_array(const std::vector<std::string>& array, const std::string& delim)
{
std::vector<std::string> result;
for(const auto str: array)
{
for (const auto word : std::views::split(str, delim))
{
std::string chunk(word.begin(), word.end());
if(!chunk.empty() && chunk != " ")
result.push_back(chunk + delim);
}
}
return result;
}
std::vector<std::string> split_string(std::string str, std::vector<std::string> delims)
{
std::vector<std::string> result = {std::string(str)};
for(const auto&delim: delims)
result = split_elems_of_array(result, delim);
return {result.begin(), result.end()};
}
For my machine, my approach is 56 times faster: 67 ms versus 5112 ms. Length of string is 1000000, there are 100 delims with length 100
Here is the algorithm of standard splitting. if you split pop-pap-bab bob by {'-' , ' '} it gives you ["pop", "pap", "bab", "bob"] it's not storing delimiters and doesn't check for empty text. You can change it to do those things too.
Define a vector of strings named result.
Define a string variable named buffer.
Loop over your string, if current character is not a delimiter append it to buffer.
if current character is a delimiter, append buffer to result.
Return result at the end.
std::vector<std::string> split(std::string str, std::vector<char> delimiters)
{
std::vector<std::string> result;
std::string buffer;
for (const auto ch : str)
{
if (std::find(delimiters.begin(), delimiters.end(), ch) == delimiters.end())
buffer += ch;
else
{
result.insert(result.end(), buffer);
buffer.clear();
}
}
if (buffer.length())
result.insert(result.end(), buffer);
return result;
}
It's time complexity is O(n.m). n is the length of string and m is the length of delimiters.

Reading .txt file and organizing into two-dimensional array

I'm looking to take a somewhat lengthy text file 50 rows by 2 columns, have a user input the file name and read it into a two demensional array. The text file is a combination of organized names (including commas) and numbers.
I can get the console to display the text file itself, but I'm stuck when it comes to orgazing the data into the array. I'm trying to devise a loop code involving getline and find in order for program through sort through the .txt, stop at a comma and record every character before that comma into a location (i.e [0] [0]) of the array. I'm aware that using vectors would be easier, but I'd like to solve this with an array.
Also, there is the issue of reading names (strings) into the array (int).
Please test this code:
#include <vector>
#include <fstream>
#include <string>
#include <sstream>
#include <iterator>
template<typename Out>
void split(const std::string &s, char delim, Out result) {
std::stringstream ss;
ss.str(s);
std::string item;
while (std::getline(ss, item, delim)) {
*(result++) = item;
}
}
std::vector<std::string> split(const std::string &s, char delim) {
std::vector<std::string> elems;
split(s, delim, std::back_inserter(elems));
return elems;
}
int main()
{
std::ifstream file("test.txt", std::ios::binary);
std::string a, b;
int c;
std::vector<std::vector<std::string>> arr;
if (file)
{
while (file >> a )
{
std::vector<std::string> v = split(a, ',');
arr.push_back(v);
}
}
return 0;
}
my test.txt:
m,2
n,4
o,6
p,8
q,10

How to get array from txt in c++ [duplicate]

This question already has answers here:
How can I read and parse CSV files in C++?
(39 answers)
Closed 7 years ago.
Sorry if it's just pure stupidity but I'm stuck at a file reading problem via C++.
This is the CSV data that I'd like to read:
5;1;0;3;3;5;5;3;3;3;3;2;3;3;0
5;1;0;3;3;5;0;3;3;3;3;2;0;0;3
5;1;1;3;3;0;0;0;0;3;5;2;3;3;3
0;3;5;5;0;2;0;3;3;0;5;1;1;0;0
0;0;3;5;5;2;0;0;0;0;5;5;1;1;0
0;0;0;0;5;2;0;0;0;0;0;5;5;1;0
;;;;;;;;;;;;;;
Code;Bezeichnung;Kosten;;;;;;;;;;;;
0;Ebene;6;;;;;;;;;;;;
1;Fluss;10; (begrenzt nutzbar);;;;;;;;;;;
2;Weg;2;;;;;;;;;;;;
3;Wald;8;;;;;;;;;;;;
4;Brücke;5;;;;;;;;;;;;
5;Felswand;12;;;;;;;;;;;;
here, I'd like to read the first values (separated by ;;;;) and store it in a 2 dimensional array. Which would not be a problem if it was seperated completely by ';'. But if use
while (getline(csvread, s, ';'))
{
[...]
}
I get information like this: {5}{1}{0}{3}{3}{5}{5}{3}{3}{3}{3}{2}{3}{3}{0\n5}{1}
so it basically saves the newline and does not think of it as delimitator.
So is there an option to use getline even if you have two delimitators? Or am I completely off?
I also thought about reading it line by line to a string, adding a ; to the string and rewriting it in a file in order to reuse getline using ;. But this can't seriously be the best option, right?
You should do the '\n' and ';' splitting separately:
// here split into lines by '\n'
while (getline(csvread, line, '\n'))
{
// in here, split line by ;
std::vector<std::string> elems;
boost::split(elems, line, boost::is_any_of(";"));
// do something with elems
}
you can use a splitting function like :
std::vector<std::string> split(const std::string& source, const std::string& delimiter){
std::vector<std::string> result;
size_t last = 0;
size_t next = 0;
while ((next = source.find(delimiter, last)) != std::string::npos){
result.push_back(source.substr(last, next - last));
last = next + delimiter.length();
}
result.push_back(source.substr(last));
return result;
}
now simply:
std::vector<std::vector<std::string>> parsedCSV;
while (getline(csvread, s, '\n'))
{
parsedCSV.push_back(split(s,";"));
}
I recently had to read csv-data as well and stumbled upon the same 'problem'. Here's what I did:
Read in a full line with getline(csvread, s), this will read up to the first newline.
Split the string on every occurence of ;, I've used this StackOverflow answer as inspiration to split a string, the code is also listed below.
I didn't care much for performance as I only had to run this program once, I won't comment on the speed of this workaround.
Good luck!
Edit: apparently Boost offers code to split a string, that might be cleaner, consider the code below if you want to avoid Boost.
#include <string>
#include <sstream>
#include <vector>
// source: https://stackoverflow.com/a/236803/4841248
std::vector<std::string> &split(const std::string &s, char delim, std::vector<std::string> &elems) {
std::stringstream ss(s);
std::string item;
while (std::getline(ss, item, delim)) {
elems.push_back(item);
}
return elems;
}
std::vector<std::string> split(const std::string &s, char delim) {
std::vector<std::string> elems;
split(s, delim, elems);
return elems;
}
try something like this:
std::vector<std::string> cells;
while (getline(csvread, s) ){
boost::split(cells, s, boost::is_any_of(";"));
....
}

Attempting to get data from lines in a file

I'm quite new to C++, so sorry if this is a dumb question!
For a project we are given a file with a couple of thousand lines of values, each line having 9 different numbers.
I want to create a for/while loop that, for each loop, stores the 8th and 9th integer of a line as a variable so that I can do some calculations with them. The loop would then move onto the next line, store the 8th and 9th numbers of that line as the same variable, so that I can do the same calculation to it, ending when I've run out of lines.
My problem is less to do with reading the file, I'm just confused how I'd tell it to take only the 8th and 9th value from each line.
Thanks for any help, it is greatly appreciated!
Designed for readability rather than speed. It also performs no checking that the input file is the correct format.
template<class T> ConvertFromString(const std::string& s)
{
std::istringstream ss(s);
T x;
ss >> x;
return x;
}
std::vector<int> values8;
std::vector<int> values9;
std::ifstream file("myfile.txt");
std::string line;
while (std::getline(file, line))
{
std::istringstream ss(line);
for (int i = 0; i < 9; i++)
{
std::string token;
ss >> token;
switch (i)
{
case 8:
{
values8.push_back(ConvertFromString<int>(token));
}
break;
case 9:
{
values9.push_back(ConvertFromString<int>(token));
}
break;
}
}
}
First, split the string, then convert those to numbers using atoi. You then will take the 8th and 9th values from the array or vector with the numbers.
//Split string
std::vector<std::string> &split(const std::string &s, char delim, std::vector<std::string> &elems) {
std::stringstream ss(s);
std::string item;
while (std::getline(ss, item, delim)) {
elems.push_back(item);
}
return elems;
}
std::vector<std::string> split(const std::string &s, char delim) {
std::vector<std::string> elems;
split(s, delim, elems);
return elems;
}
//new code goes here
std::string line;
std::vector<std::string> lineSplit = split(line, ' ');
std::vector<int> numbers;
for (int i = 0; i < lineSplit.size(); i++)
numbers.push_back(atoi(lineSplit[i]);//or stoi
int numb1 = numbers[7];//8th
int numb2 = numbers[8];//9th

creating vectors for individually parsed sentences c++

class Read
{
public:
Read(const char* filename)
:mFile(filename)
{
}
void setString()
{
while(getline(mFile, str, '.'))
{
getline(mFile, str, '.');
str.erase(std::remove(str.begin(), str.end(), '\n'), str.end());
}
}
private:
ifstream mFile;
string str;
};
int main()
{
Read r("sample.txt");
return 0;
}
My ultimate goal is to parse through each sentence in the file so I used getline setting the delimiter to '.' to get each individual sentence. I want to create a sentence vector but am not really sure how to do so.
The file is pretty big so it will have a lot of sentences. How do I create a vector for each sentence?
Will it simply be vector < string > str? How will it know the size?
EDIT: I added a line of code to remove the '\n'
EDIT: Got rid of !eof
while(!myFile.eof())
getline(mFile, str, '.');
Where did you find that? Please put it back. Try:
std::vector<std::string> sentences;
while(std::getline(mFile, str, '.'))
sentences.push_back(str);
The vector container has a .size() function to return the number of populated elements. You should google "std::vector" and read through the functions in the API.
Vectors are dynamica arrays. You need not to worry about the size of the vector. You can use push_back() function to add element in the vector. I have made some changes in your code. Please check if this work for you..
#include<vector>
using namespace std;
class Read
{
public:
Read(const char* filename)
:mFile(filename)
{
}
void setString()
{
while(getline(mFile, str, '.'))
{
vec.push_back(str);
}
}
private:
ifstream mFile;
string str;
vector<string> vec;
};
int main()
{
Read r("sample.txt");
return 0;
}
#include <vector>
using namespace std;
...
vector<string> sentences;
sentences.push_back(line);
The vector is a dynamic array and it will resize itself as you keep adding sentences. If you know the number of sentences, you can increase the performance by calling:
sentences.resize(number of sentences here)