How can I read CSV file in to vector in C++ - c++

I'm doing the project that convert the python code to C++, for better performance. That python project name is Adcvanced EAST, for now, I got the input data for nms function, in .csv file like this:
"[ 5.9358170e-04 5.2773970e-01 5.0061589e-01 -1.3098677e+00
-2.7747922e+00 1.5079222e+00 -3.4586751e+00]","[ 3.8175487e-05 6.3440394e-01 7.0218205e-01 -1.5393494e+00
-5.1545496e+00 4.2795391e+00 -3.4941311e+00]","[ 4.6003381e-05 5.9677261e-01 6.6983813e-01 -1.6515008e+00
-5.1606908e+00 5.2009044e+00 -3.0518508e+00]","[ 5.5172237e-05 5.8421570e-01 5.9929764e-01 -1.8425952e+00
-5.2444854e+00 4.5013981e+00 -2.7876694e+00]","[ 5.2929961e-05 5.4777789e-01 6.4851379e-01 -1.3151239e+00
-5.1559062e+00 5.2229333e+00 -2.4008298e+00]","[ 8.0250458e-05 6.1284608e-01 6.1014801e-01 -1.8556541e+00
-5.0002270e+00 5.2796564e+00 -2.2154367e+00]","[ 8.1256607e-05 6.1321974e-01 5.9887391e-01 -2.2241254e+00
-4.7920742e+00 5.4237065e+00 -2.2534993e+00]
one unit is 7 numbers, but a '\n' after first four numbers,
I wanna read this csv file into my C++ project,
so that I can do the math work in C++, make it more fast.
using namespace std;
void read_csv(const string &filename)
{
//File pointer
fstream fin;
//open an existing file
fin.open(filename, ios::in);
vector<vector<vector<double>>> predict;
string line;
while (getline(fin, line))
{
std::istringstream sin(line);
vector<double> preds;
double pred;
while (getline(sin, pred, ']'))
{
preds.push_back(preds);
}
}
}
For now...my code emmmmmm not working ofc,
I'm totally have no idea with this...
please help me with read the csv data into my code.
thanks

Unfortunately parsing strings (and consequently files) is very tedious in C++.
I highly recommend using a library, ideally a header-only one, like this one.
If you insist on writing it yourself, maybe you can draw some inspiration from this StackOverflow question on how to parse general CSV files in C++.

You could look at getdelim(',', fin, line),
But the other issue will be those quotes, unless you /know/ the file is always formatted exactly this way, it becomes difficult.
One hack I have used in the past that is NOT PERFECT, if the first character is a quote, then the last character before the comma must also be a matching quote, and not escaped.
If it is not a quote then getdelim() some more, but the auto-alloc feature of getdelim means you must use another buffer. In C++ I end up with a vector of all the pieces of getdelim results that then need to be concatenated to make the final string:
std::vector<char*> gotLine;
gotLine.push_back(malloc(2));
*gotLine.back() = fgetch();
gotLine.back()[1] = 0;
bool gotquote = *gotLine.back() == '"'; // perhaps different classes of quote
if (*gotLine.back() != ',')
for(;;)
{
char* gotSub= nullptr;
gotSub=getdelim(',');
gotLine.push_back(gotSub);
if (!gotquote) break;
auto subLen = strlen(gotSub);
if (subLen>1 && *(gotSub-1)=='"') // again different classes of quote
if (sublen==2 || *(gotSub-2)!='\\') // needs to be a while loop
break;
}
Then just concatenate all these string segments back together.
Note that getdelim supports null bytes. If you expect null bytes in the content, and not represented by the character sequences \000 or \# you need to store the actual length returned by getdelim, and use memcpy to concatenate them.
Oh, and if you allow utf-8 extended quotes it gets very messy!
The case this doesn't cover is a string that ends \\" or \\\\". Ideally you need to while count the number of leading backslashes, and accept the quote if the count is even.
Note that this leave the issue of unescaping the quoted content, i.e. converting any \" into ", and \\ into \, etc. Also discarding the enclosing quotes.
In the end a library may be easier if you need to deal with completely arbitrary content. But if the content is "known" you can live without.

Related

c++ Function to add an extra '\' to a filepath?

I have about 3500 full file paths to sort through (ex. "C:\Users\Nick\Documents\ReadIns\NC_000852.gbk"). I just learned that c++ does not recognize the single backslash when reading in a file path. I have about 3500 file paths that I am reading in so it would be overly tedious to manually change each one.
I have this for loop that finds the single backslash and inserts a double backslash at that index. This:
string line = "C:\Users\Nick\Documents\ReadIns\NC_000852.gbk";
for (unsigned int i = 0; i < filepath.size(); i++) {
if(filepath[i] == '\') {
filepath.insert(i, '\');
}
}
However, c++, specifically on c::b, does not compile because of the backslash character. Is there a way to add in the extra backslash character with a function?
I am reading the filepaths in from a text file, so they are being read into the string filepath variable, this is just a test.
Use double backslash as '\\' and "C:\\Users...". Because single backslash with the next character makes an escape.
Also the string::insert() method's 2nd argument expects number of characters, which is missing in your code.
With all those fixes, it compiles fine:
string filepath = "C:\\Users\\Nick\\Documents\\ReadIns\\NC_000852.gbk";
// ^^ ^^ ^^ ^^ ^^
for (unsigned int i = 0; i < filepath.size(); i++) {
if(filepath[i] == '\\') {
// ^^
filepath.insert(i, 1, '\\');
} // ^^^^^^^
}
I am not sure, how above logic will work. But below is my preferred way:
for(auto pos = filepath.find('\\'); pos != string::npos; pos = filepath.find('\\', ++pos))
filepath.insert(++pos, 1, '\\');
If you had only single character to be replaced (e.g. linux system or probably supported in windows); then, you may also use std::replace() to avoid the looping as mentioned in this answer:
std::replace(filepath.begin(), filepath.end(), '\\', '/');
I assumed that, you already have a file created which contains single backslashes and you are using that for parsing.
But from your comments, I notice that apparently you are getting the file paths directly in runtime (i.e. while running the .exe). In that case, as #MSalters has mentioned, you need not worry about such transformations (i.e. changing the backslashes).
The problem that you're seeing is because in C++, string literals are commonly enclosed in "" quotes. This brings up one minor problem: how do you put a quote inside a string literal, when that quote would end the string literal. The solution is escaping it with a \. This can also be used to add a few other characters to a string, such as \n (newline). And since \ now has a special meaning in string literals, it's also used to escape itself. So "\\" is a string containing just one character (and of course a trailing NUL).
This also applies to character literals: char example[4] = {'a', '\\', 'b', 0} is an alternative way to write "a\\b".
Now this is all about compile time, when the compiler needs to separate C++ code and string contents. Once your executable is running, a backslash is just one char. std::cout << "a\\b" prints a single backslash, because there's only one in memory. std::String word; std::cin >> word will read a single word, and if you enter one backslash then word will contain one backslash. The compiler isn't involved in that.
So if you read 3500 filenames from a std::ifstream list_of_filenames and then use that to create a further 3500 std::ifstreams, you only need to worry about backslashes in specifying that very first filename in code. And if ou take that filename from argv[1] instead, you don't need to care at all.
One way to get rid of special handling of backslash is to keep all file names in a separate disk file as such and use file stream objects such as ifstream to get file names in C++ format.
TCHAR tcszFilename[MAX_PATH] = {0};
ifstream ObjInFiles( "E:\\filenames.txt" );
ObjInFiles.getline( tcszFilename, MAX_PATH );
ObjInFiles.close();
Suppose first file name stored in filenames.txt is "e:\temp\abc.txt" then after executing getline() above, the variable tcszFilename will hold "e:\\temp\\abc.txt".

Pull out data from a file and store it in strings in C++

I have a file which contains records of students in the following format.
Umar|Ejaz|12345|umar#umar.com
Majid|Hussain|12345|majid#majid.com
Ali|Akbar|12345|ali#geeks-inn.com
Mahtab|Maqsood|12345|mahtab#myself.com
Juanid|Asghar|12345|junaid#junaid.com
The data has been stored according to the following format:
firstName|lastName|contactNumber|email
The total number of lines(records) can not exceed the limit 100. In my program, I've defined the following string variables.
#define MAX_SIZE 100
// other code
string firstName[MAX_SIZE];
string lastName[MAX_SIZE];
string contactNumber[MAX_SIZE];
string email[MAX_SIZE];
Now, I want to pull data from the file, and using the delimiter '|', I want to put data in the corresponding strings. I'm using the following strategy to put back data into string variables.
ifstream readFromFile;
readFromFile.open("output.txt");
// other code
int x = 0;
string temp;
while(getline(readFromFile, temp)) {
int charPosition = 0;
while(temp[charPosition] != '|') {
firstName[x] += temp[charPosition];
charPosition++;
}
while(temp[charPosition] != '|') {
lastName[x] += temp[charPosition];
charPosition++;
}
while(temp[charPosition] != '|') {
contactNumber[x] += temp[charPosition];
charPosition++;
}
while(temp[charPosition] != endl) {
email[x] += temp[charPosition];
charPosition++;
}
x++;
}
Is it necessary to attach null character '\0' at the end of each string? And if I do not attach, will it create problems when I will be actually implementing those string variables in my program. I'm a new to C++, and I've come up with this solution. If anybody has better technique, he is surely welcome.
Edit: Also I can't compare a char(acter) with endl, how can I?
Edit: The code that I've written isn't working. It gives me following error.
Segmentation fault (core dumped)
Note: I can only use .txt file. A .csv file can't be used.
There are many techniques to do this. I suggest searching StackOveflow for "[C++] read file" to see some more methods.
Find and Substring
You could use the std::string::find method to find the delimiter and then use std::string::substr to return a substring between the position and the delimiter.
std::string::size_type position = 0;
positition = temp.find('|');
if (position != std::string::npos)
{
firstName[x] = temp.substr(0, position);
}
If you don't terminate a a C-style string with a null character there is no way to determine where the string ends. Thus, you'll need to terminate the strings.
I would personally read the data into std::string objects:
std::string first, last, etc;
while (std::getline(readFromFile, first, '|')
&& std::getline(readFromFile, last, '|')
&& std::getline(readFromFile, etc)) {
// do something with the input
}
std::endl is a manipulator implemented as a function template. You can't compare a char with that. There is also hardly ever a reason to use std::endl because it flushes the stream after adding a newline which makes writing really slow. You probably meant to compare to a newline character, i.e., to '\n'. However, since you read the string with std::getline() the line break character will already be removed! You need to make sure you don't access more than temp.size() characters otherwise.
Your record also contains arrays of strings rather than arrays of characters and you assign individual chars to them. You either wanted to yse char something[SIZE] or you'd store strings!

C++ fstream: how to know size of string when reading?

...as someone may remember, I'm still stuck on C++ strings. Ok, I can write a string to a file using a fstream as follows
outStream.write((char *) s.c_str(), s.size());
When I want to read that string, I can do
inStream.read((char *) s.c_str(), s.size());
Everything works as expected. The problem is: if I change the length of my string after writing it to a file and before reading it again, printing that string won't bring me back my original string but a shorter/longer one. So: if I have to store many strings on a file, how can I know their size when reading it back?
Thanks a lot!
You shouldn’t be using the unformatted I/O functions (read() and write()) if you just want to write ordinary human-readable string data. Generally you only use those functions when you need to read and write compact binary data, which for a beginner is probably unnecessary. You can write ordinary lines of text instead:
std::string text = "This is some test data.";
{
std::ofstream file("data.txt");
file << text << '\n';
}
Then read them back with getline():
{
std::ifstream file("data.txt");
std::string line;
std::getline(file, line);
// line == text
}
You can also use the regular formatting operator >> to read, but when applied to string, it reads tokens (nonwhitespace characters separated by whitespace), not whole lines:
{
std::ifstream file("data.txt");
std::vector<std::string> words;
std::string word;
while (file >> word) {
words.push_back(word);
}
// words == {"This", "is", "some", "test", "data."}
}
All of the formatted I/O functions automatically handle memory management for you, so there is no need to worry about the length of your strings.
Although your writing solution is more or less acceptable, your reading solution is fundamentally flawed: it uses the internal storage of your old string as a character buffer for your new string, which is very, very bad (to put it mildly).
You should switch to a formatted way of reading and writing the streams, like this:
Writing:
outStream << s;
Reading:
inStream >> s;
This way you would not need to bother determining the lengths of your strings at all.
This code is different in that it stops at whitespace characters; you can use getline if you want to stop only at \n characters.
You can write the strings and write an additional 0 (null terminator) to the file. Then it will be easy to separate strings later. Also, you might want to read and write lines
outfile << string1 << endl;
getline(infile, string2, '\n');
If you want to use unformatted I/O your only real options are to either use a fixed size or to prepend the size somehow so you know how many characters to read. Otherwise, when using formatted I/O it somewhat depends on what your strings contain: if they can contain all viable characters, you would need to implement some sort of quoting mechanism. In simple cases, where strings consist e.g. of space-free sequence, you can just use formatted I/O and be sure to write a space after each string. If your strings don't contain some character useful as a quote, it is relatively easy to process quotes:
std::istream& quote(std::istream& out) {
char c;
if (in >> c && c != '"') {
in.setstate(std::ios_base::failbit;
}
}
out << '"' << string << "'";
std::getline(in >> std::ws >> quote, string, '"');
Obviously, you might want to bundle this functionality a class.

How to read a file and get words in C++

I am curious as to how I would go about reading the input from a text file with no set structure (Such as notes or a small report) word by word.
The text for example might be structured like this:
"06/05/1992
Today is a good day;
The worm has turned and the battle was won."
I was thinking maybe getting the line using getline, and then seeing if I can split it into words via whitespace from there. Then I thought using strtok might work! However I don't think that will work with the punctuation.
Another method I was thinking of was getting everything char by char and omitting the characters that were undesired. Yet that one seems unlikely.
So to sort the thing short:
Is there an easy way to read an input from a file and split it into words?
Since it's easier to write than to find the duplicate question,
#include <iterator>
std::istream_iterator<std::string> word_iter( my_file_stream ), word_iter_end;
size_t wordcnt;
for ( ; word_iter != word_iter_end; ++ word_iter ) {
std::cout << "word " << wordcnt << ": " << * word_iter << '\n';
}
The std::string argument to istream_iterator tells it to return a string when you do *word_iter. Every time the iterator is incremented, it grabs another word from its stream.
If you have multiple iterators on the same stream at the same time, you can choose between data types to extract. However, in that case it may be easier just to use >> directly. The advantage of an iterator is that it can plug into the generic functions in <algorithm>.
Yes. You're looking for std::istream::operator>> :) Note that it will remove consecutive whitespace but I doubt that's a problem here.
i.e.
std::ifstream file("filename");
std::vector<std::string> words;
std::string currentWord;
while(file >> currentWord)
words.push_back(currentWord);
You can use getline with a space character, getline(buffer,1000,' ');
Or perhaps you can use this function to split a string into several parts, with a certain delimiter:
string StrPart(string s, char sep, int i) {
string out="";
int n=0, c=0;
for (c=0;c<(int)s.length();c++) {
if (s[c]==sep) {
n+=1;
} else {
if (n==i) out+=s[c];
}
}
return out;
}
Notes: This function assumes that it you have declared using namespace std;.
s is the string to be split.
sep is the delimiter
i is the part to get (0 based).
You can use the scanner technique to grabb words, numbers dates etc... very simple and flexible. The scanner normally returns token (word, number, real, keywords etc..) to a Parser.
If you later intend to interpret the words, I would recommend this approach.
I can warmly recommend the book "Writing Compilers and Interpreters" by Ronald Mak (Wiley Computer Publishing)

How to read a word into a string ignoring a certain character

I am reading a text file which contains a word with a punctuation mark on it and I would like to read this word into a string without the punctuation marks.
For example, a word may be " Hello, "
I would like the string to get " Hello " (without the comma). How can I do that in C++ using ifstream libraries only.
Can I use the ignore function to ignore the last character?
Thank you in advance.
Try ifstream::get(Ch* p, streamsize n, Ch term).
An example:
char buffer[64];
std::cin.get(buffer, 64, ',');
// will read up to 64 characters until a ',' is found
// For the string "Hello," it would stream in "Hello"
If you need to be more robust than simply a comma, you'll need to post-process the string. The steps might be:
Read the stream into a string
Use string::find_first_of() to help "chunk" the words
Return the word as appropriate.
If I've misunderstood your question, please feel free to elaborate!
If you only want to ignore , then you can use getline.
const int MAX_LEN = 128;
ifstream file("data.txt");
char buffer[MAX_LEN];
while(file.getline(buffer,MAX_LEN,','))
{
cout<<buffer;
}
EDIT: This uses std::string and does away with MAX_LEN
ifstream file("data.txt");
string string_buffer;
while(getline(file,string_buffer,','))
{
cout<<string_buffer;
}
One way would be to use the Boost String Algorithms library. There are several "replace" functions that can be used to replace (or remove) specific characters or strings in strings.
You can also use the Boost Tokenizer library for splitting the string into words after you have removed the punctuation marks.