Reading data in from a .csv into usable format using C++ - c++

I would like to be able to read the data that I have into C++ and then start to do things to manipulate it. I am quite new but have a tiny bit of basic knowledge. The most obvious way of doing this that strikes me (and maybe this comes from using excel previously) would be to read the data into a 2d array. This is the code that I have so far.
#include <iostream>
#include <fstream>
#include <algorithm>
#include <string>
#include <sstream>
using namespace std;
string C_J;
int main()
{
float data[1000000][10];
ifstream C_J_input;
C_J_input.open("/Users/RT/B/CJ.csv");
if (!C_J_input) return -1;
for(int row = 0; row <1000000; row++)
{
string line;
getline(C_J_input, C_J, '?');
if ( !C_J_input.good() )
break;
stringstream iss(line);
for(int col = 0; col < 10; col++)
{
string val;
getline(iss, val, ',');
if (!iss.good() )
break;
stringstream converter(val);
converter >> data[row][col];
}
}
cout << data;
return 0;
}
Once I have the data read in I would like to be able to read through it line by line and then pull analyse it, looking for certain things however I think that could probably be the topic of another thread, once I have the data read in.
Just let me know if this is a bad question in any way and I will try to add anything more that might make it better.
Thanks!

as request of the asker, this is how you would load it into a string, then split into lines, and then further split into elements:
#include <iostream>
#include <string>
#include <fstream>
#include <vector>
#include <sstream>
//This takes a string and splits it with a delimiter and returns a vector of strings
std::vector<std::string> &SplitString(const std::string &s, char delim, std::vector<std::string> &elems)
{
std::stringstream ss(s);
std::string item;
while (std::getline(ss, item, delim))
{
elems.push_back(item);
}
return elems;
}
int main(int argc, char* argv[])
{
//load the file with ifstream
std::ifstream t("test.csv");
if (!t)
{
std::cout << "Unknown File" << std::endl;
return 1;
}
//this is just a block of code designed to load the whole file into one string
std::string str;
//this sets the read position to the end
t.seekg(0, std::ios::end);
str.reserve(t.tellg());//this gives the string enough memory to allocate up the the read position of the file (which is the end)
t.seekg(0, std::ios::beg);//this sets the read position back to the beginning to start reading it
//this takes the everything in the stream (the file data) and loads it into the string.
//istreambuf_iterator is used to loop through the contents of the stream (t), and in this case go up to the end.
str.assign((std::istreambuf_iterator<char>(t)),
std::istreambuf_iterator<char>());
//if (sizeof(rawData) != *rawSize)
// return false;
//if the file has size (is not empty) then analyze
if (str.length() > 0)
{
//the file is loaded
//split by delimeter(which is the newline character)
std::vector<std::string> lines;//this holds a string for each line in the file
SplitString(str, '\n', lines);
//each element in the vector holds a vector of of elements(strings between commas)
std::vector<std::vector<std::string> > LineElements;
//for each line
for (auto it : lines)
{
//this is a vector of elements in this line
std::vector<std::string> elementsInLine;
//split with the comma, this would seperate "one,two,three" into {"one","two","three"}
SplitString(it, ',', elementsInLine);
//take the elements in this line, and add it to the line-element vector
LineElements.push_back(elementsInLine);
}
//this displays each element in an organized fashion
//for each line
for (auto it : LineElements)
{
//for each element IN that line
for (auto i : it)
{
//if it is not the last element in the line, then insert comma
if (i != it.back())
std::cout << i << ',';
else
std::cout << i;//last element does not get a trailing comma
}
//the end of the line
std::cout << '\n';
}
}
else
{
std::cout << "File Is empty" << std::endl;
return 1;
}
system("PAUSE");
return 0;
}

On second glance, I've noticed few obvious issues which will slow your progress greatly, so I'll drop them here:
1) you are using two disconnected variables for reading the lines:
C_J - which receives data from getline function
line - which is used as the source of stringstream
I'm pretty sure that the C_J is completely unnecessary. I think you wanted to simply do
getline(C_J_input, line, ...) // so that the textline read will fly to the LINE var
// ...and later
stringstream iss(line); // no change
or, alternatively:
getline(C_J_input, C_J, ...) // no change
// ...and later
stringstream iss(C_J); // so that ISS will read the textline we've just read
elsewise, the stringstream will never see what getline has read form the file - getline writes the data to different place (C_J) than the stringstream looks at (line).
2) another tiny bit is that you are feeding a '?' into getline() as the line separator. CSVs usually use a 'newline' character to separate the data lines. Of course, your input file may use '?' - I dont know. But if you wanted to use a newline instead then omit the parameter at all, getline will use default newline character matching your OS, and this will probably be just OK.
3) your array of float is, um huge. Consider using list instead. It will nicely grow as you read rows. You can even nest them, so list<list<float>> is also very usable. I'd actually probably use list<vector<float>> as the number of columns is constant though. Using a preallocated huge array is not a good idea, as there always be a file with one-line-too-much you know and ka-boom.
4) your code contains a just-as-huge loop that iterates a constant number of times. A loop itself is ok, but the linecount will vary. You actually don't need to count the lines. Especially if you use list<> to store the values. Just like you;ve checked if the file is properly open if(!C_J_input), you may also check if you have reached End-Of-File:
if(C_J_input.eof())
; // will fire ONLY if you are at the end of the file.
see here for an example
uh.. well, that's for start. Goodluck!

Related

How can I read from a file and sort them by category

I'm trying to read a bunch of words from a file and sort them into what kind of words they are (Nouns, Adjective, Verbs ..etc). For example :
-Nouns;
zyrian
zymurgy
zymosis
zymometer
zymolysis
-Verbs_participle;
zoom in
zoom along
zoom
zonk out
zone
I'm using getline to read until the delimiter ';' but how can I know when it read in a type and when it read in a word?
The function below stop right after "-Nouns;"
int main()
{
map<string,string> data_base;
ifstream source ;
source.open("partitioned_data.txt");
char type [MAX];
char word [MAX];
if(source) //check to make sure we have opened the file
{
source.getline(type,MAX,';');
while( source && !source.eof())//make sure we're not at the end of file
{
source.getline(word,MAX);
cout<<type<<endl;
cout<<word<<endl;
source.getline(type,MAX,';');//read the next line
}
}
source.close();
source.clear();
return 0;
}
I am not fully sure about the format of your input file. But you seem to have a file with lines, and in that, items separated by a semicolon.
Reading this should be done differently.
Please see the following example:
#include <iostream>
#include <string>
#include <sstream>
#include <fstream>
std::istringstream source{R"(noun;tree
noun;house
verb;build
verb;plant
)"};
int main()
{
std::string type{};
std::string word{};
//ifstream source{"partitioned_data.txt"};
if(source) //check to make sure we have opened the file
{
std::string line{};
while(getline(source,line))//make sure we're not at the end of file
{
size_t pos = line.find(';');
if (pos != std::string::npos) {
type = line.substr(0,pos);
word = line.substr(pos+1);
}
std::cout << type << " --> " << word << '\n';
}
}
return 0;
}
There is no need for open and close statements. The constructor and
destructor of the std::ifstream will do that for us.
Do not check eof in while statement
Do not, and never ever use C-Style arrays like char type [MAX];
Read a line in the while statement and check validity of operation in the while. Then work on the read line later.
Search the ';' in the string, and if found, take out the substrings.
If I would knwo the format of the input file, then I will write an even better example for you.
Since I do not have files on SO, I uses a std::istringstream instead. But there is NO difference compared to a file. Simply delete the std::istringstream and uncomment teh ifstream definition in the source code.

Reading a text file and storing data into multiple arrays C++

I am trying to read a database file (as txt) where I want to skip empty lines and skip the column header line within the file and store each record as an array. I would like to take stop_id and find the stop_name appropriately. i.e.
If i say give me stop 17, the program will get "Jackson & Kolmar".
The file format is as follows:
17,17,"Jackson & Kolmar","Jackson & Kolmar, Eastbound, Southeast Corner",41.87685748,-87.73934698,0,,1
18,18,"Jackson & Kilbourn","Jackson & Kilbourn, Eastbound, Southeast Corner",41.87688572,-87.73761421,0,,1
19,19,"Jackson & Kostner","Jackson & Kostner, Eastbound, Southeast Corner",41.87691497,-87.73515882,0,,1
So far I am able to get the stop_id values but now I want to get the stop name values and am fairly new to c++ string manipulation
mycode.cpp
#include <iostream>
#include <fstream>
#include <string>
#include <sstream>
using namespace std;
int main()
{
string filename;
filename = "test.txt";
string data;
ifstream infile(filename.c_str());
while(!infile.eof())
{
getline(infile,line);
int comma = line.find(",");
data = line.substr(0,comma);
cout << "Line " << count << " "<< "is "<< data << endl;
count++;
}
infile.close();
string sent = "i,am,the,champion";
return 0;
}
You can use string::find 3 times to search for the third occurrence of the comma, and you must store the positions of the last 2 occurrences found in line, then use them as input data with string::substr and get the searched text:
std::string line ("17,17,\"Jackson & Kolmar\",\"Jackson & Kolmar, Eastbound, Southeast Corner\",41.87685748,-87.73934698,0,,1");
std::size_t found=0, foundBack;
int i;
for(i=0;i<3 && found!=std::string::npos;i++){
foundBack = found;
found=line.find(",",found+1);
}
std::cout << line.substr(foundBack+1,found-foundBack-1) << std::endl;
You can read the whole line of the file intoa string and then use stringstream to give you each piece one at a time up until and exluding the commas. Then you can fill up your arrays. I am assuming that you wanted each line in it's own array and that you wanted unlimited arrays. The best way to do that is to have an array of arrays.
std::string Line;
std::array<std::array<string>> Data;
while (std::getline(infile, Line))
{
std::stringstream ss;
ss << Line;
Data.push_back(std::vector<std::string>);
std::string Temp;
while (std::getline(ss, Temp, ','))
{
Data[Data.size() - 1].push_back(Temp);
}
}
This way you will have a vector, full of vectors, each of which conatining strings of all your data in that line. To access the strings as numbers, you can use std::stoi(std::string) which converts a string to an integer.

Splitting sentences and placing in vector

I was given a code from my professor that takes multiple lines of input. I am currently changing the code for our current assignment and I came across an issue. The code is meant to take strings of input and separate them into sentences from periods and put those strings into a vector.
vector<string> words;
string getInput() {
string s = ""; // string to return
bool cont = true; // loop control.. continue is true
while (cont){ // while continue
string l; // string to hold a line
cin >> l; // get line
char lastChar = l.at(l.size()-1);
if(lastChar=='.') {
l = l.substr(0, l.size()-1);
if(l.size()>0){
words.push_back(s);
s = "";
}
}
if (lastChar==';') { // use ';' to stop input
l = l.substr(0, l.size()-1);
if (l.size()>0)
s = s + " " + l;
cont = false; // set loop control to stop
}
else
s = s + " " + l; // add line to string to return
// add a blank space to prevent
// making a new word from last
// word in string and first word
// in line
}
return s;
}
int main()
{
cout << "Input something: ";
string s = getInput();
cout << "Your input: " << s << "\n" << endl;
for(int i=0; i<words.size(); i++){
cout << words[i] << "\n";
}
}
The code puts strings into a vector but takes the last word of the sentence and attaches it to the next string and I cannot seem to understand why.
This line
s = s + " " + l;
will always execute, except for the end of input, even if the last character is '.'. You are most likely missing an else between the two if-s.
You have:
string l; // string to hold a line
cin >> l; // get line
The last line does not read a line unless the entire line has non-white space characters. To read a line of text, use:
std::getline(std::cin, l);
It's hard telling whether that is tripping your code up since you haven't posted any sample input.
I would at least consider doing this job somewhat differently. Right now, you're reading a word at a time, then putting the words back together until you get to a period.
One possible alternative would be to use std::getline to read input until you get to a period, and put the whole string into the vector at once. Code to do the job this way could look something like this:
#include <iostream>
#include <string>
#include <algorithm>
#include <vector>
#include <iterator>
int main() {
std::vector<std::string> s;
std::string temp;
while (std::getline(std::cin, temp, '.'))
s.push_back(temp);
std::transform(s.begin(), s.end(),
std::ostream_iterator<std::string>(std::cout, ".\n"),
[](std::string const &s) { return s.substr(s.find_first_not_of(" \t\n")); });
}
This does behave differently in one circumstance--if you have a period somewhere other than at the end of a word, the original code will ignore that period (won't treat it as the end of a sentence) but this will. The obvious place this would make a difference would be if the input contained a number with a decimal point (e.g., 1.234), which this would break at the decimal point, so it would treat the 1 as the end of one sentence, and the 234 as the beginning of another. If, however, you don't need to deal with that type of input, this can simplify the code considerably.
If the sentences might contain decimal points, then I'd probably write the code more like this:
#include <iostream>
#include <string>
#include <algorithm>
#include <vector>
#include <iterator>
class sentence {
std::string data;
public:
friend std::istream &operator>>(std::istream &is, sentence &s) {
std::string temp, word;
while (is >> word) {
temp += word + ' ';
if (word.back() == '.')
break;
}
s.data = temp;
return is;
}
operator std::string() const { return data; }
};
int main() {
std::copy(std::istream_iterator<sentence>(std::cin),
std::istream_iterator<sentence>(),
std::ostream_iterator<std::string>(std::cout, "\n"));
}
Although somewhat longer and more complex, at least to me it still seems (considerably) simpler than the code in the question. I guess it's different in one way--it detects the end of the input by...detecting the end of the input, rather than depending on the input to contain a special delimiter to mark the end of the input. If you're running it interactively, you'll typically need to use a special key combination to signal the end of input (e.g., Ctrl+D on Linux/Unix, or F6 on Windows).
In any case, it's probably worth considering a fundamental difference between this code and the code in the question: this defines a sentence as a type, where the original code just leaves everything as strings, and manipulates strings. This defines an operator>> for a sentence, that reads a sentence from a stream as we want it read. This gives us a type we can manipulate as an object. Since it's like a string in other ways, we provide a conversion to string so once you're done reading one from a stream, you can just treat it as a string. Having done that, we can (for example) use a standard algorithm to read sentences from standard input, and write them to standard output, with a new-line after each to separate them.

How to split a string into two integers over several lines C++

I've been trying to retrieve saved data from a text file. The data stored are both numbers, separated by a ~. I've managed to get it to print out one of the lines (the top line) however I've been unable to figure out how to proceed through the entire file.
There are only two numbers (integers) on each line, an X and Y position of another vector. The idea is to assign each integer to the respective variable in the vectors. I've not managed to get that far since I can't get it to go past line 1. But I'd thought that by having an array size of 2, and the array temporarily stores the value, assigns it to the vector, then overwrites it with the next value(s) that could work. But again not managed to get that far.
Below is the code I've been trying to use;
........
string loadZombieData;
loadFile >> loadZombieData; //Data gets read from the file and placed in the string
vector<string> result; //Stores result of each split value as a string
stringstream data(loadZombieData);
string line;
while(getline(data,line,'~'))
{
result.push_back(line);
}
for(int i = 0; i < result.size(); i++){
cout << result[i] << " ";
}
.......
Just to clarify, this is not my code, this is some code I found on Stackoverflow, so I'm not entirely certain how it all works yet. As I said, I've been trying to get it to read multiple lines, then using the for loop was going to assign the results to the other vector variables as needed. Any help is appreciated :)
Use two while loops:
std::vector<std::string> result;
std::vector<int> numbers;
std::string filename;
std::ifstream ifile(filename.c_str());
if (!ifile.is_open()) {
std::cerr << "Input file not opened! Something went wrong!" << std::endl;
exit(0);
}
std::string temp;
//loop over the file using newlines as your delimiter
while (std::getline(ifile, temp, '\n')) {
//now temp has the information of each line.
//create a stringstream initialized with this information:
std::istringstream iss(temp);//this contains the information of ONE line
//now loop over the string stream object as you would have in your code sample:
while(getline(iss, temp,'~'))
{
//at this point temp is the value of a token, but it is a string
result.push_back(temp); //note: this only stores the TOKENS as strings
//so to store the token as a int or float, you need to convert it to that
//via another stringstream:
std::istringstream ss(temp);
//if your number type is float, change it here as well as in the vector
//initialization of `numbers`:
int num = 0;
//this checks the stream to ensure that conversion occurred.
//if it did, store the number, otherwise, handle the error (quit - but, this is up to you)
//if stringstreams aren't your cup of tea, try some others (refer to this link):
//http://stackoverflow.com/questions/21807658/check-if-the-input-is-a-number-or-string-c/21807705#21807705
if (!(ss >> num).fail()) {
numbers.push_back(num);
}
else {
std::cerr << "There was a problem converting the string to an integer!" << std::endl;
}
}
}
Note: this version stores the numbers verbatim: i.e. without a sense of how many numbers were on a line. However, that is reconcilable as all you have to do is output n numbers per line. In your case, you know every 2 numbers will be represent the numbers in a line.
This requires:
#include <string>
#include <vector>
#include <cstdlib>
#include <sstream>

Read a file backwards?

Is there a way to read a file backwards, line by line, without having to go through the file from the beginning to start reading backwards?
Use a memory-mapped file and walk backwards. The OS will page in the needed parts of the file in reverse order.
As per comment, a possible (quite simple) alternative would be read the lines into a vector. For example:
#include <iostream>
#include <fstream>
#include <string>
#include <vector>
int main()
{
std::ifstream in("main.cpp");
if (in.is_open())
{
std::vector<std::string> lines_in_reverse;
std::string line;
while (std::getline(in, line))
{
// Store the lines in reverse order.
lines_in_reverse.insert(lines_in_reverse.begin(), line);
}
}
}
EDIT:
As per jrok's and Loki Astari's comments, push_back() would be more efficient but the lines would be in file order, so reverse iteration (reverse_iterator) or std::reverse() would be necessary:
std::vector<std::string> lines_in_order;
std::string line;
while (std::getline(in, line))
{
lines_in_order.push_back(line);
}
Slightly improved version will be this:-
1)Seek to the last-1 position
2)Get the last-1 position
3)Read a char and print it;
4)seek 2 pos back;
5)repeat 3 &4 for last-1 times;
ifstream in;
in.open("file.txt");
char ch;
int pos;
in.seekg(-1,ios::end);
pos=in.tellg();
for(int i=0;i<pos;i++)
{
ch=in.get();
cout<<ch;
in.seekg(-2,ios::cur);
}
in.close();
Open the file for read, call fseek() to seek to the end of the file, then call ftell() to get the length of the file. Alternatively you can get the file length by calling stat() or fstat()
Allocate a buffer pointer to the file size obtained in #1, above.
Read the entire file into that buffer -- you can probably use fread() to read the file all in one shot (assuming the file is small enough).
Use another char pointer to transverse the file from end to beginning of the buffer.
The short answer would be no. However, you can use the seek() function to move your pointer to where you want to go. Then read() some data from that point. If you know well how to manage buffers, then it should be pretty quick because you can read and cache the data and then search for the previous newline character(s). Have fun with \r\n which will be inverted...
-- Update: some elaboration on the possible algorithm --
This is not valid code, but it should give you an idea of what I'm trying to say here
File reads:
int fpos = in.size() - BUFSIZ;
char buf[BUFSIZ];
in.seek(fpos);
in.read(buf, BUFSIZ);
fpos -= BUFSIZ; // repeat until fpos < 0, although think of size % BUFSIZ != 0
// now buf has characters... reset buffer position
int bpos = BUFSIZ - 1;
Getting string:
// first time you need to call the read
if(bpos == -1) do_a_read();
// getting string
std::string s;
while(bpos >= 0 && buf[bpos] != '\n') {
s.insert(0, 1, buf[bpos]);
--bpos;
}
// if bpos == -1 and buf[0] != '\n' then you need to read another BUFSIZ chars
// and repeat the previous loop...
// before leaving, skip all '\n'
while(bpos >= 0 && buf[bpos] == '\n') {
--bpos;
}
return s;
To ease with '\r', you could have a first pass that transforms all '\r' to '\n'. Otherwise, all the tests of '\n' need to also test for '\r'.
My answer is similar to ones that use a vector to store the lines of the file, but I would instead use a list.
Imagine you have the following text in a file called input.txt:
hello
there
friend
I would read the file line-by-line, pushing each line not to the back of my list but to its front. Using this rather than push_back has the same effect as reading the contents of the file line-by-line into a vector and then reversing it or iterating through it backwards.
#include <iostream>
#include <fstream>
#include <list>
#include <string>
#include <iterator>
#include <algorithm>
int main(void) {
std::ifstream file;
file.open("input.txt");
// Make sure the file opened properly
std::list<std::string> list;
std::string buffer;
while (std::getline(file, buffer)) {
list.push_front(buffer);
}
file.close();
std::copy(
list.begin(),
list.end(),
std::ostream_iterator<std::string>(std::cout, "\n")
);
return 0;
}
(Note that the bit at the bottom with std::copy is just to print the contents of the list with a newline character as a delimiter between elements.)
This then prints:
friend
there
hello
this might help.
#include <fstream>
#include <iostream>
using namespace std;
int main()
{
ifstream myFile("my.txt");
int count;
cout << "Enter the number of lines u want to print ";
cin >> count;
char c;
string str = "";
for (int i = 1; i <= 10000; i++)
{
myFile.seekg(-i, std::ios::end);
myFile.get(c);
str += c;
if (c == '\n')
{
reverse(str.begin(), str.end());
count--;
cout << str;
str.clear();
}
if (count == 0)
{
break;
}
}
cout << endl;
return 0;
}