Splitting File Input Into Numerous Vectors - c++

I have a CSV file that has about 10 different columns. Im trying to figure out whats the best method to go about here.
Data looks like this:
"20070906 1 0 0 NO"
Theres about 40,000 records like this to be analyzed. Im not sure whats best here, split each column into its own vector, or put each whole row into a vector.
Thanks!

I think this is kind of subjective question but imho I think that having a single vector that contains the split up rows will likely be easier to manage than separate vectors for each column. You could even create a row object that the vector stores to make accessing and processing the data in the rows/columns more friendly.
Although if you are only doing processing on a column level and not on a row or entry level having individual column vectors would be easier.

Since the data set is fairly small (assuming you are using a PC and not some other device, like a smartphone), you can read the file line by line into a vector of strings and then parse the elements one by one and populate a vector of some structures holding the records data.

Related

Write data in separate columns in .csv file

I'm having trouble with storing data in a csv file.
I'm hashing a text file that has ~15.500.000 lines and I have to make an Excel graph of frequency based on the hashes. The problem is that the maximum amount of rows that Excel supports is 1.048.576. So I was thinking that I could write the hashes in 15 columns, each one with 1.000.000 rows but I haven't figured out a way to solve this without comparing a bunch of numbers each time, because that would take up a lot of time since I'm dealing with this huge amount of numbers.
Do you have any ideas how to solve this? Maybe using a vector of vectors could work as well, I might try that right now. Thanks

How to write to a column in a CSV file

I currently pulling data from a CSV file. The CSV file has ~ 89 columns and 2000 rows worth of data. I am getting several specific columns of data such as all of col:1,2,21,22,66,67 using a variety of getlines and loops. I then store that data into vectors within the loops. Once I have read through the entire file I now have 6 vectors full of data that I want. I make some adjustments to that data and store it back into a vector. I now want to place that new data back into those columns I took them out of without actually picking up/out the other data that I don't want. What would be the best approach for this? As I don't want to make 89 vars to hold all that other data I would much rather write over those columns in particular or something similar.
As I don't want to make 89 vars to hold all that other data I would much rather write over those columns in particular or something similar.
Instead of using 6 vectors to store column data, you can use one vector of strings to hold the data from one row. Then you update the elements at 1,2,21,22,66,67 in that vector and write it to another file.
std::vector<std::string> row; // 89 elements after read and parse a row.
Processing 500,000 rows this way should be fast enough. If it is not, try a column-oriented database, e.g. OpenTSDB

Fixed Length Flat file Parsing

I have a flat file tables say, student.tbl and employee.tbl. Both files are fixed length files. I have a supporting files for both files with the information field code, field description, field Offset and field size.
for example,
ename string 0 10
eage number 10 2
ecity string 12 10
I wrote code to fetch data from the flat files using STL in c++. I am using vector to load those data.
My simple algorithm to load data from Fixed Length file.
1) Read Supporting file.
2) Load supporting file data into a 2D vector string say,
column_records;
3) Read Table file.
4) Get First Line from the Table File, say Data Line.
5) Get First Column Information from the supporting Table Which is
First Row of column_records.
6) Chop Data Line based on the column_record
7) Push the chopped data into a One Dimensional Vector say,
record_vector.
8) Do Step 5, Until the Last Column Information has processed.
9) Push record_vector into 2D vector say,Table_Vector.
10) Do Step 4, Until the last line of the Fixed File has reached.
Well. I did it well. It works fine. But my problem is, in Step 5.
For every fixed length data, I feel there was some repeat Iterations.
I know for a fact, First Fixed Length data itself can have retain the column descriptions for other fixed length data. But I repeatedly doing the Iteration N*M. I wish to my iteration should be 1*M.
I know that I can store my column description in a Structure array. But I have many type of tables. say students.tbl and employee.tbl. Both have different set of columns. So I think it will be bad Idea to have 'N'-struct declaration to load 'N'-supporting Tables.
I wish to use same routine to fetch data from the both tables or 'N' tables. My supporting table format will not be changed. It is in tab delimited format. This is my case.
How do I fetch data from table with 1*M iteration?
I hope I can use enumeration to do this. But I don't know how to do that? will using enumeration or macro solve this issue?
I hope my working source code will not be needed for this Question. If you think source code are needed to answer this question, definitely I will update this question with that source code. I have medium level of English Knowledge. So Sorry for Inconvenience.
Thank You.

How to parse, read, and store only one column of .CSV file into an array in C++

I have a .CSV file that's storing data from a laser. It records the height of the laser beam every second.
The .CSV file ends up having rows for each measurement that are all in this format:
DR,04,#
where the # is the height reading.
For example, if the beam is at a height of 10, the reading would say:
DR,04,10.
I want my program in C++ to read only the height (third column of the .CSV) from each row and put it into an array. I do not want the first two columns at all. That way I end up with an array with just a bunch of height values from each measurement.
How do I do that?
You can use strtok() to separate out the three columns. And then just get the last value.
You could also just take the string and scan for the first comma, and then scan from there for the second comma. What follows is the value you are after.
You could also use sscanf() to parse out the individual values.
This really isn't a difficult problem, and there are many ways to approach it. That is why people are complaining that you probably should've tried something and then ask a question here when you get stuck on a specific question.

Sort a text file after reading it?

I have a text file that holds values like this:
30 Text
21 Text
12 Text
1 Text
3 Text
I want to read this into a 2D array to keep the number and the text identifier together. Once I've done this I want to sort this into ascending order, as the text file will be unsorted.
What is the best way to go about this in C++, should I put it in an Array? My objective is to just get the top 3 highest values from the text file. Is there a data structure that would be better suited to this or a better way to go about it? I can structure the text file anyway, its not a concrete format if that should be changed.
TIA
If you only want the top three values, the most efficient way may be to define three variables (or a three-element array), read the file line-by-line, and if a newly read line belongs in the top three, put it there.
But if you want to use containers, I'd go with a std::vector and use std::sort, assuming that the file is small enough that all the data fits in memory.
I would prefer to put them into a std::map (if you have unique keys. If not use a std::multipmap instead.) So as you insert data into the map, they will always be sorted. And if you want to get the 3 highest values, just get the first 3 items of the map.