How to parse, read, and store only one column of .CSV file into an array in C++ - c++

I have a .CSV file that's storing data from a laser. It records the height of the laser beam every second.
The .CSV file ends up having rows for each measurement that are all in this format:
DR,04,#
where the # is the height reading.
For example, if the beam is at a height of 10, the reading would say:
DR,04,10.
I want my program in C++ to read only the height (third column of the .CSV) from each row and put it into an array. I do not want the first two columns at all. That way I end up with an array with just a bunch of height values from each measurement.
How do I do that?

You can use strtok() to separate out the three columns. And then just get the last value.
You could also just take the string and scan for the first comma, and then scan from there for the second comma. What follows is the value you are after.
You could also use sscanf() to parse out the individual values.
This really isn't a difficult problem, and there are many ways to approach it. That is why people are complaining that you probably should've tried something and then ask a question here when you get stuck on a specific question.

Related

spotfire plot list of elements

I have a data table that has this format :
and I want to plot temperature to time, any idea how to do that ?
This can be done in a TERR data function. I don't know how comfortable you are integrating Spotfire with TERR, there is an intro video here for instance (demo starts from about minute 7):
https://www.youtube.com/watch?v=ZtVltmmKWQs
With that in mind, I wrote the script without loading any library, so it is quite verbose and explicit, but hopefully simpler to follow step by step. I am sure there is a more elegant way, and there are better ways of making it flexible with column names, but this is a start.
Your input will be a data table (dt, the original data) and the output a new data table (dt.out, the transformed data). All column names (and some values) are addressed explicitly in the script (so if you change them it won't work).
#remove the []
dt$Values=gsub('\\[|\\]','',dt$Values)
#separate into two different data frames, one for time and one for temperature
dt.time=dt[dt$Description=='time',]
dt.temperature=dt[dt$Description=='temperature',]
#split the columns we want to separate into a list of vectors
dt2.time=strsplit(as.character(dt.time$Values),',')
dt2.temperature=strsplit(as.character(dt.temperature$Values),',')
#rearrange times
names(dt2.time)=dt.time$object
dt2.time=stack(dt2.time) #stack vectors
dt2.time$id=c(1:nrow(dt2.time)) #assign running id for merging later
colnames(dt2.time)[colnames(dt2.time)=='values']='time'
#rearrange temperatures
names(dt2.temperature)=dt.temperature$object
dt2.temperature=stack(dt2.temperature) #stack vectors
dt2.temperature$id=c(1:nrow(dt2.temperature)) #assign running id for merging later
colnames(dt2.temperature)[colnames(dt2.temperature)=='values']='temperature'
#merge time and temperature
dt.out=merge(dt2.time,dt2.temperature,by=c('id','ind'))
colnames(dt.out)[colnames(dt.out)=='ind']='object'
dt.out$time=as.numeric(dt.out$time)
dt.out$temperature=as.numeric(dt.out$temperature)
Gaia
because all of the example rows you've shown here contain exactly four list items and you haven't specified otherwise, I'll assume that all of the data fits this format.
with this assumption, it becomes pretty trivial, albeit a little messy, to split the values out into columns using the RXReplace() expression function.
you can create four calculated columns, each with an expression like:
Int(RXReplace([values],"\\[([\\d\\-]+),([\\d\\-]+),([\\d\\-]+),([\\d\\-]+)]","\\1",""))
the third argument "\\1" determines which number in the list to extract. backslashes are doubled ("escaped") per the requirements of the RXReplace() function.
note that this example assumes the numbers are all whole numbers. if you have decimals, you'd need to adjust each "phrase" of the regular expression to ([\\d\\-\\.]+), and you'd need to wrap the expression in Real() rather than Int() (if you leave this part out, the result will be a String type which could cause confusion later on when working with the data).
once you have the four columns, you'll be able to unpivot to get the data easily.

C++ Request a specific row from a file

Is there a way, where I can open a file that contains a large amount of data, and retrieve only one specific row or index, without getting the rest of the content as well?
Update:
Based on what others have mentioned here in the comments, I have some follow-up questions.
Can anyone give me an example of how to put a fixed width on the rows/linebreaks(whatever you want to call it), or show me a good source where I can read more about it?
So if I set this up correctly, I will be able to get a specific line from the file superfast, even if it contains several million rows?
If you want to access a file by records or rows, and the rows are not fixed length, you'll have to create a structure that you can associate (or map) file positions to row indices.
I recommend using std::vector<std::streampos>.
Read through the file.
When the file is at the beginning of a row, read the file position and append to the vector.
If you need to access a row in the file:
1) Use the vector to get the file position of the row.
2) Seek to the row using the file position.
This technique will work with fixed length and variable length rows.

Fixed Length Flat file Parsing

I have a flat file tables say, student.tbl and employee.tbl. Both files are fixed length files. I have a supporting files for both files with the information field code, field description, field Offset and field size.
for example,
ename string 0 10
eage number 10 2
ecity string 12 10
I wrote code to fetch data from the flat files using STL in c++. I am using vector to load those data.
My simple algorithm to load data from Fixed Length file.
1) Read Supporting file.
2) Load supporting file data into a 2D vector string say,
column_records;
3) Read Table file.
4) Get First Line from the Table File, say Data Line.
5) Get First Column Information from the supporting Table Which is
First Row of column_records.
6) Chop Data Line based on the column_record
7) Push the chopped data into a One Dimensional Vector say,
record_vector.
8) Do Step 5, Until the Last Column Information has processed.
9) Push record_vector into 2D vector say,Table_Vector.
10) Do Step 4, Until the last line of the Fixed File has reached.
Well. I did it well. It works fine. But my problem is, in Step 5.
For every fixed length data, I feel there was some repeat Iterations.
I know for a fact, First Fixed Length data itself can have retain the column descriptions for other fixed length data. But I repeatedly doing the Iteration N*M. I wish to my iteration should be 1*M.
I know that I can store my column description in a Structure array. But I have many type of tables. say students.tbl and employee.tbl. Both have different set of columns. So I think it will be bad Idea to have 'N'-struct declaration to load 'N'-supporting Tables.
I wish to use same routine to fetch data from the both tables or 'N' tables. My supporting table format will not be changed. It is in tab delimited format. This is my case.
How do I fetch data from table with 1*M iteration?
I hope I can use enumeration to do this. But I don't know how to do that? will using enumeration or macro solve this issue?
I hope my working source code will not be needed for this Question. If you think source code are needed to answer this question, definitely I will update this question with that source code. I have medium level of English Knowledge. So Sorry for Inconvenience.
Thank You.

Splitting File Input Into Numerous Vectors

I have a CSV file that has about 10 different columns. Im trying to figure out whats the best method to go about here.
Data looks like this:
"20070906 1 0 0 NO"
Theres about 40,000 records like this to be analyzed. Im not sure whats best here, split each column into its own vector, or put each whole row into a vector.
Thanks!
I think this is kind of subjective question but imho I think that having a single vector that contains the split up rows will likely be easier to manage than separate vectors for each column. You could even create a row object that the vector stores to make accessing and processing the data in the rows/columns more friendly.
Although if you are only doing processing on a column level and not on a row or entry level having individual column vectors would be easier.
Since the data set is fairly small (assuming you are using a PC and not some other device, like a smartphone), you can read the file line by line into a vector of strings and then parse the elements one by one and populate a vector of some structures holding the records data.

C++ Read only random lines in a file

I had requirement to read text file but its too large then I decide to only read some lines in this file. Can I use seek method for jump given line? Then I can only read that line because that text file is too large reading whole file is wasting lot of time. If its not possible, any one give better solution for that? (seek to given line and read it) (I know binary text files are reading byte by byte)
ex of my file
event1 0
subevent 1
subevent 2
event2 3
(In my file after one event its display number of lines I want to seek for previous event)
Yes, you can seek to a point in the file then read from there. One possible problem is that if the lines are all different lengths, a random location in the file will have a higher probability of being in a longer line: you're not getting evenly distributed probabilities of different lines. If you really really must have identical probabilities then you need to make at least one pass over the file to find the start of each line - then you can store those offsets in a vector and randomly select a vector element to guide seeking to the line data in the file. If you only care a little bit, then you can perhaps advance a small but random number of lines past the one you initially seek to... that will even the odds a bit, avoids the initial pass, but isn't perfect. hansmaad's comment adds a neat approach too - perfect results with pretty-good performance - but requires that you have all the lines numbered in the file itself.
Unless each line has exactly the same length, you're going to have to scan through it.
If you want to jump around in it, you can scan through it, saving the offset of each line in a container of your choice, and then use that to seek to a specific line.
Assuming that the lines are variable / random length, I don't believe there is any built-in way to jump directly to the start of a particular line. You can seek to an arbitrary byte position in the file. However, this might land anywhere in the beginning / middle / end of a line.
My best suggestion would be to attack the problem in two steps:
First, make a complete pass through the file, byte by byte, searching for the start of each line. Record the byte position of each line and store it into an array, vector, etc. (Basically, you are creating an index that maps from line number to starting position.) Then, when you have this index built up, you can easily jump to a particular line, by looking up the position in your index.
As far as I know, there is no built-in way to seek to a new line without already knowing where the lines are. I can't tell you the best way to achieve your goal, because most of your question details how you're trying to accomplish it, not what it is you're actually trying to accomplish. Therefore, I might go one of two ways with this:
1) If you actually need every last bit of data from the file (there is no metadata or other information that can be discarded):
Someone mentioned scanning through the file, tracking the lines as you go and building an index with it so you can read in one line at a time. This might work, and it would be the way to go if you actually need each line in its entirety, or if you only need the line number and plan on reading in small pieces at a time from there. However, without knowing details about your constraints or requirements, I would not recommend reading in entire lines using this method for one main reason: I have no way of knowing that one line will not itself be too large to load (what if there is only one line in the file?).
Instead, I would simply allocate a buffer of a size that is an appropriate amount to process at a time, and process the file in chunks of that size until you reach the end. You can stream more data in as you go. Without additional details, I can't tell you what that magic number should be, but the size of the largest chunk of information you might need to process is a good starting point as a minimum.
2) If you don't need every last bit of data from the file (you can discard some of the information in it), then you only need some of it. If you only need select pieces of data, then they are easier to find if they are tagged (which is what XML is for). There are lots of free XML parsers, or you can write your own. Then you'd search for tags instead of arbitrary line numbers, and changes to the file that result in the data being in a different location won't affect your ability to find it if it's tagged, as it would if you're just going by line numbers.