I have to do realtime plotting of scan values of sensor. I am using gnuplot for this purpose. Till now, I am able to communicate to gnuplot from my c++ program. I tried some sample plots using a .DAT file and it is working. Now, My requirement is to plot last 5 values of sensor scan values in a single plot for comparing (that means I need to store 10 arrrays of data. 1 scan have two arrays X and Y).
What I am trying to do is to store the last 5 scan values in a column format in a .DAT file like this where x, y are are my two arrays for each scan.Then using the gnuplot command "plot 'filename.dat' 1:2" "plot 'filename.dat' 2:3" etc... Then I have to rewrite the file after every 5 scans.
X1 Y1 X2 Y2 X3 Y3 X4 Y4 X5 Y5
2.3 3.4 6.6 3.6 5.5 6.5 8.5 5.5 4.5 6.6
4.3 4.5 6.2 7.7 4.3 9.2 1.4 6.9 2.4 7.8
I want to just confirm before proceeding wheather this is efficient for real time processing. Also Is there any command in gnuplot to directly plot from two arrays without the use of .dat files. I did not find one in my search.
Any suggestions would be helpful.
Presumably, you are communicating with gnuplot via pipes. Since gnuplot is a separate process, it does not have access to your programs memory space and therefore it cannot plot your data without you sending it somehow. The most straight forward way is how you mentioned (create a temporary file, send a command to gnuplot to read/plot the temporary file). Another straight forward way is to use gnuplot's inline data...It works like:
plot '-' using ... with ...
x1 y1
x2 y2
x3 y3
...
e
In this case, the datafile is written directly to the gnuplot pipe with no need for a temporary file. (for more questions, about the pseudo-file '-' see help datafile special-filenames in the gnuplot documentation).
As far as this approach being useful in realtime -- as long as the gnuplot rendering speed is fast compared to the time between re-rendering, it should work fine. (I guess there are some memory issues too if your arrays are HUGE, but I doubt that would limit any real application with only 10 1-D arrays -- and if the arrays are that big, you probably shouldn't be sending the whole thing to gnuplot anyway)
Take a look at this: https://github.com/dkogan/feedgnuplot
It's a general-purpose tool to plot standard input. It is able, among other things, to make realtime plots of data, as it comes in. If you have data in a format not directly supported, preprocess your stream with something like awk or perl.
Related
I am trying to convert a CSV text file with three columns and 572 rows to a gridded binary file (.bin) using gfortran.
I have two Fortran programs that I have written to achieve this.
The issue is that my binary file size is ending up way too large (9.6GB) by the end, which is not correct.
I have a sneaking suspicion that my nx and ny values in ascii2grd.90 are not correct and that is leading to the bad .bin file being created. With such a small list (only 572 rows), I am expecting the final .bin to be more in KBs, not GBs.
temp.90
!PROGRAM TO CONVERT ASCII TO GRD
program gridded
real lon(572),lat(572),temp(572)
open(2,file='/home/weather/data/file')
open(3,file='/home/weather/out.dat')
do 20 i=1,572
read(2,*)lat(i),lon(i),temp(i)
write(3,*)temp(i)
20 continue
stop
end
ascii2grd.f90
!PROGRAM TO CONVERT ASCII TO GRD
program ascii2grd
parameter(nx=26,ny=22,np=1)
real u(nx,ny,np),temp1(nx,ny)
integer :: reclen
inquire(iolength=reclen)a
open(12,file='/home/weather/test.bin',&
form='unformatted',access='direct',recl=nx*ny*reclen)
open(11,file='/home/weather/out.dat')
do k=1,np
read(11,*)((u(j,i,k),j=1,nx),i=1,ny)
10 continue
enddo
rec=1
do kk=1,np
write(12,rec=irec)((u(j,i,kk),j=1,nx),i=1,ny)
write(*,*)'Processing...'
irec=irec+1
enddo
write(*,*)'Finished'
stop
end
Sample from out.dat
6.90000010
15.1999998
21.2999992
999.000000
6.50000000
10.1000004
999.000000
18.0000000
999.000000
20.1000004
15.6000004
8.30000019
9.89999962
999.000000
Sample from file
-69.93500 43.90028 6.9
-69.79722 44.32056 15.2
-69.71076 43.96401 21.3
-69.68333 44.53333 999.00000
-69.55380 45.46462 6.5
-69.53333 46.61667 10.1
-69.1 44.06667 999.00000
-68.81861 44.79722 18.0
-68.69194 45.64778 999.00000
-68.36667 44.45 20.1
-68.30722 47.28500 15.6
-68.05 46.68333 8.3
-68.01333 46.86722 9.9
-67.79194 46.12306 999.00000
I would suggest a general strategy like the following:
Read the CSV with python/pandas (it could be many other things, although using python will be nice for step 2, as you'll see). But the important thing is that many other languages are more convenient than fortran for reading a CSV, and that will allow you to check that that step 1 is working before moving on.
Output to binary with numpy's tofile(). Also note that numpy will default to 'c' order for arrays so you may need to specify 'f' (fortran) order.
I have a utility at github called dataset2binary that automates this and may be of interest to you, or you could refer to the code at this answer. That is probably overkill though, because you seem to just be reading one big array of the same datatype. Nevertheless, the code you'd want will be similar, just simpler.
I use Fortran 95, and now I'm facing a problem as follows:
I have 8 datafiles with 4 columns each one, they are generated by other program (each file contains the solutions of differential equations for different sets of initial conditions).
The 4th column is my x variable and the 2nd column is my f(x).
So, all I want is to create a new file with 9 columns (with the x in the first and the f(x) of each file in the others columns).
However, each file has different values for x (and its respective f), like 1.10, 1.30 and 1.40 in one and 1.15, 1.25 and 1.42 in other.
So, it's OK for me to take a "band" in x, like [1.00;1.20] and write in my new file this average value as x, and then run the f(x) in this band under it.
But I couldn't managed how to do it.
I would try plotting the files with a smooth csplines option into a temporary file:
set format x "%10.3f"
set format y "%10.3f"
set xrange [...]
set samples ...
set table "temp1.dat"
plot 'file1.dat' using 4:2 smooth csplines
unset table
This works if you can live with the spline interpolation. There is no way to print linearly interpolated points in csv format. You might want to learn a bit of Fortran (ask whether you will need it for your further research) to do the linear interpolation. Or any other programming language.
To plot all files with one command check for example the answers on
Loop structure inside gnuplot?
Then, on linux, you can combine the generated data using colrm and paste.
cat temp1.dat | colrm 11 > x
cat temp1.dat | colrm 1 11 | colrm 12 > y1
cat temp2.dat | colrm 1 11 | colrm 12 > y2
...
paste x y1 y2 ... > combined.dat
Adjust the constants as needed.
Again, learning a programming language might also help.
I'm trying to do binary LSTM classification using theano.
I have gone through the example code however I want to build my own.
I have a small set of "Hello" & "Goodbye" recordings that I am using. I preprocess these by extracting the MFCC features for them and saving these features in a text file. I have 20 speech files(10 each) and I am generating a text file for each word, so 20 text files that contains the MFCC features. Each file is a 13x56 matrix.
My problem now is: How do I use this text file to train the LSTM?
I am relatively new to this. I have gone through some literature on it as well but not found really good understanding of the concept.
Any simpler way using LSTM's would also be welcome.
There are many existing implementation for example Tensorflow Implementation, Kaldi-focused implementation with all the scripts, it is better to check them first.
Theano is too low-level, you might try with keras instead, as described in tutorial. You can run tutorial "as is" to understand how things goes.
Then, you need to prepare a dataset. You need to turn your data into sequences of data frames and for every data frame in sequence you need to assign an output label.
Keras supports two types of RNNs - layers returning sequences and layers returning simple values. You can experiment with both, in code you just use return_sequences=True or return_sequences=False
To train with sequences you can assign dummy label for all frames except the last one where you can assign the label of the word you want to recognize. You need to place input and output labels to arrays. So it will be:
X = [[word1frame1, word1frame2, ..., word1framen],[word2frame1, word2frame2,...word2framen]]
Y = [[0,0,...,1], [0,0,....,2]]
In X every element is a vector of 13 floats. In Y every element is just a number - 0 for intermediate frames and word ID for final frame.
To train with just labels you need to place input and output labels to arrays and output array is simpler. So the data will be:
X = [[word1frame1, word1frame2, ..., word1framen],[word2frame1, word2frame2,...word2framen]]
Y = [[0,0,1], [0,1,0]]
Note that output is vectorized (np_utils.to_categorical) to turn it to vectors instead of just numbers.
Then you create network architecture. You can have 13 floats for input, a vector for output. In the middle you might have one fully connected layer followed by one lstm layer. Do not use too big layers, start with small ones.
Then you feed this dataset into model.fit and it trains you the model. You can estimate model quality on heldout set after training.
You will have a problem with convergence since you have just 20 examples. You need way more examples, preferably thousands to train LSTM, you will only be able to use very small models.
I've seen some similar questions out of which I have made a system which works for me but I need to optimize it because this program alone is taking up a lot of CPU load.
Here is the problem exactly.
I have an incoming signal/stream of data which I need to plot in real time. I only want a limited number of points to be displayed at a time (Say 1024 points) so I plot the data points along the y axis against an index from 0-1024 on the x-axis. The values of the incoming data range from 0-1023.
What I do currently (This is all in C++) is I put the data into a circular loop as it comes and each time the data gets updated (Or every second/third data point), I write out to a file and using a pipe, I plot the data from that file with gnuplot.
While this works almost perfectly, it causes a fair bit of load (Depending on the input data rate, I saw even 70% usage on both my cores of my Core 2 Duo). I'll need to be running some processor intensive code along with this short program so I feel that it is almost necessary to optimize it.
What I was hoping could be done is this: Can I only plot the differences between the current plot and the new data (Or plot each point as it comes in without replotting the whole graph such that the old item at that x index is removed).
I have a fixed number of points on the graph so replot wouldn't work. I want the old point at that x location to be removed.
Unfortunately, what you're trying to accomplish can't be done. You can mark a datafile as volatile or use the refresh keyword, but those only update the plot without re-reading the data. You want to re-read the data and then only update the differences.
There are a few things that might be helpful though. 1) your eye can only register ~26 frames per second. So, if you have a way to make sure that you only send data 26x per second to gnuplot, that might help. 2) How are you writing the datafiles? Are you dumping as ascii or binary? Doing a binary dump might be faster (both for writing and for gnuplot to read). You'll have to experiment.
There is one hack which will probably not make your script go faster, but you can try it (if you know a reasonable yrange to set, and are using points to plot the data)...
#set up code:
set style line 1 lc rgb "blue"
set xrange [0:1023]
set yrange [0:1]
plot NaN notitle #Only need to do this once.
for [i=0:1023] set label i+1 at i,0 point ls 1 #Labels must have tags > 0 :-(
#this part gets repeated by your C code.
#you could move a few points at a time to make it more responsive.
set label 401 at 400,0.8 #move point number 400 to a different y value
refresh #show it at it's new location.
You can use gnuplot to do dynamic plotting of data as explained in their FAQ, using the reread function. It seems to run at quite a low load and automatically scrolls the graph when it reaches the end. To run at low load I found I had to add a ; sleep 1 after the awk command (in their example file dyn-ping-loop.gp) otherwise it spends too much CPU on looping on the awk processing.
i store (non equidistant) time series as tables in hdf5 files using the H5TB API. The format is like this:
time channel1 channel2
0.0 x x
1.0 x x
2.0 x x
There are also insertions of "detail data" like this:
time channel1 channel2
0.0 x x
1.0 x x
1.2 x x
1.4 x x
1.6 x x
1.8 x x
2.0 x x
Now I want to store the data in another data format and therefore I like to "query" the hdf5 file like this:
select ch1 where time > 1.6 && time < 3.0
I thought of several ways to do this query:
There is a built in feature called B-Tree Index. Is it possible to use this for indexing the data?
I need to do a binary search on the time channel and then read the channel values
I create an index myself (and update it whenever there is a detail insertion). What would be the best algorithm to use here?
The main motivation for an index would be to have fast query responses.
What would you suggest here?
I found another (obvious) solution finally by myself. The easiest way is to open the hdf5 file only read the time channel and create an in memory map before reading the data channels. This process could even be optimized by reading the time channel with a sparse hyperslab.
When the indexes at a particular time are known then the data could be read.
Assuming you're not asking about how to parse the data out of a hdf5 file, merely about how to use the data once parsed....
Given class channel_data { ... };, a std::map<double, channel_data> should suit your needs, specifically std::map<>::lower_bound() and std::map<>::upper_bound().
A popular approach to solving this problem appears to be using bitmap indexing. There are also papers written on doing this, but they do not appear to have published any code.