How to read data using GPU with an OpenACC Fortran code?

How to read data using GPU with an OpenACC Fortran code? - fortran

I am trying to upgrade some legacy f77 codes to analyse molecular dynamics trajectories. I have had some success with modern fortran with OpenACC directives. My aim now is to read the LAMMPS trajectory files which are usually big, about a few GBs, using GPU. First, I would like to know if this makes sense? If yes, then, how to do it efficiently?
The trajectory file usually contains system configuration for multiple time steps. The data structure has multiple blocks of following structure: a time stamp, a few lines about number of particles and box dimensions and then position, velocity etc. for all particles at that time. This block structure repeats for successive time steps.
I would really appreciate an suggestions regarding this. Thank you!!

Related

scanning plot through a large data file using python

I have a large (10-100GB) data file of 16-bit integer data, which represents a time series from a data acquisition device. I would like to write a piece of python code that scans through it, plotting a moving window of a few seconds of this data. Ideally, I would like this to be as continuous as possible.
The data is sampled at 4MHz, so to plot a few seconds of data involves plotting ~10 million data points on a graph. Unfortunately I cannot really downsample since the features I want to see are sparse in the file.
matplotlib is not really designed to do this. It is technically possible, and I have a semi-working matplotlib solution which allows me to plot any particular time window, but it's far too slow and cumbersome to do a continuous scan of incrementally changing data - redrawing the figure takes several seconds, which is far too long.
Can anyone suggest a python package or approach do doing this?

PyQtGraph is faster than Matplotlib but I don't know if it can plot 10 million points a second. It does also include multiple methods to down-sample your data so one of them might still be useful to you. Note that it requires Qt and PyQt.
Still, you have between 5e9 and 5e10 data samples. If you can simultaneously plot
10 million of them, this still means making between 500 and 5000 plots. Are you really going to inspect them all visually? You might consider to implement some kind of feature detection.

Something that has worked for me in a similar problem (time varying heat-maps) was to run a batch job of producing several thousands such plots over night, saving each as a separate image. At 10s a figure, you can produce 3600 in 10h. You can then simply scan through the images which could provide you with the insight you're looking for.

DirectCompute writing to buffer speed

Im working on a particle sim and have ran into a bit of a bottleneck, using UAV to write to a RWStructured single float buffer is around 10 times too slow. From experimentation it seems there is no shortage in bandwidth but just the access time itself boggles it down. Append writing is out of the question since the outgoing data needs to be in a specific order. This is on DX10/SM4 hardware so here are a few questions: Is there any way at all to speed things up (other than writing larger chunks of data since the output from the shaders is non consecutive)? If not then is DX11 grade hardware any quicker with UAVs?

First thing (if you haven't done already), to profile your shader code, is to add GPU queries to your system. Here is a link to explain it:
http://mynameismjp.wordpress.com/2011/10/13/profiling-in-dx11-with-queries/
It's in dx11 but features are in dx10 too, so it should be really simple to port over.
After in compute there's different aspects, but first one would be to play with:
[numthreads(TGX, 1, 1)]
Trying values like 8,16,32,64 and try to find the sweet spot (don't forget to divide on your dispatch).

Way to speed up .obj file loader for HalfEdge Mesh?

I am working on a 3D OpenGl (C++) application in which I have my own Mesh structure based on the Half-Edge data structure. I want to build a simple way to load Wavefront obj files into my mesh structure. Of course, I can do so naively line by line, but there has to be some more efficient way (I know professional applications aren't loading the file naively line by line, it would be too slow for millions of vertices).
Can anyone point me to a tutorial or an example of a really fast OBJ loader? It would be preferable if it had something to do with a Half Edge data structure.
Edit:
There are two basic issue I am looking to get around
1) Avoid the general slowness of reading floating point numbers from a file
2) How do i intelligently determine the "adjacent" halfedge for each edge on the fly. I am imagining some sort of hashing function to determine if the symmetric or next edge for the edge being created already exists and, if so, use that pointer.

I had a similar issue loading OBJ files a while ago, albeit I was searching for shared vertices as opposed to edges. Since the file format itself contains no connectivity information the best way is use a std::set. Each time you want to add an edge to your data structure you can search the set to see if it already exists. Set searching is logarithmic in complexity so it scales well with the size of your data structure. The only way to avoid this that I can think of is to choose a file format that contains the connectivity information you need, or as Michael Slade suggested create your own format and conversion tool.

Reading and decoding ascii files is slow, particularly if the files have a million floating point numbers to convert.
My idea: write a program in any language you desire, to translate the .obj files to a binary format your program can read more-or-less directly into memory. Then run that program on the .obj files you want to load and have your program load the translated files.
For extra points, you could have your opengl program do this translation on-the-fly and cache the results, checking file modification times and updating the cache as necessary.

Fastest deskew algorithm?

I am a little overwhelmed by my task at hand. We have a toolkit which we use for TWAIN scanning. Some of our customers are complaining about slower scan speeds when the deskew option is set. This is because if their scanner does not support a hardware deskew, it is done in post-processing on the CPU. I was wondering if anyone knows of a good (i.e. fast) algorithm to achieve this. It is hard for me to say what algorithm we are using now. What algorithms are out there for this, and how do they rank as far as speed/accuracy? If I knew the names of the algorithms, it could be easier for me to do a google search on them.
Thank You.
-Tom

Are you scanning in Color or B/W ?
Deskew is processor intensive. A Group4 tiff or JPEG must be decompressed, skew angle determined, deskewed and then compressed.
There are many image processing algorithms out there with deskew and I have evaluated many over the years. There are some huge differences in processing speed between the different libraries and a lot of it comes down to how well it is coded rather than the algorithm used. There is a huge difference in commercial libraries just reading and writing images.
The fastest commerical deskew I have used by far comes from Unisoft Imaging (www.unisoftimaging.com). I assume much of it is written in assembler. Unisoft has been around for many years and is very fast and efficient. It supports different many different deskew options including black border removal, color and B/W deskew. The Group4 routines are very solid and very fast. The library comes with many other image processing options as well as TWAIN and native SCSI scanner support. It also supports Unix.
If you want a free deskew then you might want to have a look at Leptonica. It does not come with too much documentation but is very stable and well written. http://www.leptonica.com/
Developing code from scratch could be quite time consuming and may be quite buggy and prone to errors.
The other option is to process the document in a separate process so that scanning can run at the speed of the scanner. At the moment you are probably processing everything in a parallel fashion, one task after another, hence the slowdown.

Consider doing it as post-processing, because deskew cannot be done at real-time (unless it's hardware accelerated).
Deskew consists of two steps: skew detection and rotation. Detecting the skew angle can usually be done on a B&W (1-bit) image faster. Rotation speed depends on the quality of the interpolation. A good quality deskew will take a lot of time to run, much more than scanning pages.
A good high speed scanner can do 120 double-sided pages per minute, if it has hardware JPEG or TIFF Group 4 compression, and your TWAIN library takes advantage of it (hint: do not use native mode). You barely have enough time to save the file to the hard drive at that speed, let alone decompress, skew detect, rotate, re-compress. Quality deskew takes several seconds per page, unless you can use the video card's hardware accelerator to rotate and compress.

Do I correctly understand you already have such algorithm implemented? If so, are you sure there is no space for optimization? I'd start with profiling existing solution.
Anyway, I guess you should look for fast digital Radon transform algorithm.
Take a look at http://pagetools.sourceforge.net. They have deskew algorithm implementation.

Help with FFT(Fast Fourier Transforms) and/or DSP

Im trying to do a screen-flashing application, that flashes the screen according to the music(which will be frequencies, such as healing frequencies, etc...).
I already made the player and know how will I make the screen flash, but I need to make the screen flash super fast according to the music, for example if the music speeds up, the screen-flash will flash faster. I understand that I would achieve this by FFT or DSP(as I only need to know when the frequency raises from some Hz, lets say 20 to change the color, making the screen-flash).
But I've found that I understand NOTHING, even less try to implement it to my application.
Can somebody help me out to learn whichever both of them? My email is sismetic_chaos#hotmail.com. I really need help, I've been stucked for like 3 days not coding or doing anything at all, trying to understand, but I dont.
PS:My application is written in C++ and Qt.
PS:Thanks for taking the time to read this and the willingness to help.
Edit: Thanks to all for the answers, the problem is in no way solved yet, but I appreciate all the answers, I didnt thought I would get so many answers and info. Thanks to you all.

This is a difficult problem, requiring more than an FFT. I'll briefly describe how I implemented beat detection when I was writing software for professional DJ equipment.
First of all, you'll need to cut down the amount of data you're dealing with, since there are only two or three beats per second, but tens of thousands of samples. You'll also need to look at different frequency ranges, since some types of music carry the tempo in the bassline, and others in percussion or other instruments. So pass the signal through several band-pass filters (I chose 8 filters, each covering one octave, from low bass to high treble), and then downsample each band by averaging the power over a few hundred samples.
Every few seconds, you'll have a thousand or so samples in each band. Your next tool is an autocorrelation, to identify repetitive patterns in the music. The peaks of the autocorrelation tell you what the beat is more or less likely to be; but you'll need to make up some heuristics to compare all the frequency bands to find a beat that you can be confident in, and to avoid misleading syncopations. If you can manage that, then you'll have a reasonable guess at the tempo, but no idea of the phase (i.e. exactly when to flash the screen).
Now you can look at the a smoothed version of the audio data for peaks, some of which are likely to correspond to beats. Initially, look for the strongest peak over the course of a few seconds and take that as a downbeat. In conjunction with the tempo you estimated in the first stage, you can predict when the next beat is due, and measure where you actually saw something like a beat, and adjust your estimate to more closely match the data. You can also maintain a confidence level based on how well the predicted beats match the measured peaks; if that drops too low, then restart the beat detection from scratch.
There are a lot of fiddly details to this, and it took me some weeks to get it working nicely. It is a difficult problem.
Or for a simple visualisation effect, you could simply detect peaks and flash the screen for each one; it will probably look good enough.

The output of a FFT will give you the frequency spectrum of an audio sample, but extracting the tempo from the FFT output is probably not the way you want to go.
One thing you can do is to use peak detection to identify the volume "spikes" that typically occur on the "down-beats" of the music. If you can identify the down-beats, then you can use a resource like bpmdatabase.com to find the tempo of the song. The tempo will tell you how fast to flash and the peaks you detected will tell you when to start flashing. Have your app monitor your flashes to make sure that they generally occur at the same time as a peak (if the two start to diverge, then the tempo may have changed mid-song).
That may sound straightforward, but this is actually a very non-trivial thing to implement. You might want to read this SO question for more information. There are some quality links in the answers there.
If I'm completely mis-interpreting what you are trying to do and you need to do FFTs for something different, then you might want to look at using one of the existing FFT libraries to do the heavy lifting for you. Some examples are FFTW and KissFFT.

It sounds like maybe you're trying to get your visualizer to flash the screen in time with the
music somehow. I don't think calculating the FFT is going to help you here. At any
given instant, there will be many simultaneous frequency components, all over the audio spectrum (roughly 20 Hz to 20 kHz). But you're likely to be a lot more interested in the
musical tempo (beats per minute -- more like 5 Hz or below), and that's not going to show
up anywhere in an FFT of the raw audio signal.
You probably need something much simpler -- some sort of real-time peak detection.
Whenever you see a peak greater than some threshold above the average volume,
make your screen flash.
Of course, more complicated visualizations might well take advantage of the FFT,
but not the one you're describing.

My recommendation would be to find a library that does this for you. Unless you have a lot of mathematics to back you up, I think you will be wasting a ton of your time trying to learn FFTs when all you really want out is some sort of 'base hits per minute' number out which you can adjust your graphics to accordingly.
Check out this similar post:
here
It took me about three weeks to understand the mathematics behind FFTs and then another week to write something in Matlab using those concepts. If you are discouraged after three days, don't try and roll your own.
I hope this is helpful advice and not discouraging.
-Brian J. Stinar-

As previous answers have noted, an FFT is probably not the tool you need in order to solve your problem, which requires tempo detection rather than spectral analysis.
For an example of what can be done using FFT - and of how a particular FFT implementation was integrated into a Qt application, take a look at this blog post which describes the spectrum analyzer demo I developed. Code for the demo is shipped with Qt itself, in the demos/spectrum directory.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js