Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
Where can I find some GPS unit test data to validate my code?
For example:
Distance between two coordinates (miles / kilometers)
Heading/bearing from Point A to Point B
Speed from Ponit A to Point B given a duration
Right now I'm using Google Earth to fumble around with this, but it would be nice to know I'm validating my calculations against something, well, valid.
"GPS unit test data" is quite vague. You could easily have a pile of data, but if you don't know what they represent, what value are the tests?
If you're looking for a math sample of latitude/longitude calculations, check out the example on Wikipedia's Great Circle distances article: http://en.wikipedia.org/wiki/Great-circle_distance#Worked_example It has two points and works the math to compute the distance between them.
Or are you looking for the data that comes directly from a GPS unit? These are called NMEA sentences. An NMEA sentence begins with $GP and the next 3 characters are the sentence code, followed by the sentence data. http://aprs.gids.nl/nmea/ has a list.
You could certainly Google for "sample nmea data". The magnalox site appears to have some downloadable sample files, but I didn't check them to see if they'd be useful to you.
A better option would probably be to record your own data. Connect your laptop to your GPS unit, set it up to capture the serial data being emitted from the GPS, set the GPS to record your track, and take it for a short test drive. You can then compare how you processed the captured data based on what you know from the stored track (and from your little drive.) You could even have a web cam record the screen of the GPS to record heading/bearing information that doesn't arrive in the sentences.
Use caution if screen scraping NMEA sentences from a web site. All valid NMEA sentences begin with a "$GP"
RandomProfile offers randomly generated valid NMEA sentences.
Related
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 4 years ago.
Improve this question
This is my first post on here, so please excuse any mistakes.
I have a column of cells. Each cell contains a variable number of lines withing the cell. Most lines contain a date. The format of the date varies slightly. Sometimes it is in the format MM/DD/YYYY, sometimes it will be MM/DD/YY, etc. My goal is to extract the date associated with a specific word in each line. Also, each cell is on a row with an identifying number. Therefore, I need the output to be along the same row.
Example:
I have tried every extract date formula I can find and I have run into three problems:
how to pull multiple dates from the cell,
how to compensate for the fact that some rows have dates that are formatted differently, and
how to pull dates only associated with certain words on the same line as the date.
It appears that my best option would be to use Regular Expressions. However, I have just started playing around with VBA and every function I have found that seems related to my issue I have been unable to adapt to my specific problem. I was using this post as a guide to build my function initially, but I cannot get it to work: Extracting Multiple Dates from a single cell
Originally, I tried breaking the lines up by doing text to column and this formula:
=IF(SEARCH("Red",D2),DATE(MID(D2,SEARCH("??/??/20??",D2)+6,4),MID(D2,SEARCH("??/??/20??",D2),2),MID(D2,SEARCH("??/??/20??",D2)+3,2)), "No Red Date")
However, text to column was not working because of irregular spacing issues. And Blue 1 and Blue 2 is just there to compensate for if there are multiple Blue dates in the cell, which there often are
NOT AN ANSWER : It doesn't really need code, you can use MID FIND SUBSTITUTE quickly playing, I used the following
=IF(FIND(C$1,$B2,1)-11<11,MID($B2,1,10),MID($B2,FIND(C$1,$B2,1)-11,10))
Which gives this,
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I have to create an Application to read some live data feed from more than 200 tables simultaneously and process this data. I want to discuss what could be the best approach to solve this problem with optimum speed as for each table we are getting 20+ records in every minute. So far I can think of following solutions :-
1) I can make multiple thread handling some 20 odd symbols independently.
2) I can make two thread one for data read and other for data processing but reader thread will take more time as it has to read all tables sequentially.
my database is MySQL and I am not looking to shift to nosql DB right now.I am using C++ to solve this problem.I feel that if instead of 200+ tables I can get live data feed in a single table then my second approach will become much appropriate and faster.
Is the use of MySQL required if not you might get a speed increase from any nosql "database". Furthermore retrieving data from a database is always a bottleneck, generally when it comes to that much data volume you want to load as much as you can into RAM and read it from there, as it is much faster.
You could make a query that would only retrieve the newest data from a certain timestamp(which is the same timestamp of the execution of your last query) then load that into memory do all the operations that require speed there, and clean up old entries that are not required anymore.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I'm working on a research project and am assigned to do a bit of data scraping and writing code in R that can help extract current temperature for a particular zip code from a site such as wunderground.com. Now this may be a bit of an abstract question but does anyone know how to do the following:
I can extract the current temperature of a particular zip code by doing this:
temps <- readLines("http://www.wunderground.com/q/zmw:20904.1.99999")
edit(temps)
temps //gives me the source code for the website where I can look at the line that contains the temperature
ldata <- temps[lnumber]
ldata
# then have a few gsub functions that basically extracts
# just the numerical data (57.8 for example) from that line of code
I have a cvs file that contains zip code of every city in the country and I have that imported in R. It is arranged in a table according to zip, city and state. My challenge now is to write a method (using java analogy here because I'm new to R) that basically extracts 6-7 consecutive zip codes (after a particular one specified) and runs the above code by modifying the link within the readLines function and putting in the respective zip code after the link segment zmw:XXXXX and running everything after that based on that link. Now I don't quite know how to extract the data from the table. Maybe with a for-loop function? But then I don't know how to use that to modify the link. I think that's where I'm really getting stuck on. I have a bit of Java background so I understand HOW to approach this problem, just not the knowledge of the syntax. I understand this is quite an abstract question as I didn't provide a lot of code but I just want to know they functions/syntax that will help me extract the data from the table and somehow use that to modify the link through a function rather than manually doing it.
So this is about the Weather Underground data.
You can download csv files from individual weather stations in wunderground, however you need to know the weather station identifier. Here is an example URL for a weather station in Kirkland, WA (KWAKIRKL8):
http://www.wunderground.com/weatherstation/WXDailyHistory.asp?ID=KWAKIRKL8&day=31&month=1&year=2014&graphspan=day&format=1
Here is some R code:
url <- 'http://www.wunderground.com/weatherstation/WXDailyHistory.asp?ID=KWAKIRKL8&day=31&month=1&year=2014&graphspan=day&format=1'
s <- getURL(url)
s <- gsub("<br>\n","",s)
wdf <- read.csv(con<-textConnection(s))
And here is a page with which you can manually find stations and their codes.
http://www.wunderground.com/wundermap/
Since you only need a few you can pick them out manually.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
I made a structure in C and read all data in dat structure using fread function,Actually i confused about that,what is actual "audio data" means original sample data?
and how can we extract frequencies from dat audio data.
And I can successfully read that data but cant understand what i have to do further.
Pl explain.
You can easily read a wav file , just follow this document.
https://ccrma.stanford.edu/courses/422/projects/WaveFormat/
As for extracting frequencies from the file you would need to apply a Fourier Transform to your data , which would convert your data from Amplitude Time to Frequency time domain.
http://en.wikipedia.org/wiki/Fast_Fourier_transform
An audio file, typically, consists of a header and "samples". The samples can be 8, 16 or 32 bit and integer or floating point. Some audio files store the audio samples in a compressed form (mp3 for example), where others store the data as "raw samples".
To analyse the frequency, you need to perform a "fourier transform", which will give you an array of "how much at this frequency". The actual fourier transform is quite complex to describe (it's certainly more than a few dozen lines).
If the samples are in integer form, you'll have to convert from integer to floating point by dividing each sample by the max value (255, 32767 or 231-1).
Here's a package of C++ code to do FFT. There are several others out there.
http://fftwpp.sourceforge.net/
Here is another example of performing the FFT. This one displays the results in a Windows GUI.
http://www.relisoft.com/Freeware/index.htm
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I have a set of training data consisting of 20 multiple choice questions (A/B/C/D) answered by a hundred respondents. The answers are purely categorical and cannot be scaled to numerical values. 50 of these respondents were selected for free product trial. The selection process is not known. What interesting knowledge can be mined from this information?
The following is a list of what I have come up with so far-
A study of percentages (Example - Percentage of people who answered B on Qs.5 and got selected for free product trial)
Conditional probabilities (Example - What is the probability that a person will get selected for free product trial given that he answered B on Qs.5)
Naive Bayesian classifier (This can be used to predict whether a person will be selected or not for a given set of values for any subset of questions).
Can you think of any other interesting analysis or data-mining activities that can be performed?
The usual suspects like correlation can be eliminated as the response is not quantifiable/scoreable.
Is my approach correct?
It is kind of reverse engineering.
For each respondent, you have 20 answers and one label, which indicates whether this respondent gets the product trial or not.
You want to know which of the 20 questions are critical to give trial or not decision. I'd suggest you first build a decision tree model on the training data. And study the tree carefully to get some insights, e.g. the low level decision nodes contain most discriminant questions.
The answers can be made numeric for analysis purposes, example:
RespondentID IsSelected Q1AnsA Q1AnsB Q1AnsC Q1AnsD Q2AnsA...
12345 1 0 0 1 0 0
Use association analysis to see if there are patterns in the answers.
Q3AnsC + Q8AnsB -> IsSelected
Use classification (such as logistic regression or a decision tree) to model how users are selected.
Use clustering. Are there distinct groups of respondents? In what ways are they different? Use the "elbow" or scree method to determine the number of clusters.
Do you have other info about the respondents, such as demographics? Pivot table would be good in that case.
Is there missing data? Are there patterns in the way that people skipped questions?