Encoding ship data with AIS - ais

How to encode Vessel data (position, course, speed & ect.) with AIS (Automatic Information System) AIVDM/AIVDO sentence.

You can find a php version for encoding at my repo here - phpais
It should be pretty straight forward to understand the code if you understand how AIS strings are composed

This is a very vague question it depends on which AIS message you want to send? (See the second link for the different msg.)
As you might see, it is a rather complexed task, but basically you have to do what is done here it-digin.com in reversed order. But there are several different message types, you can read more about the standard here United States Coast Guard

did you try ais2csv ? It works well with NMEA sentences
https://github.com/dma-ais/AisLib/tree/master/ais-lib-cli/launch4j/ais2csv

Related

Google Natural Language Sentiment Analysis incorrect result

We have Google Natural AI integrated into our product for Sentiment Analysis (https://cloud.google.com/natural-language). One of the customers complained that when they write "BAD" then it shows a positive sentiment.
On further investigation, we found that when google Sentiment Analysis Natural Language API is called with input as BAD or Bad (pls see its in all caps or first letter caps ), it identifies text as an entity (a location or consumer good) & sends back the result as Positive while when we write "bad" in all small case, it sends negative.
Has anyone faced a similar problem? How did you solve it?
One obvious way looks like converting text into a small case but that may break some other use cases (maybe where entities do not get analyzed due to a small case text). Another way which we are building is to use our own dictionary of words with sentiments before calling google APIs but that doesn't answer the said problem, which may occur with any other text.
Inputs will help us. Thank you!
The NLP API uses an underlying model that is neural in nature. The knowledge comes from training on real world text. It is normal to get different results for different capitalizations as they can relate to different uses of the same trigram, e.g. Mike (person), mike (microphone, slang), MIKE (military alphabet entry).
The second key aspect is that the model is tuned and meant to be used on larger pieces of text and not on single words, hence good results can not be expected in this case.

Use LZMA to codificate a stream of information

The professor gave me a research paper that shows a way to efficiently compress some kind of data.
It's not worth to eplain the full algorithm since the question is not about that, I just introduce a little example that should allow you to undestand what the real question is about.
Our compression algorithm have is own dictionary which is a table (no matter how it is calculated, just assume that both compressor and decompressor have it), each table row has a string.
The compressor in order to compress a message will open it and start from begining, it will search for a match in the dictionary and eventually send a MATCH message with the row id, if nothing is found then a SET message with the message to set is sent.
Note that MATCH do not really have to be complete match, they can be followed by many MISSMATCH message each containing the byte offset wrong and the correct byte.
So for example the compressor might want to encode:
Now, in the paper they say that they entropy encode this "stream" of data using LZMA and they assume it's a trivial thing to do without giving further details.
I've searched online but I didn't come up with anything. Do you have any idea on how this last step could be done? Do you have any reference?
There is a stream compression algorithm with preset dictionary using LZMA as part of this open-source project: Zip-Ada . The preset dictionary is called there "training data".

How to read/restore big data file (SEGY format) with C/C++?

I am working on a project which needs to deal with large seismic data of SEGY format (from several GB to TB). This data represents the 3D underground structure.
Data structure is like:
1st tract, 2,3,5,3,5,....,6
2nd tract, 5,6,5,3,2,....,3
3rd tract, 7,4,5,3,1,....,8
...
What I want to ask is, in order to read and deal with the data fast, do I have to convert the data into another form? Or it's better to read from the original SEGY file? And is there any existing C package to do that?
If you need to access it multiple times and
if you need to access it randomly and
if you need to access it fast
then load it to a database once.
Do not reinvent the wheel.
When dealing of data of that size, you may not want to convert it into another form unless you have to - though some software does do just that. I found a list of free geophysics software on Wikipedia that look promising; many are open source and read/write SEGY files.
Since you are a newbie to programming, you may want to consider if the Python library segpy suits your needs rather than a C/C++ option.
Several GB is rathe medium, if we are toking about poststack.
You may use segy and convert on the fly, you may invent your own format. It depends whot you needed to do. Without changing segy format it's enough to createing indexes to traces. If segy is saved as inlines - it's faster access throug inlines, although crossline access is not very bad.
If it is 3d seismic, the best way to have the same quick access to all inlines/crosslines is to have own format - based od beans, e.g 8x8 traces - loading all beans and selecting tarces access time may be very quick - 2-3 secends. Or you may use SSD disk, or 2,5x RAM as your SEGY.
To quickly access timeslices you have 2 ways - 3D beans or second file stored as timeslices (the quickes way). I did same kind of that 10 years ago - access time to 12 GB SEGY was acceptable - 2-3 seconds in all 3 directions.
SEGY in database? Wow ... ;)
The answer depends upon the type of data you need to extract from the SEG-Y file.
If you need to extract only the headers (Text header, Binary header, Extended Textual File headers and Trace headers) then they can be easily extracted from the SEG-Y file by opening the file as binary and extracting relevant information from the respective locations as mentioned in the data exchange formats (rev2). The extraction might depend upon the type of data (Post-stack or Pre-stack). Also some headers might require conversions from one format to another (e.g Text Headers are mostly encoded in EBCDIC format). The complete details about the byte locations and encoding formats can be read from the above documentation
The extraction of trace data is a bit tricky and depends upon various factors like the encoding, whether the no. of trace samples is mentioned in the trace headers, etc. A careful reading of the documentation and getting to know about the type of SEG data you are working on will surely make this task a lot easier.
Since you are working with the extracted data, I would recommend to use already existing libraries (segpy: one of the best python library I came across). There are also numerous free available SEG-Y readers, a very nice list has already been mentioned by Daniel Waechter; you can choose any one of them that suits your requirements and the type file format supported.
I recently tried to do something same using C++ (Although it has only been tested on post-stack data). The project can be found here.

Compare two strings and find how closely they are related by meaning

Problem:
I have two strings, say, "Billie Jean" and "Thriller". I need to programmatically compare them and find how closely they are related. Those are both songs of the same artist, hence, they should give a higher score (probability, percentage etc) than say, "Brad Pitt" and "Jamaican Farewell".
One way of doing this is an open source Java tool named WikipediaMiner which compares using the Wikipedia data dump, checking links, descriptions etc.
Question:
Please suggest a better alternative, that uses any or all of Wikipepdia, DBpedia, Freebase and their cousins, or combines a different approach. I would really prefer open source software that can be downloaded and set up on a server (eg. Apache Mahout), rather than a paid web service.
It's not so much a matter of programming, but of data.
So it's not really a question for StackOverflow.
What you really want is to use WordNet I guess. That is really meant as a database for reasoning about the meaning of words. So for example, the data explicitely states that data mining is a form of data processing. And which is a physical entity...
You see, the reasoning will be only as good as your data is.
DBPedia may also include a mapping from WordNet to Wikipedia maybe?
You can't tell that "Thriller" is a song, not a music video or film genre or Lambchop album without additional context.
After you've identified what your items are, it's "simply" a matter of traversing the graph of connections in Freebase, MusicBrainz, or whatever other information sources you are using.
You'll need to decide how you're going to weight things for scoring though. Are two Michael Jackson songs more closely related because they share the same type or are they more closely related to the artist Michael Jackson because they're directly connect to him?

Looking for Ideas: How would you start to write a geo-coder?

Because the open source geo-coders cannot begin to compare to Google's or even Yahoo's, I would like to start a project to create a good open source geo-coder. Just to clarify, a geo-coder takes some text (usually with some constraints) and returns one or more lat/lon pairs.
I realize that this is a difficult and garguntuan task, so I am wondering how you might get started. What would you read? What algorithms would you familiarize yourself with? What code would you review?
And also, assuming you were going to develop this very agilely, what would you want the first prototype to be able to do?
EDIT: Let's set aside the data question for now. I am going to use OpenStreetMap data, along with a database of waypoints that I have. I would later plan to include other data sets as well, and I realize the geo-coder would be inherently limited by the quality of the original data.
The first (and probably blocking) problem would be: where do you get your data from? (unless you are willing to pay thousands of dollars for proprietary sets).
You could build a geocoding-api on top of OpenStreetMap (they publish their data in dumps on a regular basis) I guess, but that one was still very incomplete last time I checked.
Algorithms are easy. Good mapping data, however, is expensive. Very expensive.
Google drove their cars all over the world, collecting this data among other things.
From a .NET point of view these articles might be interesting for you:
Writing Your Own GPS Applications: Part I
Writing Your Own GPS Applications: Part 2
Writing GIS and Mapping Software for .NET
I've only glanced at the articles but they've been on CodeProject's 'Most Popular' list for a long time.
And maybe this CodePlex project which the author of the articles above made available.
I would start at the absolute beginning by figuring out how you're going to get the data that matches a street address with a geocode. Either Google had people going around with GPS units, OR they got the information from some existing source. That existing source may have been... (all guesses)
The Postal Service
Some existing maps(printed)
A bunch of enthusiastic users that were early adopters of GPS technology who ere more than willing to enter in street addresses and GPS coordinates
Some government entity (or entities)
Their own satellites
etc
I guess what I'm getting at is the information was either imported from somewhere or was input by someone via some interface. As my starting point I would look at how to get that information. In an open source situation, you may be able to get a bunch of enthusiastic people to enter information.
So for my first prototype, boring as it would be, I would create a form for entering information.
Then you need to know the math for figuring out the closest distance (as the crow flies). From there, try to figure out how to include roads. (My guess is you would have to have data point for each and every curve, where you hold the geocode location of the curve, and the angle of the road on a north/south and east/west vector. You'd probably need to take incline into account, too to get accurate road measurements.)
That's just where I'd start.
But in all honesty, I wouldn't even start on this. Other programmers have done it already, I'm more interested in what hasn't already been done.
get my free raw data from somewhere like http://ipinfodb.com/ip_database.php
load it into a database, denormalizing for fast lookups
design my API
build it out as a RESTful web service
return results in varying formats: JSON, XML, CSV, raw text
The first prototype should accept a ZIP code and return lat/lon in raw text.