Address Standardization/Correction/Geocoding

Address Standardization/Correction/Geocoding - geocoding

My place of employment is looking into buying a third party tool, for batch based US and Canadian Address correction, with Geo-coding.
What Products have you used?
What do you like about them?
What do you not like about them?
Note that, We are a C/C++ Unix Shop.

We use Melissa Data, they have a number of solutions including geocoding and address normalization. They have good APIs and the support has been great. Their solutions work on many platforms and languages including C and C++ on Unix. Can't think anything negative about them.

We use Trillium in our office. They provide C# libraries that you can incorporate into your projects. It will take an address and return a pretty complex standardized address object but does include information like geocodes (which are important for calculating tax).

Related

Client-Server: Data Model and Protocol, best practice and examples? [duplicate]

I'm looking into a mechanism for serialize data to be passed over a socket or shared-memory in a language-independent mechanism. I'm reluctant to use XML since this data is going to be very structured, and encoding/decoding speed is vital. Having a good C API that's liberally licensed is important, but ideally there should be support for a ton of other languages. I've looked at google's protocol buffers and ASN.1. Am I on the right track? Is there something better? Should I just implement my own packed structure and not look for some standard?

Given your requirements, I would go with Google Protocol Buffers. It sounds like it's ideally suited to your application.

You could consider XDR. It has an RFC. I've used it and never had any performance problems with it. It was used in ONC RPC and has an and comes with a tool called rpcgen. It is also easy to create a generator yourself when you just want to serialize data (which is what I ended up doing for portability reasons, took me half a day).
There is an open source C implementation, but it can already be in a system library, so you wouldn't need the sources.
ASN.1 always seemed a bit baroque to me, but depending on your actual needs might be more appropriate, since there are some limitations to XDR.

Just wanted to throw in ASN.1 into this mix. ASN.1 is a format standard, but there's libraries for most languages, and the C interface via asn1c is much cleaner than the C interface for protocol buffers.

JSON is really my favorite for this kind of stuff. I have no prior experience with binary stuff in it though. Please post your results if you are planning on using JSON!

Thrift is a binary format created by Facebook. Here's a comparison with google protocol buffers.

Check out Hessian

There is also Binary XML but it seems not stabilized yet. The article I link to gives a bunch of links which might be of interest.

Another option is SNAC/TLV which is used by AOL in it's Oscar/AIM protocol.

Also check out Muscle. While it does quite a bit, it serializes to a binary format.

Few Thing's you need to Consider
1. Storage
2. Encoding Style (1 byte 2 byte)
3. TLV standards
ASN.1 Parser is the good for binary represenations the best part is ASN.1 is a well-established technology that is widely used both within ITU-T and outside of it. The notation is supported by a number of software vendors.

C++ library for reading data matrix code

I am looking for a C++ library for reading data matrix codes, specifically ECC 200 codes (so not QR codes). I have found libdmtx and zxing. zxing is java, but there seems to be a C++ port. Does anyone have experience with reading ECC 200 codes with these libraries, or possibly with other libraries?

The DM support in the C++ port of ZXing is up to date with the Java (not true of many of the 1D codes). It's not enabled by default in the test apps but is easy to enable (and will be enabled by default in the future.)
I don't have any personal experience with actually using the DM decoder but it is included in the test suites and I believe available in the Android app.

Here's a real answer then.
I have used both libdmtx and libzxing succesfully. Libdmtx was more straightforward, because it's limited to datamatrices. In my experpience the results were, strangely enough, not always deterministic.
Libzxing is fine as well, but when you do real production (millions or readouts) it will crash sometimes due to the fact that memory management is not perfect. It's really good, but not perfect for a real production environment.
Both the libraries, libzxing and libdmtx require you to have the datamatrix deadcenter of the image and quite large. That means you need to do pre-localisation yourself.
I managed to do this by just using image routines and looking for the 'L' shape and then some smartness with a minimal-area squared bounding box, etc etc. Then the decoding and error correciton step itself i used from libzxing, which still isnt perfect.
If you go for a production environment, either do everything yourself within your own contraints, and if you are not comfortable with doing that, use a paid package, which in turn are never perfectly suited for your application and cost money.
The best port of libzxing-cpp is that of user glassenchidna. https://github.com/glassechidna/zxing-cpp

I am currently trying to use libdmtx
http://www.libdmtx.org/
It has support for all kinds of interfaces. It seems to have good reviews here and in other places….
(But I am looking for help on building the utilities :-)

Since no "real" answer was posted to my question, at least no answer from someone with experience with one of these libraries for reading 2D matrix codes, I thought I will post my own experience.
I tried both libraries and both could read codes, but the performance was not good enough for my situations. In my situation the codes are frequently not "perfect", Dots can be missing, have different size, and code can be a bit skewed. Both libraries had problems reading these codes.
At the end I used a commercial (not free) library, Sapera. Sapera was able to read the non-perfect codes much better. I used Sapera because it was used at my company in the past, but it is quite possible that other commercial machine vision libraries (like Halcon) also perform well.

I have exensively used Halcon, including for Decoding DataMatrix. I can tell you that it works really well. Even with distortions caused by, for example, reading off a circular body, or skewed prints, it still is able to read them very well, in a short amount of time.
The only downside, and a big one, is the price. The runtime license is very expensive, and you need a development license before you can purcahse a runtime license, which is even more expensive. Unless you project has enough funds, this might not be an option due to this reason. Good luck!

Sampling from a discrete probability distribution in C++

I am new to C++ and extremely surprised by the lack of accessible, common probability manipulation tools (i.e. the lack of things in Boost and the standard library). I've done a lot of scientific programming in other languages, but the standard and/or ubiquitious third party add-ons always include a full range of probability tools. A friend billed up Boost to be the equivalent ubiquitous add-on for C++, but as I read the Boost documentation, even it seems to have a dearth of what I would consider extremely elementary built-ins.
I cannot find a built in that takes some sort of array of discrete probabilities and produces an index chosen according to those probabilities. I can of course write my own function for this, but I just wanted to check whether I am missing a standard way to do this.
Having to write my own functions at such a low-level is a bad thing, I feel, but I am writing a new simulation module for a larger project that is all in C++. My usual go-to tactic would be to write it in Python and link the Python to the C++, but because several other people are going to have to manage this code once I finish it, and none of them know Python, I think it would be more prudent to deliver it to them all in C++.
More generally, what do people do in C++ for things like sampling from standard distributions, in particular something as basic as a multi-variate normal distribution?

Perhaps I'm misunderstanding your intention, but it seems to me what you want is simply std::discrete_distribution.

(Moved from comment.)
Did you look at Boost.Math.StatisticalDistributions? Specifically, its Discrete Probability Distributions?
Boost is not a library, it's a collection of libraries, so it can sometimes be difficult to find exactly what you're looking for – but that doesn't mean it isn't there. ;-]

As mentioned, you'll want to look at boost/math/distributions and friends to meet your needs.
Here's a very good, detailed tutorial on getting these working for you in Boost. You may also want to throw your weight behind stan as well, which looks quite promising within this space.

Boost's math libraries are pretty good for working with different distributions, but if you are only interested in sampling (as in the problem you mentioned in your post), then looking at the boost Random libraries might be more germane to your task. This link shows how to simulate rolling a weighted die, for example.

You should do less C++ bashing, and more question asking - we try to be helpful and respectful on SO. Questions like yours are often tagged as inflammatory.
Boost::math seems to provide exactly what you're looking for: https://www.quantnet.com/cplusplus-statistical-distributions-boost/ - I'm not 100% sure about how well it handles multi-variate distributions though (nor am I an expert on statistics).
Get it here: http://www.boost.org/doc/libs/1_49_0/libs/math/doc/html/index.html

What are the advantages of developing applications in C++ as compared to managed languages?

Hi I want to know why people develop library applications and employee management applications in C++ (this application, for example), when clearly the same thing can be done in C# and VB.NET in a much prettier way. Even banking applications are mostly in C++. Is there a good reason why, apart from the fact that compiled C++ code executes faster?
Can anyone shed some light on this?

C: 1972
C++: 1979
C#: 2000
Now think of the lifetime of a library, especially in a bank, plus, you get to use the libraries (theoretically) on almost every computersystem in existence (as opposed to C#)
You will also still find a lot of COBOL (1960) there.

The main reasons for C++ for say banking applications is:
Legacy code. A large financial firm typically has ~10-20-30 years of business specific C/C++ libraries developed in-house, plus a bunch of business specific vendor libraries which may not be available for C#
A LOT of that financial code runs on Unix/Linux. While you can purely theoretically build C# code for Linux, it's definitely NOT an established technology you want to bet billion dollar amounts on.

C++ is usable on other types of systems, whereas c# and vb.net are not.

Apart from technical reasons (such that C++ is an "unmanaged" language with quite different capabilities and properties than .NET languages), this can simply be due to preferences. Not all people find that C# and VB.NET are the best tool for every task. Or the prettiest. Why do you think so? And why should others not have similarly good reasons for choosing another tool of their liking?
Update, in reply to Konrad's comment:
It's correct that "preference" is indeed too narrow a term. There's other facets to it:
Managers / bosses can turn their (possibly badly informed) preferences into business policies;
A corporation's decade-old codebase can mean that when it comes to choosing the programming language for some new task, you'll evaluate languages with a different perspective. You want to or need to reuse the existing code, so interop with the old code's language must be possible.

It might be a factor of the knowledge economy of a particular company. For example, the bigger a company gets, or the less staff turnover they have, the harder it will be to replace competence, process and tooling to accommodate, for example, a new language. C/C++ has been around for quite some time, and many developers as well as development shops have that background.
Concerning banking applications, the reason is, I would guess, mostly because you have a close to metal environment which allows you to utilise realtime programming in a dependable fashion.

Every language has its pros and cons and no one language is best for every application. Programs in C++are harder to write, but can take advantage of platform-specific hardware and features. Because they're compiled, they also tend to run a bit faster. C# programs are easier to write, but aren't able to access underlying resources and can't be ported to non-Windows platforms very easily.
In short, it really depends on the application needs. If you need raw speed and explicit resource management, go with C++. If you want ease of coding and clarity, go with C#.

Fast sketching tools for drawing C/C++ structs, pointers, etc

I would like to know what do you use to sketch relations between different entities in C/C++. This can be a very broad issue, so I'll try to clarify a bit more my question and give an example.
I'm looking for something that is simple enough as a user, and let me sketch easily containers, pointers, etc... in an informal way.
The aim is to document some structs relations to pass them to junior developers. A look at the drawings is supposed to accelerate the understanding of the code.
My solutions at this moment are to use:
1) Paper & pencil.
2) Microsoft PowerPoint/Word Autoshapes.
3) Freeware Dia.
Other ones could be:
4) Microsoft Visio, but my company does not own licenses.
5) UML tools. I don't want to go this way. This is what I mean a more formal solution.
I know tools like Rational Rose are xxx, and I tried boUML and violet and they are fine in some parts, but I prefer the flexibility of options 1), 2) or 3).
Finally, let me write down a more concrete example:
Let's say I what to sketch a map that contains another map as the mapped value, and that one contains a struct as the mapped value, that holds a vector of pointers of a type and a pointer to other type. Also, there exist other structs that hold pointers to the objects pointed by the previous map, so there are objects pointed from different places.
This is just one example I have, but you can easily come with one from you experience.
What would you use to sketch this example or another similar you have dealt with?
Best regards,
Tomas.

Visio is great for quickly creating these types of illustrations / diagrams. I recommend at least trying to get your company to purchase a license.
If Visio is truly not an option for you, the next step may be to consider Open Source alternatives to Visio.

I have two things I use.
My whiteboard. Whiteboards are really tough to beat for diagramming something quickly.
UMLPad. It's small, so it doesn't have a ton of unrelated features to deal with, it is targeted to UML diagrams, and it is GPL.

For design issues, involving thoughts by a good many people, we've used "Post-It Design". The idea is simple:
Pick a whiteboard
Represent an entity as a Post-It (name + some comments)
Draw the relationships on the white board moving the post-its around as required
And when you're done ? Photo of the whole thing for perenity emailed to the persons involved :)
It may seem artisanal but it really remind me of the paper design approach to GUIs.

Have you tried Google Doc's Drawing? The link is one of the diagrams I've done with it.

I like yuml as a very easy way to create diagrams, that also keep that informal look. And no real drawing needed :)

I would use graphviz, but since you say "something that is simple enough as a user", dia is probably a better alternative.

I've used ArgoUML but you'll have to decide whether it's simple enough for what you have in mind.

For the sake of completeness: there's also StarUML, which is (windows) freeware and let's you create uml-diagrams pretty quickly.

Visual Paradigm UML the community edition is free and is good enough for sketching
Open Office Draw works for most of what you want to do too

You specify, that you are not just using U.M.L., most tools, these days are directed to specific U.M.L., you may want to look for a generic drawing tool.
At some projects, sometimes I use Open Office Draw, because, the company doesn't allow me to use another software (the company won't wan't to pay), its similar to Power Point or a reduced, simplified, version of Corel Draw:
http://www.openoffice.org/
In other cases, I have try both commercial and open source apps, but, doesn't like it.
At home, I use (paid software):
http://www.novagraph.com/
along with Open Office Draw.
These one, its also good (paid software):
http://www.smartdraw.com
Good Luck.

If you have an existing codebase you wish for a developer to understand (it sounds like you are trying to help junior devs come up to speed quicker) why not run your code through doxygen
With various output types and the ability to draw class hierarchies it really is a useful tool. The added benefit of something like the html output is that you dont have to cram everything into a finite amount of space since all relationships are hyperlinked. Users can just browse the source - at a type-level - without having to worry much about the details.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Address Standardization/Correction/Geocoding - geocoding

My place of employment is looking into buying a third party tool, for batch based US and Canadian Address correction, with Geo-coding. What Products have you used? What do you like about them? What do you not like about them? Note that, We are a C/C++ Unix Shop.

We use Melissa Data, they have a number of solutions including geocoding and address normalization. They have good APIs and the support has been great. Their solutions work on many platforms and languages including C and C++ on Unix. Can't think anything negative about them.

We use Trillium in our office. They provide C# libraries that you can incorporate into your projects. It will take an address and return a pretty complex standardized address object but does include information like geocodes (which are important for calculating tax).

Related

Client-Server: Data Model and Protocol, best practice and examples? [duplicate]

C++ library for reading data matrix code

Sampling from a discrete probability distribution in C++

What are the advantages of developing applications in C++ as compared to managed languages?

Fast sketching tools for drawing C/C++ structs, pointers, etc

Categories

Resources