how is a country defined by its latutue and longitude - web-services

I am designing a database to store details of a hotel, where i have to classify according to country, state, city, region and its all defined separately as tables. The hotel tables have a foreign key for them and hotel's latitude and longitude.
But i have to define each country,state,city,region with its latitude and longitude too. A simple MIN Latitude/longitude and MAX latitude/longitude isnt enough as some cities may be round or it may not be possible that way without significant error.
How do i define the global position of the city. I have to have a reasonable error rate (say 20%).

I think the concept you are looking for is the centroid. Calculating these on your own would be quite difficult. You should probably use a geocoding api like the one provided by google.

Related

AWS Personalize items attributes

I'm trying to implement personalization and having problems with Items schema.
Imagine I'm Amazon, I've products their brands and their categories. In what kind of Items schema should I include this information?
Should I include brand name as string as categorical field? Should I rather include brand ID as string or numeric? or should I include both?
What about categories? I've the same questions.
Metadata Fields Metadata includes string or non-string fields that
aren't required or don't use a reserved keyword. Metadata schemas have
the following restrictions:
Users and Items schemas require at least one metadata field,
Users and Interactions datasets can contain up to five metadata
fields. An Items dataset can contain up to 50 metadata fields.
If you add your own metadata field of type string, it must include the
categorical attribute. Otherwise, Amazon Personalize won't use the
field when training a model.
https://docs.aws.amazon.com/personalize/latest/dg/how-it-works-dataset-schema.html
There are simply 2 ways to include your metadata in Items/Users datasets:
If it can be represented as a number value, then provide the actual value if it makes sense.
If it can be represented as string, then provide the string value and make sure, that categorical is set to true.
But let's take a look into "Why does they need me, to categorize my strings metadata?". The answer is pretty simple.
Let's start with an example.
If you would have Items as Amazon.com products and you would like to provide rates metadata field, then:
You could take all of the rates including the full review text sent by clients and simply put it as metadata field.
You can take just stars rating, calculate the average and put it as metadata field.
Probably the second one is making more sense in general. Having random, long reviews of product as metadata, pretty much changes nothing. Personalize doesn't understands if the review itself is good or bad, or if the author also recommends another product, so pretty much it doesn't really add anything to the recommendations.
However if you simply "cut" your dataset and calculate the average rating, like in the 2. point, then it makes a lot more sense. Maybe some of our customers like crappy products? Maybe they want to buy them, because they are famous YouTubers and they create videos about that? Based on their previous interactions and much more, Personalize will be able to perform just slightly better, because now it knows, that this product has rating of 5/5 or 3/5.
I wanted to show you, that for some cases, providing Items metadata as string makes no sense. That's why your string metadata must be categorical. It means, that it should be finite set of values, so it adds some knowledge for Personalize about given Item and why some of people might want to interact with it.
Going back to your question:
Should I include brand name as string as categorical field? Should I rather include brand ID as string or numeric? or should I include both?
I would simply go with brand ID as string. You could also go with brand name, but probably single brand can be renamed, when it's still the same brand, so picking up the ID would be more constant. Also two different brands could have the same names, because they are present on different markets, so picking up the ID solves that.
The "categorical": true switch in your schema just tells Personalize:
Hey, do you see that string field? It's categorised, finite set of values. If you train a model for me, please include this one during the training, it's important!
And as it's said in documentation, if you will provide string metadata field, which is not marked as categorical, then Personalize will "think" that:
Hmm.. this field is a string, it has pretty random values and it's not marked as categorical. It's probably just a leftover from Items export job. Let's ignore that.

Map locations from ASA to powerbi

I have a Stream Analytics job that generates locations latitude and longitude to powerbi. I am trying to create a map on powerbi.com from these locations. I am creating a map diagram and adding latitude and longitude fields to maps latitude and longitude fields. But nothing is generated. It is expecting some Location field as well. What data should I provide there?
Yes. You need something in the Location field (but it can be a timestamp or an ID or anything). That's what decides how many bubbles you have. Then for each bubble it does an average latitude and longitude to determine where to put the bubble.
The location field is to specify a "City" location rather then specifying it giving coordinates. If you provide "London" for instance it will create the shapes at the center of London as defined in the maps provider.
On the Long and Lat side, they have to be in a specific format for PBi to recognize them as map fields.
Take a look at the links below for further info on how to change the data type using the editor.
http://community.powerbi.com/t5/Desktop/How-to-map-latitude-longitude/td-p/2479
http://www.radacad.com/how-to-do-power-bi-mapping-with-latitude-and-longitude-only
Hope this helps.

Modeling Data - Invoices and Line Items

I'm creating a web based point of sale (think cash register) solution with Django as the backend. I've always taken the 'classic' approach of modeling invoices and their line items.
InvoiceTable
id
date
customer
salesperson
discount
shipping
subtotal
tax
grand_total
[...]
InvoiceLineItems
invoice_id // foreign key
product_id
unit_price
qty
item_discount
extended_price
[...]
After attempting to research best practices, I've found that there aren't many - at least no definitive source that's widely used.
The Kimball Group suggests: "Rather than holding onto the operational notion of a transaction header “object,” we recommend that you bring all the dimensionality of the header down to the line items."
See http://www.kimballgroup.com/2007/10/02/design-tip-95-patterns-to-avoid-when-modeling-headerline-item-transactions/ and http://www.kimballgroup.com/2001/07/01/design-tip-25-designing-dimensional-models-for-parent-child-applications/.
I'm new to development (only having used desktop database software before) - but from my understanding this makes sense as we can drill down the data any way we want for reporting purposes (though I imagine we could do the same with the first method by joining the tables).
My Questions
The invoice ID will need to be repeated for each row (so we can generate data like totals for the invoice). Is this an intentional feature of this way of modeling the data?
We often have invoice level data like notes, discounts, shipping charges, etc. - How do we represent these using this method? Some discounts are product specific - so they belong on the line item anyway, others are invoice wide (think of a deal where you buy two separate products and receive a discount on the two) - we could we somehow allocate it across the line items? Same with shipping charges, allocate it by dividing it among the line items?
What do we do with invoice 'notes' - we have printed and/or internal notes, would we put the data in the line items and just repeat it for each line item? That seems to go against data normalization. Put it in a related table?
Any open source projects that use this method that I could take a look at? Not sure how to search for them.
It sounds like you're confusing relational design and dimensional design.
A relational design is for facilitating transaction processing, and minimizing data anomalies and duplication. It's your operational database. A dimensional design is for facilitating analysis.
A relational design will have an invoices table and a line_items table and a dimensional design will have a company_invoices_customer fact table with a grain of invoice line item.
Since this is for POS, I assume you want a relational design first.
As for your questions:
First there are tons of good data modelling patterns for this scenario. See https://dba.stackexchange.com/questions/12991/ready-to-use-database-models-example/23831#23831
The invoice ID will need to be repeated for each row (so we can
generate data like totals for the invoice). Is this an intentional
feature of this way of modeling the data?
Yes
We often have invoice level data like notes, discounts, shipping
charges, etc. - How do we represent these using this method?
Probably easiest/simplest to have a "notes" field on the invoice table.
For charges and discounts you should use abstraction (see Table Inheritance), and add them as Order Adjustments. See the book by Silverston in the link above.
Some discounts are product specific - so they belong on the line item
anyway, others are invoice wide (think of a deal where you buy two
separate products and receive a discount on the two) - we could we
somehow allocate it across the line items?
The price of the item should be calculated at runtime based on it's default price, and any discounts or charges that apply in the current "scenario", example discount for government, nearby, on sale day. You could have hierarchical line items that reference each other, to keep things in order. Again, see Silverston book.
What do we do with invoice 'notes' - we have printed and/or internal
notes, would we put the data in the line items and just repeat it for
each line item?
If you want line item notes, add a notes column on the line items table.
That seems to go against data normalization. Put it in a related
table?
If notes are nullable, and you want to be strict about normalization, then yes, add a invoice_notes table.

django models question

I am working on a django model and not being a database expert I could use some advice. I essentially have a model which contains a many to many relationship with another model. But I need to store unique values for each relationship each time I include something.
So for instance in chemistry you may have many elements that include hydrogen, but each element has a unique amount of hydrogen in it. So for instance a water entry would be connected to hydrogen and oxygen and the amount would be two hydrogen atoms and one oxygen.
I want hydrogen and water in this scenario to be stored in the database as elements, so I can query against them for other elements using them.
What is the best way to model this?
Thanks!
Read the documentaion here and pay close attention to the Beatles example, it's exactly what you need.
Person -> Element
Group -> Chemical_Compound
Membership -> Element_2_Chemical
Element_2_Chemical should have an int field which details how many elements you have in each chemical compound.
In your metaphor, you say "I want hydrogen and water in this scenario to be stored in the database as elements, so I can query against them for other elements using them."
Does it mean that "water" may be on any part of the relationship you are modeling? Do "water" relate to "hydrogen" in the (almost) same way as "milk" relates to "water"?
If the answer is Yes, then you should use a directed-acyclic-graph model (hopefully, you won't have cycles in your relationship: A->B->C->A). Look into the django-dag ( http://pypi.python.org/pypi/django-dag/ ) and django-treebeard-dag ( http://pypi.python.org/pypi/django-treebeard-dag/0.2 ) packages.
If the answer in No, so yo have a clear distinction between what's a "container" and what's a "containee", use a normal many-to-many rel between two different models, like the "Membership" example in django documentation ( https://docs.djangoproject.com/en/dev/topics/db/models/#extra-fields-on-many-to-many-relationships ).
In any case you'll have to add more info to the "edge" of the relationship.
Following strictly your chemical metaphor, you are maybe not modeling enough information, because some molecules have the same composition but different structure (they are called "isomers"). For instance the pentane, the 2-methylbutane and the 2,2-dimethylpropane have all five carbons and twelve hydrogens, but they are very different one another...
With this I am saying that when you are doing an "enhanced many-to-many" it's generally a complex model, so take care of not leaving anything out of it.

How to find if lat/long falls in an area using Django and geopy

I'm trying to create a Django app that would take an inputted address and return a list of political races that person would vote in. I have maps of all the districts (PDFs). And I know that I can use geopy to convert an inputted address into coordinates. How do I define the voter districts in Django so that I can run a query to see what districts those coordinates fall in?
This is a non-trivial problem too large in scope to answer in specific detail here. In short, you'll need to use GeoDjango (part of contrib). There is a section dedicated to importing spatial data.
Once you have your data loaded, you can use spatial lookups to find what district a particular coordinate intersects.
As to where to get the voter district data, you might start with www.data.gov's geodata catalog.