I have a simple request that selects a country on a map by code (e.g. FR for France) in Google chart:
http://chart.apis.google.com/chart?&cht=t&chs=440x220&chtm=europe&chco=f5f5f5,edf0d4,6c9642,13390a&chld=FR&chd=s:FR
I know that this can be extended to selecting several countries but I cannot construct a proper request. Can someone give an example of how to do it (e.g. highlighting FR and IT)?
I think I got the syntax more or less:
http://chart.apis.google.com/chart?&cht=t&chs=440x220&chtm=europe&chco=f5f5f5,edf0d4,6c9642,13390a&chld=FR|IT|CZ|DE&chd=s:afhx
so one should use "|" to separate countries and then provide enough color by using alphanumeric encoding (1 char per country)
Related
I am trying to add data to a Map with Danish postcodes, e.g. 1000-9000 (we have four digits in Denmark).
When I add that into a Map, it scatters all over the world, as Power BI do not recognize it as Danish locations, even my Power BI is setup in Danish and the Map has Danish spelled city names.
I tried to add the regions Jylland, Fyn, Sjælland as a country hierarchy, but doing that moved Jylland (Jutland) as a place in Norway...
I also tried to use city names instead of post codes, but then a city shows up in Sweden...
It does not change whether the post code format is Text or Number format, and I have no option to use a Location format in the query.
Can anyone help me use Danish post codes for Map visualization? : )
Thanks
Ok, I solved it myself!
I found the place in the modelling part where I could force PowerBI to accept my city names, region names etc. and it now works.
More detailed: go into the middle of the three left side windows called Data (not Report, not Model), and click on the column you want to change format of. Then find the Tools section and change the Data Category to for example Address, or Country etc. Hope that helps
I'm using Elasticsearch to build search for ecommerece site.
One index will have products stored in it, in products index I'll store categories in it's other attributes along with. Categories can be multiple but the attribute will have single field value. (E.g. color)
Let's say user types in Black(color) Nike(brand) shoes(Categories)
I want to process this query so that I can extract entities (brand, attribute, etc...) and I can write Request body search.
I have tought of following option,
Applying regex on query first to extract those entities (But with this approach not sure how Fuzzyness would work, user may have typo in any of the entity)
Using OpenNLP extension (But this one only works on indexation time, in above scenario we want it on query side)
Using NER of any good NLP framework. (This is not time & cost effective because I'll have millions of products in engine also they get updated/added on frequent basis)
What's the best way to solve above issue ?
Edit:
Found couple of libraries which would allow fuzzy text matching in regex. But the entities to find will be many, so what's the best solution to optimise that ?
Still not sure about OpenNLP
NER won't work in this case because there are fixed number of entities so prediction is not right when there are no entity available in the query.
If you cannot achieve desired results with tuning of built-in ElasticSearch scoring/boosting most likely you'll need some kind of 'natural language query' processing:
Tokenize free-form query. Regex can be used for splitting lexems, however very often it is better to write custom tokenizer for that.
Perform named-entity recognition to determine possible field(s) for each keyword. At this step you will get associations like (Black -> color), (Black -> product name) etc. In fact you don't need OpenNLP for that as this should be just an index (keyword -> field(s)), and you can try to use ElasticSearch 'suggest' API for this purpose.
(optional) Recognize special phrases or combinations like "released yesterday", "price below $20"
Generate possible combinations of matches, and with help of special scoring function determine 'best' recognition result. Scoring function may be hardcoded (reflect 'common sense' heuristics) or it this may be a result of machine learning algorithm.
By recognition result (matches metadata) produce formal query to produce search results - this may be ElasticSearch query with field hints, or even SQL query.
In general, efficient NLQ processing needs significant development efforts - I don't recommend to implement it from scratch until you have enough resources & time for this feature. As alternative, you can try to find existing NLQ solution and integrate it, but most likely this will be commercial product (I don't know any good free/open-source NLQ components that really ready for production use).
I would approach this problem as NER tagging considering you already have corpus of tags. My approach for this problem will be as below:
Create a annotated dataset of queries with each word tagged to one of the tags say {color, brand, Categories}
Train a NER model (CRF/LSTMS).
This is not time & cost effective because I'll have millions of
products in engine also they get updated/added on frequent basis
To handle this situation I suggest dont use words in the query as features but rather use the attributes of the words as features. For example create an indicator function f(x',y) for word x with context x' (i.e the word along with the surrounding words and their attributes) and tag y which will return a 1 or 0. A sample indicator function will be as below
f('blue', 'y') = if 'blue' in `color attribute` column of DB and words previous to 'blue' is in `product attribute` column of DB and 'y' is `colors` then return 1 else 0.
Create lot of these indicator functions also know as features maps.
These indicator functions are then used to train a models using CRFS or LSTMS. Finially we use viterbi algorithm to find the best tagging sequence for your query. For CRFs you can use packages like CRFSuite or CRF++. Using these packages all you have go do is create indicator functions and the package will train a model for you. Once trained you can use this model to predict the best sequence for your queries. CRFs are very fast.
This way of training without using vector representation of words will generalise your model without the need of retraining. [Look at NER using CRFs].
I need your help.
I work for a survey company and I am responsible for creating its architecture and modeling a data warehouse that analyzes the results of an international survey (50 countries).
For the architecture, we decided to create a tabular model in PowerBI to analyze our data and to create our reports.
Here below is the model as I thought:
However, I have a design problem.
Since the survey is international, the wording of my dimensions differs from country to country.
My 1st question:
-Would it make more sense to create only one PowerBI embedded model for all countries or 50 PowerBI reports?
My 2nd question:
My model must be multilingual
With my 50 countries, I have several languages (5 languages) and for the same language, I have several variants.
The British English labels differ from the US English labels.
For example, for the Response dimension for France the IdReponse = 1 has the wording 'Vrai' while for the USA the wording is 'True' and for the Britain is 'OK'.
Do you know how to model multi language in a data warehouse?
About question #1 - It's always better, if there is only one model. It will be much easier to maintain. It isn't clear from your question will these 50 reports show the same data (excluding the internationalization of texts like Vrai/True/OK), or each report/country should show it's own subset of the data. In case all reports will show the same data, then definitely it will be better to make one common model and all report use it. You can do this with Power BI by making one "master" report and publishing it, and then the rest of your "per country" reports use it as a data source. And you will need separate reports per country, because you will need to translate the texts (column names, static texts, etc.).
About question #2 - You can create lookup tables in your model (maybe even in the database, it's up to you). The key value (1) will be linked to the key of the table, and there will be columns per language. Depending on the language of the current report, you will select the appropriate column (e.g. French, British, etc.) and even you can fallback to let's say US English, in case there is no translation entered for the current language (e.g. by making a computed column). It is also an option to make separate lookup table per language, but I think it will be more cumbersome to maintain this way.
About question #1: Yes you need only one data model.
About question #2: You Load a question in the language it is asked and the response you get as is in the response DIM. You should create a new column in your response DIM such as Clean_response where you transformed original response to a uniformed value. for example "Vrai", "OK", "True" has same meaning so you may chose to put "Yes" in the Clean_response column. You can also convert different variation of "No", "Nada", "noops", "nah" to a clean value of "No", but keep the original value too.
Labeling a column in the report should be handle in the report code. For example writing a report in French should use your dim column name "Question" and show it as "interroger" as a heading on the report.
I am trying Tableau with data extracted from Salesforce. The input includes a "Country" record were the row have different spellings for the same thing.
Example: Cananda, CANADA, CAnada etc.
Is there a way to fix this in Tableau?
The easiest solution is create a group field based on your Country field.
Select Country in the data pane on the left side bar, right click and choose Create Group. Select elements that you want to group together put them into a single group, say Canada, that contains all variations of spelling.
This new group field initially has a name of Country (group). You may want to rename it Country_Corrected. (Or even better, rename the first field, Country_Original, and call the group field simply Country. Then you can hide Country_Original)
Groups are implemented using SQL case statements. They have many uses, but one application is to easily tolerate some inconsistent spellings in your data source without having to change your data. In general, you can specify several transformations like this that take effect at query and visualization time. For very large data sets, or for very complicated transformations, you may eventually want to push some of them upstream in your data pipeline to get better performance. But make those optimizations later when you've proven the necessity.
If the differences are just in case (upper vs lower), you can right-click the Country dimension, and create a calculated field called something like "New Country", and use the following formula to make the case consistent:
upper([Country])
Use this new "New Country" calc dimension instead of your "Country" dimension, and it will group them all without case sensitivity, and display as uppercase. Or you can use "lower" instead of "upper" if preferred.
I want to highlight a number of different regions (e.g. all countys) of a country map. Is this possible to achieve when using Google Charts, not using the default bubble-shaped markers? I want the highlights to look like regular drawn regions (i.e. custom regions).
Tried Raphael before and considering switching back to it as it has exactly what I'm after - http://raphaeljs.com/australia.html
I know this post is kind of old but you should be able to do this, you'll have to hack the map a bit. If you look at the source for the map you'll notice a bunch of path elements. Each path element has a "logicalname" property (not attribute). The logical name is just a bunch of data points seperated by #'s. I believe the fourth index should be the region code e.g. 002 for Africa or US for United States. If you did:
$("path").each(function(){ if($(this).prop("logicalname").indexOf("#{REGIONCODE}#") != -1) {/*TO STUFF HERE*/} });
You should be able to target specific regions. Please note that the region codes supplied to the map are dependent on the map resolution.