I need your help.
I work for a survey company and I am responsible for creating its architecture and modeling a data warehouse that analyzes the results of an international survey (50 countries).
For the architecture, we decided to create a tabular model in PowerBI to analyze our data and to create our reports.
Here below is the model as I thought:
However, I have a design problem.
Since the survey is international, the wording of my dimensions differs from country to country.
My 1st question:
-Would it make more sense to create only one PowerBI embedded model for all countries or 50 PowerBI reports?
My 2nd question:
My model must be multilingual
With my 50 countries, I have several languages (5 languages) and for the same language, I have several variants.
The British English labels differ from the US English labels.
For example, for the Response dimension for France the IdReponse = 1 has the wording 'Vrai' while for the USA the wording is 'True' and for the Britain is 'OK'.
Do you know how to model multi language in a data warehouse?
About question #1 - It's always better, if there is only one model. It will be much easier to maintain. It isn't clear from your question will these 50 reports show the same data (excluding the internationalization of texts like Vrai/True/OK), or each report/country should show it's own subset of the data. In case all reports will show the same data, then definitely it will be better to make one common model and all report use it. You can do this with Power BI by making one "master" report and publishing it, and then the rest of your "per country" reports use it as a data source. And you will need separate reports per country, because you will need to translate the texts (column names, static texts, etc.).
About question #2 - You can create lookup tables in your model (maybe even in the database, it's up to you). The key value (1) will be linked to the key of the table, and there will be columns per language. Depending on the language of the current report, you will select the appropriate column (e.g. French, British, etc.) and even you can fallback to let's say US English, in case there is no translation entered for the current language (e.g. by making a computed column). It is also an option to make separate lookup table per language, but I think it will be more cumbersome to maintain this way.
About question #1: Yes you need only one data model.
About question #2: You Load a question in the language it is asked and the response you get as is in the response DIM. You should create a new column in your response DIM such as Clean_response where you transformed original response to a uniformed value. for example "Vrai", "OK", "True" has same meaning so you may chose to put "Yes" in the Clean_response column. You can also convert different variation of "No", "Nada", "noops", "nah" to a clean value of "No", but keep the original value too.
Labeling a column in the report should be handle in the report code. For example writing a report in French should use your dim column name "Question" and show it as "interroger" as a heading on the report.
Related
I'm trying to implement personalization and having problems with Items schema.
Imagine I'm Amazon, I've products their brands and their categories. In what kind of Items schema should I include this information?
Should I include brand name as string as categorical field? Should I rather include brand ID as string or numeric? or should I include both?
What about categories? I've the same questions.
Metadata Fields Metadata includes string or non-string fields that
aren't required or don't use a reserved keyword. Metadata schemas have
the following restrictions:
Users and Items schemas require at least one metadata field,
Users and Interactions datasets can contain up to five metadata
fields. An Items dataset can contain up to 50 metadata fields.
If you add your own metadata field of type string, it must include the
categorical attribute. Otherwise, Amazon Personalize won't use the
field when training a model.
https://docs.aws.amazon.com/personalize/latest/dg/how-it-works-dataset-schema.html
There are simply 2 ways to include your metadata in Items/Users datasets:
If it can be represented as a number value, then provide the actual value if it makes sense.
If it can be represented as string, then provide the string value and make sure, that categorical is set to true.
But let's take a look into "Why does they need me, to categorize my strings metadata?". The answer is pretty simple.
Let's start with an example.
If you would have Items as Amazon.com products and you would like to provide rates metadata field, then:
You could take all of the rates including the full review text sent by clients and simply put it as metadata field.
You can take just stars rating, calculate the average and put it as metadata field.
Probably the second one is making more sense in general. Having random, long reviews of product as metadata, pretty much changes nothing. Personalize doesn't understands if the review itself is good or bad, or if the author also recommends another product, so pretty much it doesn't really add anything to the recommendations.
However if you simply "cut" your dataset and calculate the average rating, like in the 2. point, then it makes a lot more sense. Maybe some of our customers like crappy products? Maybe they want to buy them, because they are famous YouTubers and they create videos about that? Based on their previous interactions and much more, Personalize will be able to perform just slightly better, because now it knows, that this product has rating of 5/5 or 3/5.
I wanted to show you, that for some cases, providing Items metadata as string makes no sense. That's why your string metadata must be categorical. It means, that it should be finite set of values, so it adds some knowledge for Personalize about given Item and why some of people might want to interact with it.
Going back to your question:
Should I include brand name as string as categorical field? Should I rather include brand ID as string or numeric? or should I include both?
I would simply go with brand ID as string. You could also go with brand name, but probably single brand can be renamed, when it's still the same brand, so picking up the ID would be more constant. Also two different brands could have the same names, because they are present on different markets, so picking up the ID solves that.
The "categorical": true switch in your schema just tells Personalize:
Hey, do you see that string field? It's categorised, finite set of values. If you train a model for me, please include this one during the training, it's important!
And as it's said in documentation, if you will provide string metadata field, which is not marked as categorical, then Personalize will "think" that:
Hmm.. this field is a string, it has pretty random values and it's not marked as categorical. It's probably just a leftover from Items export job. Let's ignore that.
I need some help getting past a road block I've come across in creating my application in APEX.
This application will be to track financial disbursements from a company. It will utilize a one to many relationship. One associate to many different transaction details.
Using Quick SQL in APEX 19.2 I have created a couple tables. DISB and DISB_DTLS
DISB
Assignor vc
Processor vc
RCVD_DA date
PROC_DA date
ACT_NO number
APPROVER vc
STATUS vc
NOTES vc
DISB_DTLS
AMT number
etc
etc...
The problem I'm having is that I want to have the primary table DISB be for the associate. Hence "One Associate to Many Disbursements. However, we have so many details that it would make the interactive grid APEX uses way to big and squished when doing a Master Detail form. Yet the only way to modify two tables or a view would be a master detail form. That's why I put some disbursement info in the primary table DISB and not the DTLS table.
I know there are some creative applications out there, and need some help discovering what I can do in regards to updating multiple tables from one form, if possible. Or alternatives. I want to make this process easy for the associates. This was all in one spreadsheet at one point.
Thanks,
Joe
I recommend you don't compromise Database design over the UI.
What you can do in this case is filter segmentation.
Complete your Master-Detail as initially thought.
Some detail columns can be logically grouped so I would put some filters somewhere on the page which the users selects a Logical group of columns to be displayed. That way you hide/show the columns to ensure they fit on the screen. Think of Filters as radio buttons or even checkboxes, let the user choose what shows on the screen.
I'm creating a web based point of sale (think cash register) solution with Django as the backend. I've always taken the 'classic' approach of modeling invoices and their line items.
InvoiceTable
id
date
customer
salesperson
discount
shipping
subtotal
tax
grand_total
[...]
InvoiceLineItems
invoice_id // foreign key
product_id
unit_price
qty
item_discount
extended_price
[...]
After attempting to research best practices, I've found that there aren't many - at least no definitive source that's widely used.
The Kimball Group suggests: "Rather than holding onto the operational notion of a transaction header “object,” we recommend that you bring all the dimensionality of the header down to the line items."
See http://www.kimballgroup.com/2007/10/02/design-tip-95-patterns-to-avoid-when-modeling-headerline-item-transactions/ and http://www.kimballgroup.com/2001/07/01/design-tip-25-designing-dimensional-models-for-parent-child-applications/.
I'm new to development (only having used desktop database software before) - but from my understanding this makes sense as we can drill down the data any way we want for reporting purposes (though I imagine we could do the same with the first method by joining the tables).
My Questions
The invoice ID will need to be repeated for each row (so we can generate data like totals for the invoice). Is this an intentional feature of this way of modeling the data?
We often have invoice level data like notes, discounts, shipping charges, etc. - How do we represent these using this method? Some discounts are product specific - so they belong on the line item anyway, others are invoice wide (think of a deal where you buy two separate products and receive a discount on the two) - we could we somehow allocate it across the line items? Same with shipping charges, allocate it by dividing it among the line items?
What do we do with invoice 'notes' - we have printed and/or internal notes, would we put the data in the line items and just repeat it for each line item? That seems to go against data normalization. Put it in a related table?
Any open source projects that use this method that I could take a look at? Not sure how to search for them.
It sounds like you're confusing relational design and dimensional design.
A relational design is for facilitating transaction processing, and minimizing data anomalies and duplication. It's your operational database. A dimensional design is for facilitating analysis.
A relational design will have an invoices table and a line_items table and a dimensional design will have a company_invoices_customer fact table with a grain of invoice line item.
Since this is for POS, I assume you want a relational design first.
As for your questions:
First there are tons of good data modelling patterns for this scenario. See https://dba.stackexchange.com/questions/12991/ready-to-use-database-models-example/23831#23831
The invoice ID will need to be repeated for each row (so we can
generate data like totals for the invoice). Is this an intentional
feature of this way of modeling the data?
Yes
We often have invoice level data like notes, discounts, shipping
charges, etc. - How do we represent these using this method?
Probably easiest/simplest to have a "notes" field on the invoice table.
For charges and discounts you should use abstraction (see Table Inheritance), and add them as Order Adjustments. See the book by Silverston in the link above.
Some discounts are product specific - so they belong on the line item
anyway, others are invoice wide (think of a deal where you buy two
separate products and receive a discount on the two) - we could we
somehow allocate it across the line items?
The price of the item should be calculated at runtime based on it's default price, and any discounts or charges that apply in the current "scenario", example discount for government, nearby, on sale day. You could have hierarchical line items that reference each other, to keep things in order. Again, see Silverston book.
What do we do with invoice 'notes' - we have printed and/or internal
notes, would we put the data in the line items and just repeat it for
each line item?
If you want line item notes, add a notes column on the line items table.
That seems to go against data normalization. Put it in a related
table?
If notes are nullable, and you want to be strict about normalization, then yes, add a invoice_notes table.
I have a tabular form which is updated throughout the year and i wanted to prevent users from editing certain rows. Currently the 'row type' is hard coded however I want the application admin to control which 'row types' are readable / write at certain times. My answered question, click here.
Currently a dynamic action is fired which prevents the rows that contain the type 'manager figure' and 'sales_target' being edited.
I have created a table with the three row types against each customer. Each status is set by a number: 0 to 3 (These i will decode into something meaningful for users).
0 - Row with that row type is read only.
1 - Users can enter into the row with that row type.
2 - row is read only with that row type.
3 - row is complete and set to read only.
I have created a new form (new tab) for the admin user to maintain each status.
Currently for Customer 'Big Toy Store' rows should be set as follows:
Manager Figure row should be read only (since set to 2)
Sales should be readable (since set to 0)
Sales target should be writable (since set to 1)
Please can i be pointed in the right direction, ive looked into jquery but struggling to work out how to pass the output of an sql query to it, so it can be used to determine which rows should be read only.
Link:apex.oracle.com
workspace: apps2
user: developer.user
password: DynamicAction
application name: Application 71656 Read only Rows for Tabular Form
I'm not sure that a tabular form is a good format to work out this idea. As you can see, you require quite a bit of javascript to produce the results you want. Not only that, but this is all client side too, and thus there are some security risks to take into account. After all, I could just run some Firebug and disable or revert all things you did, and even change the numbers. Especially with sales figures, which is something you most definitely do want altered by everybody and is also the nature of your question, security is important.
There are more elegant ways here for you to control this, and not in the least to reduce the amount of highly customized javascript code. For example, you could do away with the tabular form, and instead implement a modal popup from an interactive report. Since the modal popup would be an iframe and thus a different page, you can create a form page. On a form page you have a lot more control over what happens to certain elements. You can specify conditions, read-only conditions, or use authorization schemes. All things you can not evidently use in a tabular form.
I'd think you'd do yourself a service by thinking this over again, and explore a different option. How much of a dealbreaker is using a tabular form actually?
You need the user. You need to know what group he belongs to, and then this has to be checked against the different statusses and rows have to be en/disabled. Do you really want this to happen on the client side?
I'm not saying it can't be done in a tabular form and javascript. It can, I'm just really doubting this is the correct approach!
I am creating a database for an achievement system (like something you would see in a Blizzard game). I would like to have a GUI that displays the current progress of all achievements in the game which means I will need to query the progress of all achievements for a user in order to populate the GUI. I plan on having somewhere around 100 achievements.
This brings about a design question. What is the best way to design the database and querying code to query the progress of ~100 bit fields?
It seems like the brute force method would be to get the entire row of achievements and then for each field in the row do some hardcoded string comparison to determine which achievement we are dealing with.
Another possible solution may be to have a big switch statement based on the column index of the table and handle each achievement for each case (requires not modifying the table or you have to refactor a lot of C++ code).
I'm curious to hear any other designs you guys may have for this.
Thanks!
I suggest building a solution using 3 tables. These tables are users, achievements and user_achievements. A user would be identified with a u_id in the users table. An achievement would be identified with a a_id in the achievements table. You would then keep track of users achievements by inserting a row in the user_achievements table that includes a u_id to identify the user and a a_id to identify the achievement. The user_achievements table would also contain a column that would specify the % completion of that achievement for the given user.
Came across this question and even though it's 5 years old, perhaps someone would be interested in following approach.
Achievements are usually broken down to numbers (the rest, like Name, Description of each achievement can be put to site/app core to avoid bloating the DB).
lets be simple, we are not FB and don't need separate table for them, so in "users" table we add just 1 single column: "Achievements" it is a varchar(50). Number in brackets (50) will depend on your actual needs to this column (i.e. how much data it stores).
so you end up having in each cell of the Achievements column a numerical sequence: 10982039482084109384
Read this line of digits as follows, from left to right: user has reached "1098 profile views", received "2039 likes", etc. Optionally, add a separator for easier distinction + to instantly handle cases when as first user had 25 likes, then 125, then 2039 (2 digits, 3 digits, 4 digits - or another alternative is to use 0025 then 0125 then 2039 given you know max digits is 4 per achievement). But still lets say we decide to use separators, i.e. a comma:
1098,2039,4820,8410,9384
Then once you need a data, just SELECT achievements belonging to specific userID and subsequently (if you added a separator)
explode (',', $array)
then your site php core knows that first 4 digits stand for "profile views" and lets say this means that he has a level 10 badge for profile views (1 badge for 100 views).
Thereon, you can easily do operations with no further need for SQL queries. Example, user wants to know his progress on achieving a level 20 badge, you display: he has a 1098/2000 (or 55%) progress.
At that, achievement Description, Name, level information is stored in site core, while percentage is calculated on the go.
Hope the logic is clear and may be useful to any1 in community out there.
Cheers!