Primary Key and Index - google-cloud-platform

Primary Key and Index - google-cloud-platform

Table Schema
Columns: A, B, C, D, E
I have a primary key in the table which combines (A, B) and I have a query -> Select all from the table whose A = 123.
So the above query will scan the whole table or not? If it will scan the table then the only solution is to create an index over a column A

It WILL try to seek and not scan (a seek is more efficient).
Note that the search terms must be "seekable". Equality is seekable (so you are OK) but, for example, LIKE "%foo%" is not seekable.

Related

Filtering by keyword (string) in PowerBI

I have a data visualization in PowerBI that is powered by SQL on the back-end. I have one particular field that I'd like to be able to filter by string-matching. Here's what the field looks like:
me_drug_occurence
AB
C
ABC
BC
B
Each letter is unique and stands for a type of drug. More letters = combinations of drugs. No letter repeats more than once for a record. I want to use the "Filters on the page" option and have the user be able to filter by drug A, B, and C. By selecting A, for example, that would show any record that contains A (records AB, ABC,). Selecting A and B would show any record which contains A OR B (records AB, ABC, BC, B). And so on and so forth.
My issue is that there doesn't seem to be an out-of-the-box way to do this in Power BI? If I simply drag this column to the "filters on this page" sidebar, I just get options to filter by the different drug combinations.
If I choose "advanced filter", I can get closer to my goal, but it forces the user to put the keywords in manually:
So, my question is how can I accomplish a filter on this visualization that would look something like this:
Filter:
A
B
C
Where you could filter by any record that contained A, B, or C, or some combination thereof. Do I need to create a custom measure?

You already are on right path.
Use Advanced filtering.
In this use OR condition
I.e
drug contains C or drug contains B or drug contains A
This will filter out your records with only those which have either of above 3 condition matched.
Edit:
This is closest I could come up with your req.
1 Use slicer, for this slicer add filed drug occurence make it dropdown and searchable.
Now as a end user I will type in A so i get N no of records which will contain A.
As I user I will select all those records as screenshot below
and then If I type C it will show as below and select them as well
This will be the result
Now there is one filter on market place called Text Filter. Import it from marketpalce and there you can search like text (contains) but there you cannot add three different conditions. It will look something like below
Without text filter
After adding text for filter

Compare a value from two column and get matching value from another table Power BI

I am fairly new to powerbi and I need your help in one task on which I am stuck on.
Basically I have two tables and I need to compare the value from table one with a row of table 2 and return the output.
Table 1
I need to compare values in column a & b and get a match from table 2.
For example if row 1 has BY Green & BS HIGH then I need to check this value from matrix table below and return the output in column value as either 0 or 1.
Table 2
As you can see the Table 2 first row has value BY Green and BS low has a value '0'

Try this...
Index() returns a value from the matrix (in purple) based upon the intersections of the two match()'s. The first is the Vertical match in from the Table1:Col A; the second is the Horizontal match from table1:Col B. The value found at that intersection is returned.
... My apologies ... just saw this was a BI request... no worries...
First, Need fixup table2 as a lookup file:
First, click a cell in table 2 (don't edit), then Data menu >frm table/range, will bring up the Power Query window. Select columns B (not A) through Col F), then in the PQ Transform menu > Unpivot to create the new lookup table. this can either be saved as a new table or be used by reference.
Next, open and merge Table 1 PQ_Table 2 (Be sure to select BOTH Columns in BOTH Tables, in the same order). Then, expand the table tab following the merge expand the table tab. I only selected the value to return but you can return all the values to verify, then delete the unneeded columns.
Hope this helps...
Good Luck.

What is the difference between scan and query in dynamodb? When use scan / query?

A query operation as specified in DynamoDB documentation:
A query operation searches only primary key attribute values and supports a subset of comparison operators on key attribute values to refine the search process.
and the scan operation:
A scan operation scans the entire table. You can specify filters to apply to the results to refine the values returned to you, after the complete scan.
Which is best based on performance and cost?

When creating a Dynamodb table select Primary Keys and Local Secondary Indexes (LSIs) so that a Query operation returns the items you want.
Query operations only support an equal operator evaluation of the Primary Key, but conditional (=, <, <=, >, >=, Between, Begin) on the Sort Key.
Scan operations are generally slower and more expensive as the operation has to iterate through each item in your table to get the items you are requesting.
Example:
Table: CustomerId, AccountType, Country, LastPurchase
Primary Key: CustomerId + AccountType
In this example, you can use a Query operation to get:
A CustomerId with a conditional filter on AccountType
A Scan operation would need to be used to return:
All Customers with a specific AccountType
Items based on conditional filters by Country, ie All Customers from USA
Items based on conditional filters by LastPurchase, ie All Customers that made a purchase in the last month
To avoid scan operations on frequently used operations create a Local Secondary Index (LSI) or Global Secondary Index (GSI).
Example:
Table: CustomerId, AccountType, Country, LastPurchase
Primary Key: CustomerId + AccountType
GSI: AccountType + CustomerId
LSI: CustomerId + LastPurchase
In this example a Query operation can allow you to get:
A CustomerId with a conditional filter on AccountType
[GSI] A conditional filter on CustomerIds for a specific AccountType
[LSI] A CustomerId with a conditional filter on LastPurchase

You are having dynamodb table partition key/primary key as customer_country. If you use query, customer_country is the mandatory field to make query operation. All the filters can be made only items that belongs to customer_country.
If you perform table scan the filter will be performed on all partition key/primary key. First it fetched all data and apply filter after fetching from table.
eg:
here customer_country is the partition key/primary key
and id is the sort_key
-----------------------------------
customer_country | name | id
-----------------------------------
VV | Tom | 1
VV | Jack | 2
VV | Mary | 4
BB | Nancy | 5
BB | Lom | 6
BB | XX | 7
CC | YY | 8
CC | ZZ | 9
------------------------------------
If you perform query operation it applies only on customer_country value.
The value should only be equal operator (=).
So only items equal to that partition key/primary key value are fetched.
If you perform scan operation it fetches all items in that table and filter out data after it takes that data.
Note: Don't perform scan operation it exceeds your RCU.

Its similar as in the relational database.
Get query you are using a primary key in where condition, The computation complexity is log(n) as the most of key structure is binary tree.
while scan query you have to scan whole table then apply filter on every single row to find the right result. The performance is O(n). Its much slower if your table is big.
In short, Try to use query if you know primary key. only scan for only the worst case.
Also, think about the global secondary index to support a different kind of queries on different keys to gain performance objective

In terms of performance, I think it's good practice to design your table for applications to use Query instead of Scan. Because a scan operation always scan the entire table before it filters out the desired values, which means it takes more time and space to process data operations such as read, write and delete. For more information, please refer to the official document

Query is much better than Scan - performence wise. scan, as it's name imply, will scan the whole table. But you must be well aware of the table key, sort key, indexes and and related sort indexes in order to know that you can use the Query.
if you filter your query using:
key
key & key sort
index
index and it's related sort key
use Query! otherwise use scan which is more flexible about which columns you can filter.
you can NOT Query if:
more that 2 fields in the filter (e.g. key, sort and index)
sort key only (of primary key or index)
regular fields (not key, index or sort)
mixed index and sort (index1 with sort of index2)\
...
a good explaination:
https://medium.com/#amos.shahar/dynamodb-query-vs-scan-sql-syntax-and-join-tables-part-1-371288a7cb8f

Informatica : something like CDC without adding any column in target table

I have a source table named A in oracle.
Initially Table A is loaded(copied) into table B
next I operate DML on Table A like Insert , Delete , Update .
How do we reflect it in table B ?
without creating any extra column in target table.
Time stamp for the row is not available.
I have to compare the rows in source and target
eg : if a row is deleted in source then it should be deleted in target.
if a row is updated then update in target and if not available in source then insert it in the target .
Please help !!

Take A and B as source.
Do a full outer join using a joiner (or if both tables are in the same databse, you can join in Source Qualifier)
In a expression create a flag based on the following scenarios.
A key fields are null => flag='Delete',
B key fields are null => flag='Insert',
Both A and B key fields are present - Compare non-key fields of A and B, if any of the fields are not equal set flag to 'Update' else 'No Change'
Now you can send the records to target(B) after applying the appropriate function using Update Strategy

If you do not want to retain the operations done in target table (as no extra column is allowed), the fastest way would simply be -
1) Truncate B
2) Insert A into B

Compare Keys in Informatica

I am trying to solve an Informatica problem
I have two tables: Table A and Table B have the following structure
Table A
A_Key
A_Name
A_Address
Table B
B_Key
B_Name
B_Address
A_Key (Foreign Key)
I need to make sure that Every A_Key in Table B exist as A_Key in Table A.
Since I am new to Informatica Data Quality tool, I am trying to find the logic how I can implement this.
One logic that I can think is creating Rule
Does anyone have a better solution?

You can implement a joiner between Table B (as master) and Table A (as detail), the joiner type must be detail outer, so you will get all rows in table B and their equivalent in table A. Then you could obtain those rows in table B wich doesn't have an equivalent in Table A using a filter transformation. And finally you can count the rows using an aggreation transformation.
Table A
Joiner A-B --- Filter (a is null) -- Agregator (count rows)
Table B

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Primary Key and Index - google-cloud-platform

Table Schema Columns: A, B, C, D, E I have a primary key in the table which combines (A, B) and I have a query -> Select all from the table whose A = 123. So the above query will scan the whole table or not? If it will scan the table then the only solution is to create an index over a column A

It WILL try to seek and not scan (a seek is more efficient). Note that the search terms must be "seekable". Equality is seekable (so you are OK) but, for example, LIKE "%foo%" is not seekable.

Related

Filtering by keyword (string) in PowerBI

Compare a value from two column and get matching value from another table Power BI

What is the difference between scan and query in dynamodb? When use scan / query?

Informatica : something like CDC without adding any column in target table

Compare Keys in Informatica

Categories

Resources