Effiecient for/each loop to match phrases? - regex

I am going to use a for/each loop, to search different names (table1) among textual information of records in another table (table2) using regular expressions.
SELECT id FROM "table1"
where tags ~* 'south\s?\*?africa'
or description ~* 'south\s?\*?south'
order by id asc;
but I do not know how to put it in a for each loop!
table1:
t1ID | NAME
1 | Shiraz
2 | south africa
3 | Limmatplatz
table2:
t2ID |TAGS | DESCRIPTIONS
101 |shiraz;Zurich;river | It is too hot in Shiraz and Limmatplatz
201 |southafrica;limmatplatz| we went for swimming
I have a list of names in table1. Another table has some text information that might contain those names.
I would like to get back the id of table2 that contains items in table1 with the id of the items.
For example:
t2id | t1id
101 |1
101 |3
201 |2
201 |3
My tables have 60,000 and 550.000 rows.
I need to use a way that time wise be efficient!

You don't need a loop. A simple join works.
SELECT t2.id AS t2id, t1.id AS t1id
FROM table1 t1
JOIN table1 t2 ON t2.tags ~* replace(t1.name, ' ', '\s?\*?')
OR t2.description ~* replace(t1.name, ' ', '\s?\*?')
ORDER BY t2.id;
But performance will be terrible for big tables.
There are several things you can do to improve it:
Normalize table2.tags into a separate 1:n table.
Or an n:m relationship to a tag table if tags are used repeatedly (typical case). Details:
How to implement a many-to-many relationship in PostgreSQL?
Use trigram or textsearch indexes
PostgreSQL LIKE query performance variations
Use a LATERAL join to actually use those indexes.
LATERAL JOIN not using trigram index
Ideally, use the new capability in Postgres 9.6 to search for phrases with full text search. The release notes:
Full-text search can now search for phrases (multiple adjacent words)

Related

Informatica Cloud Data Integration - find non matching rows

I am working on Informatica Cloud Data
Integraion.I have 2 tables- Tab1 and Tab2.The joining column is id.I want to find all records in Tab1 that do not exist in Tab2.What transformations can I use to achieve this?
Tab1
id name
1 n1
2 n2
3 n3
Tab2
id
1
5
6
I want to get records with id 2 and 3 from tab1 as they do not exist in tab2
You can use database source qualifier overwrite sql
Select * from table1 where id not in ( select id from table2)
Or else you can use informatica like below.
Do a lookup on table2, on join condition on id.
In exp transformation, create a flag
out_flag= iif(isnull (:lkp(id)),'pass','fail')
Put a filter next and keep the condition as out_flag= 'pass'
Whole map should be like this
Lkp
|
Sq --exp|-----> fil---tgt

Querying on Bigquery repeated fields

Below is the schema of my BigQuery table. I am selecting the sentence_id, store and BU_model and inserting data into another table in BigQuery. The datatypes for the new table generated are integer, repeated and repeated respectively.
I want to flatten/unnest the repeated fields so that they are created as STRING fields in my second table. How could this be achieved using standard sql?
+- sentences: record (repeated)
| |- sentence_id: integer
| |- autodetected_language: string
| |- processed_language: string
| +- attributes: record
| | |- agent_rating: integer
| | |- store: string (repeated)
| +- classifications: record
| | |- BU_Model: string (repeated)
The query that I am using to create the second table is as below. I would want to query on the BU_Model as a STRING column.
SELECT sentence_id ,a.attributes.store,a.classifications.BU_Model
FROM staging_table , unnest(sentences) a
Expected Output should look like:
Staging table:
41783851 regions Apparel
district Footwear
12864656 regions
district
Final Target Table:
41783851 regions Apparel
41783851 regions Footwear
41783851 district Apparel
41783851 district Footwear
12864656 regions
12864656 district
I tried the below query and it seems to work as expected, but this means that i would have to unnest every expected repeated field. My table in Bigquery has 50+ columns which are repeated. Is there a easier way around this ?
SELECT
sentence_id,
flattened_stores,
flattened_Model
FROM `staging`
left join unnest(sentences) a
left join unnest(a.attributes.store) as flattened_stores
left join unnest(a.classifications.BU_Model) as flattened_Model
Assuming you want still three columns in your output - with arrays being flattened into string
SELECT sentence_id ,
ARRAY_TO_STRING(a.attributes.store, ',') store,
ARRAY_TO_STRING(a.classifications.BU_Model, ',') BU_Model
FROM staging_table , unnest(sentences) a
UPDATE to address recent changes in question
In BigQuery Standard SQL - use of LEFT JOIN UNNEST() (as you did in your last query) is the most reasonable way to do what you want to get as a result
In BigQuery Legacy SQL - you can use FLATTEN syntax - but it has same drawback of needing to repeat same for all 50+ column
Very simplified example:
#legacySQL
SELECT sentence_id, store, BU_Model
FROM (FLATTEN([project:dataset.stage], BU_Model))
Conclusion: I would go with LEFT JOIN UNNEST() approach

How to capture slicer value by DAX measure

How to capture slicer value by DAX measure in all circumstances? Let's have sample data:
+-----------+---------+-------+
| category | species | units |
+-----------+---------+-------+
| fruit | apple | 1 |
| fruit | banana | 1 |
| vegetable | carrot | 1 |
| vegetable | potato | 1 |
+-----------+---------+-------+
I added two measures:
Measure 1:
species selected = SELECTEDVALUE(Table1[species])
Measure 2:
IsFiltered = ISFILTERED(Table1[species])
Case 1. All items in both slicers selected.
Case 2. (problematic case). Fruits selected and Carrots selected (it is possible when we untie slicers interactions).
In case when we select fruit category from one slicer and carrot from another slicer there is a problem. This set of items is obviously empty. However definitely carrot from species have been selected and it is confirmed by IsFiltered measure which evaluates to True. Is there a way to capture that value in DAX measure?
Since both the category and species slicers come from columns on the same table, if you have both fruit and carrot selected, then the resulting table is empty and any measures (except ones that remove both filters) will therefore be working with blanks. You cannot have both filters apply simultaneously an expect them to act independently (even if the two slicer visuals don't cross-filter).
If you don't want your species selected measure to be influenced by category, the simplest thing to do would be to turn off filtering (under Format > Edit interactions) from the category slicer to the visual containing species selected.
This isn't always what you want though, so another possibility is to create a new table for the species slicer which has no filtering relationship from Table1. This will allow you to work with the slicers selections separately if that's something you need to do. (I've definitely had to do this before when I wanted a slicer to behave more like a parameter than a filter.)
Edit: To do what I suggested, create a new Table2 in the query editor that references Table1, remove all columns other than species and remove duplicate if necessary. You should now have a single column table that is a list of unique species.
When you close and apply, Power BI will likely automatically create a relationship between the two tables, but you need to make sure it's exactly what you want. It needs to be a many to one relationship with a single filter direction.
Once this is done, you'll need to replace the Table1[species] slicer with Table2[species] slicer as well as change references in measures where necessary.

SQLite C++ Compare two tables within the same database for matching records

I want to be able to compare two tables within the same SQLite Database using a C++ interface for matching records. Here are my two tables
Table name : temptrigrams
ID TEMPTRIGRAM
---------- ----------
1 The cat ran
2 Compare two tables
3 Alex went home
4 Mark sat down
5 this database blows
6 data with a
7 table disco ninja
++78
Table Name: spamtrigrams
ID TRIGRAM
---------- ----------
1 Sam's nice ham
2 Tuesday was cold
3 Alex stood up
4 Mark passed out
5 this database is
6 date with a
7 disco stew pot
++10000
The first table has two columns and 85 records and the second table has two columns with 10007 records.
I would like to take the first table and compare the records within the TEMPTRIGRAM column and compare it against the TRIGRAM columun in the second table and return the number of matches across the tables. So if (ID:1 'The Cat Ran' appears in 'spamtrigrams', I would like that counted and returned with the total at the end as an integer.
Could somebody please explain the syntax for the query to perform this action?
Thank you.
This is a join query with an aggregation. My guess is that you want the number of matches per trigram:
select t1.temptrigram, count(t2.trigram)
from table1 t1 left outer join
table2 t2
on t1.temptrigram = t2.trigram
group by t1.temptrigram;
If you just want the number of matches:
select count(t2.trigram)
from table1 t1 join
table2 t2
on t1.temptrigram = t2.trigram;

Kettle Pentaho - Getting records from two table which does not match on common Key (Merge)

I have two tables in transformation and I need to get data from two tables which does not meet on common key. i.e I am doing join on table A and B
from table A I need those records which are not present in table B.
it will be helpful if someone can tell me what step I can use in Kettle spoon to do above transformation
You can achieve this with the Merge Join step. Under Join Type choose LEFT OUTER. After this step your results will look like this:
key_a|value_a|key_b|value_b
1 | 1| null | null
2 | 2 | null | null
3 | 3| 3| 3|
Then choose the Filter rows step and set key_b as the field and the condition to IS NULL.
If you also need records where key_a does not match key_b, choose the Join Type as FULL OUTER.
If both your tables are in a database of the same type, this can easily be achieved by using the Table input step and doing the join in the query itself:
SELECT table_a.key
, table_a.value
FROM table_a
LEFT JOIN table_b
ON table_a.key = table_b.key
WHERE table_b.key IS NULL