Query array column in BigQuery by condition

Query array column in BigQuery by condition - google-cloud-platform

I have a table in Bigquery with this format:
+------------+-----------------+------------+-----------------+---------------------------------+
| event_date | event_timestamp | event_name | event_params.key| event_params.value.string_value |
+------------+-----------------+------------+-----------------+---------------------------------+
| 20201110 | 2929929292 | my_event | previous_page | /some-page |
+------------+-----------------+------------+-----------------+---------------------------------+
| | layer | /some-page/layer |
| +-----------------+---------------------------------+
| | session_id | 99292 |
| +-----------------+---------------------------------+
| | user._id | 2929292 |
+------------+-----------------+------------+-----------------+---------------------------------+
| 20201110 | 2882829292 | my_event | previous_page | /some-page |
+------------+-----------------+------------+-----------------+---------------------------------+
| | layer | /some-page/layer |
| +-----------------+---------------------------------+
| | session_id | 29292 |
| +-----------------+---------------------------------+
| | user_id | 229292 |
+-------------------------------------------+-----------------+---------------------------------+
I want to perform a query to get all rows where event_params.value.string_value contains the regex /layer.
I have tried this:
SELECT
"event_params.value.string_value",
FROM `my_project.my_dataset.my_events_20210110`,
UNNEST(event_params) AS event_param
WHERE event_param.key = 'layer' AND
REGEXP_CONTAINS(event_param.value.string_value, r'/layer')
LIMIT 100
But I'm getting this output:
+---------------------------------+
| event_params.value.string_value |
+---------------------------------+
| event_params.value.string_value |
+---------------------------------+
| event_params.value.string_value |
+---------------------------------+
| event_params.value.string_value |
+---------------------------------+
| event_params.value.string_value |
+---------------------------------+
Some ideas of what I'm doing wrong?

You are selecting a string - you should select a column.
The other problem is that you're cross joining the table with its arrays - effectively bloating up the table.
Your solution is to use a subquery in the WHERE clause:
SELECT
* -- Not sure what you actually need from the table ...
FROM `my_project.my_dataset.my_events_20210110`
WHERE
-- COUNT(*)>0 means "if you find more than zero" then return TRUE
(SELECT COUNT(*)>0 FROM UNNEST(event_params) AS event_param
WHERE event_param.key = 'layer' AND
REGEXP_CONTAINS(event_param.value.string_value, r'/layer')
)
LIMIT 100
If you actually want the values from the array your quick solution is removing the quotes:
SELECT
event_params.value.string_value
FROM `my_project.my_dataset.my_events_20210110`,
UNNEST(event_params) AS event_param
WHERE event_param.key = 'layer' AND
REGEXP_CONTAINS(event_param.value.string_value, r'/layer')
LIMIT 100

Related

PowerBI : Count Distinct values in one column based on Distinct Values in another column

i have a data for group no and its set value. When the set value is same for all the batches i dont want those batches to be counted. but if there are more than 1 set values in each batch then the dax query should count it as 1.
My current data is like this
| group_no | values |
| ---------- | ---------------------- |
| H110201208 | 600 |
| H110201208 | 600 |
| H110201208 | 680 |
| H101201215 | 665 |
| H109201210 | 640 |
| H123201205 | 600 |
| H125201208 | 610 |
| H111201212 | 610 |
| H111201212 | 630 |
I want my output like this
| Group no | Grand Total |
| ---------- | ----------- |
| H101201215 | 1 |
| H109201210 | 1 |
| H110201208 | 3 |
| H111201212 | 2 |
| H123201205 | 1 |
| H125201208 | 1 |
i want to create another table like the one above using dax so that i can plot graphs n percentages based on its output
i want to do this in powerbi using DAX language.

TABLE =
GROUPBY (
Groups, //SourceTable
Groups[ group_no ],
"GrandTotal", COUNTX ( CURRENTGROUP (), DISTINCTCOUNTNOBLANK ( Groups[ values ] ) )
)

How do i add additional rows in M QUERY

I want to add more rows using the Query editor (Power query/ M Query) in only the Start Date and End Date column:
+----------+------------------+--------------+-----------+-------------+------------+
| Employee | Booking Type | Jobs | WorkLoad% | Start Date | End date |
+----------+------------------+--------------+-----------+-------------+------------+
| John | Chargeable | CNS | 20 | 04/02/2020 | 31/03/2020 |
| John | Chargeable | CNS | 20 | 04/03/2020 | 27/04/2020 |
| Bernard | Vacation/Holiday | SN | 100 | 30/04/2020 | 11/05/2020 |
| Bernard | Vacation/Holiday | Annual leave | 100 | 23/01/2020 | 24/02/2020 |
| Bernard | Chargeable | Tech PLC | 50 | 29/02/2020 | 30/03/2020 |
+----------+------------------+--------------+-----------+-------------+------------+
I want to find the MIN(Start Date) and MAX(End Date) and then append the range of start to end dates to this table only in the Start Date and End Date column in the Query Editor (Power Query/ M Query). Preferrable if I can create another table2 duplicating the original table and append these rows.
For example:
+----------+------------------+--------------+-----------+-------------+------------+
| Employee | Booking Type | Jobs | WorkLoad% | Start Date | End date |
+----------+------------------+--------------+-----------+-------------+------------+
| John | Chargeable | CNS | 20 | 04/02/2020 | 31/03/2020 |
| John | Chargeable | CNS | 20 | 04/03/2020 | 27/04/2020 |
| Bernard | Vacation/Holiday | SN | 100 | 30/04/2020 | 11/05/2020 |
| Bernard | Vacation/Holiday | Annual leave | 100 | 23/01/2020 | 24/02/2020 |
| Bernard | Chargeable | Tech PLC | 50 | 29/02/2020 | 30/03/2020 |
| | | | | 23/01/2020 | 23/01/2020 |
| | | | | 24/01/2020 | 24/01/2020 |
| | | | | 25/01/2020 | 25/01/2020 |
| | | | | 26/01/2020 | 26/01/2020 |
| | | | | 27/01/2020 | 27/01/2020 |
| | | | | 28/01/2020 | 28/01/2020 |
| | | | | 29/01/2020 | 29/01/2020 |
| | | | | 30/01/2020 | 30/01/2020 |
| | | | | 31/01/2020 | 31/01/2020 |
| | | | | ... | ... |
| | | | | 11/05/2020 | 11/05/2020 |
+----------+------------------+--------------+-----------+-------------+------------+

The List.Dates function is pretty useful here.
Generate the dates in your range, duplicate that to two columns and then append.
let
StartDate = List.Min(StartTable[Start Date]),
EndDate = List.Max(StartTable[End Date]),
DateList = List.Dates(StartDate, Duration.Days(EndDate - StartDate), #duration(1,0,0,0)),
DateCols = Table.FromColumns({DateList, DateList}, {"Start Date", "End Date"}),
AppendDates = Table.Combine({StartTable, DateCols})
in
AppendDates

Find a string in the entire table (all fields, all columns, all rows) in Django

I have a module (table) in my Django app with 24 fields (columns), and I want to search a string in it. I want to see a list that show me which one of the rows has this string in its fields.
Please have a look at this example:
+-----+------+------+---------+------------+------------+------------+-----+-------------+
| id | name | year | country | attribute1 | attribute2 | attribute3 | ... | attribute20 |
+-----+------+------+---------+------------+------------+------------+-----+-------------+
| 1 | Tie | 1993 | USA | Bond | Busy | Busy | ... | Free |
+-----+------+------+---------+------------+------------+------------+-----+-------------+
| 2 | Ness | 1980 | Germany | Free | Busy | Both | ... | Busy |
+-----+------+------+---------+------------+------------+------------+-----+-------------+
| 3 | Both | 1992 | Sweden | Free | Free | Free | ... | Busy |
+-----+------+------+---------+------------+------------+------------+-----+-------------+
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
+-----+------+------+---------+------------+------------+------------+-----+-------------+
| 24 | Lex | 2001 | Russia | Busy | Free | Free | ... | Both |
+-----+------+------+---------+------------+------------+------------+-----+-------------+
What I am looking to get (by using filters, etc.) is something like this: (When I filter the records base on the word "Both" in the entire table and all of the records. Each row that contains "Both" is in the result below)
+----+------+------+---------+------------+------------+------------+-----+-------------+
| id | name | year | country | attribute1 | attribute2 | attribute3 | ... | attribute20 |
+----+------+------+---------+------------+------------+------------+-----+-------------+
| 1 | Ness | 1980 | Germany | Free | Busy | Both | ... | Busy |
+----+------+------+---------+------------+------------+------------+-----+-------------+
| 2 | Both | 1992 | Sweden | Free | Free | Free | ... | Busy |
+----+------+------+---------+------------+------------+------------+-----+-------------+
| 3 | Lex | 2001 | Russia | Busy | Free | Free | ... | Both |
+----+------+------+---------+------------+------------+------------+-----+-------------+
You can see that the string ("Both") appears in different rows in different columns. (one "Both" is under the column "attribute3", the other "Both" is under column "Name", and the last "Both" is under column "attribute20".
How you get this result in Django by queryset?
Thanks

Assuming you have modeled the above table as a Django model named Person
from django.db.models import Q
query_text = "your search string"
Person.objects.filter(
Q(name__contains=query_text) |
Q(year__contains=query_text) |
Q(attribute1__contains=query_text)
and so on for all your attributes
)
The above code will do a case sensitie search. if instead you want it to be case insenssitive search, use name__icontains instead of say name__contains in the above code.
As suggested by #rchurch4 in comment and based on this so answer, here's how one could search the entire table with fewer lines of code:
from functools import reduce
from operators import or_
all_fields = Person._meta.get_fields()
search_fields = [i.name for i in all_fields]
q = reduce(or_, [Q(**{'{}__contains'.format(f): search_text}) for f in search_fields], Q())
Person.objects.filter(q)

Count same field values in Django queryset

I have a Django model with three fields: product, condition and quantity with data such as:
| Product | Condition | Quantity |
+---------+-----------+----------+
| A | new | 2 |
| A | new | 3 |
| A | new | 4 |
| A | old | 1 |
| A | old | 2 |
| B | new | 2 |
| B | new | 3 |
| B | new | 1 |
| B | old | 4 |
| B | old | 2 |
I'd like to sum the quantities of the entries where product and condition are equal:
| Product | Condition | Quantity |
+---------+-----------+----------+
| A | new | 9 |
| A | old | 3 |
| B | new | 6 |
| B | old | 6 |
This answer helps to count entries with the same field value, but I need to count two fields.
How could I implement this?

from django.db.models import Sum
Model.objects.values('product', 'condition').order_by().annotate(Sum('quantity'))

Indices statistics is not updateted in a Amazon Aurora read replica

When I inserted a data, writer DB Indices statistics is updateted.
But, Reader DB Indices statistics is not updateted.
Sample is Here.
database.
create database test;
table.
CREATE TABLE test(
test_id INT NOT NULL PRIMARY KEY,
test_txt CHAR(8) DEFAULT NULL,
INDEX test_txt(test_txt)
);
writer.
mysql> insert into test values(1, 'test1');
Query OK, 1 row affected (0.01 sec)
mysql> show index from test;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| test | 0 | PRIMARY | 1 | test_id | A | 1 | NULL | NULL | | BTREE | | |
| test | 1 | test_txt | 1 | test_txt | A | 1 | NULL | NULL | YES | BTREE | | |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
2 rows in set (0.00 sec)
reader.
mysql> show index from test;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| test | 0 | PRIMARY | 1 | test_id | A | 0 | NULL | NULL | | BTREE | | |
| test | 1 | test_txt | 1 | test_txt | A | 0 | NULL | NULL | YES | BTREE | | |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
2 rows in set (0.00 sec)
Why cardinality is not updated only reader??
parameter group is default.
please tell me the reason why.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Query array column in BigQuery by condition - google-cloud-platform

Related

PowerBI : Count Distinct values in one column based on Distinct Values in another column

How do i add additional rows in M QUERY

Find a string in the entire table (all fields, all columns, all rows) in Django

Count same field values in Django queryset

Indices statistics is not updateted in a Amazon Aurora read replica

Categories

Resources