How to write a complex calculated field in Data Studio with regex? - regex

I have been trying to make this regular expression REGEX filter work in Google Data Studio. It is supposed to do the following
Check the field "src_id" and COUNT all the values containing "widget".
Check the field "Page" and COUNT all the values starting with a "/" and ending with "/start".
Check the field "real_title" and NOT COUNT any value containing "-".
I have tried using the code below but it's not providing the correct result:
COUNT(CASE WHEN REGEXP_MATCH(src_id, "^widget" ) THEN 1
WHEN REGEXP_MATCH(Page, ".*(/start)$") then 1
WHEN REGEXP_MATCH(real_title, "^[^-]") then 1
ELSE 0 END)
I expect the result to "52" but it's giving me "582. I need help to spot what I'm doing wrong.

The problem is your count() - it is counting all the entries including zeroes.
either use sum() or just use the case statement and sum where you want it .
Examples
TestField1
COUNT(CASE WHEN REGEXP_MATCH(Page Title , "^How.*" ) THEN 1
ELSE 0 END)
This returns 58 - the number of page titles on my site.
TestField2
Sum(CASE WHEN REGEXP_MATCH(Page Title, "^How.*" ) THEN 1
ELSE 0 END)
This returns 7 - the number of titles on the site that start with "How"
You really don't need the sum() function in most cases because you can sum the field in the places you need it.

Related

How to merger these two records ino one row removing Null value in Informatica using transformation. Please see the snapshot for scenario

enter image description here
Input-
Code value Min Max
A abc 10 null
A abc Null 20
Output-
Code value Min Max
A abc 10 20
You can use an aggregator transformation to remove nulls and get single row. I am providing solution based on your data only.
use an aggregator with below ports -
inout_Code (group by)
inout_value (group by)
in_Min
in_Max
out_Min= MAX(in_Min)
out_Max = MAX(in_Max)
And then attach out_Min, out_Max, code and value to target.
You will get 1 record for a combination of code and value and null values will be gone.
Now, if you have more than 4/5/6/more etc. code,value combinations and some of min, max columns are null and you want multiple records, you need more complex mapping logic. Let me know if this helps. :)

HiveQL: Parse strings and count

I am using HiveQL to work with millions of rows of domain name text data stored in HDFS. The following is a hand-selected subset to illustrate lexical diversity. There are duplicate entries.
dnsvm.mgmtsubnet.mgmtvcn.oraclevcn.com.
mgmtsubnet.mgmtvcn.oraclevcn.com.
asdf.mgmtvcn.oraclevcn.com.
dnsvm.mgmtsubnet.mgmtvcn.oraclevcn.com.
localhost.
a.localhost.
img.pulsemgr.com.
36.136.154.156.in-addr.arpa.
accounts.spotify.com.
_dmarc.ixia-devops.com.
&eventtype=close&reason=4&duration=35.
&eventtype=close&reason=3&duration=10336.
I am trying to get a count of # of rows based on the last two levels of the domain, where sometimes the 2nd level is absent (i.e. localhost.). For example:
domain_root count
oraclevcn.com. 4
localhost. 1
a.localhost. 1
pulsemgr.com. 1
in-addr.arpa. 1
spotify.com. 1
ixia-devops.com 1
It would be nice to also see how to filter out domains 2nd level is absent.
I am not sure where to start. I have seen use of the SPLIT() function, but that may not be robust since there could be many levels to a domain name, for example: a.b.c.d.e.f.g.h.i etc.
Any ideas are implementations are appreciated.
Below would be the query with regexp_extract.
select domain_root, count(*) from (select regexp_extract('dnsvm.mgmtsubnet.mgmtvcn.oraclevcn.com.', '[A-Za-z0-9-]+\.[A-Za-z0-9-]+\.$', 0) as domain_root from table) A group by A.domain_root -- replace first argument with column name
regex will extract for domain root with Alphanumeric and special character '-'
hope this helps.

PowerBi - Weeknumber not in the correct order

I'm new to PowerBi and i'm running into the following problem:
Weeknum + year are not shown in the correct order. See the following screenshots:
I've concatenate weeknumber with year based on a column called "PublishDate"
This is my dax query for weeknum:
Weeknum = YEAR ( [PublishDate] ) & "" & WEEKNUM ( [PublishDate], 2 )
I do notice that 1 till 9 are not shown with a 0 infront of it. Could this be causing this?
I agree with getting the '0' in the right place. Once you change the data type from text to a number, if that '0' in't there, it will be out of order as well.
I prefer editing the query and changing the data type from the beginning:
Finding the column that needs a data type change and modifying it there:
[
You can change it from text to whole number.
The problem is that the values are being sorted in alphabetical order, because they are of datatype text. So yes, the fact that '9' does not have a '0' in front of it, does cause your problem. You can solve this by changing the format of the WEEKNUM function like this (also you do not need & "" &):
Weeknum = YEAR ( [PublishDate] ) & FORMAT(WEEKNUM ( [PublishDate], 2 ),"00")

Make =IF Function Output Numbers For "Scoring": Google Sheets

I'm am exploring methods of giving scores to different datapoints within a dataset. These points come from a mix of numbers and text string attributes looking for certain characteristics, e.g. if Col. A contains more than X number of "|", then give it a 1. If not, it gets a 0 for that category. I also have some that give the point when the value is >X.
I have been trying to do this with =IF, for example, =IF([sheet] = [Text], "1","0").
I can get it to give me 1 or 0, but I am unable to get a point total with sum.
I have tried changing the formatting of the text to both "number", "plain text", and have left it as automatic, but I can't get it to sum. Thoughts? Is there maybe a better way to do this?
FWIW - I'm trying to score based on about 12 factors.
Best,
Alex
The issue here might be that you're having the cell evaluate to either the string "0" or the string "1" rather than the number 0 or the number 1. That would explain why you're seeing the right things but the math isn't coming out right - the cell contents look like numbers, but they're really text, which the summation would then ignore.
One option would be to drop the quotation marks and write something like this:
=IF(condition, 1, 0)
This has the condition evaluate to 1 if it's true and 0 if it's false.
Alternatively, you could write something like this:
=(condition) * 1
This will take the boolean TRUE or FALSE returned by condition and convert it to either the numeric value 1 (true) or the numeric value 0 (false).

Filtering on the count with the Django ORM

I have a query that's basically "count all the items of type X, and return the items that exist more than once, along with their counts". Right now I have this:
Item.objects.annotate(type_count=models.Count("type")).filter(type_count__gt=1).order_by("-type_count")
but it returns nothing (the count is 1 for all items). What am I doing wrong?
Ideally, it should get the following:
Type
----
1
1
2
3
3
3
and return:
Type, Count
-----------
1 2
3 3
In order to count the number of occurrences of each type, you have to group by the type field. In Django this is done by using values to get just that field. So, this should work:
Item.objects.values('group').annotate(
type_count=models.Count("type")
).filter(type_count__gt=1).order_by("-type_count")
It's logical error ;)
type_count__gt=1 means type_count > 1 so if the count == 1 it won't be displayed :)
use type_count__gte=1 instead - it means type_count >= 1 :)