Google data studio : Error in formula for custom fields with REGEXP_MATCH - regex

I'm currently playing with Google Data Studio and I'm having an error that I can't get rid of.
I'm trying to create a custom field that will store some values depending on the result of my regex, see below the code :
GDS is not accepting this formula as I'm getting the error : Invalid Formula.
The documentation about REGEXP_MATCH is also saying that it returns true or false but when I just get the return of the regex '.' (looking for any character), I do not get any of these values. Instead it shows me {$theCharacterFound} ex : {A}.
Hope someone will be able to tell me what I am doing wrong !
EDIT : I found out in this topic that it is apparently a problem with the postgreSQL connector (that I'm using) so we can only hope that Google will fix it...

I think you are missing an r before the regex literal:
CASE
WHEN REGEXP_MATCH(my_field_text, r'\bWord1\b') THEN 'True'
WHEN REGEXP_MATCH(my_field_text, r'\bWord2\b') THEN 'False'
ELSE NULL
END
Note also that I placed word boundaries around your search term words. This will prevent Word1 from matching a substring in a large string, e.g. AWord1s, which you might not want to count as a match.

Related

Google Data Studio how to replace a string character by position?

I am new into Google Data Studio and I am trying to cleanse some Google Analytic data.
For example I have a filed called page which shows the page name. For some pages I have duplicates e.g: contact/product/car and contact/product/car/ (ending in this case with /)
I want to create a field that always replace the last charterer of the page name if it ends with '/' with a space
I have tried this function: REPLACE(ENDS_WITH(Page,"/"), '/','')
But is not working instead giving me true or false.
Someone can help me with this?
Please use regular expression for doing this job:
REGEXP_REPLACE(Page,"/$","")

Trying to create a field in Google Data Studio that only counts if the name of a list begins with US

I am trying to create a field in Google Data Studio that sums the revenue for lists that begin with US. I know I have to use regex, but it continue to tell me there is an unexpected end to the formula.
Here is the code.
Revenue WHERE REGEXP_MATCH(List, '^US')
Please let me know if you have any questions.
Thanks!
You can use
WHERE REGEXP_MATCH(List, '^US.*')
^^
Or even
WHERE REGEXP_MATCH(List, 'US.*')
See REGEXP_MATCH documentation:
REGEXP_MATCH attempts to match the entire string contained in field_expression. For example, if field_expression is "ABC123":
REGEXP_MATCH(field_expression, 'A') returns false.
REGEXP_MATCH(field_expression, 'A.*') returns true.

Trying to write an SQL query with regexp_matches() look behind positive in postgresql

From a PostgreSQL database, I'm trying to match 6 or more digits that come after a string that looks like "(OCoLC)" and I thought I had a working regular expression that would fit that description:
(?<=\(ocolc\))[0-9]{6,}
Here are some strings that it should return the digits for:
|a(OCoLC)08507541 will return 08507541
|a(OCoLC)174097142 will return 174097142
etc...
This seems to work to match strings when I test it on regex101.com, but when I incorporate it into my query:
SELECT
regexp_matches(v.field_content, '(?<=\(ocolc\))[0-9]{6,}', 'gi')
FROM
varfield as v
LIMIT
1;
I get this message:
ERROR: invalid regular expression: quantifier operand invalid
I'm not sure why it doesn't seem to like that expression.
UPDATE
I ended up just resorting to using a case statement, as that seemed to be the best way to work around this...
SELECT
CASE
WHEN v.field_content ~* '\(ocolc\)[0-9]{6,}'
THEN (regexp_matches(v.field_content, '[0-9]{6,}', 'gi'))[1]
ELSE v.field_content
END
FROM
varfield as v
as electricjelly noted, I'm kind of after just the numeric characters, but they have to be preceded by the "(OCoLC)" string, or they're not exactly what I'm after. This is part of a larger query, so I'm running a second case statement a boolean flag in cases where the start of the string wasn't "(OCoLC)". These seems to be more helpful anyway, as I'm going to probably want to preserve those other values somehow.
After looking over your question it seems your error is caused from a syntax problem, not so much from the function not being available on your version of PostgreSQl, as I tested it on 9.6 and I received the same error.
However, what you seem to want is to pull the numbers from a given field as in
|a(OCoLC)08507541 becomes 08507541
an easy way you could accomplish this would be to use regex_replace
the function would be:
regexp_replace('table.field', '\D', '', 'g')
the \D in the function finds all non-numbers and replaces it with a nothing (hence the '') and returns everything else
It looks like after doing some more searching, this is only a feature of versions of PostgreSQL server >= 9.6
https://www.postgresql.org/docs/9.6/static/functions-matching.html#POSIX-CONSTRAINTS-TABLE
The version I am running is version 9.4.6
https://www.postgresql.org/message-id/E1ZsIsY-0006z6-6T#gemulon.postgresql.org
So, the answer is it's not available for this version of PostgreSQL, but presumably this would work just fine in the latest version of the server.

Untranslatable character when extracting dates from strings

I am attempting to extract dates from a free-text field (because our process is awesome like that :\ ) and keep hitting Teradata error 6706. The regex I'm using is: REGEXP_SUBSTR(original_field,'(\d{2})\/(\d{2})\/(\d{4})',1) AS new_field. I'm unsure of the field's type HELP TABLE has a blank in the Type column for the field.
I've already tried converting using TRANSLATE(col USING LATIN_TO_UNICODE), as well as UNICODE_TO_LATIN, those both actually cause the error by themselves. A straight CAST(original_field AS VARCHAR(255)) doesn't fix the issue, though that cast does work. I've also tried stripping various special characters (new-line, carriage return, etc.) from the field before letting the REGEXP_SUBSTR take a crack at it, both by itself and with the CAST & TRANSLATEs I already mentioned.
At this point I'm not sure what the issue could be, and could use some guidance on additional options to try.
The final version that worked ended up being
, CASE
WHEN TRANSLATE_CHK(field USING LATIN_TO_UNICODE) = 0 THEN
REGEXP_SUBSTR(TRANSLATE(field USING LATIN_TO_UNICODE),'(\d{2})\/(\d{2})\/(\d{4})',1)
ELSE NULL
END AS Ref_Date
For whatever reason, using a TRIM inside the TRANSLATE seems to cause an issue. Only once I striped any and all functions from inside the TRANSLATE did the TRANSLATE, and thus the REGEXP_SUBSTR, work.

RGoogleAnalytics replacing unexpectedly escaped characters with gsub

I'm using RGoogleAnalytics, I'm just at the learning stage at the moment.
I'm following the code in the tutorial here https://code.google.com/p/r-google-analytics/
But when I try to run
ga.goals <- conf$GetGoals()
ga.goals
I get an error message telling me there is an unexpected escaped character '\.' at pos 7
I get a similar message for the next two lines of code (GetSegments)
This question deals with a similar problems in the Facebook Graphs API
How to replace "unexpected escaped character" in R
I've tried using a similar bit of code
confGoalsSub <- gsub('\\.', ' ', conf$GetGoals())
to remove the escaped characters, but I get another error :
cannot coerce type 'closure' to vector of type 'character'
Out of desperation I have tried confGoalsSub <- gsub('\\.', ' ', conf) which returns a character vector that is just garbage (it's just the code for conf with the decimal points stripped out).
Can anyone suggest a better expression than gsub that will return a useful object?
EDIT: As per the suggestion below I've now added the brackets at the end of the function call but I still get the same error message about unexpected escape characters. I get the same error when I try to call other, similar function such as $GetSegments().
I saw on one video at the weekend that this package was broken for a long time, although the speaker did not provide details as to why. Perhaps I should give up and try one of the other Google Analytics packages in R.
Seems odd, given that this one is supposed to be Google supported.
I think this error arises when the RJSON library isn't able to parse the Google Analytics Data Feed properly and convert it into a nested list. The updated version of [RGoogleAnalytics] (http://cran.r-project.org/web/packages/RGoogleAnalytics/index.html) fixes this problem. Currently, you won't be able to retrieve Goals and Segments from your Google Analytics account using the library but beyond that it supports the full range of dimensions and metrics.