Google Data Studio: Show users for certain URL-path - regex

I want to show and compare usertraffic from Google-Analytics in Data Studio. I need to break up the traffic between localized versions of our page.
Pages Paths and user groups
The basic language is German and hosted on www.domain.ltd/. The URL-path for english content is www.domain.ltd/en/ and for polish content we use the URL www.domain.ltd/pl/.
I want separate the user-traffic for each language-path and compare it in a linear diagram.
RegEx
I set up a new field with RegEx:
case
WHEN REGEXP_MATCH(Page, '^.*garbe-industrial\\.de\\/en\\/.*') THEN "Tarffic EN"
WHEN REGEXP_MATCH(Page, '^.*garbe-industrial\\.de\\/pl\\/.*') THEN "Tarffic PL"
else "Tarffic DE"
end
I combined the new field to a linear diagram. The diagram does not show any data.
I tried different approaches:
RegEx: .*\/en\/.* instead of the donain-path-version (1)
--
(1) Update: changed the format to "code" in order to make the full RegEx visible.

I wonder if your regular expression is too specific in the code example. I would think that just putting in .*/en/.* and .*/pl/.* would work to capture those pages. I do something similar for our Spanish pages. This works on my pages:
case when regexp_match(page, '.*/es/.*') then "Spanish"
else "English"
end

Related

Google Sheets Highlighting using Conditional Formatting for Text that DOESN'T Contain "x" - TWO Criteria

I plan to use Google Sheet's conditional formatting to highlight cells where the text DOES NOT contain:
Retail
FinServ
Manufacturing
Field Service
Managed Services
Digital Transformation
Ecommerce
Data and Analytics
For the above phrases, I want to be able to add additional details, separated by an underscore (_), and have the row still NOT be highlighted. For ex: Retail_Blog should still NOT be highlighted because it begins with one of the phrases above.
To do this, I'm currently using the formula:
=regexmatch(F:F,"Retail|FinServ|Manufacturing|Field Service|Managed Services|Digital Transformation|Ecommerce|Data and Analytics")=FALSE
This formula works great for the specifications above, but I also would like the formula to do adhere to another rule.
For the phrases below, I would like the formula to highlight cells if they DON'T EXACTLY match the phrases. For ex: "Meetings" should NOT be highlighted, but "meetings," "Meeting," and "Meetings_whatever" SHOULD be highlighted.
Meetings
Website Updates
Press Release and Distribution
Calendar Planning
Also, this formula would be for the range F:F.
Formula
=regexmatch(F:F,"Retail|FinServ|Manufacturing|Field Service|Managed Services|Digital Transformation|Ecommerce|Data and Analytics|^Meetings$|^Website Updates$|^Press Release and Distribution$|^Calendar Planning$")=FALSE
Explanation
^ means match start
$ means match end

How to convert MS Word Smart Quotes and em-dashes to simple quotes and dashes in Ckeditor 4

Hi I really like the new Ckeditor 4 Advanced Content Filtering along with the pastefromword plugin - and have read the docs on what html tags to allow and not, and I understand why it kindly converts my client's MS Word crap into htmlentities. However, I'd like to do a little intervention and convert the smart quotes to straight quotes - and all em dashes to plain dashes and not allow - before the text gets sent to the CMS database. But I can't find any docs on this or examples.
I can see there were many questions about this on the old forum Ckeditor forum http://ckeditor.com/forums/CKEditor-3.x/Replacing-smart-quotes-regular-quotes, http://ckeditor.com/forums/CKEditor-3.x/Problem-copyingpasting-MS-Word but they didn't get answered.
I'm also hoping the ckeditor team reads these forums as this is where they suggest we post questions now.
CKEditor dev here.
If you want the Paste From Word plugin to do this, you could add a rule in the plugin that replaces the contents of text nodes.
To achieve this add a property named 'text' somewhere over here(on the same level as the 'comment' property):
https://github.com/ckeditor/ckeditor-dev/blob/master/plugins/pastefromword/filter/default.js#L1106
It should be a function that accepts one parameter - the text node content, e.g.:
text: function( content ) {
return content.replace(/[\u201E\u201C]/g,'"'); // Unicode for „ and “
}
This way whenever the PFW plugin filter encounters a text node it'll replace its contents with whatever is returned by the above mentioned function.
Caveats: there are quite a few Unicode symbols that represent quotation marks and dashes.
By the way: you may not want to get too attached to the current Paste From Word plugin - we're planning a major refactor of it for v4.6.
I hope this was helpful.

CloudSearch wildcard query not working with 2013 API after migration from 2011 API

I've recently upgraded a CloudSearch instance from the 2011 to the 2013 API. Both instances have a field called sid, which is a text field containing a two-letter code followed by some digits e.g. LC12345. With the 2011 API, if I run a search like this:
q=12345*&return-fields=sid,name,desc
...I get back 1 result, which is great. But the sid of the result is LC12345 and that's the way it was indexed. The number 12345 does not appear anywhere else in any of the resulting document fields. I don't understand why it works. I can only assume that this type of query is looking for any terms in any fields that even contain the number 12345.
The reason I'm asking is because this functionality is now broken when I query using the 2013 API. I need to use the structured query parser, but even a comparable wildcard query using the simple parser is not working e.g.
q.parser=simple&q=12345*&return=sid,name,desc
...returns nothing, although the document is definitely there i.e. if I query for LC12345* it finds the document.
If I could figure out how to get the simple query working like it was before, that would at least get me started on how to do the same with the structured syntax.
Why it's not working
CloudSearch v1 (2011) had a different way of tokenizing mixed alpha+numeric strings. Here's the logic as described in the archived docs (emphasis mine).
If a string contains both alphabetic and numeric characters and is at
least three and no more than nine characters long, the alphabetic and
numeric portions of the string are treated as separate tokens. For
example, the string DOC298 is tokenized into two terms: doc 298
CloudSearch v2 (2013) text processing follows Unicode Text Segmentation, which does not specify that behavior:
Do not break within sequences of digits, or digits adjacent to letters (“3a”, or “A3”).
Solution
You should just be able to search *12345 to get back results with any prefix. There may be some edge cases like getting back results you don't want (things with more preceding digits like AB99912345); I don't know enough about your data to say whether those are real concerns.
Another option would would be to index the numeric prefix separately from the alphabetical suffix but that's additional work that may be unnecessary.
I'm guessing you are using Cloudsearch in English, so maybe this isn't your specific problem, but also watch out for Stopwords in your search queries:
https://docs.aws.amazon.com/cloudsearch/latest/developerguide/configuring-analysis-schemes.html#stopwords
In your example, the word "jo" is a stop word in Danish and another languages, and of course, supported languages, have a dictionary of stop words that has very common ones. If you don't specify a language in your text field, it will be English. You can see them here: https://docs.aws.amazon.com/cloudsearch/latest/developerguide/text-processing.html#text-processing-settings

Regex for dates format

I am working under the Web Application based on ASP.NET MVC 5 and I have a great problem in my project with the field which gives the user the ability to choose format for showing Dates in the application.
The goal is to make RegularExpressionAttribute with the regex for validation date formats inputted by user.
Acceptable formats must be:
m/d/y,
m-d-y,
m:d:y,
d/m/y,
d-m-y,
d:m:y,
y/m/d,
y-m-d,
y:m:d
and the length of the date symbols may be as 'y' so far 'yyyy'. And they can be upper case.
So after hard-coding I've made the acceptable one:
((([mM]{1,4})([\/]{1})([dD]{1,4})([\/]{1})([yY]{1,4}))|(([mM]{1,4})([\-]{1})([dD]{1,4})([\-]{1})([yY]{1,4}))|(([mM]{1,4})([\:]{1})([dD]{1,4})([\:]{1})([yY]{1,4})))|((([dD]{1,4})([\/]{1})([mM]{1,4})([\/]{1})([yY]{1,4}))|(([dD]{1,4})([\-]{1})([mM]{1,4})([\-]{1})([yY]{1,4}))|(([dD]{1,4})([\:]{1})([mM]{1,4})([\:]{1})([yY]{1,4})))|((([yY]{1,4})([\/]{1})([mM]{1,4})([\/]{1})([dD]{1,4}))|(([yY]{1,4})([\-]{1})([mM]{1,4})([\-]{1})([dD]{1,4}))|(([yY]{1,4})([\:]{1})([mM]{1,4})([\:]{1})([dD]{1,4})))|((([yY]{1,4})([\/]{1})([dD]{1,4})([\/]{1})([mM]{1,4}))|(([yY]{1,4})([\-]{1})([dD]{1,4})([\-]{1})([mM]{1,4}))|(([yY]{1,4})([\:]{1})([dD]{1,4})([\:]{1})([mM]{1,4})))
This one works... But according to my scarce regex knowledge and experience I hope to get some help and better example for resolving this puzzle.
Thanks.
You have to generalize a bit.
m{1,4}([:/-])d{1,4}\1y{1,4}|d{1,4}([:/-])m{1,4}\2y{1,4}|y{1,4}([:/-])m{1,4}\3d{1,4}
Explanation:
instead of e.g. [mM] use m and set option for case insensitive match
([:/-]) all allowed delimiters as group
\1...\3 back reference to the delimiter group 1...3

Stripping superscript from plaintext

I often grab quotes from articles that include citations that include superscripted footnotes, which when copied are a pain in the ass. They show up as actual letters in the text as they are pasted in plaintext and not in html.
Is there a way I could run this through a regex to take out these superscripts?
For example
In the abeginning bGod ccreated the dheaven and the eearth.
Should become
In the beginning God created the heaven and the earth.
I can't think of a way to have regex search for misspellings and a corresponding sequential set of numbers and letters.
Any thoughts? I'm also using Sublime Text 3 for the majority of my writing, but I wouldn't mind outsourcing this to an AppleScript, or text replacement app (aText, textExpander, etc.).
Matching Code vs. Matching a Screen
It's hard to tell without seeing an example, but this should be doable if you copy the text from code view, as opposed to the regular browser view. (Ctrl or Cmd-J is your friend). Since writing the rules will take time, this will only be worthwhile for large chunks of text.
In code view, your superscript will be marked up in a way that can be targetted by regex. For instance:
and therefore bananas make you smartera
in the browser view (where the a at the end is a citation note) may look like this in code view:
and therefore bananas make you smarter<span class="mycitations">a</span>
In your editor, using regex, you can process the text to remove all tags, or just certain tags. The rules may not always be easy to write, and of course there are many disclaimers about using regex to parse html.
However, if your source is always the same (Wikipedia for instance), then you can create and save rules that should work across many pages.