Removing sensitive informations from the logs using regex - regex

In my Ruby app I have the following regex that helps me with removing sensitive informations from logs:
/(\\"|")secure[^:]+:\s*\1.*?\1/
It works when in logs are the following information:
{"secure_data": "Test"}
but when instead of string I have object in logs it does not work:
{"secure_data": {"name": "Test"}}
How can I update regex to work with both scenarios?
https://rubular.com/r/h9EBZot1e7NUkS

You may use this regex with negated character classes and an alternation:
"secure[^:]+:\s*(?:"[^"]*"|{[^}]*})
Inside non-capturing group (?:"[^"]*"|{[^}]*}) we are matching a quoted string or an object that starts with { and ends with }.
Update RegEx Demo

The following should work for what you're trying to do. I'd suggest using a json parser though.
{"secure[^:]*?:\s({?(?:(?:,[^"]*?)?"[^"]*?"(?::\s"[^"]*?")?)*?)*?}?}
With this regex the object in secure_data may also contain multiple key-value(string)-pairs. It will still match. Other objects will not.

Related

Google Tag Manager - Regex match

I want to check if a specific string is included in a GTM variable. The value of this variable is a first-party-cookie value decoded via URI looking like this:
"\"prodirversion\":5,\"panellanguage\":\"de\",\"preferences\":false,"\"marketing\":true,\"necessary\":true,\"statistics\":false,\"social_"
I now want to check if the following string is included.
marketing":true
I created another variable with a regex table and tried different regex expressions but nothing seems to work. It works on online regex tester but not in Google Tag Manager.
My guess would be the following but it doesn't work.
marketing\\":true
or
marketing.{3}true
or
marketing\\.{2}true
GTM variable
Some Regex engines will have an error on not escaping " char in marketing\\":true
Try escaping it like this: marketing\\\":true, and it should match.
Update:
marketing":true seems to be working in GTM
from that, we can conclude that escaping character \ in input string is for show only in GTM case, and should be ignored when regex testing/debugging.

Using RegEx with Alteryx to replace string

I have a simple issue: Using Alteryx, I want to take a string, match a certain pattern and return the matched pattern.
This is my current approach:
Regex_replace("CP:ConsumerProducts&Retail</td><td><strong><fontcl","[^\<]+","$1")
According to various sources and tools like regex101, the first matched sequence should be "CP:ConsumerProducts&Retail". However, Alteryx returns
<<<<
Alteryx uses the Perl RegEx Syntax (https://help.alteryx.com/2018.2/boost/syntax_perl.html), therefore, it should have no problem with the pattern itself.
I believe I am missing something obvious but I cannot figure it out.
I have received a reply through a different forum. A solution that works for me is to use the following pattern: ([^\<]+).*
You can try the following workflow:

Regular expression not working in google analytics

Im trying to build a regular expression to capture URLs which contain a certain parameter 7136D38A-AA70-434E-A705-0F5C6D072A3B
Ive set up a simple regex to capture a URL with anything before and anything after this parameter (just just all URLs which contain this parameter). Ive tested this on an online checker: http://scriptular.com/ and seems to work fine. However google analytics is saying this is invalid when i try to use it. Any idea what is causing this?
Url will be in the format
/home/index?x=23908123890123&y=kjdfhjhsfd&z=7136D38A-AA70-434E-A705-0F5C6D072A3B&p=kljdaslkjasd
so i just want to capture URLs that contain that specific "z" parameter.
regex
^.+(?=7136D38A-AA70-434E-A705-0F5C6D072A3B).+$
You just need
^.+=7136D38A-AA70-434E-A705-0F5C6D072A3B.+$
Or (a bit safer):
^.+=7136D38A-AA70-434E-A705-0F5C6D072A3B($|&.+$)
And I think you can even use
=7136D38A-AA70-434E-A705-0F5C6D072A3B($|&)
See demo
Your regex is invalid because GA regex flavor does not support look-arounds (and you have a (?=...) positive look-ahead in yours).
Here is a good GA regex cheatsheet.
To match /home/index?x=23908123890123&y=kjdfhjhsfd&z=7136D38A-AA70-434E-A705-0F5C6D072A3B&p=kljdaslkjasd you can use:
\S*7136D38A-AA70-434E-A705-0F5C6D072A3B\S*

REGEX: excluding a string from a pattern that is already excluding particular characters

I'm trying to write a regular expression to use to validate a url path. We orginally had the pattern: [^#\?:]+ which would grab everything up until the first ?, : or # from the path.
We now want to also exclude the string 'index.cfm'.
I can't work out how to include this though. I've had a look at lookarounds but I can't seem to work out how to use it in conjunction with the pattern we already have.
EDIT: Here's an edited solution according to your comment.
^.*?(?=[#?:]|index\.cfm|$)
Here's a demo using the site you mentioned: http://regexr.com?31rk9.
should work for you
^(?P<url>[^#?:](?!index\.cfm))+

Extract text between two given strings

Hopefully someone can help me out. Been all over google now.
I'm doing some zone-ocr of documents, and want to extract some text with regex. It is always like this:
"Til: Name Name Name org.nr 12323123".
I want to extract the name-part, it can be 1-4 names, but "Til:" and "org.nr" is always before and after.
Anyone?
If you can't use capturing groups (check your documentation) you can try this:
(?<=Til:).*?(?=org\.nr)
This solution is using look behind and lookahead assertions, but those are not supported from every regex flavour. If they are working, this regex will return only the part you want, because the parts in the assertions are not matched, it checks only if the patterns in the assertions are there.
Use the pattern:
Til:(.*)org\.nr
Then take the second group to get the content between the parenthesis.