GCP - DLP - Regex

GCP - DLP - Regex - regex

Trying to process a BigQuery table with a custom infotype of RegEx variety.
RegEx I am using: ^(\d{5})$
In table below, I am trying to tag only against the "Codes" which are 5 digits. With the above RegEx, there are 0 matches.
With the following RegEx: \d{5}
It matches against all instances of 5 digits (including the two in Other)
How do I get it so that it only matches against 5 digits at the start of a "cell"(?) and ending with the 5th digit? Thanks a lot, been bogged down by this.
Name | Other | Code
Blah | Test12345 | 12345
Bleh | 54311Test | 54311

Try following RegEx:
\b\d{5}\b

Your first instinct of using ^(\d{5})$ should have worked, but did not work because of a bug within the custom regex feature.
The Cloud DLP API team is aware of this issue and they are working on a fix.
Update: Bug has been fixed so this works now. Using \b(regex)\b works as well.

Related

Fetch all the entries/word/line after a specfic word in Grafana using regex

I am new to regex and need some help in fetching all the entries after a specific word. I am working on Grafana panels and using option "Filter data by values" under edit option
Let the specific word be "sample_word" post which I want to fetch everything no matter what.
My sample string is below
700 <10> 2022-11-21T05:00:09 sample_word="abc.net"] 2022-11-21T05:00:09 | api.call | line 100 | INFO | [123456]
I tried below which but couldn't figure out the solution
/.+?(?=sample_word)/g

You could use a fixed width positive lookbehind here:
(?<=\bsample_word).*
This would match all content in the log line after sample_word. Here is a demo.

How to extract the year from a URL path using REGEXP_EXTRACT in Google Data Studio?

I'm building out a Google Data Studio dashboard and I need to create a calculated field for the year a post was published. The year is in the URI path, but I'm not sure how to extract it using REGEXP_EXTRACT. I've tried a number of solutions proposed on here but none of them seem to work on Data Studio.
In short, I have a URI like this: /theme/2019/jan/blog-post-2019/
How do I use the REGEXP_EXTRACT function to get the first 2019 after theme/ and before /jan?

Try this:
REGEXP_EXTRACT(Page, 'theme\/([0-9]{4})\/[a-z]{3}\/')
where:
theme\/ means literally "theme/";
([0-9]{4}) is a capturing group containing 4 characters from 0 to 9 (i.e. four digits);
\/[a-z]{3}\/ means a slash, followed by 3 lowercase letters (supposing that you want the regex to match all the months), followed by another slash. If you want something more restrictive, try with \/(?:jan|feb|mar|...)\/ for the last part.
See demo.

As you mentioned, I think you only want to extract the year between the string. The following will achieve that for you.
fit the query as per your needs
SELECT *
FROM Sample_table
WHERE REGEXP_EXTRACT(url, "(?<=\/theme\/)(?<year>\d{4})(?=\/[a-zA-Z]{3})")

Regex to remove everything after -i- (with -i-)

I was trying to find solution for my problem.
Input: prd-abcd-efgh-i-0dflnk55f5d45df
Output: prd-abcd-efgh
Tried Splunk Query : index=aws-* (host=prd-abcd-efgh*) | rex field=host "^(?<host>[^.]+)"| dedup host | stats count by host,methodPath
I want to remove everything comes after "-i-" using simple regex.I tried with regex "^(?[^.]+)" listed here
https://answers.splunk.com/answers/77101/extracting-selected-hosts-with-regex-regex-hosts-with-exceptions.html
Please help me to solve it.

replace(host, "(?<=-i-).*", "")
Example here: https://regex101.com/r/blcCcQ/2
This (?<=-i-) is a lookbehind

I have no knowledge of Splunk. but the normal way to do that would be to match the part you don't want and replace it with an empty string.
The regex for doing that could be:
-i-.*
Then replace the match with an empty string.

Something simple like this should work:
([a-z-]+)-i-.+
The first capture group will return only the part preceding -i-.

splunk: Get the first three numbers from ip address

I'm trying to get the first three sets of numbers of an IP address which is in this format: 10.10.10.10
Desired value would be 10.10.10

Try this regex: ^(.+)(?=\.\d+$)
DEMO
And from next time please post what have you tried along with how you plan to reach the solution.

Regex to match a correct IP4Address:
/^(([01]?\d?\d|2[0-4]\d|25[0-5])\.){3}([01]?\d?\d|2[0-4]\d|25[0-5])$/
Regex101
Regex to match first three blocks of an correct IP4Address:
/^(([01]?\d?\d|2[0-4]\d|25[0-5])\.){2}([01]?\d?\d|2[0-4]\d|25[0-5])$/
Regex101
or if it is still fine when it matches a point after the third block:
/^(([01]?\d?\d|2[0-4]\d|25[0-5])\.){3}$/
Regex101

was able to get it this way:
rex field=IP "(?<first_three>\d+\.\d+\.\d+)\.\d+"

Another method to do.
..| rex field=ip_addr "(?<split_ip>.+)\.[0-9]+"
Where,
ip_addr - field name
split_ip - variable under which the split IP address will be stored
Example:
Splunk Query:
| stats count | eval ip = "115.124.35.123" | rex field=ip "(?<split_ip>.+)\.[0-9]+" | table split_ip
Output:
115.124.35

Below works for me.
rex field=_raw "(?<ip_address>^\d+\.\d+\.\d+\.\d+)"|timechart count by ip_address

Use below regex :
^(?P<result>.+(?=\.\d+))
[link] https://regex101.com/r/bO4tY5/3

https://regex101.com/ is a super useful tool for this kind of stuff. It lets you write your regex and test it for different strings in real time.
Once you've got what you need, stick it into your Splunk search query with the rex command.
To answer your exact problem:
The regex code, where MY_FIELD_NAME_HERE is the name of the extracted field:
(?<MY_FIELD_NAME_HERE>\d+\.\d+\.\d+)\.\d+
The regex with examples from regex101:
https://regex101.com/r/qTTf4e/2
The command required for the Splunk query language, where ORIGNAL_FIELD is your original field holding 10.10.10.10 and MY_FIELD_NAME_HERE is the extracted field:
... | rex field="ORIGNAL_FIELD" "(?<MY_FIELD_NAME_HERE>\d+\.\d+\.\d+)\.\d+"

RegEx backreference followed by a number in Dreamweaver

I want to search for a specific pattern which contains a numeric "1" and replace it with the same string followed by the numeric "2". But if I call $12 then the output is the literal "$12". The regex engine seemingly tries to find the memory slot 12, but I intended to address the memory slot 1, and then write "2".
I tried to create a fiddle but this doesn't reproduce the error, so apparently it has something to do with my editor. I am using Dreamweaver CS6. If not with Dreamweaver then maybe my Dreamweaver settings.
Also, I just found this question which refers to my exact same problem – but the answer provided there doesn't work for me. $012 just writes "$012". I guess the Dreamweaver RegExp engine is peculiar like that.
Any ideas?
EDIT:
Given the example text …
This is item 1
This is house 3
… and the pattern ((?:item|house) )\d
what I tried | what I'm getting
$12 | $12
$012 | $012
\g{1}2 | \g{1}2
$g{1}2 | $g{1}2
$12 | item2 // or "house"
${1}2 | ${1}2
"$1"+2 | "item"+2
The desired result is always:
This is item 2
This is house 2
Because it was asked: yes, I am sure that the RegExp checkbox is activated and yes, I am sure that I'm in the Code view, not the Design view. I always work in Code view.
My Dreamweaver is CS6 Version 12.0 Build 5861.

This is a well-known bug in Dreamweaver. Fortunately, there are workarounds.
For argument's sake, let's say you are looking for letters and want to append a 2.
Method 1
I tested the following in Dreamweaver CS6.
Input: abc
Search in code view: ([a-z]+)
Replace: $12
Output in code view: abc2
Output in design view: abc2
Note that the output in code view is abc2, but because 2 encodes 2, on the web page you see abc2
Method 2: Two-step approach
Same search.
Replace: $1SOMETHINGDISTINCTIVE
Then search for SOMETHINGDISTINCTIVE and replace with 2
Finally
Of course some would argue that the real workaround is to work in Komodo IDE (or whatever editor they fancy), but that is not your question. :)

Lets say the Test String i.e. the string to match or select, is
aabbbccbbbaacc2
Case 1: Using Backreference for Matching or Selecting
Find/Search:
(a+)(b+)(c+)\2\1\3\d
Case 2: Using Backreference for Match or Select & Replace
Say I Expect The Result as
aacc9bbb
Find/Search:
(a+)(b+)(c+)\2\1\3\d
Replace With:
\1\039\2
or
\1$039\2
So It's NOT \3 but \03 or $03, when it is followed by a numeric character, in the Replace With Field.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

GCP - DLP - Regex - regex

Try following RegEx: \b\d{5}\b

Your first instinct of using ^(\d{5})$ should have worked, but did not work because of a bug within the custom regex feature. The Cloud DLP API team is aware of this issue and they are working on a fix. Update: Bug has been fixed so this works now. Using \b(regex)\b works as well.

Related

Fetch all the entries/word/line after a specfic word in Grafana using regex

How to extract the year from a URL path using REGEXP_EXTRACT in Google Data Studio?

Regex to remove everything after -i- (with -i-)

splunk: Get the first three numbers from ip address

RegEx backreference followed by a number in Dreamweaver

Categories

Resources