Retrieving email string using regex - regex

So I am looking to extract a set of characters from unique reset verification codes that I get in my emails. Meaning, what im trying to extract will be different every time. This is the example:
"You requested one-time code for authentication.
Your code is 7a8c28
Enter the code to verify your login."
I am trying to extract the "7a8c28" (without the quotation marks).
This is the regex expression I have written because I am trying to remove the whitespace after the "is":
[^is_\s*]*$
However, that expression above spits out a single period, and not the 7a8c28.
Am i missing something here? Or is there a better expression to use? Thank you for any assistance.

\sis\s(\w+)\b
captures the code as captured group 1 ($1).

\bis\h*(\S+)$
will capture in group 1 the value you're searching for.
Demo & explanation

Related

Regex, Grafana Loki, Promtail: Parsing a timestamp from logs using regex

I want to parse a timestamp from logs to be used by loki as the timestamp.
Im a total noob when it comes to regex.
The log file is from "endlessh" which is essentially a tarpit/honeypit for ssh attackers.
It looks like this:
2022-04-03 14:37:25.101991388 2022-04-03T12:37:25.101Z CLOSE host=::ffff:218.92.0.192 port=21590 fd=4 time=20.015 bytes=26
2022-04-03 14:38:07.723962122 2022-04-03T12:38:07.723Z ACCEPT host=::ffff:218.92.0.192 port=64475 fd=4 n=1/4096
What I want to match, using regex, is the second timestamp present there, since its a utc timestamp and should be parseable by promtail.
I've tried different approaches, but just couldn't get it right at all.
So first of all I need a regex that matches the timestamp I want.
But secondly, I somehow need to form it into a regex that exposes the value in some sort?
The docs offer this example:
.*level=(?P<level>[a-zA-Z]+).*ts=(?P<timestamp>[T\d-:.Z]*).*component=(?P<component>[a-zA-Z]+)
Afaik, those are named groups, and that is all that it takes to expose the value for me to use it in the config?
Would be nice if someone can provide a solution for the regex, and an explanation of what it does :)
You could for example create a specific pattern to match the first part, and capture the second part:
^\d{4}-\d{2}-\d{2} \d\d:\d\d:\d\d\.\d+\s+(?P<timestamp>\d{4}-\d{2}-\d{2}T\d\d:\d\d:\d\d\.\d+Z)\b
Regex demo
Or use a very broad if the format is always the same, repeating an exact number of non whitespace characters parts and capture the part that you want to keep.
^(?:\S+\s+){2}(?<timestamp>\S+)
Regex demo

How to exclude/remove text within an import.io field extract automatically

I'm using regex within import.io to only match fields when they start with a certain string which works correctly. So for example i use the following to match items that start with 'testing testing ':
^(testing\stesting\s.+
It there any way to have it return the value excluding this string (presumably using xpath in addition to regex?) So if the field value is "testing testing 1234" then i would want it to just return "1234" without the "testing testing" at the front?
Obviously I can do this manually afterwards but want to try and find a way of doing it automatically as part of the export?
Thanks,
Dave
You could use (?:\d+) this would get all the digits within a capturing group. I usually use http://regexr.com/ to build regex, it offers great cheatsheets and reference material.

RegEx SQL, issue escaping quotes

I am trying to use PSQL, specifically AWS Redshift to parse a line. Sample data follows
{"c.1.mcc":"250","appId":"sx-calllog","b.level":59,"c.1.mnc":"01"}
{"appId":"sx-voice-call","b.level":76,"foreground":9}
I am trying the following regex in order to to extract the appId field, but my query is returning empty fields.
'appId\":\"[\w*]\",'
Query
SELECT app_params,
regexp_substr(app_params, 'appId\":\"[\w*]\",')
FROM sample;
You can do that as follows:
(\"appId\":\"[^"]*\")(?:,)
Demo: http://regex101.com/r/xP0hW3
The first extracted group is what you want.
Your regex was not matching because \w does not match -
Adding this here despite this being an old question since it may help someone viewing this down the road...
If your lines of data are valid json, you can use Redshift's JSON_EXTRACT_PATH_TEXT function to extract the value a given key. Emphasis on the json being valid, as it will fail if even one line cannot be parsed and Redshift will throw a JSON parsing error.
Example using given data:
select json_extract_path_text('{"c.1.mcc":"250","appId":"sx-calllog","b.level":59,"c.1.mnc":"01"}','appId');
returns sx-calllog
This is especially useful since Redshift does not support lookahead/lookbehind (it is POSIX regex) & extract groups.
You can try using some lookahead and look behinds to isolate just the text inside the quotes for the appid. (?<=appId\":\")(?=.*\",)[^\"]*. I tested this out a bit using your examples you provided here.
To explain the regex a bit more: (?<=appId\":\")(?=.*\",)[^\"]*
(?<=appId\":\"): positive look behind for appid":". Since you don't want the appid text itself being returned (just the value), you can preface the regex with a look behind to say "find me the following regex, but only when it is following the look behind text.
(?=.*\",): positive look ahead for the ending ",. You don't want quotes to be returned in your match, but as with number 1 you want your regex to be bounded a bit and a look ahead does that.
[^\"]*: The actual matching portion. You want to find the string of chars that are NOT ". This will match the entire value and stop matching right before the closing ".
EDIT: Changed the 3rd step a little bit, removed the , from that last piece, it is not needed and would break the match if the value were to actually contain a ,.

Reg Ex Facebook

I am trying to extract some information from facebook using Regex. Here is a link with an example:
https://graph.facebook.com/210989592315921
I was interested in what would the regular expression be in order to extract just the number of likes from this string.
I have tried for example this expression:
"likes":\s[0-9]$
Thank you in advance for any advice regarding this matter,
Mark
You should follow "#Hope I helped" comment and use a json parser. You can't be sure the text is going to be formatted always the same way:
Are you always going to have a single space between : and the number ?
By the way, here is the error you are looking for, your current regex matches a single figure, not a multiple digit number, you should use something like: [0-9]+ and probably remove the $ which is not correct in your example, as you have a comma after the number.

Extract text between two given strings

Hopefully someone can help me out. Been all over google now.
I'm doing some zone-ocr of documents, and want to extract some text with regex. It is always like this:
"Til: Name Name Name org.nr 12323123".
I want to extract the name-part, it can be 1-4 names, but "Til:" and "org.nr" is always before and after.
Anyone?
If you can't use capturing groups (check your documentation) you can try this:
(?<=Til:).*?(?=org\.nr)
This solution is using look behind and lookahead assertions, but those are not supported from every regex flavour. If they are working, this regex will return only the part you want, because the parts in the assertions are not matched, it checks only if the patterns in the assertions are there.
Use the pattern:
Til:(.*)org\.nr
Then take the second group to get the content between the parenthesis.