How to write a twitter regex expression? - regex

I need a twitter expression:
twitter.com\/[\w\/\.=-]*
This one works fine, except it does capture generic twitter widgets.js library
I would like to capture ALL twitter.com EXCEPT those who cantain "widgets.js" (using regex)
ex:
twitter.com/myaccount: GOOD
twitter.com/JohnDOes: GOOD
twitter.com/widgets.js: BAD
Regards

You can do this with a negative lookahead:
twitter.com\/(?!widgets\.js)[\w\/\.=-]*
Depending on whether you are matching part of a larger text or individual strings, you may want to add some anchors:
^twitter.com\/(?!widgets\.js$)[\w\/\.=-]*$

Related

Regex, Grafana Loki, Promtail: Parsing a timestamp from logs using regex

I want to parse a timestamp from logs to be used by loki as the timestamp.
Im a total noob when it comes to regex.
The log file is from "endlessh" which is essentially a tarpit/honeypit for ssh attackers.
It looks like this:
2022-04-03 14:37:25.101991388 2022-04-03T12:37:25.101Z CLOSE host=::ffff:218.92.0.192 port=21590 fd=4 time=20.015 bytes=26
2022-04-03 14:38:07.723962122 2022-04-03T12:38:07.723Z ACCEPT host=::ffff:218.92.0.192 port=64475 fd=4 n=1/4096
What I want to match, using regex, is the second timestamp present there, since its a utc timestamp and should be parseable by promtail.
I've tried different approaches, but just couldn't get it right at all.
So first of all I need a regex that matches the timestamp I want.
But secondly, I somehow need to form it into a regex that exposes the value in some sort?
The docs offer this example:
.*level=(?P<level>[a-zA-Z]+).*ts=(?P<timestamp>[T\d-:.Z]*).*component=(?P<component>[a-zA-Z]+)
Afaik, those are named groups, and that is all that it takes to expose the value for me to use it in the config?
Would be nice if someone can provide a solution for the regex, and an explanation of what it does :)
You could for example create a specific pattern to match the first part, and capture the second part:
^\d{4}-\d{2}-\d{2} \d\d:\d\d:\d\d\.\d+\s+(?P<timestamp>\d{4}-\d{2}-\d{2}T\d\d:\d\d:\d\d\.\d+Z)\b
Regex demo
Or use a very broad if the format is always the same, repeating an exact number of non whitespace characters parts and capture the part that you want to keep.
^(?:\S+\s+){2}(?<timestamp>\S+)
Regex demo

REGEX number not in a list failing with a long list

I have a list of the following numbers and want a Regular expression that matches when a number is not in the list.
0,1,2,3,4,9,11,12,13,14,15,16,18,19,250
I have written the following REGEX statement.
^(?!.*(0|1|2|3|4|9|11|12|13|14|15|16|18|19|250)).*$
The problem is that it correctly gives a match for 5,6,7,8 etc but not for 17 or 251 for example.
I have been testing this on the online REGEX simulators.
This should resolve your issue..
^(?!\D*(0|1|2|3|4|9|11|12|13|14|15|16|18|19|250)\b).*$
In your earlier regex you were basically saying eliminate all numbers that start with 0/1/2/3/4/9!
So your original regex would actually match 54/623/71/88 but not the others. Also the 11-19 and 250 in the list were rendered useless.
Although as others have I would also recommend you to not use regex for this, as I believe it is an overkill and a maintenance nightmare!
Also an extra note "Variable length look arounds are very inefficient too" vs regular checks.
I would do \b\d+\b to get each number in the string and check if they are in your list. It would be way faster.
You can use the discard technique by matching what you do not want and capturing what you really want.
You can use a regex like this:
\b(?:[0-49]|1[1-689]|250)\b|(\d+)
Here you can check a working demo where in blue you have the matches (what you don't want) and in green the content you want. Then you have to grab the content from the capturing group
Working demo
Not sure what regex engine you are using, but here I created a sample using java:
https://ideone.com/B7kLe0

Regular expression not working in google analytics

Im trying to build a regular expression to capture URLs which contain a certain parameter 7136D38A-AA70-434E-A705-0F5C6D072A3B
Ive set up a simple regex to capture a URL with anything before and anything after this parameter (just just all URLs which contain this parameter). Ive tested this on an online checker: http://scriptular.com/ and seems to work fine. However google analytics is saying this is invalid when i try to use it. Any idea what is causing this?
Url will be in the format
/home/index?x=23908123890123&y=kjdfhjhsfd&z=7136D38A-AA70-434E-A705-0F5C6D072A3B&p=kljdaslkjasd
so i just want to capture URLs that contain that specific "z" parameter.
regex
^.+(?=7136D38A-AA70-434E-A705-0F5C6D072A3B).+$
You just need
^.+=7136D38A-AA70-434E-A705-0F5C6D072A3B.+$
Or (a bit safer):
^.+=7136D38A-AA70-434E-A705-0F5C6D072A3B($|&.+$)
And I think you can even use
=7136D38A-AA70-434E-A705-0F5C6D072A3B($|&)
See demo
Your regex is invalid because GA regex flavor does not support look-arounds (and you have a (?=...) positive look-ahead in yours).
Here is a good GA regex cheatsheet.
To match /home/index?x=23908123890123&y=kjdfhjhsfd&z=7136D38A-AA70-434E-A705-0F5C6D072A3B&p=kljdaslkjasd you can use:
\S*7136D38A-AA70-434E-A705-0F5C6D072A3B\S*

Extract text between two given strings

Hopefully someone can help me out. Been all over google now.
I'm doing some zone-ocr of documents, and want to extract some text with regex. It is always like this:
"Til: Name Name Name org.nr 12323123".
I want to extract the name-part, it can be 1-4 names, but "Til:" and "org.nr" is always before and after.
Anyone?
If you can't use capturing groups (check your documentation) you can try this:
(?<=Til:).*?(?=org\.nr)
This solution is using look behind and lookahead assertions, but those are not supported from every regex flavour. If they are working, this regex will return only the part you want, because the parts in the assertions are not matched, it checks only if the patterns in the assertions are there.
Use the pattern:
Til:(.*)org\.nr
Then take the second group to get the content between the parenthesis.

Regex to extract part of a url

I'm being lazy tonight and don't want to figure this one out. I need a regex to match 'jeremy.miller' and 'scottgu' from the following inputs:
http://codebetter.com/blogs/jeremy.miller/archive/2009/08/26/talking-about-storyteller-and-executable-requirements-on-elegant-code.aspx
http://weblogs.asp.net/scottgu/archive/2009/08/25/clean-web-config-files-vs-2010-and-net-4-0-series.aspx
Ideas?
Edit
Chris Lutz did a great job of meeting the requirements above. What if these were the inputs so you couldn't use 'archive' in the regex?
http://codebetter.com/blogs/jeremy.miller/
http://weblogs.asp.net/scottgu/
Would this be what you're looking for?
'/([^/]+)/archive/'
Captures the piece before "archive" in both cases. Depending on regex flavor you'll need to escape the /s for it to work. As an alternative, if you don't want to match the archive part, you could use a lookahead, but I don't like lookaheads, and it's easier to match a lot and just capture the parts you need (in my opinion), so if you prefer to use a lookahead to verify that the next part is archive, you can write one yourself.
EDIT: As you update your question, my idea of what you want is becoming fuzzier. If you want a new regex to match the second cases, you can just pluck the appropriate part off the end, with the same / conditions as before:
'/([^/]+)/$'
If you specifically want either the text jeremy.miller or scottgu, regardless of where they occur in a URL, but only as "words" in the URL (i.e. not scottgu2), try this, once again with the / caveat:
'/(jeremy\.miller|scottgu)/'
As yet a third alternative, if you want the field after the domain name, unless that field is "blogs", it's going to get hairy, especially with the / caveat:
'http://[^/]+/(?:blogs/)?([^/]+)/'
This will match the domain name, an optional blogs field, and then the desired field. The (?:) syntax is a non-capturing group, which means it's just like regular parenthesis, but won't capture the value, so the only value captured is the value you want. (?:) has a risk of varying depending on your particular regex flavor. I don't know what language you're asking for, but I predominantly use Perl, so this regex should pretty much do it if you're using PCRE. If you're using something different, look into non-capturing groups.
Wow. That's a lot of talking about regexes. I need to shut up and post already.
Try this one:
/\/([\w\.]+)\/archive/