RegEx all URLs that do NOT contain a string - regex

I seem to be having a bit of a brain fart atm. I've got Google counting my transitions correctly but I'm getting false positives.
This is the current goal RegEx which works great.
^/click/[0-9]+\.html\?.*
But I also want it the RegEx to NOT county anything that has &confirm=1 I'm quite stuck as to how to do that in the RegEx, I thought I might be able to use [^(?:&confirm=1)] but I don't think that's valid.

Use "exclude", not "include" filter option

Try this:
^/click/[0-9]+\.html\?(?!.*\bconfirm=1).*
I changed it slightly so it will still exclude if confirm=1 is the first param (preceded by the ? rather than &)

I'm afraid you can't... I've tried doing this before, what I found was that you used to be able to do this with negative lookahead (see Rubens), but Google Analytics stopped supporting this at some point (source: http://productforums.google.com/forum/#!topic/analytics/3YnwXM0WYxE).

Maybe I'm a little late.
What about just writing :
[^(&confirm=1)]
?

Related

Rewrite regex without negation

I have wrote this regex to help me extract some links from some text files:
https?:\/\/(?:.(?!https?:\/\/))+$
Because I am using golang/regexp lib, I'm not able to use it, due to my negation (?!..
What I would like to do with it, is to select all the text from the last occurance of http/https till the end.
sometextsometexhttp://websites.com/path/subpath/#query1sometexthttp://websites.com/path/subpath/#query2
=> Output: http://websites.com/path/subpath/#query2
Can anyone help me with a solution, I've spent several hours trying different ways of reproducing the same result with no success.
Try this regex:
https?:[^:]*$
Regex live here.
The lookaheads exist for a reason.
However, if you insist on a supposedly equivalent alternative, a general strategy you can use is:
(?!xyz)
is somewhat equivalent to:
$|[^x]|x(?:[^y]|$)|xy(?:[^z]|$)
With that said, hopefully I didn't make any mistakes:
https?:\/\/(?:$|(?:[^h]|$)|(?:h(?:[^t]|$))|(?:ht(?:[^t]|$))|(?:htt(?:[^p]|$))|(?:http(?:[^s:]|$))|(?:https?(?:[^:]|$))|(?:https?:(?:[^\/]|$))|(?:https?:\/(?:[^\/]|$)))*$

Monitoring bad links with RegEx in Google Analytics

How do I optimize this to find all links ending in weird typos, yet still exclude correct links (ending with .html) from the results?
htmll$|hhtml$|httml$|htmml$|htmll$|btml$|hml$|htl$
Thanks in advance!
Wow, that's some pretty restrictive regex rules but that kinda makes it interesting.
since we have no character negation but we do have character classes we could do:
[a-gi-z]tml$|h[a-su-z]ml|ht[a-ln-z]l|htm[a-km-z]
for my second suggestion and:
h.+tml|ht.+ml|htm.+l|html.+
to replace the first option leading to a total of:
[a-gi-z]tml$|h[a-su-z]ml|ht[a-ln-z]l|htm[a-km-z]|h.+tml|ht.+ml|htm.+l|html.+
EDIT: Having noticed that the .+'s can catch things we don't want this should be changed slightly.
(.*[a-gi-z]tml|h.*[a-su-z]ml|ht.*[a-ln-z]l|htm.*[a-km-z])$

Need a regex that will give me pagepath without domain

I been trying to use regexextract in docs (or excel) to get the pagepath of a url - like what is after the tld.
example http://google.com/this-folder/this-page-is-here
I just want it to extract /this-folder/this-page-is-here, but so far I can only get this-page-is-here or /this-folder seperately.
Sorry I'm not too good with regex can anyone help me out?
This is what I've tried
=regexextract(A1; "\//*\/*.*\/(.*)")
which returns this-page-is-here
But I've been trying it so long I don't even understand life anymore can someone show me how you're supposed to do this?
=REGEXEXTRACT(A1,"//.+?(/.*)")
Tested working - You need to add the ? to make the .+ non-greedy (stop matching ASAP)
Taking your version and adding the ? fixes it as well (I also removed an extra / at the beginning)
=regexextract(A1; "//.*?/(.*)")

Regular expression to exclude local addresses

I'm trying to configure my Foxy Proxy program and one of the features is to provide a regular expression for an exclusion list.
I'm trying to blacklist the local sites (ending in .local), but it doesn't seem to work.
This is what I attempted:
^(?:https?://)?\d+\.(?!local)+/.*$
^(?:https?://)?\d+\.(?!local)(\d)+/.*$
I also researched on Google and Stack Exchange with no success.
Since you indicate in the comments that you actually need a whitelist solution, I went with that:
Try: ^(?:https?://)?[\w.-]+\\.(?!local)\w+/.*$
http://regex101.com/r/xV4gS0
Your regex expressions match host names which start with a series of digits followed by a period and then not followed by the string "local". If this is a "blacklist", then that hardly seems like what you want.
If you're trying to match all hostnames which end in .local, you'd want something like the following for the hostname portion:
[^/]*\.local(?:/|$)
with appropriate escapes inserted depending on regex context.
If your original question was incorrect and you really need a whitelist, then you'd want something like:
^(?:(?!\.local)[^\/])*(?:\/|$)
as illustrated in http://regex101.com/r/yB0uY4
Thank you everyone to help. Indeed, it turns out that for this program, enlisting "not .local" as blacklist, it's not the same as "all .local" as whitelist.
I also had a rookie mistake on my pattern. I meant "\w" instead of "\d". Thank you Peter Alfvin for catching that.
So my final working solution is what Bart suggested:
^(?:https?://)?[\w.-]+\.(?!local)\w+/.*$ as a whitelist.

Regex - match a string not contain a 'semi-word'

I tried to make regex syntax for that but I failed.
I have 2 variables
PlayerInfo[playerid][pLevel]
and
Character[playerid]
and I want to catch only the second variable,I mean only the world what don't contain PlayerInfo, but cointains [playerid]
"(\S+)\[playerid\]" cath both words and (\S+[^PlayerInfo])\[playerid\] jump on some variables- they contais p,l,a,y ...
I need to replace in notepad++,all variables like Text[playerid] to ExClass [playerid][Text]
Couple Pluasible solutions.
List item
Notepad has a plugin called python script. Running regex from there
gives full regex functionality, the python version anyway, and a lot
of powerful potential beyond that. And I use the online python regex tester to help out.
RegRexReplace plugin helps create regex plugins in Notepad++, so when you do hit a limitation, you find out a lot quicker.
Or of course default to your alternate editor (I'm assuming you have
one?) or this online regex tool is absolutely amazing. You
can perform the action on the text online as well.
(I'd try to build a regex for you, but I'm a bit lost as to what you're looking for. Unless the Ivo Abeloos got it. If you're still coming up short, maybe a code example along with values displayed?)
Good luck!
It seems that Notepad++ support negative lookbehind since v6.
In notepad++ you could try to replace (.+)\[(.+)\] with ExClass\[\2\]\[\1\]
Try to use negative lookbehind.
(?<!PlayerInfo)\[playerid\]
EDIT: unfortunately notepad++ does not support negative lookbehind.
I tried to make a workaround based on the following naive idea:
(.[^o]|[^f]o)[playerid]
But this expression does not work either. Notepad++ seems to fail in alternative operator. Thus the answer is: it is impossible to do exactly what you want. Try to solve the problem in other way or use alternative tool.