Regex with non-capturing hashbangs - regex

I'm trying to write a regex which will parse the hash portion of a URL, removing whichever conventionally-formatted hashbang may be present.
For example, I wish to remove any of the following:
#
#/
#!
#!/
This is what I currently have:
/[(?:#|#\/|#!|#!\/)]+/
However, this is capturing an empty group at the start, and splitting the remaining strings. For example,
"#!/E/F".split(/[(?:#|#\/|#!|#!\/)]/); // ["", "", "", "E", "F"]
Whereas the desirable outcome is simply a single group
["E/F"]
Could someone please point out the error in my regex?
[If it makes a difference, I produced the above output using the JavaScript console in Firebug.]

Use string.replace instead of string.split.
#!?\/?
Use the above regex and then replace the match with empty string.
> '#!/E/F'.replace(/#!?\/?/g, '');
'E/F'
DEMO

Your regex seems awfully complicated. Maybe this is more what you're looking for:
"#!/E/F".split(/(#!/|#/|#!|#)/);
Did you checkout the Javascript regex documentation?
It might be different from what you imagined, since I don't understand why you're using the : and ? in your regex.

If you're using Javascript then you can just use:
location.assign(location.href.replace(/#.*$/, ""));
However if you only want to remove above listed hashtags then use:
var repl = location.href.replace(/#(!\/?|\/)?$/, '');

Related

How to write Regex expression to extract the content in brackets, after string and the first match?

I would like to use Regular expression to extract content between brackets, after some specific string and the 1st match.
Example text:
**-n --command PING being applied--:
Wed May 34 7:23:18 2010
[ZZZ_6323] Command [ping] failed with error [[TEZZZGH_IUE] [[EIJERTMMMMIJE_EIEJ] gdyugedyue Service [ABC] is not available in domain [DEF]. Check the content and review diejidjei. Service [ABC] Domain [DEF] ] did not ping back. It might be due to one of the following reasons:
=> Reason1
=> Reason3
=> Reason 4: deijdije djkeoidjeio.
info=4343 day=Mon year=2010*
I would like to extract the string between [] but after string Service and 1st match as Service could appear again later. In this case ABC
Could someone help me?
I am not able to combine these three conditionals.
Thanks
Assuming that you don't care about capturing square brackets inside the [ ] pair, by far the easiest way to do this is to use the following simple regex:
Service (\[[^\]]*\])
and extract only the 1st capturing group from the result using whatever regex functionality you're using. For example, using JS, you would write
string.match(/Service (\[[^\]]*\])/)[1]
to extract the first capturing group.
If you instead want a regex that will only capture the first occurrence, you can exploit the greedy nature of the * quantifier and change the regex to this:
Service (\[[^\]]*\]).*
Service \[([^\]]+)\]
will match Service [anything besides brackets] and capture anything besides brackets in group number 1. Since regex engines work left-to-right, the first match will be the leftmost match.
Test it live on regex101.com.
In PHP, you could do this (code snippet generated by RegexBuddy):
if (preg_match('/Service \[([^\]]+)\]/', $subject, $groups)) {
$result = $groups[1];
} else {
$result = "";
}
The definition of the group name How should I write it? I know that it can be like this: (?) but I dont know how to combine it with this part Service [([^]]+)] in a single way

Regex ignore first 12 characters from string

I'm trying to create a custom filter in Google Analytic to remove the query parts of the url which I don't want to see. The url has the following structure
[domain]/?p=899:2000:15018702722302::NO:::
I would like to create a regex which skips the first 12 characters (that is until:/?p=899:2000), and what ever is going to be after that replace it with nothing.
So I made this one: https://regex101.com/r/Xgbfqz/1 (which could be simplified to .{0,12}) , but I actually would like to skip those and only let the regex match whatever is going to be after that, so that I'll be able to tell in Google Analytics to replace it with "".
The part in the url that is always the same is
?p=[3numbers]:[0-4numbers]
Thank you
Your regular expression:
\/\?p=\d{3}\:\d{0,4}(.*)
Tested in Golang RegEx 2 and RegEx101
It search for /p=###:[optional:####] and capture the rest of the right side string.
(extra) JavaScript:
paragraf='[domain]/?p=899:2000:15018702722302::NO:::'
var regex= /\/\?p=\d{3}\:\d{0,4}(.*)/;
var match = regex.exec(paragraf);
alert('The rest of the right side of the string: ' + match[1]);
Easily use "[domain]/?p=899:2000:15018702722302::NO:::".substr(12)
You can try this:
/\?p\=\d{3}:\d{0,4}
Which matches just this: ?p=[3numbers]:[0-4numbers]
Not sure about replacing though.
https://regex101.com/r/Xgbfqz/1

how to remove all website addresses in bulk using regex

I have a lot of sites and I want to delete all of the web site address.
example:
http://www.website1.com/product.php?id=
http://www.website2.net/list.php?cid=
http://www.website3.org/view.php?page=
once removed:
product.php? id =
list.php? cid =
view.php? page =
I want to remove them in bulk using regex101 or regex on notepad ++
I want to ask what the code regullar expression to remove all of it?
I find PHP Live Regex easier to use for that purpose since you see the replace results directly (choose preg_replace instead of preg_match):
You can use this regex and choose replace and only keep the first capturing group $1:
(?:[a-z]{4,5}://[a-z.0-9]*\/)?([a-z.\?_=]*)([0-9]*)
Result:
product.php?id=
list.php?cid=
view.php?page=
See: http://www.phpliveregex.com/p/g5q
Use the following regex to search in Notepad++:
.*\/ demo
Then use a space to replace.
Basically we are searching for the last / and removing everything from beginning to that position.

Regex URI portion: Remove hyphens

I have to split URIs on the second portion:
/directory/this-part/blah
The issue I'm facing is that I have 2 URIs which logically need to be one
/directory/house-&-home/blah
/directory/house-%26-home/blah
This comes back as:
house-&-home and house-%26-home
So logically I need a regex to retrieve the second portion but also remove everything between the hyphens.
I have this, so far:
/[^(/;\?)]*/([^(/;\?)]*).*
(?<=directory\/)(.+?)(?=\/)
Does this solve your issue? This returns:
house-&-home and house-%26-home
Here is a demo
If you want to get the result:
house--home
then you should use a replace method. Because I am not sure what language you are using, I will give my example in java:
String regex = (?<=directory\/)(.+?)(?=\/);
String str = "/directory/house-&-home/blah"
Pattern.compile(regex).matcher(str).replaceAll("\&", "");
This replace method allows you to replace a certain pattern ( The & symbol ) with nothing ""

Article spinner with 2 tiers

I made an article spinner that used regex to find words in this syntax:
{word1|word2}
And then split them up at the "|", but I need a way to make it support tier 2 brackets, such as:
{{word1|word2}|{word3|word4}}
What my code does when presented with such a line, is take "{{word1|word2}" and "{word3|word4}", and this is not as intended.
What I want is when presented with such a line, my code breaks it up as "{word1|word2}|{word3|word4}", so that I can use this with the original function and break it into the actual words.
I am using c#.
Here is the pseudo code of how it might look like:
Check string for regex match to "{{word1|word2}|{word3|word4}}" pattern
If found, store each one as "{word1|word2}|{word3|word4}" in MatchCollection (mc1)
Split the word at the "|" but not the one inside the brackets, and select a random one (aka, "{word1|word2}" or "{word3|word4}")
Store the new results aka "{word1|word2}" and "{word3|word4}" in a new MatchCollection (mc2)
Now search the string again, this time looking for "{word1|word2}" only and ignore the double "{{" "}}"
Store these in mc2.
I can not split these up normally
Here is the regex I use to search for "{word1|word2}":
Regex regexObj = new Regex(#"\{.*?\}", RegexOptions.Singleline);
MatchCollection m = regexObj.Matches(originalText); //How I store them
Hopefully someone can help, thanks!
Edit: I solved this using a recursive method. I was building an article spinner btw.
That is not parsable using a regular expression, instead you have to use a recursive descent parser. Map it to JSON by replacing:
{ with [
| with ,
wordX with "wordX" (regex \w+)
Then your input
{{word1|word2}|{word3|word4}}
becomes valid JSON
[["word1","word2"],["word3","word4"]]
and will map directly to PHP arrays when you call json_decode.
In C#, the same should be possible with JavaScriptSerializer.
I'm really not completely sure WHAT you're asking for, but I'll give it a go:
If you want to get {word1|word2}|{word3|word4} out of any occurrence of {{word1|word2}|{word3|word4}} but not {word1|word2} or {word3|word4}, then use this:
#"\{(\{[^}]*\}\|\{[^}]*\})\}"
...which will match {{word1|word2}|{word3|word4}}, but with {word1|word2}|{word3|word4} in the first matching group.
I'm not sure if this will be helpful or even if it's along the right track, but I'll try to check back every once in a while for more questions or clarifications.
s = "{Spinning|Re-writing|Rotating|Content spinning|Rewriting|SEO Content Machine} is {fun|enjoyable|entertaining|exciting|enjoyment}! try it {for yourself|on your own|yourself|by yourself|for you} and {see how|observe how|observe} it {works|functions|operates|performs|is effective}."
print spin(s)
If you want to use the [square|brackets|syntax] use this line in the process function:
'/[(((?>[^[]]+)|(?R))*)]/x',