Remove everything before a regex pattern

Remove everything before a regex pattern - regex

I have different data in this format:
ISIN: LU0799639926
I created a regex to filter the important data:
\w{2}\d{10}
The thing is that I want to delete everything that is before and behind my pattern.
I have already tried
[^\w{2}\d{10}]*
It selects everything but my pattern, it just doesn't work. Does anyone have a solution?

You can use a .* subpattern to get anything before and after, capture your substring into a capturing group and then replace with a $1 backreference:
.*(\w{2}\d{10}).*
Replace with $1.
See demo
Perhaps, you will be safer with .*([A-Z]{2}\d{10}).*, as \w may also capture digits, and [A-Z] will only match uppercase letters.
If you have multiple values in the input string, perhaps, you will be more interested in getting a delimited string, e.g.:
.*?([A-Z]{2}\d{10})
To replace with $1;.
See another demo

Inside the character class
[^\w{2}\d{10}]
{ and } are treated as the literals { and }, they loose their regex meaning.
Try:
.*(\w{2}\d{10})
This will catch the pattern you want, then you can easily replace it with whatever you want.

Related

Regex to match and replace a character in a pattern

I would like to replace a character "?" with "fi" in a string.
I could write a generic str replace for this. But I want to replace the "?" only if it appears in between two A-Za-z character and avoid the rest
Eg., "Okay?" should be "Okay?" and not "Okayfi"
but
Modi?es should be Modifies since it has ? in middle
What have I tried?
sentence = re.sub(r"(\?)\b", "fi", sentence)
Please see here.
https://regexr.com/3nvk3
Seems to work fine in regexr. but doesnt work well in code. Am I doing something wrong?

The best approach here is to find the original text with the ﬁ ligature and read it in with proper encoding.
Otherwise, you will have to use some workarounds.
You may use (?<=[a-zA-Z]) / (?=[A-Za-z]) lookarounds:
sentence = re.sub(r"(?<=[a-zA-Z])\?(?=[a-zA-Z])", "fi", sentence)
See the regex demo. The (?<=[a-zA-Z]) positive lookbehind matches a position immediately after an ASCII letter, and (?!=[A-Za-z]) positive lookahead matches a position immediately before an ASCII letter.
Or, you may also use a capturing group with backreferences:
sentence = re.sub(r"([a-zA-Z])\?([a-zA-Z])", r"\1fi\2", sentence)
See another regex demo. Note that \1 references the value captured with the first ([a-zA-Z]) group and \2 references the value captured into Group 2 (([a-zA-Z])).

RegEx for capturing everything except numbers and one word

I am quite stuck with a regex I can't get to work. It should capture everything except digits and the word fiktiv (not single characters of it!). Objective is to get rid of this content.
I have tried something like (?!\d|fiktiv).* on my sample string 123456788daswqrt fiktiv
https://regex101.com/r/kU8mF3/1
However this does match the fiktiv at the end as well.

One possibility would be to use a neglected character class, which can be used by putting a ^ in [] braces. So you basically say don't match digits, and as many non digits as you can get until a space occurs and the word fiktiv appears.
This capturing will be "saved" in the capturing group 1 for later use.
([^\d]+)\s+fiktiv
Testing could be done here:
https://regex101.com/

It should capture everything except digits and the word fiktiv (not single characters of it!). Objective is to get rid of this content.
So, you want to remove any character that is not a digit (that is, \D or [^0-9] pattern) and not a fiktiv char sequence.
You may use a regex with a capturing group and alternation:
(fiktiv)|[^0-9]
and replace with the contents of Group 1 using a $1 backreference, fiktiv, to restore it in the replaced string.
See the regex demo
C# implementation:
Regex.Replace(input‌, "(fiktiv)|[^0-9]", "$1")
Also, see Use RegEx in SQL with CLR Procs.

Mixing Lookahead and Lookbehind in 1 Regexp

I'm trying to match first occurrence of window.location.replace("http://stackoverflow.com") in some HTML string.
Especially I want to capture the URL of the first window.location.replace entry in whole HTML string.
So for capturing URL I formulated this 2 rules:
it should be after this string: window.location.redirect("
it should be before this string ")
To achieve it I think I need to use lookbehind (for 1st rule) and lookahead (for 2nd rule).
I end up with this Regex:
.+(?<=window\.location\.redirect\(\"?=\"\))
It doesn't work. I'm not even sure that it legal to mix both rules like I did.
Can you please help me with translating my rules to Regex? Other ways of doing this (without lookahead(behind)) also appreciated.

The pattern you wrote is really not the one you need as it matches something very different from what you expect: text window.location.redirect("=") in text window.location.redirect("=") something. And it will only work in PCRE/Python if you remove the ? from before \" (as lookbehinds should be fixed-width in PCRE). It will work with ? in .NET regex.
If it is JS, you just cannot use a lookbehind as its regex engine does not support them.
Instead, use a capturing group around the unknown part you want to get:
/window\.location\.redirect\("([^"]*)"\)/
or
/window\.location\.redirect\("(.*?)"\)/
See the regex demo
No /g modifier will allow matching just one, first occurrence. Access the value you need inside Group 1.
The ([^"]*) captures 0+ characters other than a double quote (URLs you need should not have it). If these URLs you have contain a ", you should use the second approach as (.*?) will match any 0+ characters other than a newline up to the first ").

Match Latin words which not in the hook

I'm trying to filter words which is not in the "[ ]".
Why is this not working?
[^\[][\u0000-\u024F]+[^\]]

The reason your expression is not working is that it matches all text inside brackets as well as outside.
This is the best I've been able to do:
/(?:^|])[^[]+/g
It includes the ]s in the match because look-behind is not allowed:
http://regexr.com/3c515
If look-behind were allowed, this would be the ticket:
/(?:^|(?<=]))[^[]+/g
https://regex101.com/r/lK9tS7/3

Because this will match [\u0000-\u024F]+ and 2 character which will be matches by [^\[]. If you want to your regex engine match the whole of pattern you need to use start and end anchors in your regex :
/^[^\[][\u0000-\u024F]+[^\]]$/m
But this will work if your string is contain words in each line, which is not a proper way.
As a better way you can use negative look arounds :
(?<!\[)[\u0000-\u024F]+(?!\])

Trying to figure out how to capture text between slashes regex

I have a regex
/([/<=][^/]*[/=?])$/g
I'm trying to capture text between the last slashes in a file path
/1/2/test/
but this regex matches "/test/" instead of just test. What am I doing wrong?

You need to use lookaround assertions.
(?<=\/)[^\/]*(?=\/[^\/]*$)
DEMO
or
Use the below regex and then grab the string you want from group index 1.
\/([^\/]*)\/[^\/]*$

The easy way
Match:
every character that is not a "/"
Get what was matched here. This is done by creating a backreference, ie: put inside parenthesis.
followed by "/" and then the end of string $
Code:
([^/]*)/$
Get the text in group(1)
Harder to read, only if you want to avoid groups
Match exactly the same as before, except now we're telling the regex engine not to consume characters when trying to match (2). This is done with a lookahead: (?= ).
Code:
[^/]*(?=/$)
Get what is returned by the match object.

The issue with your code is your opening and closing slashes are part of your capture group.
Demo
text: /1/2/test/
regex: /\/(\[^\/\]*?)(?=\/)/g
captures a list of three: "1", "2", "test"
The language you're using affects the results. For instance, JavaScript might not have certain lookarounds, or may actually capture something in a non-capture group. However, the above should work as intended. In PHP, all / match characters must be escaped (according to regex101.com), which is why the cleaner [/] wasn't used.
If you're only after the last match (i.e., test), you don't need the positive lookahead:
/\/([^\/]*?)\/$/

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Remove everything before a regex pattern - regex

Inside the character class [^\w{2}\d{10}] { and } are treated as the literals { and }, they loose their regex meaning. Try: .*(\w{2}\d{10}) This will catch the pattern you want, then you can easily replace it with whatever you want.

Related

Regex to match and replace a character in a pattern

RegEx for capturing everything except numbers and one word

Mixing Lookahead and Lookbehind in 1 Regexp

Match Latin words which not in the hook

Trying to figure out how to capture text between slashes regex

Categories

Resources