Python: Regex remove substring starting from specific character till any alphabet - regex

Need a python regex to remove string starting from specific charcater till any aplhabet found.
Example:
hello world\r\n Processing ....Pass
hello world\r\n Processing .Fail
hello world\r\n Processing ......Error
hello world\r\n Processing ..Fail
hello world\r\n Processing .......<Any string>
Result should be:
hello world\r\n <Any String>
here dot after Processing could be any of the number and want to remove Processing ..(n times dot)
Basically I want to remove anything between \r\n to [A-Z] pattern but not the pattern
I tried this but it is also removing the pattern.
(?s)\\r\\n.*?\.[A-Z][^\w\s]

You can search using this regex:
(?s)(?<=\\r\\n ).+?(?=[A-Z])
and replace with just an empty string.
RegEx Demo
RegEx Break down:
(?s): Enable DOTALL (single line) mode
(?<=\\r\\n ): Positive lookbehind to assert that we have literal text \r\n and a space before the current position
.+?: Match 1+ of any characters
(?=[A-Z]): Lookahead to assert that we have an uppercase letter at the next position

Related

Regex, stop before a backslash caracter?

i want to use regex for replace a string in a file in powershell, this is the regex:
=.*\\app\\client\\.*\\
I applied this regex on that string:
HOME= C:\app\client\Administrateur\product
And i want this result:
= C:\app\client\Administrateur
But I have this result:
= C:\app\client\Administrateur\
How to say to regex i want to stop before the next backslash ?
Your pattern =.*\\app\\client\\.*\\ will match the last occurrence of \app\client\ and will then match until the last occurrence of the forward slash.
To match what comes after app\client\ but not include the last backslash you could use a negated character class matching not a backslash:
=.*\\app\\client\\[^\\]*
Regex demo
If the .* part at the start can not contain a backslash, this would be another option to prevent needless backtracking because the .* would first match until the end of the string:
=[^\\]*\\app\\client\\[^\\]*
Regex demo

Regex to find all spaces in lines beginning with a specific string

I am searching for a regex to find all the spaces in lines starting with a specific string (in a SVN dump file). Despite the "global" modifier my regex returns only the first occurence of the space character.
A part of the file i am working on :
...
pla bla bli
Node-path: branches/BU ILD/ml_cf/syst em/Translation/TranslationManager.class.php
Node-kind: file
Node-action: change
Text-delta: true
....
The regex :
/Node-path: \S*(\ )/g
finds only the first space (between U and I) but not the others on the line.
Using PCRE regex to find all the spaces on a line starting with a particular text, use this regex:
/(?:^Node-path: |\G)\S+\K\h+/gm
RegEx Demo
Using (?:Node-path: |\G) we are matching lines starting with Node-path: OR positioning at the end of the previous match.
\G asserts position at the end of the previous match or the start of the string for the first match
\K resets the starting point of the reported match.
\h+ matches 1 or more of horizontal whitespace (space or tab)

Find word in a text tath can be start with non alfanumeric character

I need to find word ( I know it) in a text of any lenght, like the following.
My text is very beautyfull. Yours $text is very bautyfull. Their #text, is very beautyfull
# and $ are only sample. I can have any non-alphnumerical character
I have found the following regex:
(?<=^|[^a-zA-Z0-9])\Q<word>\E(?=$|[^a-zA-Z0-9])
tath work very well if i search #text or $text, but when i search only text it match all occurrences (three in example below) instead only one occurrence text.
Is there a way to do this with regex?
Add # and $ to the first negated char class which was present inside the positive lookbehind. So that it won't match the string text which was preceded by # or $ or a-z or A-Z or 0-9
(?<=^|[^a-zA-Z0-9$#])\Qtext\E(?=$|[^a-zA-Z0-9])
DEMO

Replace whitespaces between specific strings

I'm trying to replace whitespaces with underscores in certain parts of my html-document with Notepad++.
I can identify the area to search for the whitespaces in the following way:
-Begins with: src="video/
-Ends with: mp4
For example I might have a line like this:
<video class="play" src="video/my file name with empty spaces.mp4">
and I would like to change it to be like this:
<video class="play" src="video/my_file_name_with_empty_spaces.mp4">
Tested in N++
Search: (?:src="video|(?<!^)\G)(?:(?!mp4).)*?\K\s+
Replace: _
On the demo, see the substitutions at the bottom.
Explanation
(?:src="video|(?<!^)\G) matches the delimiter src="video, or \G the position following the previous match as long as it is not at the beginning of the string (?<!^) where \G can also match
(?:(?!mp4).) matches one character that is not followed by mp4
*? lazily matches such characters, up to...
\s a space character (our match which we replace with _)
before the space, the \K tells the engine to drop what was matched so far from the final match it returns

Is there a way to "recall" a char sequence already matched in the regex itself?

The regex I'm searching has the following constraints:
it starts with "//"
then "[" a non number sequence (called delimiter in this list) and "]"
next line "\n"
"[" 0 or more number separated by the delimiter previously found "]".
For example the following text matches the regex:
//[*#*]
[1*#*34*#*64]
and the following text doesn't match the regex:
//[*#*]
[1#34#64]
because the delimiter is not the same matched in the first row
The regex I currently create is
^//\[(\D)+\]\n\[[(\d)+(\D)+]*(\d)+\]$|^//\[(\D)+\]\n\[\]$|^//\[(\D)+\]\n\[(\d)+\]$
but obviously this regex match with both previous examples.
Is there a way to "recall" a char sequence already matched in the regex itself?
You need something called back-reference (a very good tutorial here).
Use this regex in Python:
r'^//\[([^\]]+)\]\n\[\d+(\1\d+)*\]'
Sample run:
>>> string = """//[*#*]
... [1*#*34*#*64]"""
>>> print re.search(r'^//\[([^\]]+)\]\n\[\d+(\1\d+)*\]',string).group(0)
//[*#*]
[1*#*34*#*64]
will match your string in Python.
Debuggex Demo
You need to use a back-reference, in most languages you can reference a matching group using \n where n is the group number.
This pattern will work:
//\[([^]]++)]\n\[(?>\d++\1?)+]
To break it down:
// just matches the literal
\[([^]]++)] matches some characters in square brackets
\n matches the newline
\[(?:\d++\1?)++] matches one or more digits followed by the match captured in the first pattern section - optionally. This is an atomic group.