Regex handling "|" in Text - regex

I got the following text:
Code = ABCD123 | Points = 30
Code = ABCD333 | Points = 44
At the end, I want to removing anything except the Code, output:
ABCD123
ABCD333
I actually tried it with
Code = | P.+
But I don't know how to get "|" removed. Currently, I have just ÀBCD333 | left as an example.
I'm struggling there.

Assuming the code only consists of word characters, you may use the following:
^Code = (\w+).+$
..and replace with:
\1
Demo.
If the code can be anything, you may use something like this instead:
^Code = (.+?)[ ]\|.+$

Ctrl+H
Find what: ^Code = (\w+).+
Replace with: $1
CHECK Wrap around
CHECK Regular expression
UNCHECK . matches newline
Replace all
Explanation:
^ # beginning of line
Code = # literally
(\w+) # group 1, 1 or more word character
.+ # 1 or more any character but newline
Replacement:
$1 # content of group 1
Screenshot (before):
Screenshot (after):

Related

How can I delete the rest of the line after the second pipe character "|" for every line with python?

I am using notepad++ and I want to get rid of everything after one second (including the second pipe character) for every line in my txt file.
Basically, the txt file has the following format:
3.1_1.wav|I like apples.|I like apples|I like bananas
3.1_2.wav|Isn't today a lovely day?|Right now it is 1 in the afternoon.|....
The result should be:
3.1_1.wav|I like apples.
3.1_2.wav|Isn't today a lovely day?
I have tried using \|.* but then everything after the first pipe character is matched.
In Notepad++ do this:
Find what: ^([^\|]*\|[^\|]*).*
Replace with: $1
check "Regular expression", and "Replace All"
Explanation:
^ - anchor at start of line
( - start group, can be referenced as $1
[^\|]* - scan over any character other than |
\| - scan over |
[^\|]* - scan over any character other than |
) - end group
.* - scan over everything until end of line
in replace reference the captured group with $1
I'm not sure if this is the best way to do it, but try this:
[^wav]\|.*

Pyspark - Regex - Extract value from last brackets

I created the following regular expression with the idea of extracting the last element in brackets. See that if I only have one parenthesis it works fine, but if I have 2 parenthesis it extracts the first one (which is a mistake) or extract with the brackets .
Do you know how to solve it?
tmp= spark.createDataFrame(
[
(1, 'foo (123) oiashdj (hi)'),
(2, 'bar oiashdj (hi)'),
],
['id', 'txt']
)
tmp = tmp.withColumn("old", regexp_extract(col("txt"), "(?<=\().+?(?=\))", 0));
tmp = tmp.withColumn("new", regexp_extract(col("txt"), "\(([^)]+)\)?$", 0));
tmp.show()
+---+--------------------+---+----+
| id| txt|old| new| needed
+---+--------------------+---+----+
| 1|foo (123) oiashdj...|123|(hi)| hi
| 2| bar oiashdj (hi)| hi|(hi)| hi
+---+--------------------+---+----+
To extract the substring between parentheses with no other parentheses inside at the end of the string you may use
tmp = tmp.withColumn("new", regexp_extract(col("txt"), r"\(([^()]+)\)$", 1));
Details
\( - matches (
([^()]+) - captures into Group 1 any 1+ chars other than ( and )
\) - a ) char
$ - at the end of the string.
The 1 argument tells the regexp_extract to extract Group 1 value.
See the regex demo online.
NOTE: To allow trailing whitespace, add \s* right before $: r"\(([^()]+)\)\s*$"
NOTE2: To match the last occurrence of such a substring in a longer string, with exactly the same code as above, use
r"(?s).*\(([^()]+)\)"
The .* will grab all the text up to the end, and then backtracking will do the job.
This should work. Use it with the single line flag.
\([^\(\)]*?\)(?!.*\([^\(\)]*?\))
https://regex101.com/r/Qrnlf3/1

Looking for single occurrence between '{' and ':' in a large text

I'm new to the Regex world, so please be kind on the tantrums :-)
I would like to print only the first occurrence of a string between { and :.
Example in the following string:
({TRIGGER.VALUE}=0 and {Zabbix windows:zabbix[process,discoverer,avg,busy].avg(10m)}>75)
or
({TRIGGER.VALUE}=1 and {Zabbix windows:zabbix[process,discoverer,avg,busy].avg(10m)}>65)
I want it to output only Zabbix windows
how is that possible?
I tried {([a-zA-Z0-9 ]*): it is printing : and doing it twice.
Thanks for reading!
Srini
You may use a PCRE regex with -o option (extracting the matches rather than returning the whole lines) to grab the text you need and use head -1 to only have the first match:
s='({TRIGGER.VALUE}=0 and {Zabbix windows:zabbix[process,discoverer,avg,busy].avg(10m)}>75) or ({TRIGGER.VALUE}=1 and {Zabbix windows:zabbix[process,discoverer,avg,busy].avg(10m)}>65)'
echo $s | grep -oP '(?<={)[\w\s]+(?=:)' | head -1
See an online demo
Pattern details:
(?<={) - there must be a { immediately to the left of the current location
[\w\s]+ - 1+ word and/or whitespace chars
(?=:) - there must be a : immediately to the right of the current location.

Finding single escaped characters

I would like to replace some escaping character in a given text. Here what I've tried.
_RE_SPECIAL_CHARS = re.compile(r"(?:[^#\\]|\\.)+#")
text = r"ok#\#.py"
search = re.search(_RE_SPECIAL_CHARS, text)
print(text)
if search:
print(_RE_SPECIAL_CHARS.sub("<star>", text))
else:
print('<< NOTHING FOUND ! >>')
This prints :
ok#\#.py
<star>\#.py
What I need to have instead is ok<star>\#.py.
You can use lookbehind and just match the special character:
re.compile(r"(?<=[^#\\]|\\.)#")
See DEMO
Or you can capture the part before # in group 1 and replace with \1<star>
re.compile(r"((?:[^#\\]|\\.)+)#")
and
print(_RE_SPECIAL_CHARS.sub("\1<star>", text))
See DEMO

how to match each line wrapped by start/end tag?

I want to convent my blog from markdown to html. And, I used [crayon lang="cpp"]...[/crayon] to paste code. I wanted to get each line that wrapped by [crayon][/crayon], and then add 4 spaces at the beginning of each line. For example:
Some text
[crayon lang="bash"]
#!/bin/bash
[/crayon]
other text
[crayon lang="cpp"]
int main()
{
}
[/crayon]
I want it to be:
Some text
#!/bin/bash
other text
int main()
{
}
I don't know how to do it by regex. Could anyone help me?
Here is what I've tried:
\[crayon.*?\]([\d\D]*?)\[\/crayon\] \1 matches all lines wrapped by the [crayon][/crayon], but I can't add spaces.
(?'st'\[crayon.*?\])^.*$(?'-st'\[/crayon\]) doesn't match
A (relatively) easy way would be to do it in two steps:
1
Insert 4 spaces at the start of each line, but only lines after '[crayon lang="..."]' and before '[/crayon]'
pattern : (?ms)^(?=(?:(?!\[crayon\b).)*\[/crayon])
replacement : ' ' (4 spaces)
2
Remove all '[crayon lang="..."]' and '[/crayon]'
pattern : \[/?crayon.*?][ \t]*(\r?\n|$)
replacement : '' (empty string)
A PHP demo:
<?php
$text = 'Some text
[crayon lang="bash"]
#!/bin/bash
[/crayon]
other text
[crayon lang="cpp"]
int main()
{
}
[/crayon]';
$text = preg_replace('#^(?=(?:(?!\[crayon\b).)*\[/crayon])#ms', ' ', $text);
$text = preg_replace('#\[/?crayon.*?][ \t]*(\r?\n|$)#', '', $text);
echo "$text\n";
?>
which would print:
Some text
#!/bin/bash
other text
int main()
{
}
A quick explanation of the, perhaps terse regex ^(?=(?:(?!\[crayon\b).)*\[/crayon]):
^ # match the start of a line
(?= # start positive look ahead
(?: # start group
(?!\[crayon\b). # match any char as long as it doesn't have `[crayon` in front of it
)* # end group and repeatr it zero or more times
\[/crayon] # match '[/crayon]'
) # end positive look ahead
In plain English that would read:
match any start of a line, only if there's a [/crayon] ahead of this line-start, and in between this line-start and [/crayon] there cannot be a [crayon.
I have an idea. You can use it, if you think its ok.
1. Scan line by line:
a. Look for \[crayon.+\] this pattern
b. if you don't find this pattern then write the line as it present
c. if you find this pattern then don't write anything and look for \[\/crayon\] this pattern
d. until you find this pattern write every line by adding 4 spaces beginning of it.
e. when you find (c) specified pattern then don't write anything and again start from (a)
How about \[crayon.*?\]\n(.*\n)*?\[\/crayon\]\n. This way \1 can capture each individual line.