powershell complicated regular expression - regex

I've been stuck trying to write a regular expression that matches the following condition. Basically, I have a text file that contains several text lines (composed of words and digits). For example:
Some_text Number 45 Some_text ptrn: anchor Some_text Number 22 Some_text
What I need is to return “45” (or any other digits after word “Number”), but only in case that in the line was found “ptrn: anchor”. Again, if the pattern “ptrn: anchor” has been found in some line, the script should look back all along the line until it gets first word “Number” and then output the digits beside it.
I'm not so good at regular expressions and very appreciate any help.

This should do:
"Number\s*(\d+).*ptrn: anchor"
Note that if there are multiple numbers before ptrn: anchor in a single line, the first one will be returned.

Related

How to Match Tilde-Delimited Data Using Regex

I have data like this:
~10~682423~15~Test Data~10~68276127~15~More Data~10~6813~15~Also Data~
I'm trying to use Notepad++ to find and replace the values within tag 10 (682423, 68276127, 6813) with zeroes. I thought the syntax below would work, but it selects the first occurrence of the text I want and the rest of the line, instead of just the text I want (~10~682423~, for example). I also tried dozens of variations from searching online, but they also either did the same thing or wouldn't return any results.
~10~.*~
You can use: (?<=~10~)\d+(?=~) and replace with 0. This uses lookarounds to check that ~10~ precedes the digit sequence and the (?=~) ensures a ~ follows the digit sequence. If any character could be after the ~10~ field, use (?<=~10~)[^~]+(?=~).
The problem with ~10~.*~ is that the * is greedy, so it just slurps away matching any character and ~.
Use
\b10~\d+
Replace with 10~0. See proof. \b10~ will capture 10 as entire number (no match in 210 is allowed) and \d+ will match one or more digits.

How can I delete this part of the text with regex?

I have a problem that I really hope that somebody could help me. So, I want to delete some parts of text from a notepad++ document using Regex. If there's another software that I can use to delete this part of text, let me know please, I am really really noob with regex
So, my document its like this:
1
00:00:00,859 --> 00:00:03,070
text over here
2
00:00:03,070 --> 00:00:09,589
text over here
3
00:00:09,589 --> 00:00:10,589
some numbers here
4
00:00:10,589 --> 00:00:12,709
Text over here
5
00:00:12,709 --> 00:00:18,610
More text with numbers here
What I want to learn is how can I delete the first 2 lines of numbers in all the document? So I could get only the text parts (the "text over here" parts)
I would really appreciate any kind of help!
My solution:
^[\s\S]{1,5}\d{1,3}:\d{1,3}:\d{1,3},\d{1,5}\s-->\s*?\d{1,3}:\d{1,3}:\d{1,3},\d{1,5}\s
This solution match both types: either all data in one line, or numbers in one line and data in the second.
Demo: https://regex101.com/r/nKD0DQ/1/
Simplest solution;
\d+(\r\n|\r|\n)\d{2}:\d{2}.*(\r\n|\r|\n)
Get line with some number \d+ with its line break (\r\n|\r|\n)
Also the next line that starts with two 2-digit numbers and a colon \d{2}:\d{2} with the rest .* and its line break. No need to match all since we already are in the correct line, since subtitle file is defined well with its predictable structure.
Put this as Find what: value in Search -> Replace.. in Notepad++, with Seach Mode: Regular Expression and with replace value (Replace with:) of empty space. Will get you the correct result, lines of expected text with empty line in between each.
to see it on action on regex101
Subtitles, for accuracy you can use this:
\d+(\r\n|\n|\r)(\d\d:){2}\d\d,\d{3}\s*-->\s*(\d\d:){2}\d\d,\d{3}(\r\n|\n|\r)
Check Regular Expression, Find what with this and Replace with empty would do.
Regxe Demo
srt subtitles are basically ordered. And it's better accurate than lose texts.
\d : a single digit.
+ : one or more of occurances of the afore character or group.
\r\n: carriage and return. (newline)
* : zero or more of occurances of the afore character or group.
| : Or, match either one.
{3}: Match afore character or group three times.
I'm going for a less specific regex:
^[0-9]*\n[0-9:,]*\s-->\s[0-9:,]*
Demo # regex101

How do you "quantify" a variable number of lines using a regexp?

Say you know the starting and ending lines of some section of text, but the chars in some lines and the number of lines between the starting and ending lines are variable, á la:
aaa
bbbb
cc
...
...
...
xx
yyy
Z
What quantifier do you use, something like:
aaa\nbbbb\ncc\n(.*\n)+xx\nyyy\nZ\n
to parse those sections of text as a group?
You can use the s flag to match multilines texts, you can do it like:
~\w+ ~s.
There is a similar question here:
Javascript regex multiline flag doesn't work
If I understood correctly, you know that your text begins with aaa\nbbbb\ncc and ends with xx\nyyy\nZ\n. You could use aaa.+?bbbb.+?cc(.+?)xx.+?yyy.+?Z so that all operators are not greedy and you don't accidentally capture two groups at once. The text inbetween these groups would be in match group 1. You also need to turn the setting that causes dot to match new line on.
Try this:
aaa( |\n)bbbb( |\n)cc( |\n)( |\n){0,1}(.|\n)*xx( |\n)yyy( |\n)Z
( |\n) matches a space or a newline (so your starting and ending phrases can be split into different lines)
RegExr
At the end of the day what worked for me using Kate was:
( )+aaa\n( )+bbbb\n( )+cc\n(.|\n)*( )+xx\n( )+yyy\n( )+Z\n
using such regexps you can clear pages of quite a bit of junk.

Regular expression to match 10-15 and 20-26 and 30-40 characters in each line

I'm new to regular expression and tried all possible things for past three days to dirty my hands but to no good.
I have a log file with multiple lines and each line as an event and I need to match fixed character length in the file.
Match 3-6 characters and 10-16 characters and 20-24 characters.
Sample event:
Ab FIN nm06feij act:ED1W Prcs:keansourcefile
I need to extract
FIN, 06feij and ED1W fields.
You can match using small slices and clever grouping:
sed -r 's/^.{3}(.{3}).{3}(.{6}).{5}(.{4}).*$/\1 \2 \3/g'
I tried using your only example and had exactly what you expected.

RegEx in Notepad++ to find a wild character and replace the whole word

I have a test file with number values as below:
32405494
32405495
32405496
32407498
Using Notepad++, what I am trying to achieve here is to search the first 4 digits using regular expression and replace the whole number with G3E_STYLERULE_SEQ.NEXTVAL
I am able to find these values using 3240*. My question is, how do I replace the whole number with G3E_STYLERULE_SEQ.NEXTVAL?
When I am click the Replace All button, I get the following output:
G3E_STYLERULE_SEQ.NEXTVAL5494
G3E_STYLERULE_SEQ.NEXTVAL5495
G3E_STYLERULE_SEQ.NEXTVAL5496
G3E_STYLERULE_SEQ.NEXTVAL7498
However, I am expecting the following:
G3E_STYLERULE_SEQ.NEXTVAL
G3E_STYLERULE_SEQ.NEXTVAL
G3E_STYLERULE_SEQ.NEXTVAL
G3E_STYLERULE_SEQ.NEXTVAL
Any ideas to achieve this? Is it even possible through Notepad++? Are there any other text editors which I can use to achieve this?
Use something like this:
3240.*
. is the wildcard character in regex and * means that the previous character is to be repeated 0 or more times (your current regex actually matches 324 and then 0 which appears 0 or more times).
3240.* will therefore match 3240 and any other following characters.
You might also want to add a line anchor:
^3240.*
So that you don't replace numbers having 3240 in the middle too.
in notepad++, you can use this regex:
^3240\d+
it will match the four digits you're searching at the beginning of your string followed by any digit.
Try this -
Search this - ^3240\d*$
Replace with- G3E_STYLERULE_SEQ.NEXTVAL