Remove text before = character in notepad++ - regex

First the text looked like this:
Ab Yz=15,Cd Wx=2,Ef Tu=20,...
I replaced all , with \r\n, so the text looked like this:
Ab Yz=15
Cd Wx=2
Ef Tu=20
Than I wanted only the numbers after the = and replaced ^.+[=] with "blank" and my result was just 20
Does Notepad++ think, that the whole document only has a single line and takes the last = and deletes everything before that?
How can I fix this? Oh and how can I remove the text after the =? (including =)
Edit: I also tried ^.+[\=], ^.+(=) and ^.+(\=) but I got the same result.

I guess you have unintentionally checked . matches newline option which makes a . in a regex to go beyond a line - it will match newlines as well (AKA DOTALL modifier). So you should uncheck it.
Also there is no need to do this job in two separate steps. Use regex [^=]+=(\d+),? and replace with \1\n
This will turn such an input string:
Ab Yz=15,Cd Wx=2,Ef Tu=20,Ef Tu=20,Ef Tu=20,Ef Tu=20,Ab Yz=15,Cd Wx=2,Ef Tu=20,Ef Tu=20,
To:
15
2
20
20
20
20
15
2
20
20

Use Regular expressions in the left-bottom side of Replace window and find ([A-Z]+) ([A-Z]+)= replace with empty string.
More info here.

To change all in one pass, you could do:
Find what: (?:^|,)[^=]+=([^,]+)(?:,|$)
Replace with: $1\r\n
Replace all

Related

How to Identify ID of containers(4 words and 7 digits) at one string in EDI code file using notepad++

The ID of each container consists of 4 words and 7 digits( in EDI files -no space between. )
At the code there are strings of 11 digits that match the expression as well.
The expression has the form:
(\w{4}\d{7})
And this not fully solve matching due to letters and digits.
link for demo: https://regex101.com/r/vwH9nH/4
Another expression more closer to match is:
([A-Z]{4}d{7})
This seem to be more specified closer but not match to notepad++ to express container's ID.
In notepad++ I try:
Ctrl+H
Find What: (([A-Z]{4}d{7})h*|(?s:.) defined ID of containers
Replace with: (?1$1\n:)
check Wrap around
check Regular expression
Replace all
Here is part of code to copy notepad++:
UNB+UNOA:2+RCW OPS CENTER+TERMINAL+180808:1519+1533741570C3ED+++++RCW OPS CENTER'UNH+01533741570BAP+BAPLIE:D:95B:UN:SMDG22'BGM++CAPSTAN4.20180808151930+9'DTM+137:1808081519UTC:301'TDT+20+081S+++HSD:172:166+++9V7575:103:ZZZ:MONTE VERDE'LOC+5+BRSSA:139:6'LOC+61+COCTG:139:6'DTM+178:1808090412:201'DTM+133:1808091512:201'DTM+132:1808180041:201'RFF+VON:081N'LOC+147+0380412::5'MEA+WT++KGM:29515'LOC+9+BRSSA:139:6+TECSV'LOC+11+COCTG:139:6+TCC'LOC+83+DOHAI:139:6'RFF+BM:1'EQD+CN+SUDU8505087+45G1+++5'NAD+CA+HSD:172:20'LOC+147+0380312::5'MEA+WT++KGM:29586'LOC+9+BRSSA:139:6+TECSV'LOC+11+COCTG:139:6+TCC'LOC+83+DOCAU:139:6'RFF+BM:1'EQD+CN+UACU5363691+45G1+++5'NAD+CA+HLC:172:20'LOC+147+0380212::5'MEA+WT++KGM:29591'LOC+9+BRSSA:139:6+TECSV'LOC+11+COCTG:139:6+TCC'LOC+83+COCTG:139:6'RFF+BM:1'EQD+CN+TGHU9702812+45G1+++5'NAD+CA+MSC:172:20'LOC+147+0380112::5'MEA+WT++KGM:29616'LOC+9+BRSSA:139:6+TECSV'LOC+11+COCTG:139:6+TCC'LOC+83+DOCAU:139:6'RFF+BM:1'EQD+CN+HLXU6240079+45G1+++5'NAD+CA+HLC:172:20'LOC+147+0380414::5'MEA+WT++KGM:29476'LOC+9+BRSSA:139:6+TECSV'LOC+11+COCTG:139:6+TCC'LOC+83+PRSJU:139:6'RFF+BM:1'EQD+CN+HASU4556735+45G1+++5'NAD+CA+HSD:172:20'LOC+147+0380314::5'MEA+WT++KGM:29476'LOC+9+BRSSA:139:6+TECSV'LOC+11+COCTG:139:6+TCC'LOC+83+DOHAI:139:6'RFF+BM:1'EQD+CN+SUDU6787839+45G1+++5'NAD+CA+HSD:172:20'LOC+147+0380214::5'MEA+WT++KGM:29481'LOC+9+BRSSA:139:6+TECSV'LOC+11+COCTG:139:6+TCC'LOC+83+COCTG:139:6'RFF+BM:1'EQD+CN+TGHU9861619+45G1+++5'NAD+CA+MSC:172:20'LOC+147+0380114::5'MEA+WT++KGM:29492'LOC+9+BRSSA:139:6+TECSV'LOC+11+COCTG:139:6+TCC'LOC+83+COCTG:139:6'RFF+BM:1'EQD+CN+HASU5014810+45G1+++5'NAD+CA+HSD:172:20'LOC+147+0301582::5'MEA+WT++KGM:29123'LOC+9+BRSSA:139:6+TECSV'LOC+11+COCTG:139:6+TCC'LOC+83+COCTG:139:6'RFF+BM:1'EQD+CN+CLHU4693498+42G1+++5'NAD+CA+MSC:172:20'LOC+147+0301482::5'MEA+WT++KGM:29160'LOC+9+BRSSA:139:6+TECSV'LOC+11+COCTG:139:6+TCC'LOC+83+PECLL:139:6'RFF+BM:1'EQD+CN+TCLU4424005+42G1+++5'NAD+CA+HLC:172:20'LOC+147+0301382::5'MEA+WT++KGM:29183'LOC+9+BRSSA:139:6+TECSV'LOC+11+COCTG:139:6+TCC'LOC+83+COCTG:139:6'RFF+BM:1'EQD+CN+...
In this matching and replacing I have only one empty line:
and I want to have all containers in one column.
My expected output to be:
SUDU8505087
UACU5363691
TGHU9702812
HLXU6240079
HASU4556735
SUDU6787839
TGHU9861619
HASU5014810
CLHU4693498
TCLU4424005
Replace
.*?([A-Z]{4}\d{7})((?![A-Z]{4}\d{7}).)*
by
$1\n
and get
SUDU8505087
UACU5363691
TGHU9702812
HLXU6240079
HASU4556735
SUDU6787839
TGHU9861619
HASU5014810
CLHU4693498
TCLU4424005
Replace
.*?([A-Z]{4}\d{7})
by
$1\n
and get
SUDU8505087
UACU5363691
TGHU9702812
HLXU6240079
HASU4556735
SUDU6787839
TGHU9861619
HASU5014810
CLHU4693498
TCLU4424005
+42G1+++5'NAD+CA+HLC:172:20'LOC+147+0301382::5'MEA+WT++KGM:29183'LOC+9+BRSSA:139:6+TECSV'LOC+11+COCTG:139:6+TCC'LOC+83+COCTG:139:6'RFF+BM:1'EQD+CN+...
Then remove the last line by hand.

Move the beginning of a line at its end for each line in Notepad++ or UltraEdit

I have a question using Notepad++ or UltraEdit to copy the first or two first columns of my file and add them to the end. The problem would be easy if my file had regular columns, but it doesn't. Here is what it looks like:
18,-8 22 30.82,70 2 34.25,
19,-8 23 10,70 1 42.97,
20,-8 23 40.42,700 51.85,
21,-8 24 10.1,70 0 0.89,
22,-8 24 40.05,69 59 10.09,
...
1318,-7 27 26.82,78 3 16.1,
I'd like my id numbers to be copied at the end of each line. I have tried the replace tools, but didn't find the correct expression in order to catch the beginning of the line.
One possible solution using Notepad++
Assuming that the columns are separated by commas ,:
You can record a macro that will execute the following steps:
Press the Home / Pos1 key to set the caret to the first position in the current line
Search for , two times (or how many columns should be copied to the end of the line
Press Shift + Home to select the text from the beginning of the line to the possition of the caret
Copy the selected text by pressing Ctrl + C
Press End to set the caret to the end of the current line
Paste the copied text to the end of the line by pressing Ctrl + V
Move the caret to the next line by pressing ↓ (Arrow Down)
Run the macro till the end of the file is reached.
PS: Always backup your data before running the macro!
Try the following in Regular Expression search and replace mode:
Find:
^([0-9]*)(.*)$
Replace:
\1\2\1
Explanation
^ and $ are anchors for the beginning and end of a line, respectively.
^([0-9]*) matches from the start of the line until a non-digit is met (in your case, a comma). The
( and ) make the matched expression available for usage in the Replace box via \1.
(.*)$ matches everything else until the end of the line. Again, the brackets make the matched expression accessible, this time via \2.
So, since you want a copy of the first column at the end of the line, you can just do:
Replace: \1\2\1
If, instead, you wanted to move the first column to the end, you might want to do
Find: ^([0-9]*),(.*)$
Replace:
\2\1
Note the added comma in the find expression. Without it, the comma after the first column of data would get matched as part of the (.*) expression and would thus remain at the beginning of your lines when your lines gets replaced with \2\1.
edit Oops, others have beaten me to (basically) the same answer, but I hope the explanation is helpful nevertheless.
Find what: ^([0-9]*)(.*)
Replace with: \2\1
Hope this serves you.
In Notepad++:
Open the Replace dialog: Search -> Replace...
To copy the first field to the end:
Find what: ^([0-9]+,)(.*)$
Replace with: \1\2\1
To move the first field to the end:
Find what: ^([0-9]+,)(.*)$
Replace with: \2\1

Regex to match all lines starting with a specific string

I have this very long cfg file, where I need to find the latest occurrence of a line starting with a specific string. An example of the cfg file:
...
# format: - search.index.[number] = [search field]:element.qualifier
...
search.index.1 = author:dc.contributor.*
...
search.index.12 = language:dc.language.iso
...
jspui.search.index.display.1 = ANY
...
I need to be able to get the last occurrence of the line starting with search.index.[number] , more specific: I need that number. For the above snippet, that number would be 12.
As you can see, there are other lines too containing that pattern, but I do not want to match those.
I'm using Groovy as a programming/scripting language.
Any help is appreciated!
Have you tried:
def m = lines =~ /(?m)^search\.index\.(\d+)/
m[ -1 ][ 1 ]
Try this as your expression :
^search\.index\.(\d+)/
And then with Groovy you can get your result with:
matcher[0][0]
Here is an explanation page.
I don't think you should go for it but...
If you can do a multi-line search (anyway you have to here), the only way would be to read the file backward. So first, eat everything with a .* (om nom nom)(if you can make the dot match all, (?:.|\s)* if you can't). Now match your pattern search\.index\.(\d+). And you want to match this pattern at the beginning of a line: (?:^|\n) (hoping you're not using some crazy format that doesn't use \n as new line character).
So...
(?:.|\s)*(?:^|\n)search\.index\.(\d+)
The number should be in the 1st matching group. (Test in JavaScript)
PS: I don't know groovy, so sorry if it's totally not appropriate.
Edit:
This should also work:
search\.index\.(\d+)(?!(?:.|\s)*?(?:^|\n)search\.index\.\d+)

RegEx Lookaround issue

I am using Powershell 2.0. I have file names like my_file_name_01012013_111546.xls. I am trying to get my_file_name.xls. I have tried:
.*(?=_.{8}_.{6})
which returns my_file_name. However, when I try
.*(?=_.{8}_.{6}).{3}
it returns my_file_name_01.
I can't figure out how to get the extension (which can be any 3 characters. The time/date part will always be _ 8 characters _ 6 characters.
I've looked at a ton of examples and tried a bunch of things, but no luck.
If you just want to find the name and extension, you probably want something like this: ^(.*)_[0-9]{8}_[0-9]{6}(\..{3})$
my_file_name will be in backreference 1 and .xls in backreference 2.
If you want to remove everything else and return the answer, you want to substitute the "numbers" with nothing: 'my_file_name_01012013_111546.xls' -replace '_[0-9]{8}_[0-9]{6}' ''. You can't simply pull two bits (name and extension) of the string out as one match - regex patterns match contiguous chunks only.
try this ( not tested), but it should works for any 'my_file_name' lenght , any lenght of digit and any kind of extension.
"my_file_name_01012013_111546.xls" -replace '(?<=[\D_]*)(_[\d_]*)(\..*)','$2'
non regex solution:
$a = "my_file_name_01012013_111546.xls"
$a.replace( ($a.substring( ($a.LastIndexOf('.') - 16 ) , 16 )),"")
The original regex you specified returns the maximum match that has 14 characters after it (you can change to (?=.{14}) who is the same).
Once you've changed it, it returns the maximum match that has 14 characters after it + the next 3 characters. This is why you're getting this result.
The approach described by Inductiveload is probably better in case you can use backreferences. I'd use the following regex: (.*)[_\d]{16}\.(.*) Otherwise, I'd do it in two separate stages
get the initial part
get the extension
The reason you get my_filename_01 when you add that is because lookaheads are zero-width. This means that they do not consume characters in the string.
As you stated, .*(?=_.{8}_.{6}) matches my_file_name because that string is is followed by something matching _.{8}_.{6}, however once that match is found, you've only consumed my_file_name, so the addition of .{3} will then consume the next 3 characters, namely _01.
As for a regex that would fit your needs, others have posted viable alternatives.

Regex Find & Replace - Find string of any character and specific length then replace 1 character

I have a document that has a range of numbers like this:
0300010000000394001001,27
0300010000000394001002,0
0300010000000394002001,182
0300010000000394002002,51
0300010000000394003001,156
0300010000000394003002,40
I need to find the new line character and replace with a number of spaces depending on the string length.
If it has 24 characters like this - 0300010000000394001002,0 then I need to replace the new line character at the end with 5 blank spaces.
If it has 25 characters like this - 0300010000000394002002,51 then I need to replace the new line character at the end with 4 blank spaces and so on.
In my text editor I can use find and replace. I search for the line length by ^(.|\s){24}$ for 24 characters - but this will obviously replace the whole line and I only need to replace the new line character at the end.
I want to specify a new line character AFTER ^(.|\s){24}$. Is this possible?
It sounds like you need two things.
Multi-line Mode (See "Using ^ and $ as Start of Line and...")
Backreferencing
Most editors that support regex support these naturally, but you'll have to let us know what editor you're using for us to be specific. Without knowing what editor you're using, all I can say is that you want to do some combination of the following:
regex subst
----- -----
^(.{24})\n $1 <-- there are spaces here
^(.{24})^M \1 <-- there are spaces here
^(.{24})\s ^^^^^