Looking to modify text depending on a certain character string being present on the previous line - regex

I have text of the below format that I wish to modify using regex and notepad++.
Format of text:
16.232.39.195
dwdwevevveeve
148.92.235.232
49.58.203.107
130.221.168.79
vfeevvewdwdqwdq
2.170.109.98
254.250.64.253
10.102.107.236
155.146.118.222
ntyrovgmnfewijw
47.80.127.125
ewfwfmbbrmbve
26.232.99.92
10.0.46.127
229.154.77.234
35.15.165.153
fewomwvmvvm
157.27.74.183
78.244.169.225
114.7.107.67
xfevwwf
13.118.248.99
wefwfwwf
116.102.16.22
wfgheegfwf
22.4.118.222
61.205.56.191
Explanation of how I want to modify the above data format using regex and notepadd++ :
The above data format is a list of IP addresses. You will note that on the line after some of the IP addresses, there is a text string, and after some other IP addresses, there is no text string. For the IP addresses that currently don't have a text string in the following line, I want to insert a text string. Let's say for illustrative purposes I want to insert the text string 'ABC123' in the line after each IP address that currently isn't followed by a text string. If I could do this successfully, my data would be modified to look like the below:
16.232.39.195
dwdwevevveeve
148.92.235.232
ABC123
49.58.203.107
ABC123
130.221.168.79
vfeevvewdwdqwdq
2.170.109.98
ABC123
254.250.64.253
ABC123
10.102.107.236
ABC123
155.146.118.222
ntyrovgmnfewijw
47.80.127.125
ewfwfmbbrmbve
26.232.99.92
ABC123
10.0.46.127
ABC123
229.154.77.234
ABC123
35.15.165.153
fewomwvmvvm
157.27.74.183
ABC123
78.244.169.225
ABC123
114.7.107.67
xfevwwf
13.118.248.99
wefwfwwf
116.102.16.22
wfgheegfwf
22.4.118.222
ABC123
61.205.56.191
ABC123
So in algorithmic terms I want to do something like:
for IP address on line A:
if string on line B = IP address
then insert a new line after line A that contains the string 'ABC123'
elseif string on line B = character string
then do nothing
repeat the above process for the next IP address in the document.
I know that if all of the text were on a single line, I would be able to solve the problem with the below regex / notepad++ method:
Find: \d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\s(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})
replace with: \1 ABC123 \2
But it is the fact that each IP address and text string is on a separate line that I am not sure how to solve.
Any advice on the best way to solve this would be greatly appreciated.

Ctrl+H
Find what: (\d{1,3}(?:\.\d{1,3}){3})\K(?=\R(?1)|\z)
Replace with: \nABC123
TICK Wrap around
SELECT Regular expression
Replace all
Explanation:
( # group 1
\d{1,3} # 1 upto 3 digits
(?: # non capture group
\. # a dot
\d{1,3} # 1 upto 3 digits
){3} # end group, must appear 3 times
) # end group 1
\K # forget all we have seen until this position
(?= # positive lookahead, make sure we have after:
\R # any kind of linebreak
(?1) # same pattern as used in group 1 (IP address)
| # OR
\z # end of file
) # end lookahead
Replacement:
\n # linefeed, you can use \r\n for Windows EOL
ABC123
Screenshot (before):
Screenshot (after):

In your sample data, IP addresses always have a space at the end when followed by text. Please note this is not a valid IP address.
In case this is correct in the initial data of your question, you can apply the following steps:
Ctrl+H
Find what: (^(?:[0-9]{1,3}\.){3}[0-9]{1,3}$)
Replace with: \1\r\nABC123
Search mode: Regular expression
Click on Replace All

Related

Regex to disregard partial matches across lines / matching too much

I have three lines of tab-separated values:
SELL 2022-06-28 12:42:27 39.42 0.29 11.43180000 0.00003582
BUY 2022-06-28 12:27:22 39.30 0.10 3.93000000 0.00001233
_____2022-06-28 12:27:22 39.30 0.19 7.46700000 0.00002342
The first two have 'SELL' or 'BUY' as first value but the third one has not, hence a Tab mark where I wrote ______:
I would like to capture the following using Regex:
My expression ^(BUY|SELL).+?\r\n\t does not work as it gets me this:
I do know why outputs this - adding an lazy-maker '?' obviously won't help. I don't get lookarounds to work either, if they are the right means at all. I need something like 'Match \r\n\t only or \r\n(?:^\t) at the end of each line'.
The final goal is to make the three lines look at this at the end, so I will need to replace the match with capturing groups:
Can anyone point me to the right direction?
Ctrl+H
Find what: ^(BUY|SELL).+\R\K\t
Replace with: $1\t
CHECK Match case
CHECK Wrap around
CHECK Regular expression
UNCHECK . matches newline
Replace all
Explanation:
^ # beginning of line
(BUY|SELL) # group 1, BUY or SELL
.+ # 1 or more any character but newline
\R # any kind of linebreak
\K # forget all we have seen until this position
\t # a tabulation
Replacement:
$1 # content of group 1
\t # a tabulation
Screenshot (before):
Screenshot (after):
You can use the following regex ((BUY|SELL)[^\n]+\n)\s+ and replace with \1\2.
Regex Match Explanation:
((BUY|SELL)[^\n]+\n): Group 1
(BUY|SELL): Group 2
BUY: sequence of characters "BUY" followed by a space
|: or
SELL: sequence of characters "SELL" followed by a space
[^\n]+: any character other than newline
\n: newline character
\s+: any space characters
Regex Replace Explanation:
\1: Reference to Group 1
\2: Reference to Group 2
Check the demo here. Tested on Notepad++ in a private environment too.
Note: Make sure to check the "Regular expression" checkbox.
Regex

Parse SWIFT(Financial) message string with REGEX in Powershell

I am working on a Powershell script to parse SWIFT messages (text based) into a database. I am using REGEX to find the appropriate strings in the file and extract them. I now run into the issue that one of the data fields can have CR/LF characters in the string - in the example below I would need to extract the second line as well.
:61:2111261126D12000,00NTRF11000004217657P//03MT211124101166
JANE DOE 1232
I tested this regex pattern (:61:.*[\r\n].*) in RegExr and it recognizes the [\r\n] characters as requirement to be valid, so my plan was to have two expressions - one with and one without CR/LF characters to identify both messages - either with line break or without - however the code below will return all matches no matter whether a line break in included or not - it seems that PS stops evaluation strings after CR/LF.
$transaction = $swift | select-string ‘:61:.*[\r\n].*’ -AllMatches | % { $_.Matches } | % { $_.Value }
Can I use REGEX for this task or do I have to create a function to read the entire string and check for the next line tag to determine the end of this string?
Describe the first line more accurately, then whatever is left is necessarily the message:
$swift = #'
:61:2111261126D12000,00NTRF11000004217657P//03MT211124101166
JANE DOE 1232
'#
$swift |Select-String -Pattern '(?m):\d+:[^,]+,[^/]+//\d+MT\d+[\s\r\n]+.*$'
The regex pattern breaks down as follows:
(?m) # Multi-line mode, this will make `$` match end-of-line positions as well as end-of-string
:\d+: # 1 or more digits, surrounded by colons, matches `:61:`
[^,]+, # 1 or more non-commas followed by a comma, matches `2111261126D12000,`
[^/]+// # 1 or more non-slashes, followed by 2, matches `00NTRF11000004217657P//`
\d+MT\d+ # 1 or more digits followed by `MT` and more digits, matches `03MT211124101166`
[\s\r\n]+ # 1 or more white-space/CR/LF characters
.*$ # everything until the end of the current line, matches `JANE DOE 1232`
Since we're using [\s\r\n]+ to describe the potential line break, it'll still work when the linebreak is replaced with other whitespace characters.

Notepad++ and regex - how to title case string between two particular strings?

I have hundreds of bib references in a file, and they have the following syntax:
#article{tabata1999precise,
title={Precise synthesis of monosubstituted polyacetylenes using Rh complex catalysts.
Control of solid structure and $\pi$-conjugation length},
author={Tabata, Masayoshi and Sone, Takeyuchi and Sadahiro, Yoshikazu},
journal={Macromolecular chemistry and physics},
volume={200},
number={2},
pages={265--282},
year={1999},
publisher={Wiley Online Library}
}
I would like to title case (aka Proper Case) the journal name in Notepad++ using regular expression. For example, from Macromolecular chemistry and physics to Macromolecular Chemistry and Physics.
I am able to find all instances using:
(?<=journal\=\{).*?(?=\})
but I am unable to change the case via Edit > Convert Case to. Apparently it doesn't work on find all and I have to go one by one.
Next, I tried recording and running a macro but Notepad++ just hangs indefinitely when I try to run it (option to run until the end of the file).
So my question is: does anyone know the replace regex syntax I could use to change the case? Ideally, I would also like to use "|" exclusions for particular words such as " of ", " an ", " the ", etc. I tried to play with some of the examples provided here, but I was not able to integrate it into my look-aheads.
Thank you in advance, I'd appreciate any help.
This works for any number of words:
Ctrl+H
Find what: (?:journal={|\G)\K(?:(\w{4,})|(\w+))(\h*)
Replace with: \u$1\E$2$3
CHECK Wrap around
CHECK Regular expression
Replace all
Explanation:
(?: # non capture group
journal={ # literally
| # OR
\G # restart from last match position
) # end group
\K # forget all we have seen until this position
(?: # non capture group
(\w{4,}) # group 1, a word with 4 or more characters
| # OR
(\w+) # group 2, a word of any length
) # end group
(\h*) # group 3, 0 or more horizontal spaces
Replacement:
\u # uppercased the first letter of the following
$1 # content of group 1
\E # stop the uppercased
$2 # content of group 2
$3 # content of group 3
Screenshot (before):
Screenshot (after):
if the format is always in the form:
journal={Macromolecular chemistry and physics},
i.e. journal followed by 3 words then use the following:
Find: journal={(\w+)\s*(\w+)\s*(\w+)\s*(\w+)
Replace with: journal={\u\1 \u\2 \l\3 \u\4
You can modify that if you have more words to replace by adding more \u\x, where x is the position of the word.
Hope it helps to give you an idea to move forward for a better solution.
\u translates the next letter to uppercase (used for all other words)
\l translates the next letter to lowercase (used for the word "and")
\1 replaces the 1st captured () search group
\2 replaces the 2nd captured () search group
\3 replaces the 3rd captured () search group

Regex finding all commas between two words

I trying to clean up a large .csv file that contains many comma separated words that I need to consolidate parts of. So I have a subsection where I want to change all the commas to slashes. Lets say my file contains this text:
Foo,bar,spam,eggs,extra,parts,spoon,eggs,sudo,test,example,blah,pool
I want to select all commas between the unique words bar and blah. The idea is to then replace the commas with slashes (using find and replace), such that I get this result:
Foo,bar,spam/eggs/extra/parts/spoon/eggs/sudo/test/example,blah,pool
As per #EganWolf input:
How do I include words in the search but exclude them from the selection (for the unique words) and how do I then match only the commas between the words?
Thus far I have only managed to select all the text between the unique words including them:
bar,.*,blah, bar:*, *,blah, (bar:.+?,blah)*,*\2
I experimented with negative look ahead but cant get any search results from my statements.
Using Notepad++, you can do:
Ctrl+H
Find what: (?:\bbar,|\G(?!^))\K([^,]*),(?=.+\bblah\b)
Replace with: $1/
check Wrap around
check Regular expression
UNCHECK . matches newline
Replace all
Explanation:
(?: # start non capture group
\bbar, # word boundary then bar then a comma
| # OR
\G # restart from last match position
(?!^) # negative lookahead, make sure not followed by beginning of line
) # end group
\K # forget all we've seen until this position
([^,]*) # group 1, 0 or more non comma
, # a comma
(?= # positive lookahead
.+ # 1 or more any character but newlie
\bblah\b # word boundary, blah, word boundary
) # end lookahead
Result for given example:
Foo,bar,spam/eggs/extra/parts/spoon/eggs/sudo/test/example,blah,pool
Screen capture:
The following regex will capture the minimally required text to access the commas you want:
(?<=bar,)(.*?(,))*(?=.*?,blah)
See Regex Demo.
If you want to replace the commas, you will need to replace everything in capture group 2. Capture group 0 has your entire match.
An alternative approach would be to split your string by comma to create an array of words. Then join words between bar and blah using / and append the other words joined by ,.
Here is a PowerShell example of split and join:
$a = "Foo,bar,spam,eggs,extra,parts,spoon,eggs,sudo,test,example,blah,pool"
$split = $a -split ","
$slashBegin = $split.indexof("bar")+1
$commaEnd = $split.indexof("blah")-1
$str1 = $split[0..($slashbegin-1)] -join ","
$str2 = $split[($slashbegin)..$commaend] -join "/"
$str3 = $split[($commaend+1)..$split.count] -join ","
#($str1,$str2,$str3) -join ","
Foo,bar,spam/eggs/extra/parts/spoon/eggs/sudo/test/example,blah,pool
This could easily be made into a function with your entire line and keywords as inputs.

Notepad++ remove all non regex'd text

I have a large list of urls that has a unique numeric string in each, the string falls between a / and a ? I would like to remove all other text from notepad++ that are not these strings. for example
www.website.com/dsw/fv3n24nv1e4121v/123456789012?fwe=32432fdwe23f3 would end up as only 123456789012
I have figured out that the following regex \b\d{12}\b will get me the 12 digits, now I just need to remove all of the information that falls each side. I have had a look and found some posts that suggest replace with \t$1 , $1\n
, $1 , and /1 however all of these do the exact oposite of what I want and just remove the 12 digit string.
You can use this regex and replace it with empty string,
^[^ ]*\/|\?[^ ]*$
Demo
Explanation:
^[^ ]*\/ --> Matches anything expect space from start of string till it finds a /
\?[^ ]*$ --> Similarly, this matches anything except space starting from ? till end of input.
Ctrl+H
Find what: ^.*/([^?]+).*$
Replace with: $1
check Wrap around
check Regular expression
UNCHECK . matches newline
Replace all
Explanation:
^ # beginning of line
.* # 0 or more any character but newline
/ # a slash
([^?\r\n]+) # group 1, 1 or more any character that is not ? or line break
.* # 0 or more any character but newline
$ # end of line
Result for given example:
123456789012