Regex to search a multi-line field in a text file - regex

I have a text file of logs. In it I am interested in searching a field using some regular expression (I use notepad++ on Win, but even use vim on Ubuntu to parse/read this log text file so either one is ok)
The text file has entries as below.
src.type= DEVICE_1 <-- there is a space and then a newline char after the last letter which is 1
dst.type= ZONE_1
someparam1
src.type= DEVICE_1
dst.type= ZONE_2
someparam2
Such entries keep repeating in the log text file.
I am interested in finding those lines which have DEVICE_1 in it but only for those occurrences which have a dst.type= ZONE_2 after it i.e.
I intend to find
src.type= DEVICE_1
dst.type= ZONE_2
but not
src.type= DEVICE_1
dst.type= ZONE_1
Notepad++ allows searching using keywords as regexes. I could get a working regex or any other way (not necessarily involving regexes) to find such occurrences I am looking for in the text file.
I tried below in notepad++ search using regex without success:
src.type= DEVICE_1 \ndst.type= ZONE_2
Also tried [ ] character class.
How can I search for what I am looking to find?

In Vim, the following pattern seems to match what you want:
DEVICE_1\s*\n.*ZONE_2
Use /DEVICE_1\s*\n.*ZONE_2 to jump to the next match.
Use :g/DEVICE_1\s*\n.*ZONE_2/command to execute command on each match.
Use :vim DEVICE_1\s*\n.*ZONE_2 % | cw to list all the matches in the quicfix window.
Note that you can easily reuse the latest search pattern with //. It is a common strategy to work on your search pattern with /foo and, once you are satisfied, perform a substitution like this:
:%s//bar

In Notepad++, use the following regex, with the ". matches newlines" checkbox enabled:
src.type= DEVICE_1\s+dst.type= ZONE_2

There you go for Vim:
/^\zssrc.type= DEVICE_1\ze\_.\{2,2}\_^dst.type= ZONE_2$/
Breakdown of important expressions:
\zs - Start match here (will be highlighted from here);
\ze - End match here (will be highlighted to here);
\_. - same as ., but new line is also included;
\_^ - like ^, but \_ is required because we are in the middle
of regular expression.
For others, I'd refer you to Vim's documentation.

Related

Add word to the end of a each line in all files using regex notepad++

I have 10 files of 100 lines each. I need to do their translation.
This is one line in some file: "Client Notes,141"
and another simular word line in another file "Client Notes,700"
I want to modify all the concerned lines in all the 10 files to be:
"Client Notes,141,KundLinjer"
"Client Notes,700,KundLinjer"
"Client Notes,770,KundLinjer"
I tried with regular expressions and macros but I couldn figure it out
Thanks for help!
Assuming it is Notepad++ and not Notepad2:
Ctrl+H
Find what: \bClient Notes,\h*\d+\K
Replace with: ,KundLinjer
check Wrap around
check Regular expression
Replace all
Explanation:
\b : word boundary
Client Notes, : literally
\h* : 0 or more horizontal spaces
\d+ : 1 or more digits
\K : forget all we have seen until this position
Result for given example:
Client Notes,141,KundLinjer
Client Notes,700,KundLinjer
You may try the following find and replace:
Find:
^Client Notes,(\\d+)
Replace:
Client Notes,$1,KundLinjer
To make it apply to multiple files, use the directory option in the dialog to select the folder which contains the 10 files. If the 10 files be scattered in multiple places, then create a single folder containing those files. Also, make sure you do the find and replace with regex mode enabled.
In nodepad++ you can use the shortcut: Ctrl+H and under the Replace tab, search mode select Regular expression
Find:
(Client Notes, \d+)
Replace:
\1, KundLinjer
You have the option to:
Replace All in All Opened Documents.
Explanation
The brackets (client Notes, \d+) will 'capture' anything within the brackets to be used in our replacement \1 (if you had more captures, you could use \2, \3 etc..)
\d+ means any digit, one or more times (in your case 141, 700).
So we are replacing the text "Client Notes (AnyNumber)" with "Client Notes (AnyNumber) KundLinjer".
You could also replace (.*) with \1 KundLinjer if you want to add KundLinjer to all lines, no matter what.

Remove columns from CSV

I don't know anything about Notepad++ Regex.
This is the data I have in my CSV:
6454345|User1-2ds3|62562012032|324|148|9c1fe63ccd3ab234892beaf71f022be2e06b6cd1
3305611|User2-42g563dgsdbf|22023001345|0|0|c36dedfa12634e33ca8bc0ef4703c92b73d9c433
8749412|User3-9|xgs|f|98906504456|1534|51564|411b0fdf54fe29745897288c6ad699f7be30f389
How can I use a Regex to remove the 5th and 6th column? The numbers in the 5th and 6th column are variable in length.
Another problem is the User row can also contain a |, to make it even worse.
I can use a macro to fix this, but the file is a few millions lines long.
This is the final result I want to achieve:
6454345|User1-2ds3|62562012032|9c1fe63ccd3ab234892beaf71f022be2e06b6cd1
3305611|User2-42g563dgsdbf|22023001345|c36dedfa12634e33ca8bc0ef4703c92b73d9c433
8749412|User3-9|xgs|f|98906504456|411b0fdf54fe29745897288c6ad699f7be30f389
I am open for suggestions on how to do this with another program, command line utility, either Linux or Windows.
Match \|[^|]+\|[^|]+(\|[^|]+$)
Repalce $1
Basically, Anchor to the end of the line, and remove columns [-1] and [-2] (I assume columns can't be empty. Replace + with * if they can)
If you need finer detail then that, I'd recommend writing a Java or Python script to manual parse and rewrite the file for you.
I've captured three groups and given them names. If you use a replace utility like sed or vimregex, you can replace remove with nothing. Or you can use a programming language to concatenate keep_before and keep_after for the desired result.
^(?<keep_before>(?:[^|]+\|){3})(?<remove>(?:[^|]+\|){2})(?<keep_after>.*)$
You may have to remove the group namings and use \1 etc. instead, depending on what environment you use.
Demo
From Notepad++ hit ctrl + h then enter the following in the dialog:
Find what: \|\d+\|\d+(\|[0-9a-z]+)$
Replace with: $1
Search mode: Regular Expression
Click replace and done.
Regex Explain:
\|\d+ : match 1st string that starts with | followed by number
\|\d+ : match 2nd string that starts with | followed by number
(\|[0-9a-z]+): match and capture the string after the 2nd number.
$ : This is will force regex search to match the end of the string.
Replacement:
$1 : replace the found string with whatever we have between the captured group which is whatever we have between the parentheses (\|[0-9a-z]+)

Replace spaces with dashes, but only for text found between quotes in the text TAGS=""

Is it possible to do the following with Notepad++'s FIND/REPLACE function?
I have a text file where I want to replace spaces found in between the quotes of the text TAGS="*" with dashes.
Example:
TAGS="tag1,tag2,tag 3,tag4,tag 5"
should become:
TAGS="tag1,tag2,tag-3,tag4,tag-5"
So far I can find the text I want using:
FIND WHAT: TAGS="*"
But how do I have it replace spaces with dashes?
--------------------- UPDATE -----------------
My question before used tag1,tag2, but the actual data in the file does not have numbers, only words.
These following are three actual lines from the file. I need to find spaces between the quotes of TAGS="*" and replace only those spaces with dashes:
<DT>Kundalini Yoga - Pranayama - Breathing Techniques
<DT>40 Ways The World Makes Awesome Hot Dogs | Food Republic
<DT>Fix Windows boot, Fix your Boot sequence with BcdEdit, BootSect, BCDboot, WINRE,...
In the lines above, there are 3 instances of TAGS="*" which I've extracted here to make them easy to see:
TAGS="kundalini,yoga,fire breath,breathing,breath of fire"
TAGS="recipe,cooking,hot dog"
TAGS="windows stuff,bcdboot,bcdsect,repair,boot"
which, after the FIND/REPLACE, should look like:
TAGS="kundalini,yoga,fire-breath,breathing,breath-of-fire"
TAGS="recipe,cooking,hot-dog"
TAGS="windows-stuff,bcdboot,bcdsect,repair,boot"
Use the following regex:
Find what: (?:\G(?!^)|\bTAGS=")[^\s"]*\K\s+
Replace with: -
Details:
(?:\G(?!^)|\bTAGS=") - Finds either the end of the previous successful match (\G(?!^)) or
[^\s"]* - 0+ chars other than a space and "
\K - match reset operator discarding the text matched so far
\s+ - 1+ whitespaces
See the screenshot with settings below:
Use the following find/replace pattern in regex mode, and do a replace all to cover the entire document (or selection which you want). Note that I make no effort to check for TAGS="...", under the assumption that you don't have strings of the form tag123 or tag 123 anywhere else in your document.
Find:
tag\s+(\d*)
Replace:
tag-$1
Input:
tag1,tag2,tag 3,tag4,tag 5
Output:
tag1,tag2,tag-3,tag4,tag-5

Notepad++ replace text with RegEx search result

I would like replace a standard string in a file, with another that is a result of a regular expression. The standard text looks like:
<xsl:variable name="ServiceCode" select="###"/>
I would like to replace ### with a servicecode, that I can find later in the same file, from this URL:
<a href="/Services/xyz" target="_self">
The regular expression (?<=\/Services\/)(.*)(?=\" )
returns the required service code "xyz".
So, I opened Notepad++, added "###" to the "Find what" and this RegEx to the "Replace with" section, and expected that the ### text will be replaced by xyz.
But I got this result:
<xsl:variable name="ServiceCode" select="?<=/Services/.*?=" "/>
I am new to RegEx, do I need to use different syntax in the replace section than I use to find a string? Can someone give me a hint how to achieve the required result? The goal is to standardize tons of files with similar structure as now all servicecodes are hardcoded in several places in the file. Thanks.
You could use a lookahead for capturing the part ahead.
Search for: (?s)###(?=.*/Services/([^"]+)") and replace with: $1
(?s) makes the dot also match newlines (there is also a checkbox available in np++)
[^"] matches a character that is not "
The replacement $1 corresponds to capture of first parenthesized subpattern.
I am no expert at RegEx but I think I may be able to help. It looks like you might be going at this the wrong way. The regex search that you are using would normally work like this:
The parenthesis () in RegEx allow you to select part of your search and use that in the replace section.
You place (?<=\/Services\/)(.*)(?=\" ) into the "Find what" section in Notepad++.
Then in the "Replace with" section you could use \1 or \2 or \3 to replace the contents of your search with what was found in the (?<=\/Services\/) or (.*) or (?=\" ) searches respectively.
Depending on the structure of your files, you would need to use a RegEx search that selects both lines of code (and the specific parts you need), then use a combination of \1\2\3 etc. to replace everything exactly how it was, except for the ### which you could replace with the \number associated with xyz.
See http://docs.notepad-plus-plus.org/index.php/Regular_Expressions for more info.

find a single quote at the end of a line starting with "mySqlQueryToArray"

I'm trying to use regex to find single quotes (so I can turn them all into double quotes) anywhere in a line that starts with mySqlQueryToArray (a function that makes a query to a SQL DB). I'm doing the regex in Sublime Text 3 which I'm pretty sure uses Perl Regex. I would like to have my regex match with every single quote in a line so for example I might have the line:
mySqlQueryToArray($con, "SELECT * FROM Template WHERE Name='$name'");
I want the regex to match in that line both of the quotes around $name but no other characters in that line. I've been trying to use (?<=mySqlQueryToArray.*)' but it tells me that the look behind assertion is invalid. I also tried (?<=mySqlQueryToArray)(?<=.*)' but that's also invalid. Can someone guide me to a regex that will accomplish what I need?
To find any number of single quotes in a line starting with your keyword you can use the \G anchor ("end of last match") by replacing:
(^\h*mySqlQueryToArray|(?!^)\G)([^\n\r']*)'
With \1\2<replacement>: see demo here.
Explanation
( ^\h*mySqlQueryToArray # beginning of line: check the keyword is here
| (?!^)\G ) # if not at the BOL, check we did match sth on this line
( [^\n\r']* ) ' # capture everything until the next single quote
The general idea is to match everything until the next single quote with ([^\n\r']*)' in order to replace it with \2<replacement>, but do so only if this everything is:
right after the beginning keyword (^mySqlQueryToArray), or
after the end of the last match ((?!^)\G): in that case we know we have the keyword and are on a relevant line.
\h* accounts for any started indent, as suggested by Xælias (\h being shortcut for any kind of horizontal whitespace).
https://stackoverflow.com/a/25331428/3933728 is a better answer.
I'm not good enough with RegEx nor ST to do this in one step. But I can do it in two:
1/ Search for all mySqlQueryToArray strings
Open the search panel: ⌘F or Find->Find...
Make sure you have the Regex (.* ) button selected (bottom left) and the wrap selector (all other should be off)
Search for: ^\s*mySqlQueryToArray.*$
^ beginning of line
\s* any indentation
mySqlQueryToArray your call
.* whatever is behind
$ end of line
Click on Find All
This will select every occurrence of what you want to modify.
2/ Enter the replace mode
⌥⌘F or Find->Replace...
This time, make sure that wrap, Regex AND In selection are active .
Them search for '([^']*)' and replace with "\1".
' are your single quotes
(...) si the capturing block, referenced by \1 in the replace field
[^']* is for any character that is not a single quote, repeated
Then hit Replace All
I know this is a little more complex that the other answer, but this one tackles cases where your line would contain several single-quoted string. Like this:
mySqlQueryToArray($con, "SELECT * FROM Template WHERE Name='$name' and Value='1234'");
If this is too much, I guess something like find: (?<=mySqlQueryToArray)(.*?)'([^']*)'(.*?) and replace it with \1"\2"\3 will be enough.
You can use a regex like this:
(mySqlQueryToArray.*?)'(.*?)'(.*)
Working demo
Check the substitution section.
You can use \K, see this regex:
mySqlQueryToArray[^']*\K'(.*?)'
Here is a regex demo.