Finding the string or substring within nth occurrence in a line - regex

I would like to find the third occurrence of string inside quotes in order to replace this string or part of this string. This is a typical line I have to deal with:
"/activities/absenceactivity/lang/#LANG_ID#/prop_dialog.php";"BA2_PD_SCR";"Opis
dogodka";"Event description";"Descrição do Evento";"ÐпиÑ
подÑÑ";"";"č®vykio aprašymas";"Descripción del evento";"";
I know that "([^"]*)" shows every occurrence of text and quotes but I would like to get just the third one, in this example "Opis dogodka" in order to perform Search & Replace in Sublime Text.
Problem is to find the third occurrence of string within the quotes, replace it entirely or just partially and make sure that the Regex provides also a solution for an empty
""
strings.
Thank you.

I'm sure there are ways to simplify this further, but if you're ok with brute force:
Sublime command:
Find: "[^"]*";"[^"]*";"([^"]*)".*
Replace: $1
NP++:
Find what: "([^"]*)";"([^"]*)";"([^"]*)".*
Replace with: $3
sed:
sed 's/"\([^"]*\)";"\([^"]*\)";"\([^"]*\)".*/\3/'

You can use {} pattern repetition:
/(?:"([^"]*)";){3}/ # first match group will be 'Opis dogodka'
Demo
Or, use a global type match and then take the third match. This might require logic such as slicing or looping:
/"([^"]*)";/g
Demo 2
Or, just manually put in the first two patterns to skip:
/^"[^"]*";"[^"]*";("[^"]*";)/
Demo 3
Or combine repetition to skip the first n-1 then capture the nth match:
/^(?:"[^"]*";){2}("[^"]*";)/
Demo 4

Related

Remove columns from CSV

I don't know anything about Notepad++ Regex.
This is the data I have in my CSV:
6454345|User1-2ds3|62562012032|324|148|9c1fe63ccd3ab234892beaf71f022be2e06b6cd1
3305611|User2-42g563dgsdbf|22023001345|0|0|c36dedfa12634e33ca8bc0ef4703c92b73d9c433
8749412|User3-9|xgs|f|98906504456|1534|51564|411b0fdf54fe29745897288c6ad699f7be30f389
How can I use a Regex to remove the 5th and 6th column? The numbers in the 5th and 6th column are variable in length.
Another problem is the User row can also contain a |, to make it even worse.
I can use a macro to fix this, but the file is a few millions lines long.
This is the final result I want to achieve:
6454345|User1-2ds3|62562012032|9c1fe63ccd3ab234892beaf71f022be2e06b6cd1
3305611|User2-42g563dgsdbf|22023001345|c36dedfa12634e33ca8bc0ef4703c92b73d9c433
8749412|User3-9|xgs|f|98906504456|411b0fdf54fe29745897288c6ad699f7be30f389
I am open for suggestions on how to do this with another program, command line utility, either Linux or Windows.
Match \|[^|]+\|[^|]+(\|[^|]+$)
Repalce $1
Basically, Anchor to the end of the line, and remove columns [-1] and [-2] (I assume columns can't be empty. Replace + with * if they can)
If you need finer detail then that, I'd recommend writing a Java or Python script to manual parse and rewrite the file for you.
I've captured three groups and given them names. If you use a replace utility like sed or vimregex, you can replace remove with nothing. Or you can use a programming language to concatenate keep_before and keep_after for the desired result.
^(?<keep_before>(?:[^|]+\|){3})(?<remove>(?:[^|]+\|){2})(?<keep_after>.*)$
You may have to remove the group namings and use \1 etc. instead, depending on what environment you use.
Demo
From Notepad++ hit ctrl + h then enter the following in the dialog:
Find what: \|\d+\|\d+(\|[0-9a-z]+)$
Replace with: $1
Search mode: Regular Expression
Click replace and done.
Regex Explain:
\|\d+ : match 1st string that starts with | followed by number
\|\d+ : match 2nd string that starts with | followed by number
(\|[0-9a-z]+): match and capture the string after the 2nd number.
$ : This is will force regex search to match the end of the string.
Replacement:
$1 : replace the found string with whatever we have between the captured group which is whatever we have between the parentheses (\|[0-9a-z]+)

regex replacing at beginning of line in notepad++

for example I have txt with content
qqqqaa
qqss
ss00
I want to replace only one q at the beginning of line, that is to get
qqqaa
qss
ss00
I tried replace ^q in notepad++. But After I click replaceAll, I got
aa
ss
ss00
What is wrong? Is my regex wrong? What is the correct form?
The issue is that Notepad++ Replace All functionality replaces in a loop using the modified document.
The solution is to actually consume what we need to replace and keep within one regex expression like
^q(q*)
and replace with $1.
The pattern will find a q at the beginning of the line and then will capture into Group 1 zero or more occurrences of q after the first q, and in the replacement part the $1 will insert these qs inside Group 1 back into the string.
You can use ^q(.+) and replace with $1 if you also want to replace single q's.

VIM - Replace based on a search regex

I've got a file with several (1000+) records like :
lbc3.*'
ssa2.*'
lie1.*'
sld0.*'
ssdasd.*'
I can find them all by :
/s[w|l].*[0-9].*$
What i want to do is to replace the final part of each pattern found with \.*'
I can't do :%s//s[w|l].*[0-9].*$/\\\\\.\*' because it'll replace all the string, and what i need is only replace the end of it from
.'
to
\.'
So the file output is llike :
lbc3\\.*'
ssa2\\.*'
lie1\\.*'
sld0\\.*'
ssdasd\\.*'
Thanks.
In general, the solution is to use a capture. Put \(...\) around the part of the regex that matches what you want to keep, and use \1 to include whatever matched that part of the regex in the replacement string:
s/\(s[w|l].*[0-9].*\)\.\*'$/\1\\.*'/
Since you're really just inserting a backslash between two strings that you aren't changing, you could use a second set of parens and \2 for the second one:
s/\(s[w|l].*[0-9].*\)\(\.\*'\)$/\1\\\2/
Alternatively, you could use \zs and \ze to delimit just the part of the string you want to replace:
s/s[w|l].*p0-9].*\zs\ze\*\'$/\\/

Notepad++ can find but can not replace a string

I have a string like this:
INSERT INTO `test` VALUES (999,'stuff',NULL,'2014-12-01 08:09:10');
What I want is remove some string to get the value between single quotes:
stuff
I've used 2 regex:
^.*\d,'
and
',NULL.*$
When I use count in Npp, it returns Count: 1 match., but when I use replace, it returns Replace: no occurrence was found.
Do you have any idea?
You just need to combine both regexes using | OR operator. Finally in the replacement part, you need to give an empty string, so that it would remove all the matched characters.
^.*\d,'|'.*$
OR
^.*\d,'|',NULL.*$
DEMO
OR
Find What: ^.*\d,'([^']*)',NULL.*$
Replace With : \1
DEMO

find a single quote at the end of a line starting with "mySqlQueryToArray"

I'm trying to use regex to find single quotes (so I can turn them all into double quotes) anywhere in a line that starts with mySqlQueryToArray (a function that makes a query to a SQL DB). I'm doing the regex in Sublime Text 3 which I'm pretty sure uses Perl Regex. I would like to have my regex match with every single quote in a line so for example I might have the line:
mySqlQueryToArray($con, "SELECT * FROM Template WHERE Name='$name'");
I want the regex to match in that line both of the quotes around $name but no other characters in that line. I've been trying to use (?<=mySqlQueryToArray.*)' but it tells me that the look behind assertion is invalid. I also tried (?<=mySqlQueryToArray)(?<=.*)' but that's also invalid. Can someone guide me to a regex that will accomplish what I need?
To find any number of single quotes in a line starting with your keyword you can use the \G anchor ("end of last match") by replacing:
(^\h*mySqlQueryToArray|(?!^)\G)([^\n\r']*)'
With \1\2<replacement>: see demo here.
Explanation
( ^\h*mySqlQueryToArray # beginning of line: check the keyword is here
| (?!^)\G ) # if not at the BOL, check we did match sth on this line
( [^\n\r']* ) ' # capture everything until the next single quote
The general idea is to match everything until the next single quote with ([^\n\r']*)' in order to replace it with \2<replacement>, but do so only if this everything is:
right after the beginning keyword (^mySqlQueryToArray), or
after the end of the last match ((?!^)\G): in that case we know we have the keyword and are on a relevant line.
\h* accounts for any started indent, as suggested by Xælias (\h being shortcut for any kind of horizontal whitespace).
https://stackoverflow.com/a/25331428/3933728 is a better answer.
I'm not good enough with RegEx nor ST to do this in one step. But I can do it in two:
1/ Search for all mySqlQueryToArray strings
Open the search panel: ⌘F or Find->Find...
Make sure you have the Regex (.* ) button selected (bottom left) and the wrap selector (all other should be off)
Search for: ^\s*mySqlQueryToArray.*$
^ beginning of line
\s* any indentation
mySqlQueryToArray your call
.* whatever is behind
$ end of line
Click on Find All
This will select every occurrence of what you want to modify.
2/ Enter the replace mode
⌥⌘F or Find->Replace...
This time, make sure that wrap, Regex AND In selection are active .
Them search for '([^']*)' and replace with "\1".
' are your single quotes
(...) si the capturing block, referenced by \1 in the replace field
[^']* is for any character that is not a single quote, repeated
Then hit Replace All
I know this is a little more complex that the other answer, but this one tackles cases where your line would contain several single-quoted string. Like this:
mySqlQueryToArray($con, "SELECT * FROM Template WHERE Name='$name' and Value='1234'");
If this is too much, I guess something like find: (?<=mySqlQueryToArray)(.*?)'([^']*)'(.*?) and replace it with \1"\2"\3 will be enough.
You can use a regex like this:
(mySqlQueryToArray.*?)'(.*?)'(.*)
Working demo
Check the substitution section.
You can use \K, see this regex:
mySqlQueryToArray[^']*\K'(.*?)'
Here is a regex demo.