vscode regex get lines with duplicate strings separated by comma - regex

I have a vscode file with the following text:
"070230107121","46969","petcarerx","petcarerx"
"070230107121","46970","petcarerx","petcarerx"
"070230107121","47332","petcarerx","petcarerx"
"070230107121","47333","petcarerx","petcarerx"
"070230107121","47333","petcarerx","petcarerx"
"070230107121","46968","petcarerx","petcarerx"
"07087","46968","petcarerx","petcarerx"
"07087","46968","petcarerx","petcarerx"
If I do ctrl+f regex expression ^(.*)(\n\1)+$ it will find the identical lines, so in this case it finds two cases of identical lines:
I am trying to create a regular expression to find all lines where the first column is identical. so in this case; find all rows where the string that comes before the first comma is identical.
This regex expression gets everything before the first comma; ^(.+?),, is there someway I can combine that with my first regex expression to get all lines that are identical before the first comma?

You may use
^(.*?),.*(?:\n\1,.*)+$
Details
^ - start of a line
(.*?) - Capturing group 1 (\1 inline backreference can refer to it from the regex pattern, $1 if you need to refer to it from the replacement pattern)
, - a comma
.* - the rest of the line
(?:\n\1,.*)+ - 1 or more repetitions of a line break, then the same value as in Group 1 and then a comma and the rest of the line
$ - end of a line.
See the regex demo online.
Tested in VS Code:

Related

finding pattern between two "CERRADO}" strings using negative look-ahead

I have a text file containing lines like these:
CERRADO}165856}TICKET}DESCRIPTION}some random text here\r\n
other random text here}158277747\r\n
CERRADO}165856}TICKET}FR2CODE}more random text also here}1587269339\r\n
My ultimate goal is to concatenate those lines not beginnning with "CERRADO}" string with their preceding line. There might be an arbitrary number of lines not beginning with that string on the file. This is the end result:
CERRADO}165856}TICKET}DESCRIPTION}some random text here other random text here}158277747\r\n
CERRADO}165856}TICKET}FR2CODE}more random text also here}1587269339\r\n
My first attempt was to create a simple regex to match those lines.
CERRADO\}.+\r\n(?!CERRADO\})(.+\r\n)+
After having that regex right, to create a matching group and replace it getting rid of the \r\n patterns, here is what I have so far:
The proposed regex matches all the lines in the file and not just the wanted ones.
Any ideas would be appreciated
You may use
\R(?!CERRADO\})
and replace with a space.
The regex matches:
\R - a line break sequence that is...
(?!CERRADO\}) - not followed with CERRADO}.
Or,
^(CERRADO\}.*)\R(?!CERRADO\})
and replace with \1 . This regex matches:
^ - start of a line
(CERRADO\}.*) - Capturing group 1 (later referred to with \1 backreference from the replacement pattern): CERRADO} substring and then the rest of the line
\R - a line break sequence
(?!CERRADO\}) - not followed with CERRADO}.
To make multiple replacements with this one, you will need to hit Replace All several times.

REGEX - How can I select/mark 3 works delimited by tabs on a consecutive lines?

Happy New Year !
I have a problem. I don’t know how to marks\select some words delimited by tabs on a consecutive lines: Recent, Coments and Tags
please see this print screen:
I can easy to put | sign, like: Recent|Comments|Tags but this will select all the words in the files that repeats, and I want only those 3 on those lines.
What I want is to make a regex, to remove all text before those 3 words, and another regex to remove everything after those 3 words.
I try something like this ((?s)((^.*)^.*Recente.*$|^.*Coments.*$|^.*Tags.*^))(.*$)but is not very good. And I have to pay atention, because those words can repeated in the text files, so I have to select\mark exactly those 3, on that 3 consecutive line (that doesn't have any other words on it)
Since you mentioned in a comment that you want to do this in Notepad++ (a fact that should have been mentioned in the question text), and since the screenshot shows a single space after the first two words, you might try this regular expression:
.*\n([ \t]+Recente\s+Coments\s+Tags).*
It will select everything, but capture the 3 words including whitespace between them and whitespace preceding first word on same line.
If you then replace with $1, everything not in the capture group will be removed.
Actually, the spaces after the first two words don't matter to this regex.
Could you please try this in perl:
perl -0777 -ne 'while(m/((\s|\t)+)Recent\n\1Comments\n\1Tags/g){print "$&\n";}' /path/to/file
To breakdown:
Start with 1 or more tab characters (first capture group)
Then "Recent" followed by new line
Capture group, Comments and new line
Capture group, Tags
By the way, is "tab" really tab or multiple consecutive whitespaces (\s+)?

Regex replace one value between comma separated values

I'm having a bunch of comma separated CSV files.
I would like to replace exact one value which is between the third and fourth comma. I would love to do this with Notepad++ 'Find in Files' and Replace functionality which could use RegEx.
Each line in the files look like this:
03/11/2016,07:44:09,327575757,1,5434543,...
The value I would like to replace in each line is always the number 1 to another one.
It can't be a simple regex for e.g. ,1, as this could be somewhere else in the line, so it must be the one after the third and before the fourth comma...
Could anyone help me with the RegEx?
Thanks in advance!
Two more rows as example:
01/25/2016,15:22:55,276575950,1,103116561,10.111.0.111,ngd.itemversions,0.401,0.058,W10,0.052,143783065,,...
01/25/2016,15:23:07,276581704,1,126731239,10.111.0.111,ll.browse,7.133,1.589,W272,3.191,113273232,,...
You can use
^(?:[^,\n]*,){2}[^,\n]*\K,1,
Replace with any value you need.
The pattern explanation:
^ - start of a line
(?:[^,\n]*,){2} - 2 sequences of
[^,\n]* - zero or more characters other than , and \n (matched with the negated character class [^,\n]) followed with
, - a literal comma
[^,\n]* - zero or more characters other than , and \n
\K - an operator that forces the regex engine to discard the whole text matched so far with the regex pattern
,1, - what we get in the match.
Note that \n inside the negated character classes will prevent overflowing to the next lines in the document.
You can replace value between third and fourth comma using following regex.
Regex: ([^,]+,[^,]+,[^,]+),([^,]+)
Replacement to do: Replace with \1,value. I used XX for demo.
Regex101 Demo
Notepad++ Demo

remove all commas between quotes with a vim regex

I've got a CSV file with lines like:
57,13,"Bob, Bill and Susan",Student,Club,Funded,64,3200^M
I need them to look like
57,13,Bob-Bill-and-Susan,Student,Club,Funded,64,3200
I'm using vim regexes. I've broken it down into 4 steps:
Remove ^M and insert newlines:
:%s:<ctrl-V><ctrl-M>:\r:g`
Replace all with -:
:%s: :\-:g
Remove commas between quotes: Need help here.
Remove quotes:
:%s:\"\([^"]*\)\":\1:g
How do I remove commas between quotes, without removing all commas in the file?
Something like this?
:%s:\("\w\+\),\(\w\+"\):\1 \2:g
My preferred solution to this problem (removing commas inside quoted regions) is to use replacements with an expression instead of trying to get this done in one regex.
To do this you need to prepend you replacement with \= to get the replacement treated as a vim expression. From here you can extract just the parts between quotes and then manipulate the the matched part separately. This requires having two short regexes instead of one complicated one.
:%s/".\{-}"/\=substitute(submatch(0), ',', '' , 'g')/g
So ".\{-}" matches anything in quotes (non greedy) and substitute(submatch(0), ',', '' , 'g') takes what was matched and removes all of the commas and its return value is used as the actual replacement.
The relevant help page is :help sub-replace-special.
As for the other parts of your question. Step 1 is essentially trying to remove all carriage returns since the file format is actually the dos file format. You can remove them with the dos2unix program.
In Step 2 escaping the - in the replacement is unnecessary. So the command is just
:%s/ /-/g
In Step 4, you have an overly complicated regex if all you want to do is remove quotes. Since all you need to do is match quotes and remove them
:%s/"//g
:%s:\("\w*\)\(,\)\(.*"\):\1\3:g
example: "this is , an, example"
\("\w*\) match start of " every letter following qoutes group \1 for back reference
\(,\) capture comma group \2 for back reference
(.*"\) match every other character upto the second qoute ->group 3 for backreference
:\1\3: only include groups without comma, discard group 2 from returned string which is \2
:%s:\("\w*\)\(,\)\(.*"\):\1\3:g removes commas

Add to end of line that contains a specific word and starts with x

I would like to add some custom text to the end of all lines in my document opened in Notepad++ that start with 10 and contain a specific word (for example "frog").
So far, I managed to solve the first part.
Search: ^(10)$
Replace: \1;Batteries (to add ;Batteries to the end of the line)
What I need now is to edit this regex pattern to recognize only those lines that also contain a specific word.
For example:
Before: 1050;There is this frog in the lake
After: 1050;There is this frog in the lake;Batteries
You can use the regex to match your wanted lines:
(^(10).*?(frog).*)
the .*? is a lazy quantifier to get the minimum until frog
and replace by :
$1;Battery
Hope it helps,
You should allow any characters between the number and the end of line:
^10.*frog.*
And replacement will be $0;Batteries. You do not even need a $ anchor as .* matches till the end of a line since . matches any character but a line break char.
NOTE: There is no need to wrap the whole pattern with capturing parentheses, the $0 placeholder refers to the whole match value.
More details:
^ - start of a line
10 - a literal 10 text
.* - zero or more chars other than line break chars as many as possible
frog - a literal string
.* - zero or more chars other than line break chars as many as possible
try this
find with: (^(10).*(frog).*)
replace with: $1;Battery
Use ^(10.*frog.*)$ as regex. Replace it with something like $1;Batteries