regex to replace numbers separated by either space or a tab - regex

I have rows of data that look like:
12 1234 6
33 154 10
1734 2345 7
I am trying to create a regex in VS Code to use as a search and replace where I can use $1 $2 $3 in the replace to represent the different numbers in the line
So that I can replace it with something like
(12) [1234] {6}
(33) [154] {7}
I am not sure how to match it so it captures all 3 numbers in one regex split out into the separate numbers
(\d+) is matching each number individually, but how do I get it to match all 3 numbers in $1 $2 $3 ?

In Visual Studio Code, you can use
Find what: ^(\d+)\s+(\d+)\s+(\d+)$
Replace with: ($1) [$2] {$3}
Or, if you need to keep the same whitespace amount between the numebrs:
Find what: ^(\d+)(\s+)(\d+)(\s+)(\d+)$
Replace with: ($1)$2[$3]$4{$5}
NOTE: the \s shorthand character class usually matches line breaks, but in Visual Studio Code, when the pattern has no \r nor \n, \s does not match line breaks, so it is safe to use it, as it won't match across lines.
If you strictly need to only match lines with digit sequences separated with space/tabs, then replace \s+ with [ \t]+.
See the demo:

Related

Remove all but the first four characters on each line

So I have a text file in Vscode that contains several lines of text like so:
1801: Joseph Marie Jacquard, a French merchant and inventor invent a loom that uses punched wooden cards to automatically weave fabric designs. Early computers would use similar punch cards.
So now I'm trying to isolate the year number/the first 4 characters of each line. I'm new to regex, and I know how to get the first 4 characters (I used ^.{4}) but how would I be able to find all EXCEPT for the first 4 characters so that I can replace them with nothing and be left with just the year numbers?
Find: (?<=^\d{4}).*
Replace: with nothing
regex101 Demo
(?<=^\d{4}) if a line starts ^ with 4 digits , (?<=...) is a positive lookbehind
.* match everything else up to line terminators, so the : will be included in the match
Since you never matched the 4 digits, a lookbehind/lookahead isn't part of any match necessarily, that you want to keep, you don't have to worry about any capture groups or replacements.
You can
Find:       ^(.{4}).+
Replace: $1
See the regex demo. Details:
^ - start of a line (in Visual Studio Code, ^ matches any line start)
(.{4}) - capturing group #1 that captures any four chars other than line break chars
.+ - one or more chars other than line break chars, as many as possible.
The $1 backreference in the replacement pattern replaces the match with Group 1 value.

How to transpose pieces of data using Regular expression in Notepad++

I am very new to the world of regular expressions. I am trying to use Notepad++ using Regex for the following:
Input file is something like this and there are multiple such files:
Code:
abc
17
015
0 7
4.3
5/1
***END***
abc
6
71
8/3
9 0
***END***
abc
10.1
11
9
***END***
I need to be able to edit the text in all of these files so that all the files look like this:
Code:
abc
1,2,3,4,5
***END***
abc
6,7,8,9
***END***
abc
10,11,12
***END***
Also:
In some files the number of * around the word END varies, is there a way to generalize the number of * so I don't have to worry about it?
There is some additional data before abcs which does not need to be transposed, how do I keep that data as it is along with transposing the data between abc and ***END***.
Kindly help me. Your help is much appreciated!
Try the following find and replace, in regex mode:
Find: ^(\d+)\R(?!\*{1,}END\*{1,})
Replace: $1,
Demo
Here is an explanation of the regex pattern:
^ from the start of the line
(\d+) match AND capture a number
\R followed by a platform independent newline, which
(?!\*{1,}END\*{1,}) is NOT followed by ***END***
Note carefully the negative lookahead at the end of the pattern, which makes sure that we don't do the replacement on the final number in each section. Without this, the last number would bring the END marker onto the same line.
This will eplace only between "abc" and "***END***" with any number of asterisk.
Ctrl+H
Find what: (?:(?<=^abc)\R|\G(?!^)).+\K\R(?!\*+END\*+)
Replace with: ,
CHECK Match case
CHECK Wrap around
CHECK Regular expression
UNCHECK . matches newline*
Replace all
Explanation:
(?: # non capture group
(?<=^abc) # positive look behind, make sure we have "abc" at the beginning of line before
\R # any kind of linebreak
| # OR
\G # restart from last match position
(?!^) # negative look ahead, make sure we are not at the beginning of line
) # end group
.+ # 1 or more any character but newline
\K # forget all we have seen until this position
\R # any kind of linebreak
(?!\*+END\*+) # negative lookahead, make sure we haven't ***END*** after
Screen capture (before):
Screen capture (after):

Regular expressions in notepad++ (Search and Replace)

I have a list of thousands of records within a .txt document.
some of them look like these records
201910031044 "00059" "11.31AG" "Senior Champion"
201910031044 "00060" "GBA146" "Junior Champion"
201910031044 "00999" "10.12G" "ProAM"
201910031044 "00362" "113.1LI" "Abcd"
Whenever a record similar to this occurs I'd like to get rid of the last words/numbers/etc in the last quotation marks (like "Senior Champion", "Junior Champion" etc. There are many possibilities here)
e.g. (before)
201910031044 "00059" "11.31AG" "Senior Champion"
after
201910031044 "00059" "11.31AG"
I tried the following regex but it wouldn't work.
Search: ^([0-9]{17,17} + "[0-9]{8,8}" + "[a-zA-Z0-9]").*$
Replace: \1 (replace string)
OK I forgot the . (dot) sign however even if I do not have a . (dot) sign it would not work. Not sure if it has anything to do when using the + sign used more than once.
I'd like to get rid of the last words/numbers/etc in the last quotation marks
This does the job:
Ctrl+H
Find what: ^.+\K\h+".*?"$
Replace with: LEAVE EMPTY
CHECK Wrap around
CHECK Regular expression
UNCHECK . matches newline*
Replace all
Explanation:
^ # beginning of line
.+ # 1 or more any character but newline
\K # forget all we have seen until this position
\h+ # 1 or more horizontal spaces
".*?" # something inside quotes
$ # end of line
Screen capture (before):
Screen capture (after):
The RegEx looks for the 4th double quote:
^(?:[^"]*\"){4}([^|]*)
You can see this demo: https://regex101.com/r/wJ9yS6/163
You will still need to parse the lines, so probably easier opening in excel or parsing using code as a CSV.
You have a problem with the count of your characters:
you specify that the line should start with exactly 17 digits ([0-9]{17,17}). However, there are only 12 digits in the data 201910031044.
you can specify exactly 12 digits by using {12} or if it could be 12-17, then {12,17}. I'll assume exactly 12 based on the current data.
similarly, for the second column you specify that it's exactly 8 digits surrounded by quotes ("[0-9]{8,8}") but it only has 5 digits surrounded by quotes.
again, you can specify exactly 5 with {5} or 5-8 with {5,8}. I will assume exactly 5.
finally, there is no quantifier for the final field, so the regex tries to match exactly one character that is a letter or a number surrounded by quotes "[a-zA-Z0-9]".
I'm not sure if there is any limit on the number of characters, so I would go with one or more using + as quantifier "[a-zA-Z0-9]+" - if you can have zero or more, then you can use *, or if it's any other count from m to n, then you can use {m,n} as before.
Not a character count problem but the final column can also have dots but the regex doesn't account for. You can just add . inside the square brackets and it will only match dot characters. It's usually used as a wildcard but it loses its special meaning inside a character class ([]), so you get "[a-zA-Z0-9.]+"
Putting it all together, you get
Search: ^([0-9]{12} + "[0-9]{5}" + "[a-zA-Z0-9.]+").*$
Replace: \1
Which will get rid of anything after the third field in Notepad++.
This can be shortened a bit by using \d instead of [0-9] for digits and \s+ for whitespace instead of +. As a benefit, \s will also match other whitespace like tabs, so you don't have to manually account for those. This leads to
Search: ^(\d{12}\s+"\d{5}"\s+"[a-zA-Z0-9.]+").*$
Replace: \1
If you want to get rid of the last words/numbers/etc in the last quotation marks you could capture in a group what is before that and match the last quotation marks and everything between it to remove it using a negated character class.
If what is between the values can be spaces or tabs, you could use [ \t]+ to match those (using \s could also match a newline)
Note that {17,17} and {8,8} may also be written as {17} and {8} which in this case should be {12} and {5}
^([0-9]{12}[ \t]+"[0-9]{5}"[ \t]+"[a-zA-Z0-9.]+")[ \t]{2,}"[^"\r\n]+"
In parts
^ Start of string
( Capture group 1
[0-9]{12}[ \t]+ Match 12 digits and 1+ spaces or tabs
"[0-9]{5}"[ \t]+ Match 5 digits between " and 1+ spaces or tabs
"[a-zA-Z0-9.]+" Match 1+ times any of the listed between "
) Close group
[ \t]{2,} Match 1+ times
"[^"\r\n]+"
In the replacement use group 1 $1
Regex demo
Before
After

Select digits on the end of line

I need to replace only digits at the end of line with semicolon ; using RegEx in Notepad++.
Before:
ddd 66 ffff 5
d 44 dds 55
After:
ddd 66 ffff;
d 44 dds;
I'm trying to find digits at the end of lines with expression
($)(\d+)
but Notepad++ can't find anything by use of this expression. How to achieve this?
Find:
\s\d+$
Replace:
;
\d+ will match one or more digits. $ will match the end of the line--this is non-capturing (so don't worry... the end of the line will not be replaced in a find/replace operation). And so \d+$ will match one or more digits immediately followed by the end of the line.
I included \s (a single whitespace character) because it looks like you want to replace the space preceding the digits as well.
Note that you will need to do "Replace All" for this to work like you want. (because each regex match is for one instance only)
Try this find/replace:
find:
^(.*) \d+$
replace:
\1;
The find regex above matches anything up to and excluding a final space followed by at least one digit. If the end pattern for a given line is not space followed by one or more digits, the regex should not match. The replacement is the capture group, what is in parenthesis, which is everything up to but excluding the final space and number.

Notepad++ replace all using regular expression

I have lines of numbers
16
18
19
21
24
25
26
30
How can I put commas at the end of each number using regular expressions. For example: 16 will turn to 16, and 18 will turn to 18, and so on
The question is not completely clear to me.
1. Only digits in a row and nothing else
Then Bohemians answer is working.
^(\d+)$
and replace with \1,.
The ^ anchors the sequence of digits to the start of the row and the $ to the end.
2. The digits can be anywhere in the row together with other stuff
Then tafoo85 answer is working:
(\d+) and replace with \1,.
But this will replace also "tafoo85" with "tafoo85," and "2fast4you" with "2,fast4,you"
To avoid this behaviour and matching only "standalone" numbers, you would have to use word boundaries but those are not available in Notepad++.
Because Notepad++ regexes are very limited you would have to workaround this issue in four steps:
^(\d+)$and replace with \1,
^(\d+)(\s) and replace with \1,\2
(\s)(\d+)(\s) and replace with \1\2,\3
(\s)(\d+)$ and replace with \1\2,
3. Change only digits at the start of the row
use only the start of the row anchor ^
^(\d+) and replace with \1,.
Find: ([0-9]+)
Replace with \1,
Find: (^[0-9]+$) (means the whole line is all digits - and capture it)
Replace: \1, (means the first captured group then a comma