Select digits on the end of line - regex

I need to replace only digits at the end of line with semicolon ; using RegEx in Notepad++.
Before:
ddd 66 ffff 5
d 44 dds 55
After:
ddd 66 ffff;
d 44 dds;
I'm trying to find digits at the end of lines with expression
($)(\d+)
but Notepad++ can't find anything by use of this expression. How to achieve this?

Find:
\s\d+$
Replace:
;
\d+ will match one or more digits. $ will match the end of the line--this is non-capturing (so don't worry... the end of the line will not be replaced in a find/replace operation). And so \d+$ will match one or more digits immediately followed by the end of the line.
I included \s (a single whitespace character) because it looks like you want to replace the space preceding the digits as well.
Note that you will need to do "Replace All" for this to work like you want. (because each regex match is for one instance only)

Try this find/replace:
find:
^(.*) \d+$
replace:
\1;
The find regex above matches anything up to and excluding a final space followed by at least one digit. If the end pattern for a given line is not space followed by one or more digits, the regex should not match. The replacement is the capture group, what is in parenthesis, which is everything up to but excluding the final space and number.

Related

regex to extract housenumber plus addition

I'm looking for a regex that matches housenumbers combined with additions for all addresses below:
Breestraat 4
Breestraat 45
Breestraat 456
Dubbele Straat 4a
Dubbele Straat 4-a
5 meistraat 1a
5meistraat 12
5meistraat 12a
Teststraat 22-III
Now the following regex works, except in the first case. This is because the single digit housenummber is missed because of the first \d in the regex (which prevents a starting digit to be captured).
\d?.(\d+.+)$
regex to extract housenumber addition
I'm scratching my head how to get the housenumer '4' for the first line. so basically how to change the "skip starting digit" to "skip starting digit but let it have to result on the capturing group".
You can use
\d+\D*$
\d+\S*$
See the regex demo #1 and regex demo #2.
The pattern matches
\d+ - one or more digits
\D* - zero or more non-digit chars
\S* - zero or more non-whitespace chars
$ - end of string.
It's not perfectly clear what you are requesting precisely..
Anyway this is the pattern matching the house number at the end of the string:
\d+[-\da-zI]*$
https://regexr.com/6l0g7
Anyway I'm aware this is not a valid answer

Regular expressions in notepad++ (Search and Replace)

I have a list of thousands of records within a .txt document.
some of them look like these records
201910031044 "00059" "11.31AG" "Senior Champion"
201910031044 "00060" "GBA146" "Junior Champion"
201910031044 "00999" "10.12G" "ProAM"
201910031044 "00362" "113.1LI" "Abcd"
Whenever a record similar to this occurs I'd like to get rid of the last words/numbers/etc in the last quotation marks (like "Senior Champion", "Junior Champion" etc. There are many possibilities here)
e.g. (before)
201910031044 "00059" "11.31AG" "Senior Champion"
after
201910031044 "00059" "11.31AG"
I tried the following regex but it wouldn't work.
Search: ^([0-9]{17,17} + "[0-9]{8,8}" + "[a-zA-Z0-9]").*$
Replace: \1 (replace string)
OK I forgot the . (dot) sign however even if I do not have a . (dot) sign it would not work. Not sure if it has anything to do when using the + sign used more than once.
I'd like to get rid of the last words/numbers/etc in the last quotation marks
This does the job:
Ctrl+H
Find what: ^.+\K\h+".*?"$
Replace with: LEAVE EMPTY
CHECK Wrap around
CHECK Regular expression
UNCHECK . matches newline*
Replace all
Explanation:
^ # beginning of line
.+ # 1 or more any character but newline
\K # forget all we have seen until this position
\h+ # 1 or more horizontal spaces
".*?" # something inside quotes
$ # end of line
Screen capture (before):
Screen capture (after):
The RegEx looks for the 4th double quote:
^(?:[^"]*\"){4}([^|]*)
You can see this demo: https://regex101.com/r/wJ9yS6/163
You will still need to parse the lines, so probably easier opening in excel or parsing using code as a CSV.
You have a problem with the count of your characters:
you specify that the line should start with exactly 17 digits ([0-9]{17,17}). However, there are only 12 digits in the data 201910031044.
you can specify exactly 12 digits by using {12} or if it could be 12-17, then {12,17}. I'll assume exactly 12 based on the current data.
similarly, for the second column you specify that it's exactly 8 digits surrounded by quotes ("[0-9]{8,8}") but it only has 5 digits surrounded by quotes.
again, you can specify exactly 5 with {5} or 5-8 with {5,8}. I will assume exactly 5.
finally, there is no quantifier for the final field, so the regex tries to match exactly one character that is a letter or a number surrounded by quotes "[a-zA-Z0-9]".
I'm not sure if there is any limit on the number of characters, so I would go with one or more using + as quantifier "[a-zA-Z0-9]+" - if you can have zero or more, then you can use *, or if it's any other count from m to n, then you can use {m,n} as before.
Not a character count problem but the final column can also have dots but the regex doesn't account for. You can just add . inside the square brackets and it will only match dot characters. It's usually used as a wildcard but it loses its special meaning inside a character class ([]), so you get "[a-zA-Z0-9.]+"
Putting it all together, you get
Search: ^([0-9]{12} + "[0-9]{5}" + "[a-zA-Z0-9.]+").*$
Replace: \1
Which will get rid of anything after the third field in Notepad++.
This can be shortened a bit by using \d instead of [0-9] for digits and \s+ for whitespace instead of +. As a benefit, \s will also match other whitespace like tabs, so you don't have to manually account for those. This leads to
Search: ^(\d{12}\s+"\d{5}"\s+"[a-zA-Z0-9.]+").*$
Replace: \1
If you want to get rid of the last words/numbers/etc in the last quotation marks you could capture in a group what is before that and match the last quotation marks and everything between it to remove it using a negated character class.
If what is between the values can be spaces or tabs, you could use [ \t]+ to match those (using \s could also match a newline)
Note that {17,17} and {8,8} may also be written as {17} and {8} which in this case should be {12} and {5}
^([0-9]{12}[ \t]+"[0-9]{5}"[ \t]+"[a-zA-Z0-9.]+")[ \t]{2,}"[^"\r\n]+"
In parts
^ Start of string
( Capture group 1
[0-9]{12}[ \t]+ Match 12 digits and 1+ spaces or tabs
"[0-9]{5}"[ \t]+ Match 5 digits between " and 1+ spaces or tabs
"[a-zA-Z0-9.]+" Match 1+ times any of the listed between "
) Close group
[ \t]{2,} Match 1+ times
"[^"\r\n]+"
In the replacement use group 1 $1
Regex demo
Before
After

How to cut last digits from number - REGEX

I have to find the first 11 digits and cut everything that follows from the eleventh digit.
I've been trying to do it with this pattern :/^(\d{11}.*?). However, doesn't work.
You know what I'm doing wrong?
Depending on your regex flavour, you could use:
Find: ^\d{11}\K.+$
Replace: NOTHING
Explanation:
^ : beginning of line
\d{11} : 11 digits
\K : forget all we have seen until this position
.+ : 1 or more any character
$ : end of line
If you want to match first characters, you need to use anchor ^ that will anchor match at the beginning of the string.
If you want to match something and then reuse it, then you need to capture it isnide capturing group and use it in sbstitution with \1.
If you want to capture eleven digits - \d{11} will work for you.
So to sum up, you need pattern ^(\d{11}).* and replace with \1. .* will match 0 or more characters (any).
After lot of trying, It actually works with this one:
^(?=(\d{11})).+?

How would I find values in a file, but only on lines that don't start with #?

I've got a document that looks something like this:
# Document ID 8934
# Last updated 2018-05-06
52 84 12 70 23 2 7 20 1 5
4 2 7 81 32 98 2 0 77 6
(..and so on..)
In other words, it starts off with a few comment lines, then the rest of the document is just a bunch of numbers separated by spaces.
I'm trying to write a regex that gets all digits on all lines that don't start with #, but I can't seem to get it.
I've read over answers such as
Regular Expressions: Is there an AND operator?
Regex: Find a character anywhere in a document but only on lines that begin with a specific word
and pawed through sites such as http://regular-expressions.info, but I still can't get an expression that works (the best I can get is a lengthy version of ^[^#].*
So how can I match digits (or text, or whatever) in a string, but only on lines that don't start with a certain character?
Your regex ^[^#].* uses a negated character class which matches not a # from the start of the string ^ and after that matches any character zero or more times.
This would for example also match t test
What you might do is use an alternation to match a whole line ^#.*$ that starts with a # or capture in a group one or more digits (\d+)
Your digits are captured group 1. You could change the (\d+) to for example a character class ([\w+.]+) to match more than only digits.
(?:^#.*$|(\d+))
Details
(?: Non capturing group
^#.*$ Match from the start of the line ^ a # followed by any character zero or more times .* until the end of the string $
| Or
(\d+) capture one or more digits in a group
) Close non capturing group
I think a way simpler method would be to replace the lines with "" first with this regex:
^#.*
And then you can just match all the numbers with this:
-?\d+ (-? is for negative)

Notepad++ replace all using regular expression

I have lines of numbers
16
18
19
21
24
25
26
30
How can I put commas at the end of each number using regular expressions. For example: 16 will turn to 16, and 18 will turn to 18, and so on
The question is not completely clear to me.
1. Only digits in a row and nothing else
Then Bohemians answer is working.
^(\d+)$
and replace with \1,.
The ^ anchors the sequence of digits to the start of the row and the $ to the end.
2. The digits can be anywhere in the row together with other stuff
Then tafoo85 answer is working:
(\d+) and replace with \1,.
But this will replace also "tafoo85" with "tafoo85," and "2fast4you" with "2,fast4,you"
To avoid this behaviour and matching only "standalone" numbers, you would have to use word boundaries but those are not available in Notepad++.
Because Notepad++ regexes are very limited you would have to workaround this issue in four steps:
^(\d+)$and replace with \1,
^(\d+)(\s) and replace with \1,\2
(\s)(\d+)(\s) and replace with \1\2,\3
(\s)(\d+)$ and replace with \1\2,
3. Change only digits at the start of the row
use only the start of the row anchor ^
^(\d+) and replace with \1,.
Find: ([0-9]+)
Replace with \1,
Find: (^[0-9]+$) (means the whole line is all digits - and capture it)
Replace: \1, (means the first captured group then a comma