Find & Replace digit by digit and space in Sublime Text - regex

I have lot of digits and I want to add spaces in between them, like so:
0123456789 --> 0 1 2 3 4 5 6 7 8 9
I'm trying to do this using the search and replace function in Sublime Text. What I've tried so far is using \S to find all characters (there are only digits so it doesn't matter) and \1\s to replace them. However, this deletes the digits and replaces them with s. Does anybody know how to do this?

You can use a combination of Lookahead and Lookbehind assertions to do this. Use Ctrl + H to open the Search and Replace, enable Regular Expression, input the following and click Replace All
Find What: (?<=\d)(?=\d)
Replace With: empty space
Live Demo
Explanation:
(?<= # look behind to see if there is:
\d # digits (0-9)
) # end of look-behind
(?= # look ahead to see if there is:
\d # digits (0-9)
) # end of look-ahead

Press Ctrl + H to open the Search and Replace dialog (make sure you click the .* at the left to enable regular expression mode):
Search for: (\d)(?!$)
Replace with: $1[space]
The regular expression (\d)(?!$) matches all the digits except the one at the end of the line.
Here's how it looks like:

you could use this pattern (\d)(?=\d) and replace with $1 <- $1{space}
Demo

Related

Regex to disregard partial matches across lines / matching too much

I have three lines of tab-separated values:
SELL 2022-06-28 12:42:27 39.42 0.29 11.43180000 0.00003582
BUY 2022-06-28 12:27:22 39.30 0.10 3.93000000 0.00001233
_____2022-06-28 12:27:22 39.30 0.19 7.46700000 0.00002342
The first two have 'SELL' or 'BUY' as first value but the third one has not, hence a Tab mark where I wrote ______:
I would like to capture the following using Regex:
My expression ^(BUY|SELL).+?\r\n\t does not work as it gets me this:
I do know why outputs this - adding an lazy-maker '?' obviously won't help. I don't get lookarounds to work either, if they are the right means at all. I need something like 'Match \r\n\t only or \r\n(?:^\t) at the end of each line'.
The final goal is to make the three lines look at this at the end, so I will need to replace the match with capturing groups:
Can anyone point me to the right direction?
Ctrl+H
Find what: ^(BUY|SELL).+\R\K\t
Replace with: $1\t
CHECK Match case
CHECK Wrap around
CHECK Regular expression
UNCHECK . matches newline
Replace all
Explanation:
^ # beginning of line
(BUY|SELL) # group 1, BUY or SELL
.+ # 1 or more any character but newline
\R # any kind of linebreak
\K # forget all we have seen until this position
\t # a tabulation
Replacement:
$1 # content of group 1
\t # a tabulation
Screenshot (before):
Screenshot (after):
You can use the following regex ((BUY|SELL)[^\n]+\n)\s+ and replace with \1\2.
Regex Match Explanation:
((BUY|SELL)[^\n]+\n): Group 1
(BUY|SELL): Group 2
BUY: sequence of characters "BUY" followed by a space
|: or
SELL: sequence of characters "SELL" followed by a space
[^\n]+: any character other than newline
\n: newline character
\s+: any space characters
Regex Replace Explanation:
\1: Reference to Group 1
\2: Reference to Group 2
Check the demo here. Tested on Notepad++ in a private environment too.
Note: Make sure to check the "Regular expression" checkbox.
Regex

How to transpose pieces of data using Regular expression in Notepad++

I am very new to the world of regular expressions. I am trying to use Notepad++ using Regex for the following:
Input file is something like this and there are multiple such files:
Code:
abc
17
015
0 7
4.3
5/1
***END***
abc
6
71
8/3
9 0
***END***
abc
10.1
11
9
***END***
I need to be able to edit the text in all of these files so that all the files look like this:
Code:
abc
1,2,3,4,5
***END***
abc
6,7,8,9
***END***
abc
10,11,12
***END***
Also:
In some files the number of * around the word END varies, is there a way to generalize the number of * so I don't have to worry about it?
There is some additional data before abcs which does not need to be transposed, how do I keep that data as it is along with transposing the data between abc and ***END***.
Kindly help me. Your help is much appreciated!
Try the following find and replace, in regex mode:
Find: ^(\d+)\R(?!\*{1,}END\*{1,})
Replace: $1,
Demo
Here is an explanation of the regex pattern:
^ from the start of the line
(\d+) match AND capture a number
\R followed by a platform independent newline, which
(?!\*{1,}END\*{1,}) is NOT followed by ***END***
Note carefully the negative lookahead at the end of the pattern, which makes sure that we don't do the replacement on the final number in each section. Without this, the last number would bring the END marker onto the same line.
This will eplace only between "abc" and "***END***" with any number of asterisk.
Ctrl+H
Find what: (?:(?<=^abc)\R|\G(?!^)).+\K\R(?!\*+END\*+)
Replace with: ,
CHECK Match case
CHECK Wrap around
CHECK Regular expression
UNCHECK . matches newline*
Replace all
Explanation:
(?: # non capture group
(?<=^abc) # positive look behind, make sure we have "abc" at the beginning of line before
\R # any kind of linebreak
| # OR
\G # restart from last match position
(?!^) # negative look ahead, make sure we are not at the beginning of line
) # end group
.+ # 1 or more any character but newline
\K # forget all we have seen until this position
\R # any kind of linebreak
(?!\*+END\*+) # negative lookahead, make sure we haven't ***END*** after
Screen capture (before):
Screen capture (after):

Regular expressions in notepad++ (Search and Replace)

I have a list of thousands of records within a .txt document.
some of them look like these records
201910031044 "00059" "11.31AG" "Senior Champion"
201910031044 "00060" "GBA146" "Junior Champion"
201910031044 "00999" "10.12G" "ProAM"
201910031044 "00362" "113.1LI" "Abcd"
Whenever a record similar to this occurs I'd like to get rid of the last words/numbers/etc in the last quotation marks (like "Senior Champion", "Junior Champion" etc. There are many possibilities here)
e.g. (before)
201910031044 "00059" "11.31AG" "Senior Champion"
after
201910031044 "00059" "11.31AG"
I tried the following regex but it wouldn't work.
Search: ^([0-9]{17,17} + "[0-9]{8,8}" + "[a-zA-Z0-9]").*$
Replace: \1 (replace string)
OK I forgot the . (dot) sign however even if I do not have a . (dot) sign it would not work. Not sure if it has anything to do when using the + sign used more than once.
I'd like to get rid of the last words/numbers/etc in the last quotation marks
This does the job:
Ctrl+H
Find what: ^.+\K\h+".*?"$
Replace with: LEAVE EMPTY
CHECK Wrap around
CHECK Regular expression
UNCHECK . matches newline*
Replace all
Explanation:
^ # beginning of line
.+ # 1 or more any character but newline
\K # forget all we have seen until this position
\h+ # 1 or more horizontal spaces
".*?" # something inside quotes
$ # end of line
Screen capture (before):
Screen capture (after):
The RegEx looks for the 4th double quote:
^(?:[^"]*\"){4}([^|]*)
You can see this demo: https://regex101.com/r/wJ9yS6/163
You will still need to parse the lines, so probably easier opening in excel or parsing using code as a CSV.
You have a problem with the count of your characters:
you specify that the line should start with exactly 17 digits ([0-9]{17,17}). However, there are only 12 digits in the data 201910031044.
you can specify exactly 12 digits by using {12} or if it could be 12-17, then {12,17}. I'll assume exactly 12 based on the current data.
similarly, for the second column you specify that it's exactly 8 digits surrounded by quotes ("[0-9]{8,8}") but it only has 5 digits surrounded by quotes.
again, you can specify exactly 5 with {5} or 5-8 with {5,8}. I will assume exactly 5.
finally, there is no quantifier for the final field, so the regex tries to match exactly one character that is a letter or a number surrounded by quotes "[a-zA-Z0-9]".
I'm not sure if there is any limit on the number of characters, so I would go with one or more using + as quantifier "[a-zA-Z0-9]+" - if you can have zero or more, then you can use *, or if it's any other count from m to n, then you can use {m,n} as before.
Not a character count problem but the final column can also have dots but the regex doesn't account for. You can just add . inside the square brackets and it will only match dot characters. It's usually used as a wildcard but it loses its special meaning inside a character class ([]), so you get "[a-zA-Z0-9.]+"
Putting it all together, you get
Search: ^([0-9]{12} + "[0-9]{5}" + "[a-zA-Z0-9.]+").*$
Replace: \1
Which will get rid of anything after the third field in Notepad++.
This can be shortened a bit by using \d instead of [0-9] for digits and \s+ for whitespace instead of +. As a benefit, \s will also match other whitespace like tabs, so you don't have to manually account for those. This leads to
Search: ^(\d{12}\s+"\d{5}"\s+"[a-zA-Z0-9.]+").*$
Replace: \1
If you want to get rid of the last words/numbers/etc in the last quotation marks you could capture in a group what is before that and match the last quotation marks and everything between it to remove it using a negated character class.
If what is between the values can be spaces or tabs, you could use [ \t]+ to match those (using \s could also match a newline)
Note that {17,17} and {8,8} may also be written as {17} and {8} which in this case should be {12} and {5}
^([0-9]{12}[ \t]+"[0-9]{5}"[ \t]+"[a-zA-Z0-9.]+")[ \t]{2,}"[^"\r\n]+"
In parts
^ Start of string
( Capture group 1
[0-9]{12}[ \t]+ Match 12 digits and 1+ spaces or tabs
"[0-9]{5}"[ \t]+ Match 5 digits between " and 1+ spaces or tabs
"[a-zA-Z0-9.]+" Match 1+ times any of the listed between "
) Close group
[ \t]{2,} Match 1+ times
"[^"\r\n]+"
In the replacement use group 1 $1
Regex demo
Before
After

Regex for find and replace in N++ of number followed by anything

I have a text file with the following lines:
asm-java-2.0.0-lib
cib-slides-3.1.0
lib-hibernate-common-4.0.0-beta
astp
act4lib-4.0.0
I want to remove everything from, including the '-' before the numbers begin so the results look like:
2.0.0-lib
3.1.0
4.0.0-beta
act4lib
Does anyone know the correct regex for this? So far I have come up with -\D.*(a-z)* but its got too many errors.
Ctrl+H
Find what: ^.*?(?=\d|$)
Replace with: LEAVE EMPTY
check Wrap around
check Regular expression
UNCHECK . matches newline
Replace all
Explanation:
^ # beginning of line
.*? # 0 or more any character but newline, not greedy
(?= # start lookahead, zero-length assertion that makes sure we have after
\d # a digit
| # OR
$ # end of line
) # end lookahead
Result for given example:
2.0.0-lib
3.1.0
4.0.0-beta
Another solution that deals with act4lib-4.0.0:
Ctrl+H
Find what: ^(?:.*-(?=\d)|\D+)
Replace with: LEAVE EMPTY
check Wrap around
check Regular expression
UNCHECK . matches newline
Replace all
Explanation:
^ # beginning of line
(?: # start non capture group
.* # 0 or more any character but newline
- # a dash
(?=\d) # lookahead, zero-length assertion that makes sure we have a digit after
| # OR
\D+ # 1 or more non digit
) # end group
Replacement:
\t # a tabulation, you may replace with what you want
Given:
asm-java-2.0.0-lib
cib-slides-3.1.0
lib-hibernate-common-4.0.0-beta
astp
act4lib-4.0.0
Result for given example:
2.0.0-lib
3.1.0
4.0.0-beta
4.0.0
Use
^\D+\-
If you want to completely remove lines without numbers then use this
^\D+(\-|$)
In case the packages contain numbers in their names like act4lib-4.0.0 then a longer variant is needed
^[\w-]+(\-(?=\d+\.\d+)|$)
It can be shortened to ^.+?(\-(?=\d+\.)|$) but I just want to be sure so I also check the minor version number
The ^ will match from the start of line

Notepad++ remove all non regex'd text

I have a large list of urls that has a unique numeric string in each, the string falls between a / and a ? I would like to remove all other text from notepad++ that are not these strings. for example
www.website.com/dsw/fv3n24nv1e4121v/123456789012?fwe=32432fdwe23f3 would end up as only 123456789012
I have figured out that the following regex \b\d{12}\b will get me the 12 digits, now I just need to remove all of the information that falls each side. I have had a look and found some posts that suggest replace with \t$1 , $1\n
, $1 , and /1 however all of these do the exact oposite of what I want and just remove the 12 digit string.
You can use this regex and replace it with empty string,
^[^ ]*\/|\?[^ ]*$
Demo
Explanation:
^[^ ]*\/ --> Matches anything expect space from start of string till it finds a /
\?[^ ]*$ --> Similarly, this matches anything except space starting from ? till end of input.
Ctrl+H
Find what: ^.*/([^?]+).*$
Replace with: $1
check Wrap around
check Regular expression
UNCHECK . matches newline
Replace all
Explanation:
^ # beginning of line
.* # 0 or more any character but newline
/ # a slash
([^?\r\n]+) # group 1, 1 or more any character that is not ? or line break
.* # 0 or more any character but newline
$ # end of line
Result for given example:
123456789012