I'm trying to match multiple strings in a single line using regex in Sublime Text 3.
I want to match all values and replace them with null.
Part of the string that I'm matching against:
"userName":"MyName","hiScore":50,"stuntPoints":192,"coins":200,"specialUser":false
List of strings that it should match:
"MyName"
50
192
200
false
Result after replacing:
"userName":null,"hiScore":null,"stuntPoints":null,"coins":null,"specialUser":null
Is there a way to do this without using sed or any other substitution method, but just by matching the wanted pattern in regex?
You can use this find pattern:
:(.*?)(,|$)
And this replace pattern:
:null\2
The first group will match any symbol (dot) zero or more times (asterisk) with this last quantifier lazy (question mark), this last part means that it will match as little as possible. The second group will match either a comma or the end of the string. In the replace pattern, I substitute the first group with null (as desired) and I leave the symbol matched by the second group unchanged.
Here is an alternative on amaurs answer where it doesn't put the comma in after the last substitution:
:\K(.*?)(?=,|$)
And this replacement pattern:
null
This works like amaurs but starts matching after the colon is found (using the \K to reset the match starting point) and matches until a comma of new line (using a positive look ahead).
I have tested and this works in Sublime Text 2 (so should work in Sublime Text 3)
Another slightly better alternative to this is:
(?<=:).+?(?=,|$)
which uses a positive lookbehind instead of resetting the regex starting point
Another good alternative (so far the most efficient here):
:\K[^,]*
This may help.
Find: (?<=:)[^,]*
Replace: null
Related
I'm trying to combine two working regex patterns into one. Please let me know the correct syntax and if this can be better written.
Pattern 1: (?P<date>.*)\s+(?P<timezone>.*)\|.*\|.*\|(?P<ip>[\w*.:-]+)\|.*\|
Pattern 2: (?P<path>[^\/]+(?=\-[^\/-]*$))
Sample line:
06/Mar/2020:00:01:04 -0500|/TESTSTREAM|5766764|4.2.2.1|123290|path1/path2/x-fr-US.OPEN.1-Turtle-2020.30.04-64.mp3
The first expression matches the start of the string, the second matches the end, you can combine them by putting a non-greedy .*? between them, like this:
(?P<date>.*)\s+(?P<timezone>.*)\|.*\|.*\|(?P<ip>[\w*.:-]+)\|.*\|.*?(?P<path>[^\/]+(?=\-[^\/-]*$))
As you can see here this expression works, but it takes 1660 steps to match the string. This is because .* between | first capture the whole string up to the end, and then try to step back character by character in order to find the match.
If you use the non-greedy modifiers here: .*?, then the regex machine will initially match an empty string and then will need to move forward character by character until it finds the matching |. It will reduce the number of steps to 1183: demo
However, if you want to remove this backtracking (forward-tracking) at all, you can just very quickly skip as many non-| characters as possible with [^|]*. Similarly we can replace other .* patterns in the regex. The resulting regex finds a match in just 47 steps, more than 30-times less than the original regex:
(?P<date>\S*)\s+(?P<timezone>[^|]*)\|[^|]*\|[^|]*\|(?P<ip>[\w*.:-]+)\|[^|]*\|(?:[^\/\n]*\/)*(?P<path>.*)-.*
Demo here.
Update 2020-03-09
If you want to keep the last slash you can use this regex:
(?P<date>\S*)\s+(?P<timezone>[^|]*)\|[^|]*\|[^|]*\|(?P<ip>[\w*.:-]+)\|[^|]*\|.*?(?P<path>\/[^\/]*)-[^\/]*
I need help with RegEx I just can't figure it out I need to search for broken Hashtags which have an space.
So the strings are for Example:
#ThisIsaHashtagWith Space
But there could also be the Words "With Space" which I don't want to replace.
So important is that the String starts with "#" then any character and then the words "With Space" which I want to replace to "WithSpace" to repair the Hashtags.
I have a Document with 10k of this broken Hashtags and I'm kind of trying the whole day without success.
I have tried on regex101.com
with following RegEx:
^#+(?:.*?)+(With Space)
Even I think it works on regex101.com it doesn't in Notepad++
Any help is appreciated.
Thanks a lot.
BR
In your current regex you match a # and then any character and in a capturing group match (With Space).
You could change the capturing group to capture the first part of the match.
(#+.*?)With Space
Then you could use that group in the replacement:
$1WithSpace
As an alternative you could first match a single # followed by zero or more times any character non greedy .*? and then use \K to reset the starting point of the reported match.
Then match With Space.
#+(?:.*?)\KWith Space
In the replacement use WithSpace
If you want to match one or more times # you could use a quantifier +. If the match should start at the beginning of string you could use an anchor ^ at the start of the regex.
Try using ^(#.+?)(With\s+Space) for your regex as it also matches multiple spaces and tab characters - if you have multiple rows that you want to affect do gmi for the flags. I just tried it with the following two strings, each on a separate line in Notepad++
#blablaWith Space
#hello###$aWith Space
The replace with value is set to $1WithSpace and I've tried both replaceAll and replace one by one - seems to result in the following.
#blablaWithSpace
#hello###$aWithSpace
Feel free to comment with other strings you want replaced. Also be sure that you have selected the Regular Extension search mode in NPP.
Try this? (#.*)( ).
I tried this in Notepad++ and you should be able to just replace all with $1. Make sure you set the find mode to regular expressions first.
const str = "#ThisIsAHashtagWith Space";
console.log(str.replace(/(#.*)( )/g, "$1"));
We have tab spaced list of "key=value" pairs.
How we can split it, using regexp?
Case key=value must be transformed into value. Case key=value=value2 must be transformed into value=value2.
https://regex101.com/r/dR5dT0/1 - I've started solution like this, but can't find beautiful way to remove only "key=" part from text.
UPD BTW, do you know cool crash courses on regular expressions?
You can just use
=(\S*)
See regex demo
Since the list is already formatted, the = in the pattern will always be the name/value delimiter.
The \S matches any non-whitespace character.
The * is a quantifier meaning that the \S should occur zero or more times (\S* matches zero or more non-whitespace characters).
You can use this regex for matching:
/\w+=(\S+)/
and grab captured group #1
RegEx Demo
I have a string like aaa**b***c****ddd, and I want to get a sequence of matched text of pattern [^*]\*+[^*], which should I thank be [a**b, b***c, c***d]. However, when I test this in text editor like vim or emacs, the second (b***c) is not matched.
aaa**b***c***ddd
|--| |---|
first third
|---|
second, which I think should be matched but not
How should I modify the regular expression to match the second?
Yes you can, the trick consists to put all in a capturing group inside a lookahead to allow overlapping results:
(?=([^*]\*+[^*]))
But you can't use this do to replacements since this pattern matches nothing. (or perhaps if you can get the capture group length and the current offset)
EDIT:
it seems to be possible to obtain the capture group length with vim with strlen(submatch(1))
#CommuSoft is correct. One way to approach this problem would be to match the whole string against this regex and then the second time around, you match this regex against the substring that starts at (index_of_first_previous_match + 1) until the end of the string. Hope that is clear.
So if the index of your first match above (a**b) was 2. Then the new substring that you match against the regex the second time should start from index 3 till the end of the string. This will give you the two results.
However, Casimir's answer is much simpler.
I have a file with a some comma separated names and some comma separated account numbers.
Names will always be something like Dow, John and numbers like 012394,19862.
Using Notepad++'s "Regex Find" feature, I'd like to replace commas between numbers with pipes |.
Basically :
turn: Dow,John into: Dow,John
12345,09876 12345|09876
13568,08642 13568|08642
I've been using [0-9], to find the commas, but I can't get it to properly leave the number's last digit and replace just the comma.
Any ideas?
Search for ([0-9]), and replace it with \1|. Does that work?
use this regex
(\d),(\d)
and replace it with
$1|$2
OR
\1|\2
(?<=\d), should work. Oddly enough, this only works if I use replace all, but not if I use replace single. As an alternative, you can use (\d), and replace with $1|
General thoughts about replacing only part of a match
In order to replace a part of a match, you need to either 1) use capturing groups in the regex pattern and backreferences to the kept group values in the replacement pattern, or 2) lookarounds, or 3) a \K operator to discard left-hand context.
So, if you have a string like a = 10, and you want to replace the number after a = with, say, 500, you can
find (a =)\d+ and replace with \1500 / ${1}500 (if you use $n backreference syntax and it is followed with a digit, you should wrap it with braces)
find (?<=a =)\d+ and replace with 500 (since (?<=...) is a non-consuming positive lookbehind pattern and the text it matches is not added to the match value, and hence is not replaced)
find a =\K\d+ and replace with 500 (where \K makes the regex engine "forget" the text is has matched up to the \K position, making it similar to the lookbehind solution, but allowing any quantifiers, e.g. a\h*=\K\d+ will match a = even if there are any zero or more horizontal whitespaces between a and =).
Current problem solution
In order to replace any comma in between two digits, you should use lookarounds:
Find What: (?<=\d),(?=\d)
Replace With: |
Details:
(?<=\d) - a positive lookbehind that requires a digit immediately to the left of the current location
, - a comma
(?=\d) - a positive lookahead that requires a digit immediately to the right of the current location.
See the demo screenshot with settings:
See the regex demo.
Variations:
Find What: (\d),(?=\d)
Replace With: \1|
Find What: \d\K,(?=\d)
Replace With: |
Note: if there are comma-separated single digits, e.g. 1,2,3,4 you can't use (\d),(\d) since this will only match odd occurrences (see what I mean).