I've been working on a regex issue. I have a lot of lines formatted like this:
3240985|#Apple.-+240538|34346|346356356|36433565|6agf8s89auf
The end goal should look like this:
#Apple.-+240538|6agf8s89auf
#Apple.-+240538 is random characters, and 6agf8s89auf is random alphanumeric characters.
I've been using (.*?)[\|] and replacing the parts I need with blank characters in Notepad++ but it's impossible to complete it this way with the number of lines I have.
The regex for this kind of string is (?:(?<=^)|(?<=\|))(\d+(?:$|\|))
Demo: https://regex101.com/r/sO0fZ2/2
However Find and Replace in Notepad++ may have some issues because Notepad++ finds and replace strings only once. Some other text editors like, sublime text find and replaces the contents recursively. However you can simple overcome this by clicking Replace All button multiple times.
Input
Result after clicking "Replace All in All Opened Documents" twice
In sublime text, you can achieve this in single click:
Input
Result
P.S.: I'm not aware if there's any feature in Notepad++ that finds and replaces the content recursively. You can google for that. If there's any feature like that, then you can use it. However, I think that this shouldn't be a problem because it will only require a couple of more clicks.
There is a simple approach with an alternation:
^\d+\||\|\d+(?=\||$)
Details:
^\d+\| - Branch 1 matching a chunk of 1+ digits (\d+) at the beginning of the string (^) and a | after them
| - alternation operator meaning OR
\|\d+(?=\||$) - a literal pipe (\|, must be escaped) with 1+ digits after it (\d+) that are followed with a literal pipe or end of string ((?=...) is a positive lookahead that does not advance the regex index, thus, you can still match adjacent matches with the same pattern.)
Related
I need to replace sequences (of various length) of characters by another character. I am working in Eclipse on xml files
For exemple -------- should be replaced by ********.
The replacement should be done only for sequences of at least 3 characters, not 1 or 2.
It is easy to find the matching sequences in regex for example with -{3,30} but I don't understand how to specify the replacement sequence.
I made this regex solution ready when question was posted but didn't submit an answer because I kept testing in eclipse and even though regex was working for find feature, a * in replacement wasn't changing text in Eclipse editor.
Here is a shorter and a bit more efficient regex:
(?!^)\G-|(?=-{3})-
Replace with a *
RegEx Demo
Breakdown:
(?!^)\G: Match from end of the previous match
-: Match a -
|: OR
(?=-{3}): Make sure we have 3 hyphens ahead
-: Match a -
Here is a screenshot from my Eclipse that shows selected match for this regex:
You can use
(?:\G(?!\A)|(?<!-)(?=-{3,30}(?!-)))-
See the regex pattern. Details:
(?:\G(?!\A)|(?<!-)(?=-{3,30}(?!-))) - either
\G(?!\A) - end of the preceding match
| - or
(?<!-)(?=-{3,30}(?!-)) - a position that is not immediately preceded with a - char and is immediately followed with 3 to 30 hyphens (not followed with another hyphen).
- - a hyphen.
The (?:\G(?!\A)|(?<!-)(?=-{3,30}(?!-)))- regex goes into the "Find What" filed, * goes to the "Replace With" field.
Note that regular expressions are only meant to be used in search fields, replacement fields must only contain replacement patterns. Usually, the replacement pattern is a string containing literal string(s) and/or backreferences. Here, we do not need any backreference as the regex does not capture anything.
This question already has answers here:
RegEx to select everything between two characters?
(4 answers)
Closed 3 years ago.
I want to find all characters between 2 special characters. I can't find the solution though because there are new lines that are not included. It's prolly easy, but I can't seem to find the right regex for it.
How do I solve this problem?
The source data is structured like this:
\#(.*)\;
doesn't include new lines and
(?!\#)([\S\s])(?!=\;)
doesn't work also.
It selects everything, but doesn't do the group trick...
Source looks like this:
#first line of text;
#second line of text;
#third line could easy
be on a new line;
#forth etc;
#this could (#hi,#hi,#hi) also
happen though:));
#so.... any idea;
any new line starts with # and every line ends with ;
I see two problems in your regex,
You are missing quantifier in your [\S\s] due to which it will only match one character.
Second you need a non-greedy regex so it doesn't match all the lines.
Also, where you wrote this (?!#) I guess you meant to write any one character among them, for which you should place it in a character set like this [?!#]
You need this regex, where you can capture your text from group1
#([\w\W]*?);
Regex Demo
And like you attempted, if you want your full match to only select the intended text, you can use lookaround.
Regex Demo with lookarounds so your full match is intended text only
Also, writing [^;]* (which also matches newlines) is way faster than .*? hence you should preferably use this regex,
(?<=[?!#])[^;]*(?=;)
Regex Demo with best performance
You just need to modify your first regex a little bit so that it looks like this:
#([\s\S]*?);
. will only match non new line characters. So I replaced it with [\s\S] - the set of whitespaces union the set of non-whitespaces - the set of all characters. If your regex engine has the "single line" option, you can turn that on, and . will match new lines as well.
I also made * lazy. Otherwise it will just be one whole match that matches all the way to the last ;. For more info, see this question.
You don't need to escape the ;.
You have to use either a single line flag /s or add whitespace characters \s as second alternative to all characters .. Also, your * quantifier must be lazy/non-greedy, so the whole regex stops at first ; it founds.
#((?:.|\s)*?); or #(.*?);/s
I need help with RegEx I just can't figure it out I need to search for broken Hashtags which have an space.
So the strings are for Example:
#ThisIsaHashtagWith Space
But there could also be the Words "With Space" which I don't want to replace.
So important is that the String starts with "#" then any character and then the words "With Space" which I want to replace to "WithSpace" to repair the Hashtags.
I have a Document with 10k of this broken Hashtags and I'm kind of trying the whole day without success.
I have tried on regex101.com
with following RegEx:
^#+(?:.*?)+(With Space)
Even I think it works on regex101.com it doesn't in Notepad++
Any help is appreciated.
Thanks a lot.
BR
In your current regex you match a # and then any character and in a capturing group match (With Space).
You could change the capturing group to capture the first part of the match.
(#+.*?)With Space
Then you could use that group in the replacement:
$1WithSpace
As an alternative you could first match a single # followed by zero or more times any character non greedy .*? and then use \K to reset the starting point of the reported match.
Then match With Space.
#+(?:.*?)\KWith Space
In the replacement use WithSpace
If you want to match one or more times # you could use a quantifier +. If the match should start at the beginning of string you could use an anchor ^ at the start of the regex.
Try using ^(#.+?)(With\s+Space) for your regex as it also matches multiple spaces and tab characters - if you have multiple rows that you want to affect do gmi for the flags. I just tried it with the following two strings, each on a separate line in Notepad++
#blablaWith Space
#hello###$aWith Space
The replace with value is set to $1WithSpace and I've tried both replaceAll and replace one by one - seems to result in the following.
#blablaWithSpace
#hello###$aWithSpace
Feel free to comment with other strings you want replaced. Also be sure that you have selected the Regular Extension search mode in NPP.
Try this? (#.*)( ).
I tried this in Notepad++ and you should be able to just replace all with $1. Make sure you set the find mode to regular expressions first.
const str = "#ThisIsAHashtagWith Space";
console.log(str.replace(/(#.*)( )/g, "$1"));
I have been looking at the various topics on Regex on SO, and they are all saying that to find the invert (select all that doesn't fit the criteria) you simply use the[^] syntax or negative lookahead.
I have tried using both of these methods on my Regex but the results are not adequate the [^] especially seems to take all its contents literally (even when escaped).
What I need this for:
I have a massive SQL line with a SQL dump I'm trying to remove all characters that are not the line id, and the numerical value of one column.
My regex works in matching exactly what I'm looking for; what I need to do is to invert this match so I can remove all non-matching parts in my IDE.
My regex:
/(\),\(\d{1,4},)|(,\d{10},)/
This matches a "),(<number upto 4 digits>," or ",<number of ten digits>," .
The subject
My subject is a 500Kb line of an SQL dump looking something like this (I have already removed a-z and other unwanted characters in previous simple find/replaces):
),(39,' ',1,'01761472100','#','9 ','20',1237213277,0,1237215419,''),(40,' ',3,'01445731203','#',' ','-','22 2','210410//816',1237225423,0,1484651768,''),(4270,' /
My aim is to use a regex to achive the following output:
),(39,,1237213277,,1237215419,),(40,,1237225423,,1484651768,),(4270,
Which I can then go over again and easily remove repetitions such as commas.
I have read that Negation in Regex is tricky, So, what is the syntax to get the regex I've made to work inverted? To remove all non-matching groups? What can you recommend as a way of solving this without spending hours manually reading the lines?
You may use a really helpful (*SKIP)(?!) (=(*SKIP)(*F) or (*SKIP)(*FAIL)) construct in PCRE to match these texts you know and then skip and match all other text to remove:
/(?:\),\(\d{1,4},|,\d{10},)(*SKIP)(?!)|./s
See the regex demo
Details:
(?:\),\(\d{1,4},|,\d{10},) - match 1 of the 2 alternatives:
\),\(\d{1,4}, - ),(, then 1 to 4 digits and then ,
| - or
,\d{10}, - a comma, 10 digits, a comma
(*SKIP)(?!) - omit the matched text and proceed to the next match
| - or
. - any char (since /s DOTALL modifier is passed to the regex)
The same can be done with
/(\),\(\d{1,4},|,\d{10},)?./s
and replacing with $1 backreference (since we need to put back the text captured with the patterns we need to keep), see another regex demo.
I have a text document that I need to modify. Most of the words are seperated by "-" (minus) character.
So in sublime text, I tried this pattern:
(\w+)\-(\w+)
This pattern works perfectly fine but there is one word that has "-" (minus) character naturally in the document. (Eg: foo-bar)
So I need a pattern that finds all minus seperated words but exludes "foo-bar"
Sorry if this question asked before but I couldn't find the answer I needed
You can use a negative look-ahead (with optional i switch to match words in a case-insensitive way):
(?i)(?!\bfoo\-bar\b)\b(\w+)-(\w+)\b
Mind that this will only work with non-overlapping matches.
See example:
If you want to replace a hyphen with space in cases I provided in the screenshot, you can use (?!\bfoo\-bar\b)\b(\w+)\-(?=\w) search regex and replace with $1 (result: go there now):