Regex on second occurrence only - regex
Is it possible to perform my Regex on the second occurrence of a specific symbol only?
Regex being used:
#.*
Example data:
Stack#overflow:Stack#overflow
Desired output:
Stack#overflow:Stack
As you can see in the output, everything including and after the second occurrence of an # has been removed, but the text before it stays.
I'm using Notepad++ or any text editor which allows Regex's to be used.
The pattern #.* will match the first occurrence of # and will then match any char except a newline until the end of the string and the dot will also match all following # signs.
If if : should also be a boundary, you could use a negated character class to match any char except # or :
[^#:]+#[^#:]+:[^#:]+
[^#:]+ Match any char except # or :
# Match literally
[^#:]+ Match any char except # or :
: Match literally
[^#:]+ Match any char except # or :
Regex demo
You can try
^[^#]*#[^#]*
Regex Demo
Ctrl+H
Find what: ^.+\K#.+$
Replace with: LEAVE EMPTY
check Wrap around
check Regular expression
UNCHECK . matches newline
Replace all
Explanation:
^ # beginning of line
.+ # 1 or more any character but newline
\K # forget all we have seen until this position
# # literally #
.+ # 1 or more any character but newline
$ # end of line
Given:
Stack#overflow:Stack#overflow
Result for given example:
Stack#overflow:Stack
Screen capture (before):
Screen capture (after):
Try this approach:
regex = /(.*?#.*?)#.*/;
or just
(.*?#.*?)#.*
link to solution
Related
Add Find Special Character at beginning and ADD to END of string with regex
Have a string that starts with a # symbol and would like to add the same # symbol. The string could contain any type of lower/upper, numbers, comas, periods, etc. Each string is a single separate line. Here is what I have tried with no success: Find: (?=#)([\S\s]+) # www.admitme.app Find: (?=#\B)([\S\s]+) # Carlo Baxter Find: (?=#\B)([A-Za-z0-9]+) # resumes in 15 minutes Replace: $1 # # resumes in 15 minutes # Yes I'm a noob with regex... Thanks in advance Hank K
The following pattern is working in your demo: (?=#\B)(.*) This works in multiline mode, because then the .* will not match across newlines. You were using [\s\S]*, which will match across newlines, even in multiline mode. Here is the updated demo.
You can do the same replacement without lookarounds or capture groups using one of these patterns. The point is to match any character without newlines using .* (And not have a flag set to have the dot matches newlines) #\B.* # .* In the replacement use the matched text followed by a space and # $0 # See a regex demo.
How would you replace only a single character in the middle of text with duplicates?
How would you use the regex in Notepad++ to format replacing a single character that it finds in every line excepts for the duplicate ones in the certain line further? test1:_|TEST:-TEST.| test2:_|TEST:-TEST.| test3:_|TEST:-TEST.| As shown in the test code, there are two colons; I'm trying to replace the first colon with each line to a ; and NOT the second one found; the result of me doing the regex should equal to this: test1;_|TEST:-TEST.| test2;_|TEST:-TEST.| test3;_|TEST:-TEST.|
Ctrl+H Find what: ^.+?\K: Replace with: ; CHECK Wrap around CHECK Regular expression UNCHECK . matches newline Replace all Explanation: ^ # beginning of line .+? # 1 or more any character but newline, not greedy \K # forget all we have seen until this position : # colon Screen capture (before): Screen capture (after):
I'm guessing that maybe this expression, (\w+)\s*(?::)(\s*_\s*\|\s*\w+\s*:\s*-\w+\.\|) with a replacement of $1;$2 might work. DEMO 1 Or with less boundaries, this expression: ([^:]+):(.*) with the same replace. DEMO 2
It's done like this Find (?m)^[^:\r\n]*\K: Replace ; https://regex101.com/r/rT1vG9/1
Regex notepad++ and groups
I have the following data in my file: 234xt_ yad42_ 23ft3_ 45gdw_ ... Where the _ means a space. Using Notepad++ I want to rewrite it to be: '234xt', 'yad42', '23ft3', '45gdw' I am using the following regex in the "Find what" (^\w+)\s*\n And in the "Replace with" field $0, But it is not working as expected.
You may use ^(\w+) $ or ^(\w+)\h$ And replace with '$1',. ^ will match the start of a line, (\w+) will place one or more letters, digits or underscores into Group 1 (that you may access via $1 or \1 backreference in the replacement pattern), and then a space or \h will match a space or any horizontal whitespace, and then $ will assert the position at the end of the line. If the (white)spaces can go missing add the appropriate quantifier after the space or \h: \h* will match 0 or more whitespaces and \h? will match 1 or 0. Settings & demo:
You should use \1 instead of $0 see the example in the docs.
How to correctly build RegEx for multiline values in reg file
I would like to get values from a .reg file (REG EXPORT file) so I can compare them to another .reg file. I'm having problems to create the RegEx for this. facts which make it harder for me: I don't know what kind of registry key types are being used in the file (that's why I want to build a regex for all the different types like string, dword, qword, multistring,...) I don't know if the last character in the file is a newline or not I would like to only return the actual value, e.g. fa,ad,df,fa,ad,df,fa,ad if the regkey is "qword"=hex(b):fa,ad,df,fa,ad,df,fa,ad $Text = #' [HKEY_LOCAL_MACHINE\SOFTWARE\Test] "String"="asfasdfasasfasdfasasfasdfasasfas" "Binary"=hex:d3,45,34,53,45,34,53,45,34,53,45,34,53,45,34,53,45,34,5b,09,89,08,\ 34,09,8a,ef,02,30,40,9a,ad,fa,d0 "DWORD"=dword:fefefefe "multistring"=hex(7):61,00,62,00,6c,00,61,00,73,00,66,00,62,00,00,00,62,00,61,\ 00,6c,00,73,00,66,00,62,00,61,00,73,00,64,00,66,00,00,00,62,00,61,00,6c,00,\ 73,00,64,00,66,00,61,00,64,00,6c,00,66,00,00,00,61,00,73,00,64,00,66,00,61,\ 00,73,00,64,00,66,00,00,00,61,00,73,00,64,00,66,00,00,00,61,00,73,00,64,00,\ 00,00,66,00,61,00,73,00,64,00,00,00,66,00,61,00,73,00,64,00,66,00,61,00,73,\ 00,66,00,61,00,73,00,64,00,66,00,00,00,61,00,73,00,64,00,66,00,61,00,73,00,\ 64,00,66,00,61,00,73,00,64,00,00,00,61,00,73,00,64,00,66,00,61,00,73,00,64,\ 00,66,00,00,00,00,00 "qword"=hex(b):fa,ad,df,fa,ad,df,fa,ad '# # this one works $key = "multistring" $regex = ('(?ms)\"{0}\"=hex\(7\):(.+)\n' -f [RegEx]::Escape($key)) [regex]::Matches($Text, $regex) | foreach { $_.Groups[1].Value } # this one does not work because there is no newline after the last line... $key2 = "qword" $regex2 = ('(?ms)\"{0}\"=hex\(b\):(.+)\n' -f [RegEx]::Escape($key2)) [regex]::Matches($Text, $regex2) | foreach { $_.Groups[1].Value }
In your regex you use (?s) which is a modifier that will make the dot match any character including new lines. So .+ will match until the end of all lines. You could use a capturing group to capture the part after the colon. First match the part uptil a colon using \"{0}\"=hex\(7\): Then match what follows until the end of the line and use a negative lookahead to check if what follows is not a line that starts with a word between double quotes followed by an equals sign like "qword"=. As long as that is the case, match the whole string. Your code could look like: $regex = \"{0}\"=hex\(7\):(.*(?:(?!\n"[^\n"]+"=)\n.*)*) Explanation of the second part: ( Capturing group which will hold your value .* Match any character except a newline 0+ times (?: Non capturing group (?! Negative lookahead to assert what follows is not \n"[^\n"]+"= Match \n", negated character class to match not any of \n or " )\n.* Close negative lookahead and match \n followed by any character except a newline 0+ times )* Close non capturing group and repeat 0+ times ) Close capturing group Example Pattern \"multistring\"=hex\(7\):(.*(?:(?!\n"[^\n"]+"=)\n.*)*) Regex demo
.+ is a greedy expression, and the modifier (?s) makes the . match all characters (including newlines), so (.+)\n will match everything up to the last newline. Try something like this: $regex = '"{0}"=hex\(b\):(.+(?:\n .+)*)' You need neither (?m) nor (?s) here, because you don't want . to include newlines, and you don't want to match beginnings or ends of lines inside the multiline string. .+(?:\n .+)* matches the rest of the line after the prefix hex(b): and all subsequent lines beginning with two consecutive spaces. The (?:...) is just a non-capturing group, since there's no need to capture each line in a separate group.
regex for first instance of a specific character that DOESN'T come immediately after another specific character
I have a function, translate(), takes multiple parameters. The first param is the only required and is a string, that I always wrap in single quotes, like this: translate('hello world'); The other params are optional, but could be included like this: translate('hello world', true, 1, 'foobar', 'etc'); And the string itself could contain escaped single quotes, like this: translate('hello\'s world'); To the point, I now want to search through all code files for all instances of this function call, and extract just the string. To do so I've come up with the following grep, which returns everything between translate(' and either ') or ',. Almost perfect: grep -RoPh "(?<=translate\(').*?(?='\)|'\,)" . The problem with this though, is that if the call is something like this: translate('hello \'world\', you\'re great!'); My grep would only return this: hello \'world\ So I'm looking to modify this so that the part that currently looks for ') or ', instead looks for the first occurrence of ' that hasn't been escaped, i.e. doesn't immediately follow a \ Hopefully I'm making sense. Any suggestions please?
You can use this grep with PCRE regex: grep -RoPh "\btranslate\(\s*\K'(?:[^'\\\\]*)(?:\\\\.[^'\\\\]*)*'" . Here is a regex demo RegEx Breakup: \b # word boundary translate # match literal translate \( # match a ( \s* # match 0 or more whitespace \K # reset the matched information ' # match starting single quote (?: # start non-capturing group [^'\\\\]* # match 0 or more chars that are not a backslash or single quote ) # end non-capturing group (?: # start non-capturing group \\\\. # match a backslash followed by char that is "escaped" [^'\\\\]* # match 0 or more chars that are not a backslash or single quote )* # end non-capturing group ' # match ending single quote Here is a version without \K using look-arounds: grep -oPhR "(?<=\btranslate\(')(?:[^'\\\\]*)(?:\\\\.[^'\\\\]*)*(?=')" . RegEx Demo 2
I think the problem is the .*? part: the ? makes it a non-greedy pattern, meaning it'll take the shortest string that matches the pattern. In effect, you're saying, "give me the shortest string that's followed by quote+close-paren or quote+comma". In your example, "world\" is followed by a single quote and a comma, so it matches your pattern. In these cases, I like to use something like the following reasoning: A string is a quote, zero or more characters, and a quote: '.*' A character is anything that isn't a quote (because a quote terminates the string): '[^']*' Except that you can put a quote in a string by escaping it with a backslash, so a character is either "backslash followed by a quote" or, failing that, "not a quote": '(\\'|[^'])*' Put it all together and you get grep -RoPh "(?<=translate\(')(\\'|[^'])*(?='\)|'\,)" .