Regex: match a string at start or after some special characters - regex

I'm using Java Pattern class to find a string "keyword" which is at the beginning of the string or after a character that is in a list of characters. For example, the list of characters is ' ' and '<', then:
match:
"keyword..."
"...<keyword..."
"... keyword..."
not match:
"...akeyword..."
I've tried all these:
"[^ <]keyword"
"[ <^]keyword"
"[\\^ <]keyword" note:for a Java/C# string backslash need to be escaped
This question is similar Match only at string start or after whitespace but with only basic skills of Regex I can't adopt it to this problem. I'v tried:
"(?<!\\S<)keyword"
"(?<!([\\S<]))keyword"
And this seems to be a very basic problem, there may be a very easy and clear way.

This should work (^|[< ])keyword
(...|...) has ^ and [< ], stating either it should be start of string of be after char(<) or char( )

You could use an alternation | in a non capturing group (?:^|[ <]) to assert either the start of the string ^ or match a space or < in a character class and use a capturing group for keyword.
(?:^|[ <])(keyword)\b
Regex demo
Or you could use a positive lookbehind (?<=...) and match only keyword
(?<=^|[< ])keyword\b
Regex demo

(^keyword |[< ^]keyword)
Write in the square brackets the character you need.

Related

Trying to match string A if string B is found anywhere before it

What I'm trying to do is, if a string consists of some substring that starts with "!" encapsulated in "[" and "]", to separate those brackets from the rest of the string via a space, e.g. "[!foo]" --> "[ !foo ]", "[!bar]" --> "[ !bar ]", etc. Since that substring can be variable length, I figured this had to be done with regex. My thought was to do this in two steps - first separate the first bracket, then separate the second bracket.
The first one isn't hard; the regex is just \[! and so I can just do str = str.replace(/\[!/g, "[ !"); in Javascript. It's the second part I can't get to work.
Because now, I need to match "]" if the string literal "[ !" is found anywhere before it. So a simple positive lookbehind doesn't match because it only looks directly behind: (?<=\Q[ !\E)\] doesn't match.
And I still don't understand why, but I'm not allowed to make the positive lookbehind non-fixed length; (?<=\Q[ !\E.*)\] throws the error Syntax Error: Invalid regular expression: missing / in the console, and this regex debugger yields a pattern error explaining "A quantifier inside a lookbehind makes it non-fixed width".
Putting a non-capturing group of non-fixed width between the lookbehind and the capturing group doesn't work; (?<=\Q[ !\E)(?:.*)\] doesn't match.
One thing that won't work is just trying to match "[ !" at the start of the string, because this whole "[!foo]" string is actually itself a substring of an even bigger string and isn't at the beginning.
What am I missing?
Using 2 positive lookarounds, you can assert what is on the left is an opening square bracket (?<=\[)
Then match any char except ] using a negated character class ![^[\]]+ preceded by an exclamation mark and assert what is on the right is a closing square bracket using (?=])
Note that in Javascript the lookbehind is not yet widely supported.
(?<=\[)![^[\]]+(?=])
In the replacement use the matched substring $&
Regex demo
[
"[!foo]",
"[!bar]"
].forEach(s =>
console.log(s.replace(/(?<=\[)![^[\]]+(?=])/g, " $& "))
)
Or you could also use 3 capturing groups instead:
(\[)(![^\]]+)(\])
In the replacement use
$1 $2 $3
Regex demo
[
"[!foo]",
"[!bar]"
].forEach(s =>
console.log(s.replace(/(\[)(![^\]]+)(\])/g, "$1 $2 $3"))
)
You can use this regex: \[!([^]]+)\] with this substitution string [! \1 ].
Explanation:
The regex:
\[!: match begins with [!
([^]]+): capture in group 1 all the characters that are not ]
\]: match ]
The substitution: substitute the full match with [!{contents of group 1}].
Regex Demo
I hope it helps.

How to replace text without changing quoted string with regex

I want to replace
$this->input->post("product_name");
with
$post_data["product_name"];
I want to use notepad++ regex, but I couldn't find proper solution
In find --> $this->input->post("[\*w\]");
In replace --> $post_data["$1"];
but its not working
The $this->input->post("[\*w\]"); pattern does not work because:
$ is a special char matching the end of a line, you need to use \$ to match it as a literal char
[\*w'\] is a malformed pattern as there is no matching unescaped ] for the [ that opens a character class. Also, w just matches w, not any letter, digit or underscore, \w does that.
You may use
Find What: \$this->input->post\("(\w*)"\);
Replace With: $post_data["$1"];
If there can be any char inside double quotes use .*? instead of \w*:
Find What: \$this->input->post\("(.*?)"\);
Regulex graph:
NPP test:
Use this pattern to match desired text \$this->input->post\(("[^"]+")\);
And replace it with pattern \$post_data\[\1\]
Explanation:
\$this->input->post - matach $this->input->post literally
\(("[^"]+")\); - match (literally, then match double quates and everything between them with "[^"]+" and store inside first capturing group, then match ); literally
To replace
$this->input->post("product_name");
by
$post_data["product_name"];
do replace, with regex activated
this->input->post\("(.*)"\);
by
post_data\["\1"\];
The \x with x a number, corresponds to the x-th match catched with the parenthesis. Here we catch any character inside this->input->post(XXXX);
Don't forget to escape special character with \.
Your special characters were []()

match everything but particular words using regex

var text = "!john david sue !jay";
I want to get all strings except words that begin with "!" like "!john" and
"!jay"...As a result i should get "david" and "sue" strings in this case.
Why doesn't this regex work?
/[^(![a-z0-9]+)]/
You can use negative lookbehind:
(?<!!)\b\w+
See Regex DEMO
Your regex does not work because your pattern is inside [^ ] (negated character set). All characters are matched literally in a negated char set i.e ( will match a literal ( instead of grouping bracket, etc.

Remove all characters after a certain match

I am using Notepad++ to remove some unwanted strings from the end of a pattern and this for the life of me has got me.
I have the following sets of strings:
myApp.ComboPlaceHolderLabel,
myApp.GridTitleLabel);
myApp.SummaryLabel + '</b></div>');
myApp.NoneLabel + ')') + '</label></div>';
I would like to leave just myApp.[variable] and get rid of, e.g. ,, );, + '...', etc.
Using Notepad++, I can match the strings themselves using ^myApp.[a-zA-Z0-9].*?\b (it's a bit messy, but it works for what I need).
But in reality, I need negate that regex, to match everything at the end, so I can replace it with a blank.
You don't need to go for negation. Just put your regex within capturing groups and add an extra .*$ at the last. $ matches the end of a line. All the matched characters(whole line) are replaced by the characters which are present inside the first captured group. .
matches any character, so you need to escape the dot to match a literal dot.
^(myApp\.[a-zA-Z0-9].*?\b).*$
Replacement string:
\1
DEMO
OR
Match only the following characters and then replace it with an empty string.
\b[,); +]+.*$
DEMO
I think this works equally as well:
^(myApp.\w+).*$
Replacement string:
\1
From difference between \w and \b regular expression meta characters:
\w stands for "word character", usually [A-Za-z0-9_]. Notice the inclusion of the underscore and digits.
(^.*?\.[a-zA-Z]+)(.*)$
Use this.Replace by
$1
See demo.
http://regex101.com/r/lU7jH1/5

regex to check string is certain length

I am trying to write a regex to match pairs of cards (AA, KK, QQ ... 22) and I have the regex ([AKQJT2-9])\1. The problem I have is that this regex will match AA as well as AAbc etc. Is there a way to write the regex such that I can specify I want to match ([AKQJT2-9])\1 and only that (i.e. no more characters after).
Enclose the regex in ^ and $:
^([AKQJT2-9])\1$
^ is the "start-of-string" anchor, and $ is the "end-of-string" anchor. If your regex flavor supports it, \A and \Z might be an even better choice since ^ and $ can also match start/end of a line in a multiline string, depending on your regex engine and configuration.
You mean, like this ?
^([AKQJT2-9])\1$
It will only match if the string is "AA", "KK", …
If you want to capture both characters, but not the rest of the string, you'll have to use another parenthesis
($match,$unused) = $string ~= (([AKQJT2-9])\2); # in perl