Remove all characters after a certain match - regex

I am using Notepad++ to remove some unwanted strings from the end of a pattern and this for the life of me has got me.
I have the following sets of strings:
myApp.ComboPlaceHolderLabel,
myApp.GridTitleLabel);
myApp.SummaryLabel + '</b></div>');
myApp.NoneLabel + ')') + '</label></div>';
I would like to leave just myApp.[variable] and get rid of, e.g. ,, );, + '...', etc.
Using Notepad++, I can match the strings themselves using ^myApp.[a-zA-Z0-9].*?\b (it's a bit messy, but it works for what I need).
But in reality, I need negate that regex, to match everything at the end, so I can replace it with a blank.

You don't need to go for negation. Just put your regex within capturing groups and add an extra .*$ at the last. $ matches the end of a line. All the matched characters(whole line) are replaced by the characters which are present inside the first captured group. .
matches any character, so you need to escape the dot to match a literal dot.
^(myApp\.[a-zA-Z0-9].*?\b).*$
Replacement string:
\1
DEMO
OR
Match only the following characters and then replace it with an empty string.
\b[,); +]+.*$
DEMO

I think this works equally as well:
^(myApp.\w+).*$
Replacement string:
\1
From difference between \w and \b regular expression meta characters:
\w stands for "word character", usually [A-Za-z0-9_]. Notice the inclusion of the underscore and digits.

(^.*?\.[a-zA-Z]+)(.*)$
Use this.Replace by
$1
See demo.
http://regex101.com/r/lU7jH1/5

Related

Regex: How to get all words, special characters and white spaces between quotation marks?

Currently I have a regex expression ([^\[\][\[^\[\][\n"]+) to match text between "", but this does not capture whitespaces, for e.g. if I enter " hello ", it will return hello, without the spaces before and after the word.
Is there some expression I can use to just simply catch anything between two quotation marks?
Thank you.
Maybe this will help:
(?<!\\)(\"|')(.+?)(?:(?<!\\)\1)
And to get the text inside the quotes, get the second capture group.
Proof.
Explanation
(?<!\\) - Negative lookbehind. Looks for literal backslash ('')
(\"|') - to test for the start of the "string"
(.+?) - . will match anything but newlines.
+? means as much as possible but only as much needed to match.
(?:(?<!\\)\1) - Non capturing group.
Used here so we can use the (?<!\\) described earlier without looking behind the whole expression. The
\1 matches the first capture group ((\"|')). Can be replaced with $1
You should use following regex:
\"\s*([^\"]+?)\s*\"
([^\"]+?)The text you want to get will be between space and quote.
Demo & Explanation

How to replace text without changing quoted string with regex

I want to replace
$this->input->post("product_name");
with
$post_data["product_name"];
I want to use notepad++ regex, but I couldn't find proper solution
In find --> $this->input->post("[\*w\]");
In replace --> $post_data["$1"];
but its not working
The $this->input->post("[\*w\]"); pattern does not work because:
$ is a special char matching the end of a line, you need to use \$ to match it as a literal char
[\*w'\] is a malformed pattern as there is no matching unescaped ] for the [ that opens a character class. Also, w just matches w, not any letter, digit or underscore, \w does that.
You may use
Find What: \$this->input->post\("(\w*)"\);
Replace With: $post_data["$1"];
If there can be any char inside double quotes use .*? instead of \w*:
Find What: \$this->input->post\("(.*?)"\);
Regulex graph:
NPP test:
Use this pattern to match desired text \$this->input->post\(("[^"]+")\);
And replace it with pattern \$post_data\[\1\]
Explanation:
\$this->input->post - matach $this->input->post literally
\(("[^"]+")\); - match (literally, then match double quates and everything between them with "[^"]+" and store inside first capturing group, then match ); literally
To replace
$this->input->post("product_name");
by
$post_data["product_name"];
do replace, with regex activated
this->input->post\("(.*)"\);
by
post_data\["\1"\];
The \x with x a number, corresponds to the x-th match catched with the parenthesis. Here we catch any character inside this->input->post(XXXX);
Don't forget to escape special character with \.
Your special characters were []()

Perl regex extract two consecutive words

I am trying to extract strings containing two words separated by one or more whitespace from a list.
Example:
#a=("aaa12:.", "lala lulu", "erwer", ",", "lala loqw asqwd", "asdas sadsad", "asasd| asq");
#b=grep {/\w+\s+\w+/} #a;
this gives me
'lala lulu',
'lala loqw asqwd',
'asdas sadsad'
but I don't want to grep the one with three words...
I tried #b=grep {/^\w\s+\w$/} but then I don't get any matches. Should be simple, but I just don't get it. Which regex do I need here?
\w only matches one character. You want the following:
/^\w+\s+\w+\z/
^ matches the start of string.
\w+ matches one of more "word" characters.
\s+ matches one of more whitespace characters.
\w+ matches one of more "word" characters.
\z matches the end of the string.
I tried #b=grep {/^\w\s+\w$/} but then I don't get any matches
The only reason it doesn't work is because you left off quantifier(s) at
the beginning/end:
/^\w\s+\w$/
^ ^
where it would work fine if it were /^\w+\s+\w+$/
The better way to do it though is add some flexibility with whitespace: /^\s*\w+\s+\w+\s*$/

Get string after string with trailing whitespaces

I currently need to figure out how to use regex and came to a point which i don't seem to figure out:
the test strings that are the sources (They actually come from OCR'd PDFs):
string1 = 'Beleg-Nr.:12123-23131'; // no spaces after the colon
string2 = 'Beleg-Nr.: 12121-214331'; // a tab after the colon
string3 = 'Beleg-Nr.: 12-982831'; // a tab and spaces after the colon
I want to get the numbers eplicitly. For that I use this pattern:
pattern = '/(?<=Beleg-Nr\.:[ \t]*)(.*)
This will get me the pure numbers for string1 and string2 but isn't working on string3 (it gives me additional whitespace before the number).
What am I missing here?
Edit: Thanks for all the helpful advises. The software that OCRs on the fly is able to surpress whitespace on its own in regexes. This did the trick. The resulting pattern is:
(?<=Beleg-Nr\.:[\s]*)(.*)
You can use "\s" special symbol to include both space and tabs (so, you will not need combine it into a group via []).
This works for me:
/(Beleg-Nr.:\s*)(.*)/
http://regexr.com?35rj6
The problem is that [ ]* will match only spaces. You need to use \s which will match any whitespace character (more specifically \s is [\f\n\r\t\v\u00A0\u2028\u2029]) :
/(?<=Beleg-Nr.:\s*)(.*)/
Side note:
* is greedy by default, so it will try to match max number of whitespaces possible, so you do not need to use negative [^\s] in your last () group.
Just replace the (.*) with a more restrictive pattern ([^ ]+$ for example). Also note, that the . after Beleg-Nr matches other chars as well.
The $ in my example matches the end of the line and thus ensures, that all characters are being matched.
I'd suggest to match to tabs as well:
pattern = '/(?<=Beleg-Nr\.:[ \t]*)([^ \t]+)$

NOTEPAD++ REGEX - I can't get what's in between two strings, I don't get it

I'm so close to understanding regex. I'm a bit stumped, I thought i understood lazy and greedy.
Here is my current regex: <g_n><!\[CDATA\[([^]]+)(?=]]><\/g_n>)
My current regex makes:
<g_n><![CDATA[xxxxxxxxxx]]></g_n>
match to:
<g_n><![CDATA[xxxxxxxxxx
But I want to make it match like this:
xxxxxxxxxx
You want
<g_n><!\[CDATA\[(.*?)]]></g_n>
then if you want to replace it use
\1
in the replacement box
Your matching the whole string, the brackets around the .*? match all of that and put it in the \1 variable
So the match will be all of the string with \1 referring to what you want
To change the xxxxx
Regex :
(<g_n><![CDATA[)(?:.*?)(]]></g_n>)
Replacement
\1WHAT YOU WANT TO CHANGE TO\2
It looks like you need to add escape slashes to the two closing square brackets, as they are literals from the string you're parsing.
<g_n><!\[CDATA\[.*+?\]\]><\/g_n>
^ ^
Any square brackets not being escaped by backslashes will be treated as regex operational brackets, which in this case won't catch the input string.
EDIT, I think the +? is redundant.
\[.*\]\]> ...
should suffice, since .* means any character, any amount of times.
Tested with notepad++ 6.3.2:
find: (<g_n><!\[CDATA\[)([^]]+)(?=]]></g_n>)
replace: $1WhatYouWant
You can replace + by * in the pattern to match void CDATA:
<g_n><![CDATA[]]></g_n>