Keep only the strings in between quotes in Notepad++ - regex

In Notepad++, I use the expression (?<=").*(?=") to find all strings in between quotes. It would the seem rather trivial to be able to only keep those results. However, I cannot find an easy solution for this.
I think the problem is that Notepad++ is not able to make multiple selections. But there must be some kind of workaround, right? Perhaps I must invert the regex and then find/replace those results to end up with the strings I want.
For example:
blablabla "Important" blabla
blabla "Again important" blablabla
I want to keep:
Important
Again important

There is no great solution for this and depending on your use case I would recommend writing a quick script that actually uses your first expression and creates a new file with all of the matches (or something like this). However, if you just want something quick and dirty, this expression should get you started:
[^"]*(?:"([^"]*)")?
\1\n
Explanation:
[^"]* # 0+ non-" characters
(?: # Start non-capturing group
" # " literally
( # Start capturing group
[^"]* # 0+ non-" characters
) # End capturing group
" # " literally
)? # End non-capturing group AND make it optional
The reason the optional non-capturing group is used is because the end of your file may very well not have a string in quotes, so this isn't a necessary match (we're more interested in the first [^"]* that we want to remove).

Try something like this:
[^"\r\n]+"([^"]+)"[^"\r\n]+
And replace with $1. The above regex assumes there will be only 2 double quotes in each line.
[^"]+ matches non-quote characters.
[^"\r\n]+ matches non-quote, non newline characters.
regex101 demo

Hard to be certain from your post, but I think you may want : SEE BELOW
<(?<=")(.*)(?=")
The part you keep will be captured as \2.
(?<=")(.*)(?=")
\1 \2 \3
Your original regex string uses parentheses to group characters for evaluation. Parentheses ALSO group characters for capturing. That is what I added.
Update:
The regex pattern you provided doesn't seem to work correctly. Won't this work?
\"(.*)\"
\1 now captures the content.

Related

Find and replace with variable text

Trying to batch a bunch of conversions with a regex find and replace and I'm not actually sure it's possible, let alone how to achieve this.
Using Sublime text as an editor, open to other tools to accomplish this if possible.
Two sample lines :
Session::flash( 'error', 'Only users with permission may view the directory user.' );
Session::flash( 'error', 'System user ID does not exist.' );
** Desired outcome: **
flash('Only users with permission may view the directory user.')->error();
flash('System user ID does not exist.')->error();
Current Regex that matches:
Session::flash(\s*'error',.* )
Is it possible, that the text lines can be saved and reused in the replace lines? Hoping for a solution along the lines of $variable so that I may replace the strings with something like
** Wishful line: **
flash('$variable')->error();
Thanks folks!
You could use 2 capturing groups and in the replacement referer to those capturing groups.
\bSession::flash\(\s*'([^']+)',\s*('[^']+')\s*\);
In the replacement use:
flash($2)->$1;
Explanation
\bSession::flash\(\s* Match a wordboundary to prevent Session being part of a longer word, then match Session::flash( followed by 0+ times a whitespace char
'([^']+)' Match ', then capture in group 1 matching not a ' using a negated character class, then match ' again
,\s* Match a comma followed by 0+ times a whitespace char
('[^']+') Capture in group 2 matching ', then not ' and again '
\s*\); Match 0+ times a whitespace char followed by );
Regex demo
Result:
flash('Only users with permission may view the directory user.')->error;
flash('System user ID does not exist.')->error;
What you're looking for here is a capture group and a backreference.
In a regular expression anything wrapped in ( and ) is captured for later use by whatever performed the regular expression match, which in this case is Sublime Text. The number of capture groups that are supported varies depending on the regular expression library in use, but you generally get at least 10.
In use, every incidence of () creates a capture, with the first capture being numbered as 1, the second one 2 and so on (generally also the entire match is capture 0). Using the sequence \1 or $1 means "use the contents of the first capture group".
As an example, consider the regular expression ^([a-z]).\1. Breaking it down:
^ - match starting at the start of a line
( - start a capture
[a-z] - match a single lower case letter
) - end a capture
. - match any character
\1 - match whatever the contents of the first capture was
Given this input:
abc
aba
bab
This regular expression matches aba and bab because the first character in both cases is captured as \1 and needs to match later. However abc doesn't match because in that case \1 is a, but the third character is c.
The result of the capture can also be used in the replacement text as well the same way. If you modify your regular expression, you can capture the text you want to keep and use it in the replacement.
As a note, your regex as outlined in your question above doesn't match in Sublime because ( starts a capture group, and thus does not match the ( that's actually in the text. If you're using Sublime and you turn on the Highlight Matches option in the Find/Replace panel, you'll see that your regex is not considered a match.
Find:
Session::flash\(\s*'error'\s*,(.*)\);
Replace:
flash(\1)->error();
Result:
flash( 'Only users with permission may view the directory user.' )->error();
flash( 'System user ID does not exist.' )->error();
This is more or less the regex outlined in your question, except:
The ( and ) in your regex have been replaced with \( and \), which is to tell the regex that this should match a literal ( and not be considered to start a capture.
The .* is changed to (.*) which means "whatever text appears here, capture it for later use.
The replacement text refers to the text captured as \1 and puts it back in the replacement.

How to delete every third line on Notepad++?

I have text on new lines like so:
tom
tim
john
will
tod
hello
test
ttt
three
I want to delete every third line so using the example above I want to remove: john,hello,three
I know this calls for some regex, but I am not the best with it!
What I tried:
Search: ([^\n]*\n?){3} //3 in my head to remove every third
Replace: $1
The others I tried were just attempts with \n\r etc. Again, not the best with regex. The above attempt I thought was kinda close.
This will delete every third line that may contain more than one word.
Ctrl+H
Find what: (?:[^\r\n]+\R){2}\K[^\r\n]+(?:\R|\z)
Replace with: LEAVE EMPTY
check Wrap around
check Regular expression
Replace all
Explanation:
(?: # start non capture group
[^\r\n]+ # 1 or more non linebreak
\R # any kind of linebreak (i.e. \r, \n, \r\n)
){2} # end group, appears twice (i.e. 2 lines)
\K # forget all we have seen until this position
[^\r\n]+ # 1 or more non linebreak
(?: # start non capture group
\R # any kind of linebreak (i.e. \r, \n, \r\n)
| # OR
\z # end of file
) #end group
Result for given example:
tom
tim
will
tod
test
ttt
Screen capture:
Demo
gedit ubuntu
Search for: (.*?)\n(.*?)\n(.*)\n
Replace with: \1\n\2\n
Since the OP says Sahil's answer "worked like a charm" I'll assume the text in notepad++ ended with a newline character. Otherwise, Sahil's and Toto's answers will fail to match the final set of words.
Sahil's pattern: (.*?)\n(.*?)\n(.*)\n takes 79 steps *if the text ends in \n; otherwise 112 steps and fails.
His replacement expression needlessly uses two capture group references.
Toto's pattern: ((?:[^\r\n]+\R){2})[^\r\n]+\R takes 39 steps *if the text ends in \n; otherwise 173 steps and fails.
His replacement expression uses one capture group reference.
My suggested pattern will take only 25 steps and uses no capture groups.
Your text is a series of non-white characters followed by white characters and so the following is the shortest, most accurate pattern which provides maximum speed:
\S+\s+\S+\s+\K\S+\s*
This pattern should be paired with an empty replacement.
\S means "non-white-space character"
\s means "white-space character"
+ means one or more of the preceding match
* means zero or more of the preceding match
\K means Keep the match starting from here
The * on the final \s allows the final 3 lines of text to conclude without a trailing newline character. When doing this kind of operation on a big batch of text, it is important to be sure that the replacement is working properly on the whole text and no undesired substrings remain.
While I'm sure you've long forgotten about this regex task, it is important that future readers benefit from learning the best way to achieve the desired result.
Another way, you can use the plugin ConyEdit to do this. Use the command line cc.dl 3.3 to delete the third line of each group, 3 lines for each group.

Trying to figure out how to capture text between slashes regex

I have a regex
/([/<=][^/]*[/=?])$/g
I'm trying to capture text between the last slashes in a file path
/1/2/test/
but this regex matches "/test/" instead of just test. What am I doing wrong?
You need to use lookaround assertions.
(?<=\/)[^\/]*(?=\/[^\/]*$)
DEMO
or
Use the below regex and then grab the string you want from group index 1.
\/([^\/]*)\/[^\/]*$
The easy way
Match:
every character that is not a "/"
Get what was matched here. This is done by creating a backreference, ie: put inside parenthesis.
followed by "/" and then the end of string $
Code:
([^/]*)/$
Get the text in group(1)
Harder to read, only if you want to avoid groups
Match exactly the same as before, except now we're telling the regex engine not to consume characters when trying to match (2). This is done with a lookahead: (?= ).
Code:
[^/]*(?=/$)
Get what is returned by the match object.
The issue with your code is your opening and closing slashes are part of your capture group.
Demo
text: /1/2/test/
regex: /\/(\[^\/\]*?)(?=\/)/g
captures a list of three: "1", "2", "test"
The language you're using affects the results. For instance, JavaScript might not have certain lookarounds, or may actually capture something in a non-capture group. However, the above should work as intended. In PHP, all / match characters must be escaped (according to regex101.com), which is why the cleaner [/] wasn't used.
If you're only after the last match (i.e., test), you don't need the positive lookahead:
/\/([^\/]*?)\/$/

How can I capture the string "word1/word2/*" with a regular expression?

I need to build a regular expression that accepts the following pattern:
word1/word2/*
word1/*
"word2" is optional, and it needs to end with "/*".
I tried this regexp:
(word1)/(word2)?/\*
It matches this input: word1/word2/*
but not this: word1/*
Try this:
word1(?:/word2)?/\*
The (?: ) construct is a non-capturing group. The ? after it means "zero or one of the previous atom." So this matches word1 optionally followed by /word2, then a final slash and asterisk.
(I'm assuming that's a literal asterisk that you want to match, not a wildcard asterisk as in "any characters"; if it's the latter you want, replace \* with .*.)
(You can put the capturing groups () back in on word1 and word2 if you need them.)
You need to move the first / into the following capture group (parenthesized subexpression):
(word1)(/word2)?/\*
If you want to capture word2 without the /, introduce an additional capture group:
(word1)(/(word2))?/\*
Depending on your environment, more sophisticated solutions may be available that avoid the additional capture group (non-capturing groups, look-around assertions).
I have another answer, I believe its more lean.
(((word)(1|2)/){1,2}\*)
Much less code, try it out.

Notepad++ regular expressions

First of all, regular expressions are quite possibly the most confusing thing I have every dealt with - with that being said I cannot believe how efficient they can make ones life.
So I am trying to understand the wildcard regex with no luck
Need to turn
f_firstname
f_lastname
f_dob
f_origincountry
f_landing
Into
':f_firstname'=>$f_firstname,
':f_lastname'=>$f_lastname,
':f_dob'=>$f_dob,
':f_origincountry'=>$f_origincountry,
':f_landing'=>$f_landing,
In the answer can you please briefly describe the regex you are using, I have been reading the tutorials but they boggle my mind. Thanks.
Edit: As Chris points out, you can improve the regex by cleaning up any white space there may be in the target string. I also replace the dot with \w as he did because it's better practice than using the .
Search: ^f_(\w+)\s*$
^ # start at the beginning of the line
f_ # look for f_
(\w+) # capture in a group all characters
\s* # optionally skip over (don't capture) optional whitespace
$ # end of the line
Replace: ':f_\1'=>$f_\1,
':f_ # beginning of replacement string
\1 # the group of characters captured above
'=>$f_ # some more characters for the replace
\1, # the capture group (again)
Find: (^.*)
Replace with: ':$1'=>$$1,
Find What:
(f_\w+)
Here we're matching f_ followed by a word character \w+ (the plus mean one or more times). Wrapping the whole thing in brackets means we can reference this group in the replace pattern
Replace With:
':\1'=>$\1,
This is simply your result phrase but instead of hardcoding the f words I've put \1 to reference the group in the search