Notepad 2 insert character after regular expression search - regex

I am having an issue with trying to figure out how to insert some text after I perform a regex search. I know there is a replace function, but I am not looking for that option, just inserting. The text editor I am using is Notepad2, but I am willing to try this in other text editors.
Here is the example that I have.
TEST|Test2|Test3|Test4
This is what I am looking for
Test|Test2|PrefixTest3|Test4
Notice that I am trying to insert the the phrase "Prefix" after the 2nd pipe and leave everything else alone.
I can successfully query the result by using this regex:
^[^|]*\|[^|]*|
But then I do not know how I can retain everything prior and after the search point. Any ideas?

You could simply use \K inorder to discard the previously matched characters.
^[^|]*\|[^|]*\|\K
Then replace the match with the string prefix.
DEMO

You may easily do that in Notepad2 using the regex-based Replace feature.
Find:       ^\([^|]*|[^|]*|\)
Replace: \1Prefix
Details:
^ - start of a line (Notepad2 never overflows line boundaries!)
\([^|]*|[^|]*|\) - Capturing group 1 matching a sequence of:
[^|]* - zero or more chars other than |
| - a literal (yes, no escaping is necessary, both escaped and unescaped | match a literal |) pipe symbol
[^|]*| - see above, gets to the second |.
The replacement contains a \1 backreference that inserts what was captured with the capturing group 1.
NOTE that Notepad2 regex engine is very limited. Here is what the Notepad2 documentation says:
Notepad2 supports only a limited subset of regular expressions, as provided by built-in engine of the Scintilla source code editing component. The advantage is that it has a very small footprint. There's currently no plans to integrate a more advanced regular expressions engine, but this may be an option for future development.
Note: Regular expression search is limited to single lines, only.
Also, you may refer to the inline comments inside Scintilla RESearch.cxx file describing the supported syntax. Bear in mind that the regex type used in the Notepad2 S&R tool is that of POSIX and not all of the described Scintilla regex features will work in the tool.
Note that Notepad2 does not seem to support alternation and limiting quantifiers (similar to Lua patterns), but \w matches Unicode letters together with ASCII ones. Sadly, I could not make ? quantifier work.

^([^|]*\|[^|]*\|)
Try this.Replace by $1prefix.See demo.Just capture the first group and then use it for replace.The first group can be accessed by $1.
http://regex101.com/r/pQ9bV3/11

Related

Regex: extract characters from two patterns

I have the following string:
https://www.google.com/today/sunday/abcde2.hopeho.3345GETD?weatherType=RAOM&...
https://www.google.com/today/monday/jbkwe3.ho4eho.8495GETD?weatherType=WHTDSG&...
I'd like to extract jbkwe3.ho4eho.8495GETD or abcde2.hopeho.3345GETD. Anything between the {weekday}/ and the ?weatherType=.
I've tried (?<=sunday\/)$.*?(?=\?weatherType=) but it only works for the first line and I want to make it applicable to all strings regardless the value of {weekday}.
I tried (?<=\/.*\/)$.*?(?=\?weatherType=) but it didn't work. Could anyone familiar with Regex can lend some help? Thank you!
[Update]
I'm new to regex but I was experimenting it on sublime text editor via the "find" functionality which I think should be PCRE (according to this post)
Try this regex:
(?:sun|mon|tues|wednes|thurs|fri|satur)day\/\K[^?]+(?=\?weatherType)
Click for Demo
Link to Code
Explanation:
(?:sun|mon|tues|wednes|thurs|fri|satur)day - matches the day of a week i.e, sunday,monday,tuesday,wednesday,thursday,friday,saturday
\/ - matches /
\K - unmatches whatever has been matched so far and pretends that the match starts from the current position. This can be used for the PCRE.
[^?]+ - matches 1 or more occurences of any character that is not a ?
(?=\?weatherType) - the above subpattern[^?]+ will match all characters that are not ? until it reaches a position which is immediately followed by a ? followed by weatherType
To make the match case-insensitive, you can prepend the regex with (?i) as shown here
In the examples given, you actually only need to grab the characters between the last forward slash ("/") and the first question mark ("?").
You didn't mention what flavor regex (ie, PCRE, grep, Oracle, etc) you're using, and the actual syntax will vary depending on this, but in general, something like the following (Perl) replacement regex would handle the examples given:
s/.*\/([^?]*)\?.*/$1/gm
There are other (and more efficient) ways, but this will do the job.

A regular expression that matches two long strings and ignores everything in between

I am searching through a 1.5 million line Premiere Pro project for any text that matches one of my audio filters and is set to mono.
Text that I am searching for begins with the <ChannelType> tag and ends with the <FilterMatchName>Tags. So it would looks like this
<ChannelType>0</ChannelType>
<FrameRate>5292000</FrameRate>
</AudioComponent>
<FilterPreset>0</FilterPreset>
<OpaqueData Encoding="base64" Checksum="53060659">AAAAAD8L8lo+AUr+Pac1NjwTmoUAAAAAP0uQDD37nIg9ui6MPjwU5j+AAAA+C/JaAAAAAD8qqqsAAAAAP4AAAD92L8w9py8FAAAAAHNvZnQgY29tcHJlc3Npb24AIiBkZWZhdWx0PSIwIiBzdGVwPSIxIiBtaW49IjAiIG1heD0iMSIvPgoJICA8Zmw=</OpaqueData>
<FilterIndex>-1</FilterIndex>
<FilterMatchName>1094998321 Dynamics1</FilterMatchName>
If I were in a Word doc, I would just do a find as
<ChannelType>0</ChannelType>*<FilterMatchName>1094998321 Dynamics1</FilterMatchName>
I am terrible with Regex. I was hoping someone could help me out. Everything I have tried either doesn't match anything, or matches EVERYTHING in the document. I am using Notepad++.
Since you are working in Notepad++, you have access to PCRE regular expressions. This one will get all the text between <ChannelType> and </FilterMatchName>
(?s)<ChannelType>.*?</FilterMatchName>
the (?s) allows the . to match newline characters
After matching <ChannelType>, the .*? lazily matches all characters up to...
the closing </FilterMatchName>, which we match.
Let me know if you have any questions. :)
What type of regular expressions are you using (which language/library)?
Basically you can use .* instead of * in regular expressions. IF your text is long though, it's better to use a Reluctant quantifier[1] if your re implementation allows it.
This is a good site with comparison of different re implementations and tutorials:
http://www.regular-expressions.info
[1] http://docs.oracle.com/javase/tutorial/essential/regex/quant.html

Regular expression using negative lookbehind not working in Notepad++

I have a source file with literally hundreds of occurrences of strings flecha.jpg and flecha1.jpg, but I need to find occurrences of any other .jpg image (i.e. casa.jpg, moto.jpg, whatever)
I have tried using a regular expression with negative lookbehind, like this:
(?<!flecha|flecha1).jpg
but it doesn't work! Notepad++ simply says that it is an invalid regular expression.
I have tried the regex elsewhere and it works, here is an example so I guess it is a problem with NPP's handling of regexes or with the syntax of lookbehinds/lookaheads.
So how could I achieve the same regex result in NPP?
If useful, I am using Notepad++ version 6.3 Unicode
As an extra, if you are so kind, what would be the syntax to achieve the same thing but with optional numbers (in this case only '1') as a suffix of my string? (even if it doesn't work in NPP, just to know)...
I tried (?<!flecha[1]?).jpg but it doesn't work. It should work the same as the other regex, see here (RegExr)
Notepad++ seems to not have implemented variable-length look-behinds (this happens with some tools). A workaround is to use more than one fixed-length look-behind:
(?<!flecha)(?<!flecha1)\.jpg
As you can check, the matches are the same. But this works with npp.
Notice I escaped the ., since you are trying to match extensions, what you want is the literal .. The way you had, it was a wildcard - could be any character.
About the extra question, unfortunately, as we can't have variable-length look-behinds, it is not possible to have optional suffixes (numbers) without having multiple look-behinds.
Solving the problem of the variable-length-negative-lookbehind limitation in Notepad++
Given here are several strategies for working around this limitation in Notepad++ (or any regex engine with the same limitation)
Defining the problem
Notepad++ does not support the use of variable-length negative lookbehind assertions, and it would be nice to have some workarounds. Let's consider the example in the original question, but assume we want to avoid occurrences of files named flecha with any number of digits after flecha, and with any characters before flecha. In that case, a regex utilizing a variable-length negative lookbehind would look like (?<!flecha[0-9]*)\.jpg.
Strings we don't want to match in this example
flecha.jpg
flecha1.jpg
flecha00501275696.jpg
aflecha.jpg
img_flecha9.jpg
abcflecha556677.jpg
The Strategies
Inserting Temporary Markers
Begin by performing a find-and-replace on the instances that you want to avoid working with - in our case, instances of flecha[0-9]*\.jpg. Insert a special marker to form a pattern that doesn't appear anywhere else. For this example, we will insert an extra . before .jpg, assuming that ..jpg doesn't appear elsewhere. So we do:
Find: (flecha[0-9]*)(\.jpg)
Replace with: $1.$2
Now you can search your document for all the other .jpg filenames with a simple regex like \w+\.jpg or (?<!\.)\.jpg and do what you want with them. When you're done, do a final find-and-replace operation where you replace all instances of ..jpg with .jpg, to remove the temporary marker.
Using a negative lookahead assertion
A negative lookahead assertion can be used to make sure that you're not matching the undesired file names:
(?<!\S)(?!\S*flecha\d*\.jpg)\S+\.jpg
Breaking it down:
(?<!\S) ensures that your match begins at the start of a file name, and not in the middle, by asserting that your match is not preceded by a non-whitespace character.
(?!\S*flecha\d*\.jpg) ensures that whatever is matched does not contain the pattern we want to avoid
\S+\.jpg is what actually gets matched -- a string of non-whitespace characters followed by .jpg.
Using multiple fixed-length negative lookbehinds
This is a quick (but not-so-elegant) solution for situations where the pattern you don't want to match has a small number of possible lengths.
For example, if we know that flecha is only followed by up to three digits, our regex could be:
(?<!flecha)(?<!flecha[0-9])(?<!flecha[0-9][0-9])(?<!flecha[0-9][0-9][0-9])\.jpg
Are you aware that you're only matching (in the sense of consuming) the extension (.jpg)? I would think you wanted to match the whole filename, no? And that's much easier to do with a lookahead:
\b(?!flecha1?\b)\w+\.jpg
The first \b anchors the match to the beginning of the name (assuming it's really a filename we're looking at). Then (?!flecha1?\b) asserts that the name is not flecha or flecha1. Once that's done, the \w+ goes ahead and consumes the name. Then \.jpg grabs the extension to finish off the match.

Replacing char in a String with Regular Expression

I got a string like this:
PREFIX-('STRING WITH SPACES TO REPLACE')
and i need this:
PREFIX-('STRING_WITH_SPACES_TO_REPLACE')
I'm using Notepad++ for the Regex Search and Replace, but i'm shure every other Editor capable of regex replacements can do it to.
I'm using:
PREFIX-\('(.*)(\s)(.*)'\)
for search and
PREFIX-('\1_\3')
for replace
but that replaces only one space from the string.
The regex search feature in Notepad++ is very, very weak. The only way I can see to do this in NPP is to manually select the part of the text you want to work on, then do a standard find/replace with the In selection box checked.
Alternatively, you can run the document through an external script, or you can get a better editor. EditPad Pro has the best regex support I've ever seen in an editor. It's not free, but it's worth paying for. In EPP all I had to do was this:
search: ((?:PREFIX-\('|\G)[^\s']+)\s+
replace: $1_
EDIT: \G matches the position where the previous match ended, or the beginning of the input if there was no previous match. In other words, the first time you apply the regex, \G acts like \A. You can prevent that by adding a negative lookahead, like so:
((?:PREFIX-\('|(?!\A)\G)[^\s']+)\s+
If you want to prevent a match at the very beginning of the text no matter what it starts with, you can move the lookahead outside the group:
(?!\A)((?:PREFIX-\('|\G)[^\s']+)\s+
And, just in case you were wondering, a lookbehind will work just as well as a lookahead:
((?:PREFIX-\('|(?<!\A)\G)[^\s']+)\s+
You have to keep matching from the beggining of the string untill you can match no more.
find /(PREFIX-\('[^\s']*)\s([^']*'\))/
replace $1_$2
like: while (/(PREFIX-\('[^\s']*)\s([^']*'\))/$1_$2/) {}
How about using Replace all for about 20 times? Or until you're sure no string contains more spaces
Due to nature of regex, it's not possible to do this in one step by normal regular expression.
But if I be in your place, I do such replaces in several steps:
find such patterns and mark them with special character
(Like replacing STRING WITH SPACES TO REPLACE with #STRING WITH SPACES TO REPLACE#
Replace #([^#\s]*)\s to #\1_ server times.
Remove markers!
I studied a little the regex tool in Notepad++ because I didn't know their possibilities.
I conclude that they aren't powerful enough to do what you want.
Your are obliged to learn and use a programming language having a real regex capability. There are a number of them. Personnaly, I use Python. It would take 1 mn to do what you want with it
You'd have to run the replace several times for each space but this regex will work
/(?<=PREFIX-\(')([^\s]+)\s+/g
Replace with
\1_ or $1_
See it working at http://refiddle.com/10z

Split string of 2 words in separate words [but sometimes there is only word in the string...]

I have strings like "AMS.I-I.D. ver.5" and "AM0011 ver. 2", of which the "ver. *" needs to be stripped.
I have done that with: (^A.*)(ver.\s\d{1,2})
The problem occurs when there is no version number like "AM0003".
How can I make the (ver.\s\d{1,2}) part optional?
The reason why it's not working when you add a question mark is because your first group is matching greedily. Try changing it to a non-greedy match and then making the second group optional:
^(A.*?)(ver\.\s\d{1,2})?$
^ ^
non-greedy optional
Note that in both parts the only change is the addition of a question mark but the question mark has a different meaning in each case.
Also, in one of your examples there is no whitespace between the text ver. and the version number so you should consider making the whitespace optional in your regular expression.
See the regular expression in action on Rubular.
To have an optionnal capture with regex you just have to use the ? operator.
(^A.*)(ver\.\s\d{1,2})?
Resources :
regular-expressions.info - Optional items
Since the examples show a space between the two words (the product ID and the version information), I would expect to design a regex which uses that space to separate the parts. In Perl:
$line =~ s/^(A\S+)\s+ver\.\s?\d{1,2}/$1/;
This removes (without capturing) the version; if the version is not present, then the substitution does nothing.
$line =~ s/^(A\S+)(?:\s+(ver\.\s?\d{1,2}))?/$1/;
This is an almost trivial variation on the idea; it captures the version string if present (as well as substituting, etc. Note the subtlety that the space before the version string is included in the optional material but not captured '(?:...)?', but the version information is captured without leading spaces.
Quoting the regexes in the abstract, without tying them to the Perl context (though they're still using PCRE - Perl Compatible Regular Expression - notation), you could write:
^(A\S+)(?:\s+(ver\.\s?\d{1,2}))?