Find and replace with regular expression in Notepad++ - regex

At the moment, I have a PHP function that gets the contents of a CSV file and puts it into a multi-dimensional array, which contains text that I print out in various places, using the indexes.
an example of use would be:
$localText[index][pageText][conceptQualityText][$lang];
The first index, [index], would be the name of the page. The second index [pageText] would indicate what it is (text for the page). The third index, [conceptQualityText] indicates what the actual text is. The last index, [$lang] gets the text in the desired language.
so:
->page location
->what is it
->the content
->what language it should be displayed in.
This all worked fine in the previous PHP versions. However, upgrading to 7.2, PHP seems to be a bit more strict. I was a bit more green ~2 years ago when I first made this solution, and now know that since these indexes aren't defined as strings e.g. encapsulated in single quotes like so: ['index'], they fit the notation of a superglobal (DEFINE). I didn't give it much thought back then, but now PHP seems to interpret them as so (superglobals), and so I get thrown the error that x word is an undefined superglobal.
My initial thought is to make a search and replace on my example string:
$localText[index][pageText][conceptQualityText][$lang];
using the regular expression functionality in Notepad++.
However, the example is just one of many, the notation of the array indexing is basically:
$localText[index][index2][index3][$lang];
So my question is:
How can I make use of the Notepad++ search and replace, using a regular expression, so that my index pointers become strings, instead of acting as superglobal variables?
e.g. make:
$localText[index][index2][index3][$lang];
into:
$localText['index']['index2']['index3'][$lang];
I will need some sort of logic that checks for whatever is inside the brackets and encapsulates them with single quotes, except for the last index, [$lang].
I tried to give as much information as possible, let me know if anything needs to be elaborated.
I tried to refer to these docs without much luck.

I found a solution using
this:
find: \b(localText\[)([a-zA-z0-9_\-]+)(\]\[)([a-zA-z0-9_\-]+)(\]\[)([a-zA-z0-9_\-]+)
replace: $1'$2'$3'$4'$5'$6'
and it works like a charm. Thanks for everyone who took their time to help.

You can use the following regex to match:
\[[^'](\w+)[^']\]
The regex matches a Word between Square brackets unless it quoted.
Replace with:
['$1']
The regex will not match the last brackets because it contains a '$' sign.

Related

Matching within matches by extending an existing Regex

I'm trying to see if its possible to extend an existing arbitrary regex by prepending or appending another regex to match within matches.
Take the following example:
The original regex is cat|car|bat so matching output is
cat
car
bat
I want to add to this regex and output only matches that start with 'ca',
cat
car
I specifically don't want to interpret a whole regex, which could be quite a long operation and then change its internal content to match produce the output as in:
^ca[tr]
or run the original regex and then the second one over the results. I'm taking the original regex as an argument in python but want to 'prefilter' the matches by adding the additional code.
This is probably a slight abuse of regex, but I'm still interested if it's possible. I have tried what I know of subgroups and the following examples but they're not giving me what I need.
Things I've tried:
^ca(cat|car|bat)
(?<=ca(cat|car|bat))
(?<=^ca(cat|car|bat))
It may not be possible but I'm interested in what any regex gurus think. I'm also interested if there is some way of doing this positionally if the length of the initial output is known.
A slightly more realistic example of the inital query might be [a-z]{4} but if I create (?<=^ca([a-z]{4})) it matches against 6 letter strings starting with ca, not 4 letter.
Thanks for any solutions and/or opinions on it.
EDIT: See solution including #Nick's contribution below. The tool I was testing this with (exrex) seems to have a slight bug that, following the examples given, would create matches 6 characters long.
You were not far off with what you tried, only you don't need a lookbehind, but rather a lookahead assertion, and a parenthesis was misplaced. The right thing is: Put the original pattern in parentheses, and prepend (?=ca):
(?=ca)(cat|car|bat)
(?=ca)([a-z]{4})
In the second example (without | alternative), the parentheses around the original pattern wouldn't be required.
Ok, thanks to #Armali I've come to the conclusion that (?=ca)(^[a-z]{4}$) works (see https://regexr.com/3f4vo). However, I'm trying this with the great exrex tool to attempt to produce matching strings, and it's producing matches that are 6 characters long rather than 4. This may be a limitation of exrex rather than the regex, which seems to work in other cases.
See #Nick's comment.
I've also raised an issue on the exrex GitHub for this.

How to use a regular expression in notepad++ to change a url

I need some help with our migrated site urls's. We moved our site from Joomla to Worpdress and IN our posts we have over 20K of internal links.
The structure of these links are like these:
www.mysite.nl/current-post-title/index.php?option=com_content&view=article&id=5259:related-post-title&catid=35:universum&Itemid=48
What we need is this:
www.mysite.nl/related-post-title
So basically we need to remove everyhing behind www.mysite.nl/ up until the colon :, i.e. remove this: current-post-title/index.php?option=com_content&view=article&id=5259: (must remove the colon itself too)
And then remove everything behind the first ampersand (including the ampersand itself) until the end of the string, i.e. remove &catid=35:universum&Itemid=48
Of course only url strings containing this index.php?option=com_content must be changed.
I have dumped the table in plain text and opened it in Notepad++ to do a search and replace with regular expression because the content that must be removed from these lines is different every time.
Can someone please help me with the right regular expression?
In find what box enter below:
(www.mysite.nl)\/.*index.php\?option=com[^:]+:([^&]+)&.*
In replace with box enter:
\1/\2
Result
www.mysite.nl/related-post-title
Go inside-out, rather than outside-in, replace \/.+&id=\d+\:(.+?)&.+ with /$1. Also, paste a few into http://www.regexr.com/ and play around, although JavaScript and Notepad++ might have some differences in implemented Regex features, e.g. negative lookbehinds.

RegEx SQL, issue escaping quotes

I am trying to use PSQL, specifically AWS Redshift to parse a line. Sample data follows
{"c.1.mcc":"250","appId":"sx-calllog","b.level":59,"c.1.mnc":"01"}
{"appId":"sx-voice-call","b.level":76,"foreground":9}
I am trying the following regex in order to to extract the appId field, but my query is returning empty fields.
'appId\":\"[\w*]\",'
Query
SELECT app_params,
regexp_substr(app_params, 'appId\":\"[\w*]\",')
FROM sample;
You can do that as follows:
(\"appId\":\"[^"]*\")(?:,)
Demo: http://regex101.com/r/xP0hW3
The first extracted group is what you want.
Your regex was not matching because \w does not match -
Adding this here despite this being an old question since it may help someone viewing this down the road...
If your lines of data are valid json, you can use Redshift's JSON_EXTRACT_PATH_TEXT function to extract the value a given key. Emphasis on the json being valid, as it will fail if even one line cannot be parsed and Redshift will throw a JSON parsing error.
Example using given data:
select json_extract_path_text('{"c.1.mcc":"250","appId":"sx-calllog","b.level":59,"c.1.mnc":"01"}','appId');
returns sx-calllog
This is especially useful since Redshift does not support lookahead/lookbehind (it is POSIX regex) & extract groups.
You can try using some lookahead and look behinds to isolate just the text inside the quotes for the appid. (?<=appId\":\")(?=.*\",)[^\"]*. I tested this out a bit using your examples you provided here.
To explain the regex a bit more: (?<=appId\":\")(?=.*\",)[^\"]*
(?<=appId\":\"): positive look behind for appid":". Since you don't want the appid text itself being returned (just the value), you can preface the regex with a look behind to say "find me the following regex, but only when it is following the look behind text.
(?=.*\",): positive look ahead for the ending ",. You don't want quotes to be returned in your match, but as with number 1 you want your regex to be bounded a bit and a look ahead does that.
[^\"]*: The actual matching portion. You want to find the string of chars that are NOT ". This will match the entire value and stop matching right before the closing ".
EDIT: Changed the 3rd step a little bit, removed the , from that last piece, it is not needed and would break the match if the value were to actually contain a ,.

If duplicate within brackets, delete one of the lines

Hi i have a long list of items (~6k), that comes in this format:
'Entry': ['Entry'],
What i want to do, is if within the first bracket, the words match, i.e.:
'ACT': ['KOSOV'],
'ACT': ['STIG'],
I want it to leave only one of the entries, it doesn't matter which entry the first the second or whatever, i just need it to leave one of them.
If possible I would like to accomplish that by sublime, or notepad++ using regexp and if there is no way then do whatever you think is best to solve this.
UPD: The AWK command did the job indeed, thank you
You can't solve this using just regular expressions. You either need to remember all entries you've seen so far while scanning the text (would require writing a small utility program, probably), or you could sort the entries and then remove any repeated entries.
If you have a sorted file, then you can solve it using a regular expression, such as this one:
^(([^:]+):.+\n)(?:\2.+\n)+
Replace with \1. See it in action here

Regular Expression Replace String

I have a rather complicating data file with many rows of many different types. For the particular column I'm interested in I have a pattern that looks like this:
12.6 \pm 0.8
^^ The number of digits before and after the decimal in each of those pieces of the entry may vary.
I'm hoping I can use regular expressions to replace that column entry to:
[12.6,-0.8,+0.8]
What I am requesting help on is how I should go about replacing once I've found entries like what I had earlier. All of the examples I've found so far are for when you want to replace static strings with other static strings, but for each line I'm necessarily going to have different numbers (and different digits perhaps). The regular expression I've attempted so far to find entries like "12.6 \pm 0.8" is the following:
\d*\.\d*\s\\\w{2})\s\d*\.\d*
I would also appreciate if I could get a check on that, too. At the moment I'm just manipulating the datafile in my text editor, but I'm also open to Python solutions, too.
Thanks!
Your expression is close. Are there any conditions where this won't work?
(\d*\.\d*)\s\\\w{2}\s(\d*\.\d*)
with the replace pattern being (for JS)
[$1, -$2, $2]
or for emacs (according to http://www.emacswiki.org/emacs/RegularExpression)
[\1, -\2, \2]