If duplicate within brackets, delete one of the lines - regex

Hi i have a long list of items (~6k), that comes in this format:
'Entry': ['Entry'],
What i want to do, is if within the first bracket, the words match, i.e.:
'ACT': ['KOSOV'],
'ACT': ['STIG'],
I want it to leave only one of the entries, it doesn't matter which entry the first the second or whatever, i just need it to leave one of them.
If possible I would like to accomplish that by sublime, or notepad++ using regexp and if there is no way then do whatever you think is best to solve this.
UPD: The AWK command did the job indeed, thank you

You can't solve this using just regular expressions. You either need to remember all entries you've seen so far while scanning the text (would require writing a small utility program, probably), or you could sort the entries and then remove any repeated entries.
If you have a sorted file, then you can solve it using a regular expression, such as this one:
^(([^:]+):.+\n)(?:\2.+\n)+
Replace with \1. See it in action here

Related

How to extract out specific substrings?

I have these long strings that have multiple substrings in them all separated by periods. The good news is I've found out how to extract most of the substrings on the left or right of the strings by using functions like left, mid, right, regexextract, find, len, and substitute, but I just can't figure out this last problem.
The problem with these substrings is sometimes some are there, sometimes none are there (most I've seen at once is, I think, 3). And other than being in all caps, which some of the other substrings that I don't want are also in, I don't think there's any regex pattern one could use except something like string1|string2|string3, etc all the way up to maybe string30.
I first thought it would be best to just have a formula look at the string, compare it to a range on another sheet, and if there was something in the range that was in the string, then show it. But I was lost on how to do that. Then I figured just put the whole range list in a regex and somehow extract any substrings that were in the string.
And that worked, but it would only extract the first substring it found whereas I wanted it to extract all the substrings it found. And while I think I'd prefer the substrings to be put into different columns (not rows) by using the Split function, I'd settle for them all being put in the same cell via the Textjoin function.
The farthest I've gotten is
=split(REGEXextract(A2,"\b(?:string1|string2|string3)\b")," ")
but like I said that only spits out the first substring it finds. And I've seen some people use REGEXreplace with Split and ArrayFormula and sometimes double REGEX functions, but I just can't seem to make those work for my purposes.
I'm doing this in GoogleSheets, but even an Excel or LibreOffice answer will probably be helpful as I can probably turn them into a GS solution. I realize I could just make a simple regexextract in 30 or so columns, but I'd really rather not do that. Thanks in advance, even if you just give me an idea of what direction to head in.
You could try something like this, that would filter all values that match your desired list of substrings. Replace F1:F2 with the range where you save the values you want to appear, and A3 with the cell of the substring. If you need you can set this as an array with Map or BYROW, for example
=filter(split(A3,"."),INDEX(REGEXMATCH(SPLIT(A3,"."),JOIN("|",F1:F2))))

Find and replace with regular expression in Notepad++

At the moment, I have a PHP function that gets the contents of a CSV file and puts it into a multi-dimensional array, which contains text that I print out in various places, using the indexes.
an example of use would be:
$localText[index][pageText][conceptQualityText][$lang];
The first index, [index], would be the name of the page. The second index [pageText] would indicate what it is (text for the page). The third index, [conceptQualityText] indicates what the actual text is. The last index, [$lang] gets the text in the desired language.
so:
->page location
->what is it
->the content
->what language it should be displayed in.
This all worked fine in the previous PHP versions. However, upgrading to 7.2, PHP seems to be a bit more strict. I was a bit more green ~2 years ago when I first made this solution, and now know that since these indexes aren't defined as strings e.g. encapsulated in single quotes like so: ['index'], they fit the notation of a superglobal (DEFINE). I didn't give it much thought back then, but now PHP seems to interpret them as so (superglobals), and so I get thrown the error that x word is an undefined superglobal.
My initial thought is to make a search and replace on my example string:
$localText[index][pageText][conceptQualityText][$lang];
using the regular expression functionality in Notepad++.
However, the example is just one of many, the notation of the array indexing is basically:
$localText[index][index2][index3][$lang];
So my question is:
How can I make use of the Notepad++ search and replace, using a regular expression, so that my index pointers become strings, instead of acting as superglobal variables?
e.g. make:
$localText[index][index2][index3][$lang];
into:
$localText['index']['index2']['index3'][$lang];
I will need some sort of logic that checks for whatever is inside the brackets and encapsulates them with single quotes, except for the last index, [$lang].
I tried to give as much information as possible, let me know if anything needs to be elaborated.
I tried to refer to these docs without much luck.
I found a solution using
this:
find: \b(localText\[)([a-zA-z0-9_\-]+)(\]\[)([a-zA-z0-9_\-]+)(\]\[)([a-zA-z0-9_\-]+)
replace: $1'$2'$3'$4'$5'$6'
and it works like a charm. Thanks for everyone who took their time to help.
You can use the following regex to match:
\[[^'](\w+)[^']\]
The regex matches a Word between Square brackets unless it quoted.
Replace with:
['$1']
The regex will not match the last brackets because it contains a '$' sign.

Regex: Replace every char in the search string IF they're found in order

I am building a search functionality and I am trying to make it similar to the one in Sublime Text.
Assume "cmd" as the input string and "command" is one of the results.
To search the files, among other things, I split that input by chars and end up with the following regex: c.*?m.*?d. This part is succesfull in finding files like "command", however, when I use the same regex to replace the found string with some HTML elements to evidentiate the fact that the searched string is found in that particular item, this results in something like this:
<span>command</span>
I understand exactly why this is happening and I'm looking for and alternative to display to the user something like the following:
<span>c</span>o<span>m</span><span>m</span>an<span>d</span>
Or, maybe just:
<span>c</span>o<span>m</span>man<span>d</span>
I have an idea of how to do this, which is by encapsulating every single character in between parantheses and then replace every single one with the <span>$x</span> part, but I'm not sure how to do this exactly.
Any kind of help is immensely appreciated.
Thanks,

Notepad++ Regex - Finding and replacing multiple different criteria simultaneously

I've just started to get to grips with regex in notepad++ and I've tasked myself with formatting a chunk of JSON data into something human readable, as well as something that can be read into an algorithm a colleague of mine wrote. I've found a few regex expressions that do this perfectly, but in order to get to my desired result, I have to do it in four separate Find/Replace steps. Is there some sort of way I can create one single find/replace expression that handles all of the above tasks for me?
Currently I have Notepad++ doing the following:
Deleting all quotation marks by finding " and replacing it with
nothing
Deleting all commas by finding , and replacing it with nothing
Changing all underscored numbers that are followed by a colon with
the number 0 (the reason behind this is particular to the project)
by finding _[0-9]*: and replacing with _0 and finally, putting all
of a particular expression onto it's own line by finding the start
of the particular string I'm after and adding \n.
I know that's convoluted, but fortunately it does the job. Is there any way of consolidating all that into a single command, or does that all have to be done step by step?
Thanks guys :)
Notepad ++ allows you to consolidate individual search and replaces as a macro which you can also save.
Hit the record button in the toolbar (or Macro>Start Recording)
perform these regex replacements in the required order.
hit stop button in toolbar (or Macro>Stop Recording)
Hit the play button to perform all the required replacement operations again.
Save the macro by going into the Macro option in the window menu and 'save current recorded macro'
As for the first to replacements you could use the following expression: (?:"|,)

How do I join two regular expressions into one in Notepad++?

I've been searching a lot in the web and in here but I can't find a solution to this.
I have to make two replacements in all registry paths saved in a text file as follows:
replace all asterisc with: [#42]
replace all single backslashes with two.
I already have two expressions that do this right:
1st case:
Find: (\*) - Replace: \[#42\]
2nd case:
Find: ([^\\])(\\)([^\\]) - Replace: $1$2\\$3
Now, all I want is to join them together into just one expression so that I can do run this in one time only.
I'm using Notepad++ 6.5.1 in Windows 7 (64 bits).
Example line in which I want this to work (I include backslashes but i don't know if they will appear right in the html):
HKLM\SOFTWARE\Classes\*\shellex\ContextMenuHandlers\
I already tried separating it with a pipe, like I do in Jscript (WSH), but it doesn't work here. I also tried a lot of other things but none worked.
Any help?
Thanks!
Edit: I have put all the backslashes right, but the page html seem to be "eating" some of them!
Edit2: Someone reedited my text to include an accent that doesn't remove the backslashes, so the expressions went wrong again. But I got it and fixed it. ;-)
Sorry, but this was my first post here. :)
As everyone else already mentioned this is not possible.
But, you can achieve what you want in Notepad++ by using a Macro.
Go to "Macro" > "Start Recording" menu, apply those two search and replace regular expressions, press "Stop Recording", then "Save Current Recorded Macro", there give it a name, assign a shortcut, and you are done. You now can reuse the same replacements whenever you want with one shortcut.
Since your replacement strings are totally different and use data that come not from any capture (i.e. [#42]), you can't.
Keep in mind that replacement strings are only masks, and can not contain any conditional content.