Howto search and replace 3 jobs at once with regexp - regex

I'm using a program that's called ASR (Actual Search and Replace) which has some powerful features build in to search a text with regexps and replace it.
I'm using it a lot and I kind of scripted it into my workflow.
Problem is, I need to replace three searches to correct a configuration file (strip the "-" from only these three lines), this is all manual work and very time consuming.
The config file has got the following lines randomly through the file and they can occur multiple times with different names and numbers. They are always on one single line.
<id>filename-33</id>
<source>#filename-33</source>
<url>{filename-33}</url>
The desired output should be:
<id>filename33</id>
<source>#filename33</source>
<url>{filename33}</url>
Both "filename" as the number "33" can be anything (filename is always a name lowercase no special characters and the number is always a number from 0 to 1000).
I know howto find and replace all three lines with:
<source>#(.*)- replace with <source>#$1
<url>{(.*)- replace with <url>{$1
<id>(.*)- replace with <id>$1
But this has to be done in three separate runs.
My question is, is it possible to do a search and replace with just one single find line and one single replace line ?
Regards,
Arjan

You can use use an alternation (using the | pipe operator) to create a single expression that will match all 3 patterns and create a single replacement.
Replacing this pattern:
(?:<source>(?=#)|<url>(?={)|<id>)([^-]+)-
with $1$2 should result in the correct output.
https://regex101.com/r/mS3mP9/3
Analysis of the expression:
( // begin capturing group
<source># // find the opening <source> tag followed by a #
| <url>{ // ...or find the opening <url> tag followed by a {
| <id> // ...or find the opening <id> tag
) // end capturing group
([^-]+) // capture everything that is not a hyphen
- // match and consume the hyphen

It can be done using ^(<(?:id|source|url)>(#|\{)?\w+)- and replacing it with $1 as shown here.

Related

Finding all XML Files containing specific strings using REGEX

I use VSCode for salesforce and I have hundreds of fieldsets in the sandbox, I would like to use REGEX to find all XML files that contains these 2 words in any order:
LLC_BI__Stage__c
LLC_BI__Status__c
I have tried using these REGEX but it did not work, I am assuming because the strings are in different lines:
(?=LLC_BI__Stage__c)(?=LLC_BI__Status__c)
^(?=.*\bLLC_BI__Stage__c\b)(?=.*\bLLC_BI__Status__c\b).*$
(.* LLC_BI__Stage__c.* LLC_BI__Status__c.* )|(.* LLC_BI__Status__c.* LLC_BI__Stage__c.*)
e.g, this XML File contains the 2 strings and should be returned
<displayedFields>
<field>LLC_BI__Amount__c</field>
<isFieldManaged>false</isFieldManaged>
<isRequired>false</isRequired>
</displayedFields>
<displayedFields>
**<field>LLC_BI__Stage__c</field>**
<isFieldManaged>false</isFieldManaged>
<isRequired>false</isRequired>
</displayedFields>
<displayedFields>
<field>LLC_BI__lookupKey__c</field>
<isFieldManaged>false</isFieldManaged>
<isRequired>false</isRequired>
</displayedFields>
<displayedFields>
**<field>LLC_BI__Status__c</field>**
<isFieldManaged>false</isFieldManaged>
<isRequired>false</isRequired>
</displayedFields>
You could use an alternation to find either one of them and according to this post use [\s\S\r] to match any character including newlines.
If there is an issue using [\s\S\r] you migh tuse [\S\r\n\t\f\v ]* instead.
(?:LLC_BI__Stage__c[\S\s\r]*LLC_BI__Status__c|LLC_BI__Status__c[\S\s\r]*LLC_BI__Stage__c)
Explanation
(?: Non capturing group
LLC_BI__Stage__c[\S\s\r]*LLC_BI__Status__c Match first part till second part
| Or
LLC_BI__Status__c[\S\s\r]*LLC_BI__Stage__c Match second part till first part
) Close group
Regex demo 1 and Regex demo 2

How can I delete this part of the text with regex?

I have a problem that I really hope that somebody could help me. So, I want to delete some parts of text from a notepad++ document using Regex. If there's another software that I can use to delete this part of text, let me know please, I am really really noob with regex
So, my document its like this:
1
00:00:00,859 --> 00:00:03,070
text over here
2
00:00:03,070 --> 00:00:09,589
text over here
3
00:00:09,589 --> 00:00:10,589
some numbers here
4
00:00:10,589 --> 00:00:12,709
Text over here
5
00:00:12,709 --> 00:00:18,610
More text with numbers here
What I want to learn is how can I delete the first 2 lines of numbers in all the document? So I could get only the text parts (the "text over here" parts)
I would really appreciate any kind of help!
My solution:
^[\s\S]{1,5}\d{1,3}:\d{1,3}:\d{1,3},\d{1,5}\s-->\s*?\d{1,3}:\d{1,3}:\d{1,3},\d{1,5}\s
This solution match both types: either all data in one line, or numbers in one line and data in the second.
Demo: https://regex101.com/r/nKD0DQ/1/
Simplest solution;
\d+(\r\n|\r|\n)\d{2}:\d{2}.*(\r\n|\r|\n)
Get line with some number \d+ with its line break (\r\n|\r|\n)
Also the next line that starts with two 2-digit numbers and a colon \d{2}:\d{2} with the rest .* and its line break. No need to match all since we already are in the correct line, since subtitle file is defined well with its predictable structure.
Put this as Find what: value in Search -> Replace.. in Notepad++, with Seach Mode: Regular Expression and with replace value (Replace with:) of empty space. Will get you the correct result, lines of expected text with empty line in between each.
to see it on action on regex101
Subtitles, for accuracy you can use this:
\d+(\r\n|\n|\r)(\d\d:){2}\d\d,\d{3}\s*-->\s*(\d\d:){2}\d\d,\d{3}(\r\n|\n|\r)
Check Regular Expression, Find what with this and Replace with empty would do.
Regxe Demo
srt subtitles are basically ordered. And it's better accurate than lose texts.
\d : a single digit.
+ : one or more of occurances of the afore character or group.
\r\n: carriage and return. (newline)
* : zero or more of occurances of the afore character or group.
| : Or, match either one.
{3}: Match afore character or group three times.
I'm going for a less specific regex:
^[0-9]*\n[0-9:,]*\s-->\s[0-9:,]*
Demo # regex101

Remove columns from CSV

I don't know anything about Notepad++ Regex.
This is the data I have in my CSV:
6454345|User1-2ds3|62562012032|324|148|9c1fe63ccd3ab234892beaf71f022be2e06b6cd1
3305611|User2-42g563dgsdbf|22023001345|0|0|c36dedfa12634e33ca8bc0ef4703c92b73d9c433
8749412|User3-9|xgs|f|98906504456|1534|51564|411b0fdf54fe29745897288c6ad699f7be30f389
How can I use a Regex to remove the 5th and 6th column? The numbers in the 5th and 6th column are variable in length.
Another problem is the User row can also contain a |, to make it even worse.
I can use a macro to fix this, but the file is a few millions lines long.
This is the final result I want to achieve:
6454345|User1-2ds3|62562012032|9c1fe63ccd3ab234892beaf71f022be2e06b6cd1
3305611|User2-42g563dgsdbf|22023001345|c36dedfa12634e33ca8bc0ef4703c92b73d9c433
8749412|User3-9|xgs|f|98906504456|411b0fdf54fe29745897288c6ad699f7be30f389
I am open for suggestions on how to do this with another program, command line utility, either Linux or Windows.
Match \|[^|]+\|[^|]+(\|[^|]+$)
Repalce $1
Basically, Anchor to the end of the line, and remove columns [-1] and [-2] (I assume columns can't be empty. Replace + with * if they can)
If you need finer detail then that, I'd recommend writing a Java or Python script to manual parse and rewrite the file for you.
I've captured three groups and given them names. If you use a replace utility like sed or vimregex, you can replace remove with nothing. Or you can use a programming language to concatenate keep_before and keep_after for the desired result.
^(?<keep_before>(?:[^|]+\|){3})(?<remove>(?:[^|]+\|){2})(?<keep_after>.*)$
You may have to remove the group namings and use \1 etc. instead, depending on what environment you use.
Demo
From Notepad++ hit ctrl + h then enter the following in the dialog:
Find what: \|\d+\|\d+(\|[0-9a-z]+)$
Replace with: $1
Search mode: Regular Expression
Click replace and done.
Regex Explain:
\|\d+ : match 1st string that starts with | followed by number
\|\d+ : match 2nd string that starts with | followed by number
(\|[0-9a-z]+): match and capture the string after the 2nd number.
$ : This is will force regex search to match the end of the string.
Replacement:
$1 : replace the found string with whatever we have between the captured group which is whatever we have between the parentheses (\|[0-9a-z]+)

RegExp , Notepad++ Replace / remove several values

I have this dataset: (about 10k times)
<Id>HOW2SING</Id>
<PopularityRank>1</PopularityRank>
<Title><![CDATA[Superior Singing Method - Online Singing Course]]></Title>
<Description><![CDATA[High Quality Vocal Improvement Product With High Conversions. Online Singing Lessons Course Converts Like Crazy Using Content Packed Sales Video. You Make 75% On Every Sale Including Front End, Recurring, And 1-click Upsells!]]></Description>
<HasRecurringProducts>true</HasRecurringProducts>
<Gravity>45.9395</Gravity>
<PercentPerSale>74.0</PercentPerSale>
<PercentPerRebill>20.0</PercentPerRebill>
<AverageEarningsPerSale>74.9006</AverageEarningsPerSale>
<InitialEarningsPerSale>70.1943</InitialEarningsPerSale>
<TotalRebillAmt>16.1971</TotalRebillAmt>
<Referred>75.0</Referred>
<Commission>75</Commission>
<ActivateDate>2011-06-23</ActivateDate>
</Site>
I am trying to do the following:
Get the data from within the tags, and use it to create a URL, so in this example it should make
http://www.reviews.how2sing.domain.com
also, all other data has to go, i want to perform a REGEX function that will just give me a list of URLS.
I prefer to do it using notepad++ but i suck at regex, any help would be welome
To keep the regex relatively simple you can just use:
.*?<id>(.+?)</id>
Replace with:
http://www.reviews.\1.domain.com\n
That will search and replace all instances of Id tag and preceding text. You can then just remove the last manually.
Make sure matches newline is selected.
Regex is straightforward, only slightly tricky part is that it uses +? and *? which are non-greedy. This prevents the whole file from being matched. The () indicate a capture group that is used in the replacement, i.e. \1.
If you want to a regex that will include replacing the last part then use:
.*?(?:(<id>)?(.+?)</id>).+?(?:<id>|\Z)
This is a bit more tricky, it uses:
?:. A non-capturing group.
| OR
\Z end of file
Basically, the first time it will match everything up to the end of the first </id> and replace up to and including the next <id>. After that it will have replaced the starting <id> so everything before </id> goes in the group. On the last match it will match the end of file \Z.
If you only want the Id values, you can do:
'<Id>([^<]*)<\/Id>'
Then you can get the first captured group \1 which is the Id text value and then create a link from it.
Here is a demo:
http://regex101.com/r/jE9qN8
[UPDATE]
To get rid of all other lines, match this regex: '.*<Id>([^<]*)<\/Id>.*' and replace by first captured group \1. Note for the regex match, since there are multiple lines, you will need to have the DOTALL or /s flag activated to also match newlines.
Hope that helps.

Remove all lines that don't match regex in Notepad++

I have a range of files of a specific format. I have pasted an example here.
----------------------------------------------
Attempting to factor n = 160000000000110400000000018963... Found a factor: 400000000000129
Time (s): 18.9561
----------------------------------------------
Attempting to factor n = 164025000000137700000000028819... Found a factor: 405000000000179
Time (s): 22.3426
----------------------------------------------
Attempting to factor n = 168100000000155800000000036051... Found a factor: 410000000000197
Time (s): 101.183
I would like a regular expression that I can use to capture the times, e.g. for all the lines with format "Time (s): X.Y" I want to keep X.Y on a seperate line, and throw EVERYTHING ELSE away.
I have the following expression: Time (s):\s+(\d+.\d+), which captures these. This captures the lines I need, but Notepad++ only seems to have functionality to replace with something, not save what it matches. So I can remove all those lines, which is nearly the opposite of what I want.
Any help?
Well I don't know Noteplad++ but its likely that you can use the result of capture groups in the replacement field. Either try
\1
or
$1
1 = first capture group. So you basically replace the whole line with \2 in your case.
Use this on the command line:
for /f "usebackq tokens=3" %a in (`findstr /b "Time" 1.txt`) do #echo %a
Follow next steps (Notepad++ 6.2.3):
Clean and mark
Replace: ^(Time \(s\):)+ ([.\r]*) with: #\2
Remove unmarked lines
Replace: ^[^#]+[.\n]* with: (empty)
Remove mark
Replace: ^#(.*) with: \1
Use the following expression to match the entire line:
.*\(s\)\:\s+(\d+.\d+)
Now you can replace this with
\1
which gives you the matched group number 1 (the only group in the above expression) that matches the time
Adjust your regular expression so it either matches a "Time" line and captures the time within, or matches the whole line. Then replace with the captured text, which will be blank for ignored lines.
Find what: (Time \D+(\d+.\d+)|.*)
Replace with: \2
This leaves you with a sequence of captured times plus blank lines, which can be removed using TextFX's Remove blank lines, or Extended Replace on "\r\n\r\n".
Similar to MaurizioRam's answer (which lead me to figuring out this answer), you can take advantage of the "Mark" tab in the Find window.
As you probably know Ctrl+F opens a window with Find and Replace tabs. It also has tabs Find In Files, Find In Projects, and Mark.
Mark will let you add a special highlight (a mark) to everything your regex matches, by pressing "Mark All".
After pressing "Mark All" you can "Copy Marked Text" which will copy everything that your regex matched into your clipboard.
You can now paste this into a new file, which will give you a file with only the text your regex matched.