Remove texts based on pattern - regex

I have a file with lots of URLs:
domain1.com/blue
domain1.com/blue/
domain2.com/red
domain2.com/red/
...
[etc]
Is there a way for me to use Regex formula to keep ONLY the "domain1.com/blue" type of text, but DELETE "domain1.com/blue/"?
The pattern is that all these URLs' end have the first part the same, but at the end some have a "/"; basically i want to remove all the URLs that have the "/" at the end but keep the ones without "/" at the end.
In the end the the file should only contain these:
domain1.com/blue
domain2.com/red
...
[etc]
Thank you so much for the help! If anyone has an idea how to do this, it'd be awesome!

There are two things that you could do that I can think of.
1, Match all the lines in the file that do not satisfy your pattern and replace it with a single new line.
The regex for this /^.*\/$/ and then replace with whatever.
2, Match only the lines you want to keep and save them to a new file.
The regex /^((?!(\/$)).)*$/
you can paste these into a regex translator for in depth explanation as to what they're doing

Unfortunately, you did not specify which language you are using with regex, so I can't give you language-specific details. But a line that ends with / followed by possibly one or more white space characters can be tested for with the following regular expression:
/\/\s*$/
So read in each line and test it against the above regex. It it matches, do not write it out to the new file.
See Regex Demo

Related

Perl, replace multiple matches in string

So, i'm parsing an XML, and got a problem. XML has objects containing script, which looks about that:
return [
['measurement' : org.apache.commons.io.FileUtils.readFileToByteArray(new File('tab_2_1.png')),
'kpi' : org.apache.commons.io.FileUtils.readFileToByteArray(new File('tab_2_2.png'))]]
I need to replace all filenames, saving file format, every entry of regexp template, because string can look like that:
['measurement' : org.apache.commons.io.FileUtils.readFileToByteArray(new File('tab_2_1.png'))('tab_2_1.png'))('tab_2_1.png')),
and i still need to replace all image_name before .png
I used this regexp : .*\(\'(.*)\.png\'\),
but it catches only last match in line before \n, not in whole string.
Can you help me with correcting this regexp?
The problem is that .* is greedy: it matches everything it can. So .*x matches all up to the very last x in the string, even if all that contains xs. You need the non-greedy
s/\('(.*?)\.png/('$replacement.png/g;
where the ? makes .* match up to the first .png. The \(' are needed to suitably delimit the pattern to the filename. This correctly replaces the filenames in the shown examples.
Another way to do this is \('([^.]*)\.png, where [^.] is the negated character class, matching anything that is not a .. With the * quantifier it again matches all up to the first .png
The question doesn't say how exactly you are "parsing an XML" but I dearly hope that it is with libraries like XML::LibXML of XML::Twig. Please do not attempt that with regex. The tool is just not fully adequate for the job, and you'll get to know about it. A lot has been written over years about this, search SO.

Regex to match "Warm Regards"-type email signatures

I am an absolute regex noob and have been banging my head against the wall trying to write a regex to remove email signatures from a string that look like this:
Hi There, this is an email.
Warm Regards,
Joe Bloggs
Thus far, I’ve tried variations on:
/^[\w |][R|r]egards,/
The regex should:
look at the beginning of the line (what I was aiming for with the ^,
cover variations like “Warm Regards”, “Kind Regards”, “Best Regards”, and plain old “Regards” (which I was hoping to accomplish with the [\w |] to match any word or blank and the [R|r] to cover Regards/regards),
be OK with mixed case like “warm regards” or “Warm Regards”, and
only pickup lines that are [word] Regards or just regards, so that we don’t grab email body that has the word “regards” somewhere in it.
This seems elementary, but I just can’t nail it, and I seem to err on broadening my regex too much such that any line that contains “regards” gets picked up. I’m doing this in Node.js combined with the string.search function if that matters.
This seems to fit all your requirements:
^(\w*\s)?[r|R]egards,?
Has to start on a new line, then can have any word followed by a space, and the word regards, or just the word regards, with the comma also being optional.
If you want to wipe out everything after the regards line as well you can add in \s*.*
^(\w*\s)?[r|R]egards,?\s*.*
If you are trying to remove everything from the Warm Regards line on, this should do it
^[^<]*?(?=(.*)[R|r]egards)
Try the following regular expression
^\w* ?regards,?
with the case insensitive & global flag specified.
You can see the regular expression explanation and what it matches here: http://regex101.com/r/vR3zG5
The regular expression that matches signatures defined in #1-#4 is following:
/^(\w+ +)?regards,? *$/im
How it works:
"^" in the beginning means new line
"(\w+ +)?" means optional segment that contains exactly one word followed by at least one space
"regards" is just a simple match
",?" optional comma at the end
" *" - the line may contain trailing spaces (it may be useful to put the same match after ^)
"$" - end of line
/.../i - means that the expression is case-insensitive
/.../m - means that ^ and $ match at line breaks

Regex Match That doesn't contain some text

I am tring to create a regex that finds a Start Prefix and an End Prefix that have paragraph tags between them. But the one i have cteated is not working to my expectations.
%%%HL_START%%%(.*?)</p><p>(.*?)%%%HL_END%%%
Correctly Matches
<p>This Should %%%HL_START%%%Work</p><p>This%%%HL_END%%% SHould Match</p>
This also matches but i dont want it to match becasue the </p><p> is not in bettween the Start and End Prefix
<p>%%%HL_START%%%One%%%HL_END%%% Some More Text %%%HL_START%%%Here%%%HL_END%%%</p><p>Some more text %%%HL_START%%%Here%%%HL_END%%%</p>
I'm not entirely comfortable that regex is the right solution here; if you are getting into nested start and stop markers, you might not have a regular language...
For this specific example, try changing the regex to use [^%] instead of . so that the .*?matching can't go past the %%%%H:_END%%%%
%%%HL_START%%%([^%]*?)</p><p>([^%]*?)%%%HL_END%%%

regular expression for multiple filenames

I have some files like that
15.58.55.ser 16.22.20.ser 16.36.23.ser 16.40.13.ser 16.59.41.ser 17.05.08.ser 17.14.40.ser 18.14.40.ser 18.20.43.ser
I want to replace these filenames with the following format
image_1.ser image_2.ser ....
I don't know how to achieve it.
please give me some advice.
The regex is quite simple:
(?:\d{2}\.){3}ser
It matches two digits \d{2} and a dot \. three times {3}, ending in ser.
You can see from RegExr that is matches all of your test cases.
However, in order to know how to do the replacement, you'd have to specify a language that you're working with.
Try this(If you need Java code)
String regex = "\\.ser";
fileName = "15.58.55.ser";
System.out.println(filename.replaceAll(fileName.split(regex)[0], "image_1"));
This is just for only one entry. If you want to replace multiple files, do it in For loop or whatever

replacing all open tags with a string

Before somebody points me to that question, I know that one can't parse html with regex :) And this is not what I am trying to do.
What I need is:
Input: a string containing html.
Output: replace all opening tags
***<tag>
So if I get
<a><b><c></a></b></c>, I want
***<a>***<b>***<c></a></b></c>
as output.
I've tried something like:
(<[~/].+>)
and replace it with
***$1
But doesn't really seem to work the way I want it to. Any pointers?
Clarification: it's guaranteed that there are no self closing tags nor comments in the input.
You just have two problems: ^ is the character to exclude items from a character class, not ~; and the .+ is greedy, so will match as many characters as possible before the final >. Change it to:
(<[^/].+?>)
You can also probably drop the parentheses and replace with $0 or $&, depending on the language.
Try using: (<[^/].*?>) and replace it with ***$1