How can find the regex for separating strings - regex

I have this file
xorg-fonts-misc-1.0b-1
Xorg-font-bitstream-75dpi-1.0.0-2.i386
Xorg-font-bitstream-100dpi-1.2a-2.arm
Other-Third-Party-1.2.2-1-any
i want to separate and want output like this
xorg-fonts-misc- 1.0b-1
Xorg-font-bitstream-75dpi- 1.0.0-2.i386
Xorg-font-bitstream-100dpi- 1.2a-2.arm
Other-Third-Party- 1.2.2-1-any
I tried this
-[^a-zA-Z][0-9\.\w-]+[^a-zA-Z][\w-]*?[\d\w]*\n

this will put your text into two matching groups. I put spaces between the two groups, but you can put tabs or whatever else in there if you want
/^(.*?)((\d[a-z]?\.)+.*)$/\1 \2/gmi
regex101 is a great place to test out regexes. That link has the regex and your test input, and gives a full explaination of how the regex works

Related

Regex to remove hashtags but to keep first hashtag

I want to remove all hashtags from a text but it should keep the first hashtag
Example text:
This is an example #DoNotRemoveThis #removethis #removethis #removethis
Expected result:
This is an example #DoNotRemoveThis
I'm using this
\#\w+\s?
but it remove all the hashtags. I want to keep the first hashtag
This may require further knowledge as to what flavour of regex you are using. For example, if .NET (C#) you can do variable length look-behind, and thus the following pattern will do what you need:
(?<=#.*)(#\w+)/g
Test at Regex101
However, this won't work in most other engines.
It sounds to me like you want to match everything up to but not including the second hash symbol, right?
/^[^#]*#[^#]*/
That will match any number of non-hash characters, then a hash character, then more non-hash characters.

Remove texts based on pattern

I have a file with lots of URLs:
domain1.com/blue
domain1.com/blue/
domain2.com/red
domain2.com/red/
...
[etc]
Is there a way for me to use Regex formula to keep ONLY the "domain1.com/blue" type of text, but DELETE "domain1.com/blue/"?
The pattern is that all these URLs' end have the first part the same, but at the end some have a "/"; basically i want to remove all the URLs that have the "/" at the end but keep the ones without "/" at the end.
In the end the the file should only contain these:
domain1.com/blue
domain2.com/red
...
[etc]
Thank you so much for the help! If anyone has an idea how to do this, it'd be awesome!
There are two things that you could do that I can think of.
1, Match all the lines in the file that do not satisfy your pattern and replace it with a single new line.
The regex for this /^.*\/$/ and then replace with whatever.
2, Match only the lines you want to keep and save them to a new file.
The regex /^((?!(\/$)).)*$/
you can paste these into a regex translator for in depth explanation as to what they're doing
Unfortunately, you did not specify which language you are using with regex, so I can't give you language-specific details. But a line that ends with / followed by possibly one or more white space characters can be tested for with the following regular expression:
/\/\s*$/
So read in each line and test it against the above regex. It it matches, do not write it out to the new file.
See Regex Demo

Regex Match That doesn't contain some text

I am tring to create a regex that finds a Start Prefix and an End Prefix that have paragraph tags between them. But the one i have cteated is not working to my expectations.
%%%HL_START%%%(.*?)</p><p>(.*?)%%%HL_END%%%
Correctly Matches
<p>This Should %%%HL_START%%%Work</p><p>This%%%HL_END%%% SHould Match</p>
This also matches but i dont want it to match becasue the </p><p> is not in bettween the Start and End Prefix
<p>%%%HL_START%%%One%%%HL_END%%% Some More Text %%%HL_START%%%Here%%%HL_END%%%</p><p>Some more text %%%HL_START%%%Here%%%HL_END%%%</p>
I'm not entirely comfortable that regex is the right solution here; if you are getting into nested start and stop markers, you might not have a regular language...
For this specific example, try changing the regex to use [^%] instead of . so that the .*?matching can't go past the %%%%H:_END%%%%
%%%HL_START%%%([^%]*?)</p><p>([^%]*?)%%%HL_END%%%

replacing all open tags with a string

Before somebody points me to that question, I know that one can't parse html with regex :) And this is not what I am trying to do.
What I need is:
Input: a string containing html.
Output: replace all opening tags
***<tag>
So if I get
<a><b><c></a></b></c>, I want
***<a>***<b>***<c></a></b></c>
as output.
I've tried something like:
(<[~/].+>)
and replace it with
***$1
But doesn't really seem to work the way I want it to. Any pointers?
Clarification: it's guaranteed that there are no self closing tags nor comments in the input.
You just have two problems: ^ is the character to exclude items from a character class, not ~; and the .+ is greedy, so will match as many characters as possible before the final >. Change it to:
(<[^/].+?>)
You can also probably drop the parentheses and replace with $0 or $&, depending on the language.
Try using: (<[^/].*?>) and replace it with ***$1

regex optional part in prefix, but do not include it in matches if it present

Problem is easier to be seen in code then described I got following regex
(?<=First(Second)?)\w{5}
and following sample data
FirstSecondText1
FirstText2
I only want matches Text1 & Text2 , I get 3 though, Secon is added, and I don't want that.
Played around, cant seem to get it to work.
You need an additional negative lookahead:
(?<=First(Second)?)(?!Second)\w{5}
If you want to avoid using Second twice, you could do it without lookaround and take the result of the first capturing group:
First(?:Second)?(\w{5})
You can try this regex (?<=First(Second)?)\w{5}$. All you have to do is to add a $ in the end so that the regex would not match the text Secon. You can use this as long as you are sure of the pattern that comes at the end of the input text. In this case it is \w{5}$