Regular expression everything after a link - regex

I am a complete regular expression idiot, just keep that in mind :)
I am trying to create a regular expression that will match link:xxxxxx where everything after link: is a wildcard.
Can i just do link:* or am I totally misguided?

link:.* should work correctly.
. matches any character, and you want to repeat it "0 to unlimited" times so you add *.
If you're new to regex, a good way to learn it is by using regex101.
For your problem, you can check out this regex101 example
(Note that I have also added the g modifier, which means that you want to select all matches, not just the first matching line)

Related

Regular expression for duplicate string

Hello I am trying to formulate the regular expression to find substring and replace portion of that string. I have input in the format
Some_text_beginning_AASHISH_XX_YY_COPY_COPY_COPY_COPY
Please see that every string will have word AASHISH and in the end there could be indeterminate number of COPY. I want to delete all the COPY
I wrote the regular expression as
(.*)_AASHISH_(.*)_COPY+
I could find all the valid expression with this. But when I try to replace it with
$1_AASHISH_$2
It replaces just the last _COPY All the _COPY which came before last one are taken to be in group 2.
Further see that I am not using any programming language. I am using some third party tool. All it allows me is to search for string and replace it. It allows me to write regular expression.
Just to clarify why this question is not the same as posted before, tool I am using does not allow me use all regular expression somehow. I dont know how that tool is created. I just have UI.
Thanks in advance
Here's a regex that will capture the whole portion you want to maintain, resulting in a replacement that's just $1.
(.*_AASHISH_.*?)(?:_COPY)+
A few notes:
.*? - The ? on the end makes the repetition operator * non-greedy. It will match the minimum characters given its context.
(?:_COPY) - The ?: prefix makes this a non-capturing grouping.
+ - The repetition operator will make the entire last group (_COPY) repeat 1 or more times, not just the Y.

matching in between a long sentence with keywords

target sentence:
$(SolDir)..\..\ABC\ccc\1234\ccc_am_system;$(SolDir)..\..\ABC\ccc\1234\ccc_am_system\host;$(SolDir)..\..\ABC\ccc\1234\components\fds\ab_cdef_1.0\host; $(SolDir)..\..\ABC\ccc\1234\somethingelse;
how should I construct my regex to extract item contains "..\..\ABC\ccc\1234\ccc_am_system"
basically, I want to extract all those folders and may be more, they are all under \ABC\ccc\1234\ccc_am_system:
$(SolDir)..\..\ABC\ccc\1234\ccc_am_system\host\abc;
$(SolDir)..\..\ABC\ccc\1234\ccc_am_system\host\123\123\123\123;
$(SolDir)..\..\ABC\ccc\1234\ccc_am_system\host;
my current regex doesn't work and I can't figure out why
\$.*ccc\\1234\.*;
Your problem is most likely that * is a greedy operator. It's greedily matching more than you intend it to. In many regex dialects, *? is the reluctant operator. I would first try using it like this:
\$.*?ccc\\1234.*?;
You can read up a bit more on greedy vs reluctant operators in this question.
If that doesn't work, you can try to be more specific with the characters you match than .. For example, you can match every non-semicolon character with an expression like this: [^;]*. You could use that idea this way:
\$[^;]*ccc\\1234[^;]*;
The below regex would store the captured strings inside group 1.
(\$.*?ccc\\1234\\.*?;)
You need to make the * quantifier to does a shortest match by adding ? next to * . And also this \.* matches a literal dot zero or more times. It's wrong.
DEMO
I found this to be the best:
\$(.[^\$;])*ccc\\1234(.[^\$;])*;
it doesn't allow any over match whatsoever, if I use ?, it still matches more $ or ; more than once for some reason, but with above expression, that will never be case. Still thanks to all those who took the time to answer my question,.

A regular expression that matches two long strings and ignores everything in between

I am searching through a 1.5 million line Premiere Pro project for any text that matches one of my audio filters and is set to mono.
Text that I am searching for begins with the <ChannelType> tag and ends with the <FilterMatchName>Tags. So it would looks like this
<ChannelType>0</ChannelType>
<FrameRate>5292000</FrameRate>
</AudioComponent>
<FilterPreset>0</FilterPreset>
<OpaqueData Encoding="base64" Checksum="53060659">AAAAAD8L8lo+AUr+Pac1NjwTmoUAAAAAP0uQDD37nIg9ui6MPjwU5j+AAAA+C/JaAAAAAD8qqqsAAAAAP4AAAD92L8w9py8FAAAAAHNvZnQgY29tcHJlc3Npb24AIiBkZWZhdWx0PSIwIiBzdGVwPSIxIiBtaW49IjAiIG1heD0iMSIvPgoJICA8Zmw=</OpaqueData>
<FilterIndex>-1</FilterIndex>
<FilterMatchName>1094998321 Dynamics1</FilterMatchName>
If I were in a Word doc, I would just do a find as
<ChannelType>0</ChannelType>*<FilterMatchName>1094998321 Dynamics1</FilterMatchName>
I am terrible with Regex. I was hoping someone could help me out. Everything I have tried either doesn't match anything, or matches EVERYTHING in the document. I am using Notepad++.
Since you are working in Notepad++, you have access to PCRE regular expressions. This one will get all the text between <ChannelType> and </FilterMatchName>
(?s)<ChannelType>.*?</FilterMatchName>
the (?s) allows the . to match newline characters
After matching <ChannelType>, the .*? lazily matches all characters up to...
the closing </FilterMatchName>, which we match.
Let me know if you have any questions. :)
What type of regular expressions are you using (which language/library)?
Basically you can use .* instead of * in regular expressions. IF your text is long though, it's better to use a Reluctant quantifier[1] if your re implementation allows it.
This is a good site with comparison of different re implementations and tutorials:
http://www.regular-expressions.info
[1] http://docs.oracle.com/javase/tutorial/essential/regex/quant.html

capture with if-then-else in php regex

I'm very lost with a regular expression. It's just black magic to me. Here's what i need:
there is a filename: some_file.jpg
it might be in the following format: some_file_p250.jpg
the regex to match the file in simple format: /^([a-zA-Z_-0-9]+).(jpg|jpeg|png)$/
the regex to match the file in advanced format: /^([a-zA-Z_-0-9]+)(_[a-z]?[0-9]{2,3}).(jpg|jpeg|png)$/
my question is as follows: how do i make the "(_[a-z]?[0-9]{3,4})" part optional? I've tried adding a question mark to the second group like this:
/^([a-zA-Z_\-0-9]+)(_[a-z]?[0-9]{3,4})?\.(jpg|jpeg|png)$/
Even though the pattern works, it always captures the contents of the second group in the first group and leaves the second empty.
How can i make this work to capture the filename, advanced part (_p250) and the extension separately? I'm thinking it has something to do with the greediness of the first group, but i might be completely wrong and even if i'm right, i still don't know how to solve it.
Thanks for your thoughts
Adding a question mark after the first plus will make the first capturing expression non-greedy. This worked for me using your test case:
/^([a-zA-Z_\-0-9]+?)(_[a-z]?[0-9]{3,4})?\.(jpg|jpeg|png)$/
I tested in Javascript, not PHP, but here's my test:
"some_file_p250.jpg".match(/^([a-zA-Z_\-0-9]+?)(_[a-z]?[0-9]{3,4})?\.(jpg|jpeg|png)$/)
and my results:
["some_file_p250.jpg", "some_file", "_p250", "jpg"]
In my experience, making a capturing expression non-greedy makes regular expressions a lot more intuitive and will often make them work the way I expect them to work. In your case, it was doing what you suspected; the first expression was capturing everything and never gave the second expression a chance to capture anything.
I think this is what you want:
/^([a-zA-Z_\-0-9]+)(|_[a-z]?[0-9]{3,4})?\.(jpg|jpeg|png)$/
or
/^([\d\w\-]+)(|_[a-z]?[0-9]{3,4})\.(jpg|jpeg|png)$/

is it the right reqular expression

i have following regular expression but it's not working properly it takes only three values after # sign but i want it to be any number length
"/^[a-zA-Z0-9_\.\-]+\#([a-zA-Z0-9\-]+\.)+[a-zA-Z0-9]{2,4}$/"
this#thi This is validated
this#this It is not validating this expression
Can you please tell me what's the problem with the expression...
Thanks
If you want your regex to match "any number length" then why are you using {2,4}?
I think a better example of the strings you're trying to match might give others a better idea of what you want, because based on your regex it is a bit confusing what you're looking for.
Try this:
^[a-zA-Z0-9_.-]+#([a-zA-Z0-9-]+\.)+[a-zA-Z0-9]{2,4}$
The main problem is that you didn't escape the dot: \.. In regular expression the dot matches everything (mostly), making your regex quite liberal.