How to regex replace all characters of specific type in lines that have a specific word - regex

There is a multiline text, in which there are specific lines that i'm interested in indicated by specific words. For example i am interested in the lines that have ".jpg" in them.
I'm trying to use a lookahead:
(?=\.jpg)
In these lines i would like to delete specific characters, for example all matches of "_"
Sample input:
https://somewebpage/stuff1_stuff2_stuff3.jpg
Desired output:
https://somewebpage/stuff1stuff2stuff3.jpg
I'm trying to write this regex for latest notepad++
My problem is that i can't seem to properly combine the positive lookahead with my regex recursively
([^_]*)(_?)
Any help is appreciated.

[_-](?=.*\.jpg) worked for me. replace with empty string to remove the characters or just do a find. you can expand your character list of course, but I think this covers you.

Related

Regex for extracting each word between hyphens

I am learning regex and trying to write a pattern that exactly matches each of the strings without'-' so that I can iterate for each of the groups and print the respective strings.
I have a string that looks like "Abcd001-wd2s-vwe1-20180e3103.txt"
I was able to write a regex for extracting Abcd001, wd2s and .txt from above text as shown below
(\A[^-]+)=> Abcd001
(-[^-]+-)=> wd2s
(\..*)=>.txt
However, I was unable to come up with the correct pattern for extracting the exact strings vwe1 and 20180e3103
It will be really helpful if you can guide me on this or if there is a better approach to achieve this?
Please note: [^-.]+ may give me all the words separately but I am looking for an option where I have a group defined for each of these strings so that its one to one mapping.
Thanks!
To get vwe1 or 20180e3103 from the example data, you might use a quantifier {2} or {3} to repeat matching one or more word charcters followed by a hyphen (?:\w+-){2}.
Then you could capture in a group ([^-.]+) matching not a hyphen or a dot.
(?:\w+-){2}([^-.]+)
Try the below regex
/\-([^\)]+)\-/gmi;
Also check the similar implementation:
https://stackoverflow.com/a/50336050/8179245

How can I extract an exact set of words from a string using a regex?

I've looked everywhere and haven't been able to find a question that answers this specific use case (maybe I've missed it). But basically I'm wanting to extract the following text from a string: Welcome James:
This text must be at the start of the string, e.g:
Welcome James: Now some text follows...blahblah - This would be a match
However
This is some text Welcome James: some more text... - This would not be a match.
So basically I'd hard code Welcome James: into the regex (I don't need any other variables of Welcome <name>:.
Is this possible? All I've been able to find is regexes that match single words without spaces or characters.
To search at the start of a string, just prefix the regex with the ^ (caret) character:
/^Welcome James/
Here is the answer :) But #charles gave it too!
^(Welcome James)

Regular expression to remove lines containing word with exceptions

I am using this regex in PowerGrep, (this regex search for strings LAB RAD TRAN)
.*((LAB)|(RAD)|(TRAN)).*\r\n
to search and remove lines in plain text that contains strings or part of a string and it works great.
Now I need something more. I want to keep the word LABER, but remove every other string containing LAB, such as LABOR, LAB1, ALAB, ALABA, etc.
Is there a way to "protect" a string LABER and remove every other string containing LAB?
Tried to alter the above regex using * but it always includes the word LABER that I need to keep. Any solution?
I think PowerGrep supports lookaround assertions; if so, this should work:
.*((LAB(?!ER\b))|(RAD)|(TRAN)).*\r\n
Although that will keep anything ending with LABER, not just the whole word.
You can add exclusions to regex in the form by means of a look-ahead:
(?m)^.*(LAB(?!(?:ER|OV)\b)|RAD|TRAN).*$
The (?!(?:ER|OV)\b) lookahead will check if the sequence LAB is not followed by ER or OV and a word boundary.
I am adding the alternation into look-ahead because your ask to "protect" LABER and LABOV.
Also, since you are looking for whole lines, you can make use of the multiline mode (?m) and ^/$ anchors.

How to evaluate a RegExp in an array with match groups?

I need to parse an array-like text with regular expression and get the match groups.
One example of then text I want to parse is this:
['red','green', 'blue']
I want to use match groups, because I want to extract them.
I am using this regular expression, but the groups found by it are not like what I expected:
\[ *('.+?')( *, *('.+?'))* *\]
The idea is to parse in this order:
A square bracket
Any number of spaces
A group with:
Single quote
Any character
Single quote
Zero or more groups of:
Any number of spaces
A comma
Any number of spaces
A group with
Single quote
Any character
Single quote
Any number of spaces
A square bracket
And get one group with each parsed array element.
Can you help me?
Hint: a easy way to test regexp is the site http://rubular.com
This isn't going to be a totalitarian answer, but I'm fairly certain you can't whitespace check by doing " *", at least it may depend on the language you're using.
Here's a C# regex example that shows some of the language requirements to check for whitespace: regex check for white space in middle of string
Edit: I see you added Ruby as your language, unfortunately I'm not verbose in Ruby so specifics I cannot help you with, sorry.
Edit2: Seeing as you're forcing yourself into Ruby to debug your regex statement, might I suggest: http://www.debuggex.com/ which tries to stay language independent?
Try this regex: '([^']+)', it should give you the following match groups red, green, blue according to rubular.com
You can match an arbitrary number of groups with one regex:
^\[\s*|(?:\G'([^']+)'\s*(?:,\s*|]$))+
or like this (should be more performant):
^\[\s*+|(?>\G'([^']++)'\s*+(?>,\s*+|]$))++
This work in ruby like asked before, in delphi I don't know.

Notepad++ Regex: Find all 1 and 2 letter words

I’m working with a text file with 200.000+ lines in Notepad++. Each line has only one word. I need to strip out and remove all words which only contains one letter (e.g.: I) and words which contains only two letters (e.g.: as).
I thought I could just pas in regular regex like this [a-zA-Z]{1,2} but I does not recognize anything (I’m trying to Mark them).
I’ve done manual search and I know that there do exists words of that length so therefor can it only be my regex code that’s wrong. Anyone knows how to do this in Notepad++ ???
Cheers,
- Mestika
If you want to remove only the words but leave the lines empty, this works:
^[a-zA-Z]{1,2}$
Replace this with an empty string. ^ and $ are anchors for the beginning and the end of a line (because Notepad++'s regexes work in multi-line mode).
If you want to remove the lines completely, search for this:
^[a-zA-Z]{1,2}\r\n
And replace with an empty string. However, this won't work before Notepad++ 6, so make sure yours is up-to-date.
Note that you will have to replace \r\n with the specific line-endings of your file!
As Tim Pietzker suggested, a platform independent solution that also removes empty lines would be:
^[a-zA-Z]{1,2}[\r\n]+
A platform-independent solution that does not remove empty lines but only those with one or two letters would be:
^[a-zA-Z]{1,2}(\r\n?|\n)
I don't use Notepad++ but my guess is it could be because you have too many matches - try including word boundaries (your exp will match every set of 2 letters)
\b[a-zA-Z]{1,2}\b
The regex you specified should find 1-or-2 characters (even in Notepad++'s Find-dialog), but not in the way you'd think. You want to have the regex make sure it starts at the beginning of the line and ends at the end with ^ and $, respecitevely:
^[a-zA-Z]{1,2}$
Notepad++ version 6.0 introduced the PCRE engine, so if this doesn't work in your current version try updating to the most recent.
You seem to use the version of Notepad++ that doesn't support explicit quantifiers: that's why there's no match at all (as { and } are treated as literals, not special symbols).
The solution is to use their somewhat more lengthy replacement:
\w\w?
... but that's only part of the story, as this regex will match any symbol, and not just short words. To do that, you need something like this:
^\w\w?$