Little vim regex - regex

I have a bunch of strings that look like this: '../DisplayPhotod6f6.jpg?t=before&tn=1&id=130', and I'd like to take out everything after the question mark, to look like '../DisplayPhotod6f6.jpg'.
s/\(.\.\.\/DisplayPhoto.\{4,}\.jpg\)*'/\1'/g
This regex is capturing some but not all occurences, can you see why?

\.\{4,} is trying to match 4 or more . characters. What it looks like you wanted is "match 4 or more of any character" (.\{4,}) but "match 4 or more non-. characters" ([^.]\{4,}) might be more accurate. You'll also need to change the lone * at the end of the pattern to .* since the * is currently applying to the entire \(\) group.

I think the easyest way to go for this is:
s/?.*$/'/g
This says: delete everything after the question mark and replace it with a single quote.

I would use macros, sometime simpler than regexp (and interactive) :
qa
/DisplayPhoto<Enter>
f?dt'
n
q
And then some #a, or 20000#a to go though all lines.

The following regexp: /(\.\./DisplayPhoto.*\.jpg)/gi
tested against following examples:
../DisplayPhotocef3.jpg?t=before&tn=1&id=54
../DisplayPhotod6f6.jpg?t=before&tn=1&id=130
will result:
../DisplayPhotocef3.jpg
../DisplayPhotod6f6.jpg

%s/\('\.\.\/DisplayPhoto\w\{4,}\.jpg\).*'/\1'/g
Some notes:
% will cause the swap to work on all lines.
\w instead of '.', in case there are some malformed file names.
Replace '.' at the start of your matching regex with ' which is exactly what it should be matching.

Related

RegEx that matches characters after semicolon in the same line

I need some help with the Regular Expressions. I need a RegEx that matches with characters if they are after a semicolon AND in the same line of a previous word.
Let me explain that:
I need something like this. I have to make a function that does not allow to introduce character after a semicolon in the same line, and I think I could do it with this sort of RegEx.
Thank you.
I am not sure I understood your question, but would something like this help?
This regular expression
Well, you've got two ways to do it:
A: Create a regular expression to validate correct input.
B: Create a regular expression to find incorrect input.
I would use option 1, but it depends on what you need to do.
A: Regex to validate correct lines
In this case, we'll use the m modifier to set the regex engine to search by line (m = multiline). This means that ^ matches the beginning of a line and $ matches the end of a line.
Then we want to match some characters which are not the semicolon itself. To do this we use the [^ ] group meaning "anything which is not in the provided list of characters". So to say any char except the semicolon we'll have to use [^;].
Now, this char is not alone as they'll be probably many of them. To do that we can either use the * or + operators that respectively mean "0 or more times" and "1 or more times". If the data before the semicolon is mandatory then we'll use the + operator. This leads to [^;]+ to say any char which is not a semicolon, 1 or more times.
Then we'll capture this with the () operators. This will let us have direct access to this value without having to take the line and remove the semicolon with a truncation by our own.
After this capturation, we have the semicolon and then maybe some empty spaces or not and then the end of the line. For the spaces after, it's up to you. It would be \s* to say any kind of space, tab or blank char 0 or n times.
At the end we get this regex: ^([^;]+);\s*$ with the m and g flags
m for multiline and g for global, which means don't stop at the first match but look for all of them.
Test it here: https://regex101.com/r/sT59eu/1/
B: Regex to find invalid lines
Well, this could be rather easy too: ;.+$
. means any char. So here we'll find the lines with something behind the semicolon.
Test it here: https://regex101.com/r/ocDofm/1/
But you will NOT find lines with missing semicolons!
if I understand it correctly,
(?<=;)[A-Za-z]+
might does your work.
The python documentation is helpful: https://docs.python.org/3/library/re.html

Regex - Skip characters to match

I'm having an issue with Regex.
I'm trying to match T0000001 (2, 3 and so on).
However, some of the lines it searches has what I can describe as positioners. These are shown as a question mark, followed by 2 digits, such as ?21.
These positioners describe a new position if the document were to be printed off the website.
Example:
T123?214567
T?211234567
I need to disregard ?21 and match T1234567.
From what I can see, this is not possible.
I have looked everywhere and tried numerous attempts.
All we have to work off is the linked image. The creators cant even confirm the flavour of Regex it is - they believe its Python but I'm unsure.
Regex Image
Update
Unfortunately none of the codes below have worked so far. I thought to test each code in live (Rather than via regex thinking may work different but unfortunately still didn't work)
There is no replace feature, and as mentioned before I'm not sure if it is Python. Appreciate your help.
Do two regex operations
First do the regex replace to replace the positioners with an empty string.
(\?[0-9]{2})
Then do the regex match
T[0-9]{7}
If there's only one occurrence of the 'positioners' in each match, something like this should work: (T.*?)\?\d{2}(.*)
This can be tested here: https://regex101.com/r/XhQXkh/2
Basically, match two capture groups before and after the '?21' sequence. You'll need to concatenate these two matches.
At first, match the ?21 and repace it with a distinctive character, #, etc
\?21
Demo
and you may try this regex to find what you want
(T(?:\d{7}|[\#\d]{8}))\s
Demo,,, in which target string is captured to group 1 (or \1).
Finally, replace # with ?21 or something you like.
Python script may be like this
ss="""T123?214567
T?211234567
T1234567
T1234434?21
T5435433"""
rexpre= re.compile(r'\?21')
regx= re.compile(r'(T(?:\d{7}|[\#\d]{8}))\s')
for m in regx.findall(rexpre.sub('#',ss)):
print(m)
print()
for m in regx.findall(rexpre.sub('#',ss)):
print(re.sub('#',r'?21', m))
Output is
T123#4567
T#1234567
T1234567
T1234434#
T123?214567
T?211234567
T1234567
T1234434?21
If using a replace functionality is an option for you then this might be an approach to match T0000001 or T123?214567:
Capture a T followed by zero or more digits before the optional part in group 1 (T\d*)
Make the question mark followed by 2 digits part optional (?:\?\d{2})?
Capture one or more digits after in group 2 (\d+).
Then in the replacement you could use group1group2 \1\2.
Using word boundaries \b (Or use assertions for the start and the end of the line ^ $) this could look like:
\b(T\d*)(?:\?\d{2})?(\d+)\b
Example Python
Is the below what you want?
Use RegExReplace with multiline tag (m) and enable replace all occurrences!
Pattern = (T\d*)\?\d{2}(\d*)
replace = $1$2
Usage Example:

RegEx substract text from inside

I have an example string:
*DataFromAdHoc(cbgv)
I would like to extract by RegEx:
DataFromAdHoc
So far I have figured something like that:
^[^#][^\(]+
But Unfortunately without positive result. Do you have maybe any idea why it's not working?
The regex you tried ^[^#][^\(]+ would match:
From the beginning of the string, it should not be a # ^[^#]
Then match until you encounter a parenthesis (I think you don't have to escape the parenthesis in a character class) [^\(]+
So this would match *DataFromAdHoc, including the *, because it is not a #.
What you could do, it capture this part [^\(]+ in a group like ([^(]+)
Then your regex would look like:
^[^#]([^(]+)
And the DataFromAdHoc would be in group 1.
Use ^\*(\w+)\(\w+\)$
It just gets everything between the * and the stuff in brackets.
Your answer may depend on which language you're running your regex in, please include that in your question.

regex limiting wildcards for url folders

I'd like to set up a regular expression that matches certain patterns for a URL:
http://www.domain.com/folder1/folder2/anything/anything/index.html
This matches, and gets the job done:
/^http:\/\/www\.domain\.com\/folder1\/folder2\/.*\/.*\/index\.html([\?#].*)?$/.test(location.href)
I'm unsure how to limit the wildcards to one folder each. So how can I prevent the following from matching:
http://www.domain.com/folder1/folder2/folder3/folder4/folder5/index.html
(note: folder 5+ is what I want to prevent)
Thanks!
Try this regular expression:
/^http:\/\/www\.domain\.com\/(?:\w+\/){1,3}index\.html([\?#].*)?$/
Change the number 3 to the maximum depth of folders possible.
. matches any character.
[^/] matches any characters except /.
Since the / character marks the begining and end of regex literals, you may have to escape them like this: [^\/].
So, replacing .* by [^\/]* will do what you want:
/^http:\/\/www\.domain\.com\/folder1\/folder2\/[^\/]*\/[^\/]*\/index\.html([\?#].*)?$/.test(location.href)
/^http:\/\/www\.domain\.com\/folder1\/folder2\/[^/]*\/[^/]*\/index\.html([\?#].*)?$/
I don't remember whether we should escape the slashes within the []. I don't think so.
EDIT: Aknoledging tom's comment using + instead of *:
/^http:\/\/www\.domain\.com\/folder1\/folder2\/[^/]+\/[^/]+\/index\.html([\?#].*)?$/
/^http:\/\/www\.domain\.com\/\([^/]*\/\)\{2\}/
And you can change 2 to whatever number of directories you want to match.
You may use:
^http:\/\/www\.domain\.com\/folder1\/folder2\/(\w*\/){2}index\.html([\?#].*)?$/.test(location.href)

Pattern matching in Perl

I am doing pattern match for some names below:
ABCD123_HH1
ABCD123_HH1_K
Now, my code to grep above names is below:
($name, $kind) = $dirname =~ /ABCD(\d+)\w*_([\w\d]+)/;
Now, problem I am facing is that I get both the patterns that is ABCD123_HH1, ABCD123_HH1_K in $dirname. However, my variable $kind doesn't take this ABCD123_HH1_K. It does take ABCD123_HH1 pattern.
Appreciate your time. Could you please tell me what can be done to get pattern with _k.
You need to add the _K part to the end of your regex and make it optional with ?:
/ABCD(\d+)_([\w\d]+(_K)?)/
I also erased the \w*, which is useless and keeps you from correctly getting the HH1_K.
You should check for zero or more occurrences of _K.
* in Perl's regexp means zero or more times
+ means atleast one or more times.
Hence in your regexp, append (_K)*.
Finally, your regexp should be this:
/ABCD(\d+)\w*_([\w\d]+(_K)*)/
\w includes letters, numbers as well as underscores.
So you can use something as simple as this:
/ABCD\w+/