Regex to match part of file name - regex

I need to use regex match expression to find part of a file name.
eg file name is ABC01-001-M-001_0.dwg
I need to match the first digit after the underscore (this is the revision number of the drawing file name)
In this case with the example it would match the 0
Can anyone please advise the regex for this?
Many thanks.

pushpraj answer is almost correct. I take it you dont want to see the underscore.
so (?<=_)(\d+) would be the correct choice. The (?<=_) says, that the underscore have to be in front of your desired pattern but not included.
Demo: https://regex101.com/r/kH1nE2/2

I think this should be pretty simple.
_(\d)
online demo https://regex101.com/r/kH1nE2/1
in case you expect more digits you can use the following
_(\d+)
note "+" which will match 1 or more digits after underscore

In this case regex is not even needed.
$n = "ABC01-001-M-001_0.dwg";
Echo intval(explode("_", $n)[1]); // 0
https://3v4l.org/fBXc5
It seems you are looking for a javascript solution?
var n = "ABC01-001-M-001_0.dwg";
var int = parseInt(n.split("_")[1]);
Not sure it actually works because I'm not doing writing in javascript but ^^ is what I could google about the functions.
Maybe works, maybe not.
It seems to work :-)
https://jsfiddle.net/w9x7d40f/

Related

Pattern matching in Perl

I am doing pattern match for some names below:
ABCD123_HH1
ABCD123_HH1_K
Now, my code to grep above names is below:
($name, $kind) = $dirname =~ /ABCD(\d+)\w*_([\w\d]+)/;
Now, problem I am facing is that I get both the patterns that is ABCD123_HH1, ABCD123_HH1_K in $dirname. However, my variable $kind doesn't take this ABCD123_HH1_K. It does take ABCD123_HH1 pattern.
Appreciate your time. Could you please tell me what can be done to get pattern with _k.
You need to add the _K part to the end of your regex and make it optional with ?:
/ABCD(\d+)_([\w\d]+(_K)?)/
I also erased the \w*, which is useless and keeps you from correctly getting the HH1_K.
You should check for zero or more occurrences of _K.
* in Perl's regexp means zero or more times
+ means atleast one or more times.
Hence in your regexp, append (_K)*.
Finally, your regexp should be this:
/ABCD(\d+)\w*_([\w\d]+(_K)*)/
\w includes letters, numbers as well as underscores.
So you can use something as simple as this:
/ABCD\w+/

Regex string transformation/extraction

Code:
https://aaa.bbb.net/ccc/211099_589944494365122_1446403980_n.jpg
How can I get 589944494365122 out of that string using regex?
The best I can do so far is _(.*) resulting 589944494365122_1446403980_n.jpg
First, you should generalize your problem description, like that: How can I get the longest non-empty substring of digits after the first _ in string? The regexp you literally asked for is (589944494365122), but that's not what you expect.
According to my guess about what you want, the answer could be _(\d+).
The rule of extraction I can see in your input is:
211099_589944494365122_1446403980
[0-9]+_ part we want _[0-9]+
so a regex with look-behind and look-ahead will help:
'(?<=\d_)\d+(?=_\d)'
test with grep:
kent$ echo " https://aaa.bbb.net/ccc/211099_589944494365122_1446403980_n.jpg"|grep -Po '(?<=\d_)\d+(?=_\d)'
589944494365122
This works;
var s = "https://aaa.bbb.net/ccc/211099_589944494365122_1446403980_n.jpg";
var m = /_([^_]*)/.exec(s);
console.log( m[1] ); // 589944494365122
I would go with \d+_(\d+)_\d+_n\.jpg, but depending on the exact specification of the URL this may need a little bit of tweaking.
Also depending on the language, this may need to be altered a little bit. The solution I suggest will work for instance in Ruby (as well as many other regex implementations). Here \d matches any digit and \d+ means one or more digits. I assume the letter before .jpg is always n but you may change this by either replacing n with .(any character) or with \w (any word character).

Regex: match everything before FIRST underscore and everything in between AFTER

I have an expression like
test_abc_HelloWorld_there could be more here.
I'd like a regex that takes the first word before the first underscore. So get "test"
I tried [A-Za-z]{1,}_ but that didn't work.
Then I'd like to get "abc" or anything in between the first 2 underscores.
2 Separate Regular expressions, not combined
Any help is very appreciated!
Example:
for 1) the regex would match the word test
for 2) the regex would match the word abc
so any other match for either case would be wrong. As in, if I were to replace what I matched on then I would get something like this:
for case 1) match "test" and replace "test" with "Goat".
'Goat_abc_HelloWorld_there could be more here'
I don't want a replace, I just want a match on a word.
In both case you can use assertions.
^[^_]+(?=_)
will get you everything up to the first underscore of the line, and
(?<=_)[^_]+(?=_)
will match whatever string is located between two unserscores.
Step back and consider that maybe you're overengineering the solution here. Ruby has a split method for this, other languages probably have their own equivalents
given something like this "AAPL_annual_i.xls", you could just do this and take advantage of the fact that your data is already structured
string_object = "AAPL_annual_i.xls"
ary = string_object.split("_")
#=> ["AAPL", "annual", "i.xls"]
extension = ary.split(".")[1]
#=> ["xls"]
filetype = ary[3].split(".")[0] #etc
'doh!
But seriously, I've found that leaning on the split method is not only easier on me, it's easier on my associates who have to read my code and understand what it does.

Regular Expression to find sequences of lowercase letters joined with underscore

I can't seem to make my regular expression work.
I'd like to have some alpha text, no numbers, an underscore and then some more aplha text.
for example: blah_blah
I have an non-working example here
^[a-z][_][a-z]$
Thanks in advance people.
EDIT: I apologize, I'd like to enforce the use of all lower case.
^[a-z]+_[a-z]+$
Try this:
[A-Za-z]+_[A-Za-z]+
Lowercase :
[a-z]+_[a-z]+
You just need:
[a-z]+_[a-z]+
or if it needs to be an entire line:
^[a-z]+_[a-z]+$
Try:
^[a-z]+_[a-z]+$
Depending on which flavor of regex you're using there are a different possibilities:
^[A-Za-z]+_[A-Za-z]+$
^\a+_\a+$
^[[:alpha:]]+_[[:alpha:]]+$
The first form being the most widely accepted.
Your example suggests you're looking for things exactly like "blah_foo" and don't want to extract it from strings like "Hey blah_foo you". If this is not the case, you should drop the "^" (match the beginning of the string) and "$" (match the end of the string)

Quick Regex question

Can anybody guide me in the right direction...
I have some folder strucktures that I want to check for a trailing / slash, some of the folders don't have a trailing / and if it does not have a trailing
/ I need to add one
If I use this regex it works but it replace the last character of the folder name
Folder/name/test
folder/name/test1/
folder/name/test2/
replace(/.$/ig,"/");
This regex replaces Folder/name/tes[t]/ but will take out the t and replace it with /
Hope this makes sense...
Try something like this:
replace(/[^/]$/ig, "$0/")
replace(/(.)$/ig,"\1/");
or better
replace(/([^\\])$/ig,"\1/");
if \1 isn't a backreference in your language, then you'll have to figure that out, or tell us teh language.
The regex you made basically means this: take any character that is the last one in the string and replace it with /. What you need to do is to group the last character and then insert it again in the replacement, like this:
replace(/([^\/])$/ig,"$1/");
For more information see
http://www.regular-expressions.info/brackets.html
Without knowing the language it's difficult to post a correct answer and you can't use the code provided in a cut-and-paste fashion. Anyway I might go for this regex:
replace(/(.)\/*$/,"\1/");
This will append the trailing / only if it's not there yet.
I'm not sure which language this is for but this is how you would do it in Perl:
#! /local/bin/perl
my #data = <data>;
while (<DATA>)
{
s#[^/]\n#/\n#m;
print;
}
__DATA__
/foo/bar/
/baz/jazz
/baz/jazz
This then prints out the following:
/foo/bar/
/baz/jaz/
/baz/jazz/
The key to the regex is the "[^/]\n" This basically matches anything at the end next to to the newline. With your nomenclature, I would assume the syntax would be the following:
replace(/[^\/]\n/ig,"/");
Or if there is no newline use this:
replace(/[^\/]$/ig,"/");
I hope that helps.
I would avoid a regular expressions in this case and do something easier like:
$path = rtrim($path, '/').'/';
EDIT:
Woops, assumed it was php...