can regex be used to index/slice parts of string? - regex

So I have a list of serial numbers in the following format:
Serial Number: CN073GTT74445714892L
I was wondering if regex can be used to extract just the last 6 chars?
So in this case, it is 14892L
forget to mention, there is other unrelated text in the document, so how would i make so the match pattern is always after "serial Number: " ?
EDIT - this worked (?<=\s.{29}).{6}$

You can do it with a regex:
.{6}$
Demo
But you can do it without it, and it's an advisable solution. E.g. in Ruby:
"CN073GTT74445714892L"[-6..-1]
in Python:
In [4]: "CN073GTT74445714892L"[-6:]
Out[4]: '14892L'

Regex is ideally used to identify patterns. If it's only the last 6 digits you're interested in, then a normal string manipulation will work too.
e.g in Python, you could use:
str = "CN073GTT74445714892L"
str[-6:]

Related

Regex to insert space with certain characters but avoid date and time

I made a regex which inserts a space where ever there is any of the characters
-:\*_/;, present for example JET*AIRWAYS\INDIA/858701/IDBI 05/05/05;05:05:05 a/c should beJET* AIRWAYS\ INDIA/ 858701/ IDBI 05/05/05; 05:05:05 a/c
The regex I used is (?!a\/c|w\/d|m\/s|s\/w|m\/o)(\D-|\D:|\D\*|\D_|\D\\|\D\/|\D\;)
I have added some words exceptions like a/c w/d etc. \D conditions given to avoid date/time values getting separated, but this created an issue, the numbers followed by the above mentioned characters never get split.
My requirement is
1. Insert a space after characters -:\*_/;,
2. but date and time should not get split which may have / :
3. need exception on words like a/c w/d
The following is the full code
Private Function formatColon(oldString As String) As String
Dim reg As New RegExp: reg.Global = True: reg.Pattern = "(?!a\/c|w\/d|m\/s|s\/w|m\/o)(\D-|\D:|\D\*|\D_|\D\\|\D\/|\D\;)" '"(\D:|\D/|\D-|^w/d)"
Dim newString As String: newString = reg.Replace(oldString, "$1 ")
formatColon = XtraspaceKill(newString)
End Function
I would use 3 replacements.
Replace all date and time special characters with a special macro that should never be found in your text, e.g. for 05/15/2018 4:06 PM, something based on your name:
05MANUMOHANSLASH15MANUMOHANSLASH2018 4MANUMOHANCOLON06 PM
You can encode exceptions too, like this:
aMANUMOHANSLASHc
Now run your original regex to replace all special characters.
Finally, unreplace the macros MANUMOHANSLASH and MANUMOHANCOLON.
Meanwhile, let me tell you why this is complicated in a single regex.
If trying to do this in a single regex, you have to ask, for each / or :, "Am I a part of a date or time?"
To answer that, you need to use lookahead and lookbehind assertions, the latter of which Microsoft has finally added support for.
But given a /, you don't know if you're between the first and second, or second and third parts of the date. Similar for time.
The number of cases you need to consider will render your regex unmaintainably complex.
So please just use a few separate replacements :-)

Get ISBN number with regex

ISBN numbers come with random dash positions
978-618-81543-7-7
9786-18-81-5437-7
97-86-18-81-5437-7
How could i get them every time without knowing dash positions?
Just delete every - with your language of choice.
With Ruby :
"978-618-81543-7-7".delete('-')
#=> "9786188154377"
If you really want to use a regex :
"978-618-81543-7-7".gsub(/-/,'')
If you have multiple lines with isbns :
isbns = "978-618-81543-7-7
9786-18-81-5437-7
97-86-18-81-5437-7"
p isbns.scan(/\b[-\d]+\b/).map{|number_and_dash| number_and_dash.delete('-')}
#=> ["9786188154377", "9786188154377", "9786188154377"]
Pretty easy use of regex, google it to learn more. In Python:
import re
nums=re.findall('\d+',isbnstring)
This will give a list of the numbers. To join them to a string:
isbn=''.join(nums)
As per comments below, if you're working a file you could work it line by line:
with open(isbnfile) as desc:
for isbnstring in desc:
#Do the above and more.
As one example. There are a ton of ways to do this. I just realized from the command line sed is a good choice as well:
sed 's/-//g' isbnfile > newisbnfile

How to group provided string correctly?

I have the following regex:
^([A-Za-z]{2,3}\d{6}|\d{5}|\d{3})((\d{3})?)(\d{2}|\d{3}|\d{6})(\d{2}|\d{3})$
I use this regex to match different, yet similar strings:
# MOR644-004-007-001
MOR644004007001 # string provided
# VUF00101-050-08-01
VUF001010500801 # string provided
# MF001317-077944-01
MF00131707794401 # string provided
These strings need to match/group as it is at the top of the strings, however my problem is that it is not grouping it correctly
The first string: MOR644004007001 is grouped: (MOR644004) (007) (001) which should be (MOR644) (004) (007) (001)
The second string: VUF001010500801 is grouped (VUF001010) (500) (801) which should be (VUF00101) (050) (08) (01)
How can I change ([A-Za-z]{2,3}\d{6}|\d{5}|\d{3})((\d{3})?) so that it would group the provided string correctly?
I am not sure that you can do what you want to.
Let's consider the first two strings:
# MOR644-004-007-001
MOR644004007001 # string provided
# VUF00101-050-08-01
VUF001010500801 # string provided
Now, both the strings are composed of 3 chars followed by 12 digits. Thus, given a regex R, if R does not depend on particular (sequences of) characters and on particular (sequences of) digits (i.e., it presents [A-Za-z] and \d but does not present, let's say, MO and 0070), then it will match both the string in the same way.
So, if you want to operate a different matching, then you need to look at the particular occurrence of certain characters or digits. We need more data from you in order to give you an aswer.
Finally, I suggest you to take a look at this tool:
http://regex.inginf.units.it/ (demo: http://regex.inginf.units.it/demo.html). It is a research project that automatically generates a regex given (many) examples of extraction. I warmly suggest you to try it, especially if you know that an underlying pattern is present in your case for sure (i.e. strings beginning with VUF must be matched differently from strings beginning with MOR) but you are unable to find it. Again, you will need to provide many examples to the engine. Needles to say, if a generic pattern does not exist, then the tool won't find it ;)
Considering your comment to Serv I'd say the (only?) solution is to have one regex for each possibility, like -
MOR(\d{3})(\d{3})(\d{3})(\d{3})|VUF(\d{5})(\d{3})(\d{2})(\d{2})|MF(\d{6})(\d{6})(\d{2})
and then use the execution environment (JS/php/python - you haven't provided which one) to piece the parts together.
See example on regex101 here. Note that substitution, only as an example, matches only the second string.
Regards
Take a look at this. I have used what's called as a named group. As pointed out earlier by others, it's better to have one regex code for each string. I have shown here for the first string, MOR644004007001. Easily you can expand for other two strings:
import re
# MOR644-004-007-001
MOR = "MOR644004007001" # string provided
# VUF00101-050-08-01
VUF = "VUF001010500801" # string provided
# MF001317-077944-01
MF = "MF00131707794401" # string provided
MORcompile = re.compile(r'(?P<first>\w{,6})(?P<second>\d{,3})(?P<third>\d{,3})(?P<fourth>\d{,3})')
MORsearch = MORcompile.search(MOR.strip())
print MORsearch.group('first')
print MORsearch.group('second')
print MORsearch.group('third')
print MORsearch.group('fourth')
MOR644
004
007
001

Regexp: Keyword followed by value to extract

I had this question a couple of times before, and I still couldn't find a good answer..
In my current problem, I have a console program output (string) that looks like this:
Number of assemblies processed = 1200
Number of assemblies uninstalled = 1197
Number of failures = 3
Now I want to extract those numbers and to check if there were failures. (That's a gacutil.exe output, btw.) In other words, I want to match any number [0-9]+ in the string that is preceded by 'failures = '.
How would I do that? I want to get the number only. Of course I can match the whole thing like /failures = [0-9]+/ .. and then trim the first characters with length("failures = ") or something like that. The point is, I don't want to do that, it's a lame workaround.
Because it's odd; if my pattern-to-match-but-not-into-output ("failures = ") comes after the thing i want to extract ([0-9]+), there is a way to do it:
pattern(?=expression)
To show the absurdity of this, if the whole file was processed backwards, I could use:
[0-9]+(?= = seruliaf)
... so, is there no forward-way? :T
pattern(?=expression) is a regex positive lookahead and what you are looking for is a regex positive lookbehind that goes like this (?<=expression)pattern but this feature is not supported by all flavors of regex. It depends which language you are using.
more infos at regular-expressions.info for comparison of Lookaround feature scroll down 2/3 on this page.
If your console output does actually look like that throughout, try splitting the string on "=" when the word "failure" is found, then get the last element (or the 2nd element). You did not say what your language is, but any decent language with string splitting capability would do the job. For example
gacutil.exe.... | ruby -F"=" -ane "print $F[-1] if /failure/"

Regex: How to match a string that is not only numbers

Is it possible to write a regular expression that matches all strings that does not only contain numbers? If we have these strings:
abc
a4c
4bc
ab4
123
It should match the four first, but not the last one. I have tried fiddling around in RegexBuddy with lookaheads and stuff, but I can't seem to figure it out.
(?!^\d+$)^.+$
This says lookahead for lines that do not contain all digits and match the entire line.
Unless I am missing something, I think the most concise regex is...
/\D/
...or in other words, is there a not-digit in the string?
jjnguy had it correct (if slightly redundant) in an earlier revision.
.*?[^0-9].*
#Chad, your regex,
\b.*[a-zA-Z]+.*\b
should probably allow for non letters (eg, punctuation) even though Svish's examples didn't include one. Svish's primary requirement was: not all be digits.
\b.*[^0-9]+.*\b
Then, you don't need the + in there since all you need is to guarantee 1 non-digit is in there (more might be in there as covered by the .* on the ends).
\b.*[^0-9].*\b
Next, you can do away with the \b on either end since these are unnecessary constraints (invoking reference to alphanum and _).
.*[^0-9].*
Finally, note that this last regex shows that the problem can be solved with just the basics, those basics which have existed for decades (eg, no need for the look-ahead feature). In English, the question was logically equivalent to simply asking that 1 counter-example character be found within a string.
We can test this regex in a browser by copying the following into the location bar, replacing the string "6576576i7567" with whatever you want to test.
javascript:alert(new String("6576576i7567").match(".*[^0-9].*"));
/^\d*[a-z][a-z\d]*$/
Or, case insensitive version:
/^\d*[a-z][a-z\d]*$/i
May be a digit at the beginning, then at least one letter, then letters or digits
Try this:
/^.*\D+.*$/
It returns true if there is any simbol, that is not a number. Works fine with all languages.
Since you said "match", not just validate, the following regex will match correctly
\b.*[a-zA-Z]+.*\b
Passing Tests:
abc
a4c
4bc
ab4
1b1
11b
b11
Failing Tests:
123
if you are trying to match worlds that have at least one letter but they are formed by numbers and letters (or just letters), this is what I have used:
(\d*[a-zA-Z]+\d*)+
If we want to restrict valid characters so that string can be made from a limited set of characters, try this:
(?!^\d+$)^[a-zA-Z0-9_-]{3,}$
or
(?!^\d+$)^[\w-]{3,}$
/\w+/:
Matches any letter, number or underscore. any word character
.*[^0-9]{1,}.*
Works fine for us.
We want to use the used answer, but it's not working within YANG model.
And the one I provided here is easy to understand and it's clear:
start and end could be any chars, but, but there must be at least one NON NUMERICAL characters, which is greatest.
I am using /^[0-9]*$/gm in my JavaScript code to see if string is only numbers. If yes then it should fail otherwise it will return the string.
Below is working code snippet with test cases:
function isValidURL(string) {
var res = string.match(/^[0-9]*$/gm);
if (res == null)
return string;
else
return "fail";
};
var testCase1 = "abc";
console.log(isValidURL(testCase1)); // abc
var testCase2 = "a4c";
console.log(isValidURL(testCase2)); // a4c
var testCase3 = "4bc";
console.log(isValidURL(testCase3)); // 4bc
var testCase4 = "ab4";
console.log(isValidURL(testCase4)); // ab4
var testCase5 = "123"; // fail here
console.log(isValidURL(testCase5));
I had to do something similar in MySQL and the following whilst over simplified seems to have worked for me:
where fieldname regexp ^[a-zA-Z0-9]+$
and fieldname NOT REGEXP ^[0-9]+$
This shows all fields that are alphabetical and alphanumeric but any fields that are just numeric are hidden. This seems to work.
example:
name1 - Displayed
name - Displayed
name2 - Displayed
name3 - Displayed
name4 - Displayed
n4ame - Displayed
324234234 - Not Displayed