Regex perl with letters and numbers - regex

I need to extract a strings from a text file that contains both letters and numbers. The lines start like this
Report filename: ABCL00-67900010079415.rpt ______________________
All I need is the last 8 numbers so in this example that would be 10079415
while(<DATA>){
if (/Report filename/) {
my ($bagID) = ( m/(\d{8}+)./ );
print $bagID;
}
Right now this prints out the first 8 but I want the last 8.

You just need to escape the dot, so that it would match the 8 digit characters which exists before the dot charcater.
my ($bagID) = ( m/(\d{8}+)\./ );
. is a special character in regex which matches any character. In-order to match a literal dot, you must need to escape that.

To match the last of anything, just precede it with a wildcard that will match as many characters as possible
my ($bag_id) = / .* (\d{8}) /x
Note that I have also use the /x modifier so that the regex can contain insignificant whitespace for readability. Also, your \d{8}+ is what is called a possessive quantifier; it is used for optimising some regex constructions and makes no difference at the end of the pattern

Related

Regex ignore special character with greedy

I used the following regex to catch 10 numbers and letters:
/[a-zA-Z0-9]{10}/g
It works fine if the 10 characters are only numbers and letters.
e.g. input: 12345xcdw034342
it catches 12345xcdw0
But in this case with special characters or space, it doesn't catch it.
123}456712234324Zz3 or 123}45 71223AB3
It should catch 10 numbers and letters regardness of characters.
Any help would be gratefully appreciated.
You can do it but not without any extra processing
As you have not spetified what language you're using Ill use Javascript for being quite universal but the same logic must apply in any language.
Here are the options I can think of
if I have testString = "12#34{56A789BDE"
Match the all until the first ten alphanumeric caracters, and then remove the spetial characters in the resulting string
testString.match(/(\w.*?){10}/)[0].replaceAll(/\W/g, '')
// results '123456A789'
// explanation: we take the first \w and use .*? to indicate that we dont care if the alphanumeric has a non-alphanumeric right next to it, then we clean the result by removing \W which means non-alphanumeric
Match only the first ten alphanumeric caracters and then join them to make a result string
testString.match(/\w/g).splice(0,10).join('')
// results '123456A789'
// explanation: we match 10 groups of aphanumeric characters represented by \w (note the lowercase) and we join the first 10 (using splice to get them) as each group "()" is in the case of javascript returned as an element of an array of matches
Remove the spetial characters from your string and then take the first ten
testString.replaceAll(/\W/g,'').match(/\w{10}/)[0]
// results '123456A789'
// explanation: we replace \W which means non alpha numeric characters, with '' to delete them then we match the first ten
You can use
/[a-zA-Z0-9](?:[^a-zA-Z0-9]*[a-zA-Z0-9]){9}/g
See the regex demo. Details:
[a-zA-Z0-9] - an alphanumeric
(?:[^a-zA-Z0-9]*[a-zA-Z0-9]){9} - nine occurrences of any zero or more chars other than an alphanumeric char and then an alphanumeric char.

How to check number of different characters using regex?

I'm trying to create regex to find all inputs containing max three different characters. It doesn't matter how long the input is.
Example of cases:
"32 32 32 32 34" --> match
"MM" --> match
" " --> match
"1234" --> no match
I've done regex to find inputs of four or more different chars, but now I need it in opposite way...
(.).*(?\1)(.).*(?\1)(?\2)(.).*(?\1)(?\2)(?\3)(.)
Main question is: How to check number of different characters?
The following will match a string with a maximum of three different non-space characters
^\s*(\S)?(?:\s|\1)*(\S)?(?:\s|\1|\2)*(\S)?(?:\s|\1|\2|\3)*$
(\S) matches one non-space character and captures it so it can then be referenced later in the regex using a back-reference e.g. \1. The ? in the (\S)? are used so the string can contain zero, one, two or three types of non-space characters.
The ?: make a group non-capturing.
The first part of the regex captures up to three different non-space characters \1, \2, \3, and then (?:\s|\1|\2|\3)* ensures only those characters or space \s can then appear before the end of the string $.
One way, in Javascript, to count the number of different non-space characters in a string "using regex":
var str = 'ABC ABC';
var chars = '';
str.replace( /\S/g, function ( m ) {
if ( chars.indexOf(m) == -1 ) chars += m;
});
chars.length; // 3
Good q. Here's the simplest I could come up:
^\s*([^\s]{1,3}\s+)*[^\s]{0,3}$
Explanation:
^\s* matches any amount of whitespace at the start.
([^\s]{1,3}\s+)* matches repeating groups of between one and three
non-whitespace characters followed by at least one whitespace character. Consider putting ?: after ( to make this a non-capturing group.
The final [^\s]{0,3} allows the string to end with up to three non-whitespace characters (so it doesn't have to end with whitespace as enforced by 2.)
Visualisation:
Demo:
Test it here: Debuggex Demo

regex: find one-digit number

I need to find the text of all the one-digit number.
My code:
$string = 'text 4 78 text 558 my.name#gmail.com 5 text 78998 text';
$pattern = '/ [\d]{1} /';
(result: 4 and 5)
Everything works perfectly, just wanted to ask it is correct to use spaces?
Maybe there is some other way to distinguish one-digit number.
Thanks
First of all, [\d]{1} is equivalent to \d.
As for your question, it would be better to use a zero width assertion like a lookbehind/lookahead or word boundary (\b). Otherwise you will not match consecutive single digits because the leading space of the second digit will be matched as the trailing space of the first digit (and overlapping matches won't be found).
Here is how I would write this:
(?<!\S)\d(?!\S)
This means "match a digit only if there is not a non-whitespace character before it, and there is not a non-whitespace character after it".
I used the double negative like (?!\S) instead of (?=\s) so that you will also match single digits that are at the beginning or end of the string.
I prefer this over \b\d\b for your example because it looks like you really only want to match when the digit is surrounded by spaces, and \b\d\b would match the 4 and the 5 in a string like 192.168.4.5
To allow punctuation at the end, you could use the following:
(?<!\S)\d(?![^\s.,?!])
Add any additional punctuation characters that you want to allow after the digit to the character class (inside of the square brackets, but make sure it is after the ^).
Use word boundaries. Note that the range quantifier {1} (a single \d will only match one digit) and the character class [] is redundant because it only consists of one character.
\b\d\b
Search around word boundaries:
\b\d\b
As explained by the others, this will extract single digits meaning that some special characters might not be respected like "." in an ip address. To address that, see F.J and Mike Brant's answer(s).
It really depends on where the numbers can appear and whether you care if they are adjacent to other characters (like . at the end of a sentence). At the very least, I would use word boundaries so that you can get numbers at the beginning and end of the input string:
$pattern = '/\b\d\b/';
But you might consider punctuation at the end like:
$pattern = '/\b\d(\b|\.|\?|\!)/';
If one-digit numbers can be preceded or followed by characters other than digits (e.g., "a1 cat" or "Call agent 7, pronto!") use
(?<!\d)\d(?!\d)
Demo
The regular expression reads, match a digit (\d) that is neither preceded nor followed by digit, (?<!\d) being a negative lookbehind and (?!\d) being a negative lookahead.

Match a number in a string with letters and numbers

I need to write a Perl regex to match numbers in a word with both letters and numbers.
Example: test123. I want to write a regex that matches only the number part and capture it
I am trying this \S*(\d+)\S* and it captures only the 3 but not 123.
Regex atoms will match as much as they can.
Initially, the first \S* matched "test123", but the regex engine had to backtrack to allow \d+ to match. The result is:
+------------------- Matches "test12"
| +-------------- Matches "3"
| | +--------- Matches ""
| | |
--- --- ---
\S* (\d+) \S*
All you need is:
my ($num) = "test123" =~ /(\d+)/;
It'll try to match at position 0, then position 1, ... until it finds a digit, then it will match as many digits it can.
The * in your regex are greedy, that's why they "eat" also numbers. Exactly what #Marc said, you don't need them.
perl -e '$_ = "qwe123qwe"; s/(\d+)/$numbers=$1/e; print $numbers . "\n";'
"something122320" =~ /(\d+)/ will return 122320; this is probably what you're trying to do ;)
\S matches any non-whitespace characters, including digits. You want \d+:
my ($number) = 'test123' =~ /(\d+)/;
Were it a case where a non-digit was required (say before, per your example), you could use the following non-greedy expressions:
/\w+?(\d+)/ or /\S+?(\d+)/
(The second one is more in tune with your \S* specification.)
Your expression satisfies any condition with one or more digits, and that may be what you want. It could be a string of digits surrounded by spaces (" 123 "), because the border between the last space and the first digit satisfies zero-or-more non-space, same thing is true about the border between the '3' and the following space.
Chances are that you don't need any specification and capturing the first digits in the string is enough. But when it's not, it's good to know how to specify expected patterns.
I think parentheses signify capture groups, which is exactly what you don't want. Remove them. You're looking for /\d+/ or /[0-9]+/

Regex allow digits and a single dot

What would be the regex to allow digits and a dot? Regarding this \D only allows digits, but it doesn't allow a dot, I need it to allow digits and one dot this is refer as a float value I need to be valid when doing a keyup function in jQuery, but all I need is the regex that only allows what I need it to allow.
This will be in the native of JavaScript replace function to remove non-digits and other symbols (except a dot).
Cheers.
If you want to allow 1 and 1.2:
(?<=^| )\d+(\.\d+)?(?=$| )
If you want to allow 1, 1.2 and .1:
(?<=^| )\d+(\.\d+)?(?=$| )|(?<=^| )\.\d+(?=$| )
If you want to only allow 1.2 (only floats):
(?<=^| )\d+\.\d+(?=$| )
\d allows digits (while \D allows anything but digits).
(?<=^| ) checks that the number is preceded by either a space or the beginning of the string. (?=$| ) makes sure the string is followed by a space or the end of the string. This makes sure the number isn't part of another number or in the middle of words or anything.
Edit: added more options, improved the regexes by adding lookahead- and behinds for making sure the numbers are standalone (i.e. aren't in the middle of words or other numbers.
\d*\.\d*
Explanation:
\d* - any number of digits
\. - a dot
\d* - more digits.
This will match 123.456, .123, 123., but not 123
If you want the dot to be optional, in most languages (don't know about jquery) you can use
\d*\.?\d*
Try this
boxValue = boxValue.replace(/[^0-9\.]/g,"");
This Regular Expression will allow only digits and dots in the value of text box.
My try is combined solution.
string = string.replace(',', '.').replace(/[^\d\.]/g, "").replace(/\./, "x").replace(/\./g, "").replace(/x/, ".");
string = Math.round( parseFloat(string) * 100) / 100;
First line solution from here: regex replacing multiple periods in floating number . It replaces comma "," with dot "." ; Replaces first comma with x; Removes all dots and replaces x back to dot.
Second line cleans numbers after dot.
Try the following expression
/^\d{0,2}(\.\d{1,2})?$/.test()