Simple REGEX to find ON TAPE Numbers - regex

I need to find the string ‘ON TAPE (000012, 000013)’. The number of course changes each time I need to search. I've been trying to learn regex, but I'm not taking to it very well. Anyone mind filling in the blank for me with a regex that will locate the string ‘ON TAPE (000012, 000013)’ ?

Welcome :)
(.*?\(\d+\, \d+\))
Check this out: Regex101

This is all dependent on the exact flavor of regex that you are using. Different languages handle regular expressions differently. Assuming that only the number is going to change, you could try
POSIX
With a POSIX-compliant regex engine, the () characters represent grouping, so they need to be escaped.
/ON\sTAPE\s\(\d+,\s\d+\)/
\s matches any whitespace character
\( and \) match the parentheses
\d matches any numeric character
+ means that the previous character can be repeated 1 to n times
javascript
For this particular case, javascript is POSIX-compliant.
php
For this particular case, php is POSIX-compliant.
python
For this particular case, python is POSIX-compliant.
grep
With grep, you don't need to escape the brackets, it doesn't handle the + character or the \d character.
ON\sTAPE\s([0-9][0-9]*,\s[0-9][0-9]*)
\s matches any whitespace character
[0-9] matches any numeric character
* means that the previous character can be repeated 0 to n times.
PS. The link that Nikolas shared is really useful :)

Related

Regex expression - single quote without comma

I need a regex expression to find single quotes that does not have a comma neither right before nor right after it. Also the single quotes should not be the first character or the last character in the string and should have an alphanumeric character on each side
Example "Jane's book" would detect while "'apples','oranges'"
Can anyone help?
You can use this regex with lookarounds:
(?<=[a-zA-Z0-9])'(?=[a-zA-Z0-9])
RegEx Demo
Something like:
(?<=[A-Za-z0-9])\'(?=[A-Za-z0-9])
should give you matches in the languages that support positive lookaheads and positive lookbehinds (JavaScript only supports lookaheads if I remember correctly). I didn't test the above, but I'm not sure you would even need to escape the single quote...
You need your language-appropriate variation of:
.+'[^,]+.*
' finds you a single quote. You generally do not need to escape a single quotation mark.
[^,] allows any character but a comma and + indicates that you require at least one such character
.* says you can have as many of any character as you like, so putting it before and after what you care about says your expression can occur anywhere in the string. .+ means you must have at least one of any character not a comma.
Note that I'm making some assumptions, like that you'll only have one ' in the string that you want to find. Also I'm assuming you don't care about , except for right after '. If that's not true, you need to be more specific about your requirements.

how to get match string that start with hyphen and character with regex

I have two strings
100-2000
and
100-X200-2012
I try to write regex that match both strings like below by saying that if the second hyphen start with X ignore it
[0-9]+-[a-zA-Z0-9 \-X]+-[0-9]
but it fail to match it, I am not sure how to match it with my criteria ?
do you mean this:
[0-9]+-[^X][a-zA-Z0-9 \-]*-[0-9]+
It can be something like this
(\d*)(-X\d*|)-(\d{4})
Since you haven't specified a tool, I'll go with the .NET regex flavor (my personal favorite):
\d+(-[ \w-[_X]][ \w-[_]]+)?-\d+
\d is the same as [0-9]
In .NET, you can use character class subtraction to remove particular elements from a characters class. In this case, I've included a space and a \w (which is the same as [0-9a-zA-Z_]) within the character class and then subtracted the underscore and uppercase X.
Another option is to use negative lookahead:
\d+(-(?!X)[ 0-9a-zA-Z]+)?-\d+
I like this even better, but not all flavors of regex support it.

Find Numeric Match, replace with delimited match. Regular Expressions

I'm working on searching for an occurrence of 1234567.0 and replacing all matches with 1234567 I'm using Enterprise wizard and can't understand why my regular expressions that work with visual studios won't work in the program.
Right now I'm trying (/d{9})(/d{7}) I know I'm way off here and continue to dig into the cryptic world of regex.
Any regex wizards have a two cents in this. Thanks.
How about replace (\d+)\.\d+ with first matching group..? That trims decimal part including period.
\d+\.\d+ will match 1 or more numbers, followed by a decimal, followed by one or more numbers.
If you want to capture the integer part, put parenthesis around it.
(\d+)\.\d+
\d is the special character for digit
+ means "match at least one of these"
\. just matches a period. Since . is a special character, you have to escape it with a \

regex negative look-ahead for exactly 3 capital letters arround a char

im trying to write a regex finds all the characters that have
exactly 3 capital letters on both their sides
The following regex finds all the characters that have exactly 3 capital letters on the left side of the char, and 3 (or more) on the right:
'(?<![A-Z])[A-Z]{3}(.)(?=[A-Z]{3})'
When trying to limit the right side to no more then 3 capitals using the regex:
'(?<![A-Z])[A-Z]{3}(.)(?=[A-Z]{3})(?![A-Z])'
i get no results, there seems to be a fail when adding the (?![A-Z]) to the first regex.
can someone explain me the problem and suggest a way to solve it?
Thanks.
You need to put the negative lookahead inside the positive one:
(?<![A-Z])[A-Z]{3}.(?=[A-Z]{3}(?![A-Z]))
You can do that with the lookbehind, too:
(?<=(?<![A-Z])[A-Z]{3}).(?=[A-Z]{3}(?![A-Z]))
It doesn't violate the "fixed-length lookbehind" rule because lookarounds themselves don't consume any characters.
EDIT (about fixed-length lookbehind): Of all the flavors that support lookbehind, Python is the most inflexible. In most flavors (e.g. Perl, PHP, Ruby 1.9+) you could use:
(?<=^[A-Z]{3}|[^A-Z][A-Z]{3}).
...to match a character preceded by exactly three uppercase ASCII letters. The first alternative - ^[A-Z]{3} - starts looking three positions back, while the second - [^A-Z][A-Z]{3} - goes back exactly four positions. In Java, you can reduce that to:
(?<=(^|[^A-Z])[A-Z]{3}).
...because it does a little extra work at compile time to figure out that the maximum lookbehind length will be four positions. And in .NET and JGSoft, anything goes; if it's legal anywhere, it's legal in a lookbehind.
But in Python, a lookbehind subexpression has to match a single, fixed number of characters. If you've butted your head against that limitation a few times, you might not expect something like this to work:
(?<=(?<![A-Z])[A-Z]{3}).
At least I didn't. It's even more concise than the Java version; how can it work in Python? But it does work, in Python and in every other flavor that supports lookbehind.
And no, there are no similar restrictions on lookaheads, in any flavor.
Taking out the positive lookahead worked for me.
(?<![A-Z])[A-Z]{3}(.)([A-Z]{3})(?![A-Z])
'ABCdDEF' 'ABCfDEF' 'HHHhhhHHHH' 'jjJJjjJJJ' JJJjJJJ
matches
ABCdDEF
ABCfDEF
JJJjJJJ
I'm not sure how the regexp engines should work with multiple lookahead assertions, but the one you're using may have its own opinion on that.
You could as well use a single assertion as follows:
'(?<![A-Z])[A-Z]{3}(.)(?=[A-Z]{3}[^A-Z])'
The same with lookbehind:
'(?<=[^A-Z][A-Z]{3})(.)(?=[A-Z]{3}[^A-Z])'
This will have a problem matching the pattern in the beginning and in the end of the line.
I can't think of a proper solution, but there can be a dirty trick: for instance, add a space (or something else) in the beginning and the end of the whole line, then perform the matching.
$ echo 'ABCdDEF ABCfDEF HHHhhhHHHH AAAaAAAbAAA jjJJJJjJJJ JJJjJJJ' | sed 's/.*/ & /' | grep -oP '(?<=[^A-Z][A-Z]{3})(\S)(?=[A-Z]{3}[^A-Z])'
d
f
a
b
j
Note that I changed (.) to (\S) in the middle, change it back if you want the space to match.
P.S. Are you solving The Python Challenge? :)
Since the look ahead pattern is the same as the look behind pattern, you could also use the continue anchor \G:
/(?:[A-Z]{3}|\G[A-Z]*)(.)[A-Z]{3}/
A match is returned if three capitals precede a single character or where the last match left off (optionally followed by other capitals).

Regex to match name1.name2[.name3]

I am trying to validate user id's matching the example:
smith.jack or smith.jack.s
In other words, any number of non-whitespace characters (except dot), followed by exactly one dot, followed by any number of non-whitespace characters (except dot), optionally followed by exactly one dot followed by any number of non-whitespace characters (except dot). I have come up with several variations that work fine except for allowing consecutive dots! For example, the following Regex
^([\S][^.]*[.]{1}[\S][^.]*|[\S][^.]*[.]{1}[\S][^.]*[.]{1}[\S][^.]*)$
matches "smith.jack" and "smith.jack.s" but also matches "smith..jack" "smith..jack.s" ! My gosh, it even likes a dot as a first character. It seems like it would be so simple to code, but it isn't. I am using .NET, btw.
Frustrating.
that helps?
/^[^\s\.]+(?:\.[^\s\.]+)*$/
or, in extended format, with comments (ruby-style)
/
^ # start of line
[^\s\.]+ # one or more non-space non-dot
(?: # non-capturing group
\. # dot something
[^\s\.]+ # one or more non-space non-dot
)* # zero or more times
$ # end of line
/x
you're not clear on how many times you can have dot-something, but you can replace the * with {1,3} or something, to specify how many repetitions are allowed.
i should probably make it clear that the slashes are the literal regex delimiter in ruby (and perl and js, etc).
^([^.\s]+)\.([^.\s]+)(?:\.([^.\s]+))?$
I'm not familiar with .NET's regexes. This will do what you want in Perl.
/^\w+\.\w+(?:\.\w+)?$/
If .NET doesn't support the non-capturing (?:xxx) syntax, use this instead:
/^\w+\.\w+(\.\w+)?$/
Note: I'm assuming that when you say "non-whitespace, non-dot" you really mean "word characters."
You are using the * duplication, which allows for 0 iterations of the given component.
You should be using plus, and putting the final .[^.]+ into a group followed by ? to represent the possibility of an extra set.
Might not have the perfect syntax, but something similar to the following should work.
^[^.\s]+[.][^.\s]+([.][^.\s]+)?$
Or in simple terms, any non-zero number of non-whitespace non-dot characters, followed by a dot, followed by any non-zero number of non-whitespace non-dot characters, optionally followed by a dot, followed by any non-zero number of non-whitespace non-dot characters.
I realise this has already been solved, but I find Regexpal extremely helpful for prototyping regex's. The site has a load of simple explanations of the basics and lets you see what matches as you adjust the expression.
[^\s.]+\.[^\s.]+(\.[^\s.]+)?
BTW what you asked for allows "." and ".."
I think you'd benefit from using + which means "1 or more", instead of * meaning "any number including zero".
(^.)+|(([^.]+)[.]([^.]+))+
But this would match x.y.z.a.b.c and from your description, I am not sure if this is sufficiently restrictive.
BTW: feel free to modify if I made a silly mistake (I haven't used .NET, but have done plently of regexs)
[^.\s]+\.[^.\s]+(\.([^\s.]+?)?
has unmatched paren. If corrected to
[^.\s]+\.[^.\s]+(\.([^\s.]+?))?
is still too liberal. Matches a.b. as well as a.b.c.d. and .a.b
If corrected to
[^.\s]+\.[^.\s]+(\.([^\s.]+?)?)
doesn't match a.b
^([^.\W]+)\.?([^.\W]+)\.?([^.\W]+)$
This should capture as described, group the parts of the id and stop duplicate periods
I took a slightly different approach. I figured you really just wanted a string of non-space characters followed by only one dot, but that dot is optional (for the last entry). Then you wanted this repeated.
^([^\s\.]+\.?)+$
Right now, this means you have to have at least one string of characters, e.g. 'smith' to match. You, of course could limit it to only allow one to three repetitions with
^([^\s\.]+\.?){1,3}$
I hope that helps.
RegexBuddy Is a good (non-free) tool for regex stuff