Find Numeric Match, replace with delimited match. Regular Expressions - regex

I'm working on searching for an occurrence of 1234567.0 and replacing all matches with 1234567 I'm using Enterprise wizard and can't understand why my regular expressions that work with visual studios won't work in the program.
Right now I'm trying (/d{9})(/d{7}) I know I'm way off here and continue to dig into the cryptic world of regex.
Any regex wizards have a two cents in this. Thanks.

How about replace (\d+)\.\d+ with first matching group..? That trims decimal part including period.

\d+\.\d+ will match 1 or more numbers, followed by a decimal, followed by one or more numbers.
If you want to capture the integer part, put parenthesis around it.
(\d+)\.\d+
\d is the special character for digit
+ means "match at least one of these"
\. just matches a period. Since . is a special character, you have to escape it with a \

Related

Simple REGEX to find ON TAPE Numbers

I need to find the string ‘ON TAPE (000012, 000013)’. The number of course changes each time I need to search. I've been trying to learn regex, but I'm not taking to it very well. Anyone mind filling in the blank for me with a regex that will locate the string ‘ON TAPE (000012, 000013)’ ?
Welcome :)
(.*?\(\d+\, \d+\))
Check this out: Regex101
This is all dependent on the exact flavor of regex that you are using. Different languages handle regular expressions differently. Assuming that only the number is going to change, you could try
POSIX
With a POSIX-compliant regex engine, the () characters represent grouping, so they need to be escaped.
/ON\sTAPE\s\(\d+,\s\d+\)/
\s matches any whitespace character
\( and \) match the parentheses
\d matches any numeric character
+ means that the previous character can be repeated 1 to n times
javascript
For this particular case, javascript is POSIX-compliant.
php
For this particular case, php is POSIX-compliant.
python
For this particular case, python is POSIX-compliant.
grep
With grep, you don't need to escape the brackets, it doesn't handle the + character or the \d character.
ON\sTAPE\s([0-9][0-9]*,\s[0-9][0-9]*)
\s matches any whitespace character
[0-9] matches any numeric character
* means that the previous character can be repeated 0 to n times.
PS. The link that Nikolas shared is really useful :)

How to add formatting commas to a number in Google Refine

Due to what we're using the data for, it's important that long numbers (8+ digits) have commas every 3 digits for formatting and readability.
The issue is I really don't know how to make an expression that does this. Would anyone with some more experience writing these expressions point me in the right direction?
The supported expression languages are GREL (Google Refine Expression Language), Clojure, and Jython.
Using a substitution, this \B(?=(\d{3})+(?!\d)) will insert a comma every three digits
So 12345678 becomes 12,345,678
It uses :
\B : Negated word boundary, it's the negated version of \b and matches at every position where \b does not. Effectively, \B matches at any position between two word characters as well as at any position between two non-word characters. More details
A positive look ahead (?=...) wich make sure the comma will be inserted every three digits starting from the right
Live DEMO

What do comma separated numbers in curly braces at the end of a regex mean?

I am trying to understand the following regex, I understand the initial part but I'm not able to figure out what {3,19} is doing here:
/[A-Z][A-Za-z0-9\s]{3,19}$/
That is the custom repetition operation known as the Quantifier.
\d{3} will find exactly three digits.
[a-c]{1,3} will find any occurrance of a, b, or c at least one time, but up to three times.
\w{0,1} means a word character will be optionally found. This is the same as placing a Question Mark, e.g.: \w?
(\d\w){1,} will find any combination of a Digit followed by a Word Character at least One time, but up to infinite times. So it would match 1k1k2k4k1k5j2j9k4h1k5k This is the same as a Plus symbol, e.g.: (\d\w)+
b{0,}\d will optionally find the letter b followed by a digit, but could also match infinite letter b's followed by a digit. So it will match 5, b5, or even bbbbbbb5. This is the same as an Asterisk. e.g.: b*\d
Quantifiers
They are 'quantifiers' - it means 'match previous pattern between 3 and 19 times'
When you are learning regular expressions, it's really use to play with them in an interactive tool which can highlight the matches. I've always liked a tool called Regex Coach, but it is Windows only. Plenty of online tools though - have a play with your regex here, for example.
{n,m} means "repeat the previous element at least n times and at most m times", so the expression
[A-Za-z0-9\s]{3,19} means "match between 3 and 19 characters that are letters, digits, or whitespace". Note that repetition is greedy by default, so this will try to match as many characters as possible within that range (this doesn't come into play here, since the end of line anchor makes it so that there is really only one possibility for each match).
The regular expression you have there /[A-Z][A-Za-z0-9\s]{3,19}$/ breaks up to mean this:
[A-Z] We are looking for a Capital letter
Followed by
[A-Za-z0-9\s]{3,19} a series of letters, digits, or white space that is between 3 and 19 characters
$ Then the end of the line.
It will have to match [A-Za-z0-9\s] between 3 and 19 times.
Here's a good regex reference guide:
http://www.regular-expressions.info/reference.html
what does comma separated numbers in curly brace at the end of regex means
It denotes Quantifier with the range specified in curly brace.
curly brace analogues to function with arguments. Where we can specify single integer or two integers which acts as a range between the two numbers.
/[A-Z][A-Za-z0-9\s]{3,19}$/
Using online regex websites we can get understand as follows:
https://regex101.com/

Unable to use two regular expressions in SAS Content Categorization Studio

I'm working in SAS Content Categorization Studio. I'm trying to get two concepts, consisting of one regular expression each, to return a number of matches. One is supposed to find dates, the other a particularly formatted number.
(0[1-9]|[12][0-9]|3[01])[.](0[1-9]|1[012])[.](?:[0-9]{2})?[0-9]{2}
[1-9](?:(?:[ -.])?\d){10,10}
The regex that is supposed to find the formatted number (latter) doesn't return any hits as long as the regex that is supposed to find dates (former) is active or not commented out. As soon as I comment out the regex for the date, the latter continues to work again. They seem to be mutually exclusive. Can anyone tell me what I'm doing wrong?
If your pattern on the end (?:(?:[ -.])?\d){10,10} is the date match portion, it seems to be a little off to me. What that appears to match is 10 iterations of "Some optional character (literally anything because of the '.') followed by a digit". First, it seems like you would want 8 iterations, not ten to match a date. But I think what you really want is something like \d{1,2}([\.-])\d{1,2}\1\d{4}. This would match "One or two digits, followed by a literal . or -, followed by one or two more digits, followed by whatever special character you matched before (. or -), followed by four digits".

Regex to match name1.name2[.name3]

I am trying to validate user id's matching the example:
smith.jack or smith.jack.s
In other words, any number of non-whitespace characters (except dot), followed by exactly one dot, followed by any number of non-whitespace characters (except dot), optionally followed by exactly one dot followed by any number of non-whitespace characters (except dot). I have come up with several variations that work fine except for allowing consecutive dots! For example, the following Regex
^([\S][^.]*[.]{1}[\S][^.]*|[\S][^.]*[.]{1}[\S][^.]*[.]{1}[\S][^.]*)$
matches "smith.jack" and "smith.jack.s" but also matches "smith..jack" "smith..jack.s" ! My gosh, it even likes a dot as a first character. It seems like it would be so simple to code, but it isn't. I am using .NET, btw.
Frustrating.
that helps?
/^[^\s\.]+(?:\.[^\s\.]+)*$/
or, in extended format, with comments (ruby-style)
/
^ # start of line
[^\s\.]+ # one or more non-space non-dot
(?: # non-capturing group
\. # dot something
[^\s\.]+ # one or more non-space non-dot
)* # zero or more times
$ # end of line
/x
you're not clear on how many times you can have dot-something, but you can replace the * with {1,3} or something, to specify how many repetitions are allowed.
i should probably make it clear that the slashes are the literal regex delimiter in ruby (and perl and js, etc).
^([^.\s]+)\.([^.\s]+)(?:\.([^.\s]+))?$
I'm not familiar with .NET's regexes. This will do what you want in Perl.
/^\w+\.\w+(?:\.\w+)?$/
If .NET doesn't support the non-capturing (?:xxx) syntax, use this instead:
/^\w+\.\w+(\.\w+)?$/
Note: I'm assuming that when you say "non-whitespace, non-dot" you really mean "word characters."
You are using the * duplication, which allows for 0 iterations of the given component.
You should be using plus, and putting the final .[^.]+ into a group followed by ? to represent the possibility of an extra set.
Might not have the perfect syntax, but something similar to the following should work.
^[^.\s]+[.][^.\s]+([.][^.\s]+)?$
Or in simple terms, any non-zero number of non-whitespace non-dot characters, followed by a dot, followed by any non-zero number of non-whitespace non-dot characters, optionally followed by a dot, followed by any non-zero number of non-whitespace non-dot characters.
I realise this has already been solved, but I find Regexpal extremely helpful for prototyping regex's. The site has a load of simple explanations of the basics and lets you see what matches as you adjust the expression.
[^\s.]+\.[^\s.]+(\.[^\s.]+)?
BTW what you asked for allows "." and ".."
I think you'd benefit from using + which means "1 or more", instead of * meaning "any number including zero".
(^.)+|(([^.]+)[.]([^.]+))+
But this would match x.y.z.a.b.c and from your description, I am not sure if this is sufficiently restrictive.
BTW: feel free to modify if I made a silly mistake (I haven't used .NET, but have done plently of regexs)
[^.\s]+\.[^.\s]+(\.([^\s.]+?)?
has unmatched paren. If corrected to
[^.\s]+\.[^.\s]+(\.([^\s.]+?))?
is still too liberal. Matches a.b. as well as a.b.c.d. and .a.b
If corrected to
[^.\s]+\.[^.\s]+(\.([^\s.]+?)?)
doesn't match a.b
^([^.\W]+)\.?([^.\W]+)\.?([^.\W]+)$
This should capture as described, group the parts of the id and stop duplicate periods
I took a slightly different approach. I figured you really just wanted a string of non-space characters followed by only one dot, but that dot is optional (for the last entry). Then you wanted this repeated.
^([^\s\.]+\.?)+$
Right now, this means you have to have at least one string of characters, e.g. 'smith' to match. You, of course could limit it to only allow one to three repetitions with
^([^\s\.]+\.?){1,3}$
I hope that helps.
RegexBuddy Is a good (non-free) tool for regex stuff