re pattern doesnt recognize comma value - regex

I hope you can help me with this one.
I'm running an apply method on a pandas dataframe to identify if the value has a correct number format (since some of them have a comma value separating the thousands). Thing is that, as far as i can seen my regex pattern doesnt recognize the comma value. Here's my code:
def afloat(x):
x=str(x)
pattern=re.compile(r"\d+,\d\d\d")
return pattern.match(x)
data=["1,000","999","2,580"]
df=pd.DataFrame(data,columns=["data"])
df["status"]=df.apply(lambda x: afloat(df["data"]),axis=1)
what I get is the following, even though there are comma values that, as far as i can tell, they do match with the pattern i'm defining:
data status
0 1,000 None
1 999 None
2 2,580 None
I just can't identify what i'm doing wrong. thanks!

I tried this and it worked for me:
import re
def afloat(x):
x=str(x).replace(".",",")
pattern=re.compile(r"\d+,\d\d\d")
return pattern.match(x)
test = 100.123
print(afloat(test))

If you look at what you pass to .apply, you will see where the trouble is: .apply(lambda x: afloat(df["data"]),axis=1) passes the whole df["data"] column and not its current row value.
Instead, you should use .apply(lambda x: afloat(x["data"]),axis=1) where x denotes the current row.
Now, the pattern you have is used with re.match and thus will only be searched for at the start of the string, but after it, there may be more text. To make sure you match the entire string with the pattern, add $ at the end of the regex pattern.
However, since all you want to do is to check if a value matches some regex pattern and return a boolean column, you should consider using Series.str.match:
>>> df['data'].str.match(r'\d+,\d{3}$')
0 True
1 False
2 True
Name: data, dtype: bool
Here, \d+,\d{3}$ will match a string that starts with 1+ digits, a comma, and then three digits up to the end of string.

Related

Regex to detect string is x.x.x where x is a digit from 1-3 digits

I have values 1000+ rows with variable values entered as below
5.99
5.188.0
v5.33
v.440.0
I am looking in Gsheet another column to perform following operations:
Remove the 'v' character from the values
if there is 2nd '.' missing as so string can become 5.88 --> 5.88.0
Can help please in the regex and replace logic as tried this but new to regex making. Thanks for the help given
=regexmatch(<cellvalue>,"^[0-9]{1}\.[0-9]{1,3}\.[0-9]{1,3}$")
I have done till finding the value as 5.88.0 returns TRUE and 5.99 returns false, need to now append ".0" so 5.99 --> 5.99.0 and remove 'v' if found.
You can use a combination of functions, it may not be pretty, but it does the work
Replace any instance of v with an empty string using substitute, by making the content of the cell upper case, if we don't put UPPER(CELL) we could exclude any upper case V or lower case v(it will depend which function you use)
SUBSTITUTE(text_to_search, search_for, replace_with, [occurrence_number])
=SUBSTITUTE(UPPER(A1),"V","")
Look for cell missing the last block .xxx, you need to update a bit your regex to specified that the last group it's not present
^([0-9]{1}\.[0-9]{1,3} ( \.[0-9]{1,3}){0} )$
Using REGEXMATCH and IF we can then CONCATENATE the last group as .0
REGEXMATCH(text, regular_expression)
CONCATENATE(string1, [string2, ...])
=IF(REGEXMATCH(substitute(upper(A2),"V",""),"^([0-9]{1}\.[0-9]{1,3}(\.[0-9]{1,3}){0})$"),concatenate(A2,".0"), A2)
The last A2 will be replace with something similar than what we have until now, but before that we need to make small change in the regex, we want to look for the groups you specified were the last group it's present, that's your orignal regex, if it meets the regex it will put it in the cell, otherwise it will put INVALID, you can change that to anything you want it to be
^([0-9]{1}.[0-9]{1,3}.[0-9]{1,3})$
This it's the piece we are putting instead of the last A2
IF(REGEXMATCH(substitute(upper(A2),"V",""),"^([0-9]{1}\.[0-9]{1,3}\.[0-9]{1,3})$"),substitute(upper(A2),"V",""),"INVALID")
With this the final code to put in your cell will be:
=IF(REGEXMATCH(substitute(upper(A2),"V",""),"^([0-9]{1}\.[0-9]{1,3}(\.[0-9]{1,3}){0})$"),concatenate(SUBSTITUTE(UPPER(A2),"V",""),".0"),IF(REGEXMATCH(substitute(upper(A2),"V",""),"^([0-9]{1}\.[0-9]{1,3}\.[0-9]{1,3})$"),substitute(upper(A2),"V",""),"INVALID"))

parse comma seperated values in argumentlist that's seperated by commas

So i have this regex:
=([0-9A-Za-z_-]+),?
and i need have a string like:
foo=bar,pine=apple,tree,bar=bie
or
foo=bar,pine=apple,tree
or
pine=apple,tree
the regex works for cases where i only have 1 value.
but since we have comma's in the list of values for the key.
the regex just craps out and my code does half of what i want it to do but doesn't get the 2nd value.
How do i fix my regex to take both values regardless of where in the string it is?
alone, between 2 others, at the end.
i tried some stuff but couldn't figure it out.
Attempt 1:
=([0-9A-Za-z,_-]+),=?
In this case, it matches the one where it's in the middle but it fails on the others because = does not exist.
Attempt 2:
=[0-9A-Za-z_-]+([,]+[0-9A-Za-z_-]*),?
Matches too bar,pine and tree,bar for example
EDIT::
This seems to work maybe....
=('[0-9A-Za-z,_-]+'),*|=([0-9A-Za-z_-]+),*
if i use quotes for multi values..
You can split on variable names - that will leave only the values:
s := regexp.MustCompile("[^,\\s]+=").Split("foo=bar,pine=apple,tree,bar=bie", -1)
fmt.Println(s)
# => [ "bar", "apple,tree", "bie"]
Go Demo
Regex Demo

Find numbers in string using Golang regexp

I want to find all numbers in a string with the following code:
re:=regexp.MustCompile("[0-9]+")
fmt.Println(re.FindAllString("abc123def", 0))
I also tried adding delimiters to the regex, using a positive number as second parameter for FindAllString, using a numbers only string like "123" as first parameter...
But the output is always []
I seem to miss something about how regular expressions work in Go, but cannot wrap my head around it. Is [0-9]+ not a valid expression?
The problem is with your second integer argument. Quoting from the package doc of regex:
These routines take an extra integer argument, n; if n >= 0, the function returns at most n matches/submatches.
You pass 0 so at most 0 matches will be returned; that is: none (not really useful).
Try passing -1 to indicate you want all.
Example:
re := regexp.MustCompile("[0-9]+")
fmt.Println(re.FindAllString("abc123def987asdf", -1))
Output:
[123 987]
Try it on the Go Playground.
#icza answer is perfect for fetching positive numbers but, if you have a string which contains negative numbers also like below
"abc-123def987asdf"
and you are expecting output like below
[-123 987]
replace regex expression with below
re := regexp.MustCompile(`[-]?\d[\d,]*[\.]?[\d{2}]*`)

can any one tell me regular expression for this? UPDATED

I tried many syntax in vistal studio, and in this site, but nothing helped.
The expression would be _ct_(anyDigitHere)_
like
adkasjflasdjfsdf asdfkl sjfdsf _ct150_ asdfasd // so it would match this _ct150
any thing here doens't matter Random stuff..afd a&r9qwr89 ((
_ct415487_ anything here doesn't matter // this will match _ct415487_
basically any _ctAndAnyNumberHere_ (underscore at start and end)
A couple I tried ^*(_ct)(:z)(+_)+*$, ^*(_ct[0-9]+_)+*$. But none helps!
EDIT
Thanks for the reply(ies). That did work, but what I now the problem is replace those matched elements with a hidden field value.. say..
if the value in hidden field is of 1 digit, (any value from 0-9), I have to take last digit from that matched expression and replace it with the value in hidden field.
if the value in hidden field is of 2 digit, (any value from 0-99), I have to take last two digits from that matched expression and replace it with the value in hidden field.
so, basically..
if the value in hidden field is of n digit, I have to take last n digits from that matched expression and replace it with the value in hidden field.
How do I do that?
I don't know what language of visual studio you're talking about, but this should work:
_ct\d+_
or this:
_ct([0-9]+)_
EDIT:
Regex rg = new Regex("_ct([0-9]+)_");
string text = "any thing here doens't matter Random stuff..afd a&r9qwr89 ((_ct415487_ anything here doesn't matter";
var match = rg.Match(text).Groups[1].Value;
int sizeHiddenField = HiddenField1.Value.Length;
var newString = text.Replace(match, match.Substring(0, match.Length - sizeHiddenField) + HiddenField1.Value);
/_ct\d*_/
This is the regular expression syntax for your given problem. Try this

Regex: How to match a string that is not only numbers

Is it possible to write a regular expression that matches all strings that does not only contain numbers? If we have these strings:
abc
a4c
4bc
ab4
123
It should match the four first, but not the last one. I have tried fiddling around in RegexBuddy with lookaheads and stuff, but I can't seem to figure it out.
(?!^\d+$)^.+$
This says lookahead for lines that do not contain all digits and match the entire line.
Unless I am missing something, I think the most concise regex is...
/\D/
...or in other words, is there a not-digit in the string?
jjnguy had it correct (if slightly redundant) in an earlier revision.
.*?[^0-9].*
#Chad, your regex,
\b.*[a-zA-Z]+.*\b
should probably allow for non letters (eg, punctuation) even though Svish's examples didn't include one. Svish's primary requirement was: not all be digits.
\b.*[^0-9]+.*\b
Then, you don't need the + in there since all you need is to guarantee 1 non-digit is in there (more might be in there as covered by the .* on the ends).
\b.*[^0-9].*\b
Next, you can do away with the \b on either end since these are unnecessary constraints (invoking reference to alphanum and _).
.*[^0-9].*
Finally, note that this last regex shows that the problem can be solved with just the basics, those basics which have existed for decades (eg, no need for the look-ahead feature). In English, the question was logically equivalent to simply asking that 1 counter-example character be found within a string.
We can test this regex in a browser by copying the following into the location bar, replacing the string "6576576i7567" with whatever you want to test.
javascript:alert(new String("6576576i7567").match(".*[^0-9].*"));
/^\d*[a-z][a-z\d]*$/
Or, case insensitive version:
/^\d*[a-z][a-z\d]*$/i
May be a digit at the beginning, then at least one letter, then letters or digits
Try this:
/^.*\D+.*$/
It returns true if there is any simbol, that is not a number. Works fine with all languages.
Since you said "match", not just validate, the following regex will match correctly
\b.*[a-zA-Z]+.*\b
Passing Tests:
abc
a4c
4bc
ab4
1b1
11b
b11
Failing Tests:
123
if you are trying to match worlds that have at least one letter but they are formed by numbers and letters (or just letters), this is what I have used:
(\d*[a-zA-Z]+\d*)+
If we want to restrict valid characters so that string can be made from a limited set of characters, try this:
(?!^\d+$)^[a-zA-Z0-9_-]{3,}$
or
(?!^\d+$)^[\w-]{3,}$
/\w+/:
Matches any letter, number or underscore. any word character
.*[^0-9]{1,}.*
Works fine for us.
We want to use the used answer, but it's not working within YANG model.
And the one I provided here is easy to understand and it's clear:
start and end could be any chars, but, but there must be at least one NON NUMERICAL characters, which is greatest.
I am using /^[0-9]*$/gm in my JavaScript code to see if string is only numbers. If yes then it should fail otherwise it will return the string.
Below is working code snippet with test cases:
function isValidURL(string) {
var res = string.match(/^[0-9]*$/gm);
if (res == null)
return string;
else
return "fail";
};
var testCase1 = "abc";
console.log(isValidURL(testCase1)); // abc
var testCase2 = "a4c";
console.log(isValidURL(testCase2)); // a4c
var testCase3 = "4bc";
console.log(isValidURL(testCase3)); // 4bc
var testCase4 = "ab4";
console.log(isValidURL(testCase4)); // ab4
var testCase5 = "123"; // fail here
console.log(isValidURL(testCase5));
I had to do something similar in MySQL and the following whilst over simplified seems to have worked for me:
where fieldname regexp ^[a-zA-Z0-9]+$
and fieldname NOT REGEXP ^[0-9]+$
This shows all fields that are alphabetical and alphanumeric but any fields that are just numeric are hidden. This seems to work.
example:
name1 - Displayed
name - Displayed
name2 - Displayed
name3 - Displayed
name4 - Displayed
n4ame - Displayed
324234234 - Not Displayed