R: grepl select first charachter on a string - regex

I apologize in advance, this might be a repeat question. However, I just spent the two last hours over stackoverflow, and can't seem to find a solution.
I want to use grepl to detect rows that begin with a digit, that's what I tried to use but It didn't give me the rigt answer:
grep.numeric=as.data.frame(grepl("^[:digit:]",df_mod$name))
I guess that the problem is from the regular expression "^[:digit:]", but I couldn't figure it out.
UPDATE
My dataframe looks like this, It's huge, but below is an example:
ID mark name
1 whatever name product
2 whatever 10 product
3 whatever 250 product
4 another_mark other product
I want to detect products which their names begin with a number.
UPDATE 2
applying grep.numeric=grepl("^[[:digit:]]",df_mod$name) on the example below give me the right answer which is:
grep.numeric
[1] FALSE TRUE TRUE FALSE
But, what drive me crazy is when I pply this fuction to my real dataframe:
grep.numeric=grepl("^[[:digit:]]",df_mod[217,]$nom)
give me this result:
grep.numeric
[1] FALSE
But actually, what I have is this :
df_mod[217,]$nom
[1] 100 lipo 30 gélules
Please help me.

Apparently, some of your values have leading spaces, so you could either modify your regex to (or something similar)
grepl("^\\s*[[:digit:]]", df_mod$name)
Or use the built in trimws function
grepl("^[[:digit:]]", trimws(df_mod$name))

Related

Retrieving the 12th through 14th characters from a long strong using ONLY regex - Grafana variable

I have a small issue, I am trying to get specific characters from a long string using regex but I am having trouble.
Workflow
Prometheus --> Grafana --> Variable (using regex)
I can't use anything other than Regex expressions to achieve this result
I am currently using this expression to grab the long string from some json output:
.*channel_id="(.*?)".*
FROM THIS
{account_id="XXXXXXX-xxxx-xxxx-xxxx-xxxxxxxxxx",account_name="testalpha",channel_id="s0022110430col0901241usa",channel_abbr="s0022109430col}
This returns a string that's ALWAYS 24 characters long:
s0022110430col0901241usa
PROBLEM:
I need to grab the 3 letters 'col' and 'usa' as they are the two teams that are playing, ideally I would be able to pipe the results from the first regex to get these values (the position is key, since the first value will ALWAYS be the 12-14th characters and the second value is the last 3 characters) if I could output these values in uppercase with the string "vs" in between to create a string such as:
COL vs USA
or
ARG vs BRA
I am open to any and every suggestion anyone may have
Thank you!
PS - The uppercase thing is 'nice to have' BUT not needed
I'm still learning RegEx, so this is all I could come up with:
For the col (first team):
(?<=(channel_id=".{11}))\w{3}
For the usa (second team):
(?<=(channel_id=".{21}))\w{3}
Can you define the channel_id?
It begins with 's' and then there are many numbers. If they are always numbers, you can use this regex:
channel_id=".[0-9]+([a-z]+)[0-9]+([a-z]+)
You will get 2 groups, one with "col" and the other with "usa".
Edit:
Or if you just know, that you have always the same size, you can use something like:
channel_id=".{11}([a-z]+).{7}([a-z]+)

Power BI - extract number from text string based on conditions

I have done extensive searching and I don't believe this is a repeat, but is definitely and extension of previous questions. I am attempting to extract numbers from a text string within a Power BI function. I have successfully extracted the numbers from the string into a value using the below:
Text.Combine(
List.RemoveNulls(
List.Transform(
Text.ToList([string_col]),
each if Value.Is(Value.FromText(_), type number)
then _ else null)
)
)
Using this code works great when the number I am interested in is the only number in the string, for example:
"Bring on the 1234567 comments" results in 1234567
However, I can't resolve extracting my number when multiple different numbers occur in the string, for example:
"Bring on on the 1234567 comments with 50 telling me this is a repeat" results in 123456750
What I need to do is one pull the number within the string that meets conditions (one in my case). For my particular issue, the number I need to extract will always be the only 7 digit number in the string, so I feel like this should be a more straight forward answer?
Is there a way to extract only the 7 digit number using my provided function or something similar? If I am way off base, can someone please set me on the proper path?
As always, the communities help is greatly appreciated.
Diedrich
First, you could use the Text.Select function to extract all numbers.
FirstStep =
Table.AddColumn(Source, "MyNumberColumn", each Text.Select([MyStringColumn], {"0".."9"}))
I found this solution on this blog post from Erik Svensen:
https://eriksvensen.wordpress.com/2018/03/06/extraction-of-number-or-text-from-a-column-with-both-text-and-number-powerquery-powerbi
For your specific requirement, maybe you need to column type the NumberColumn as text:
FirstStep =
Table.AddColumn(
Source,
"MyTempNumberColumn",
each Text.Select([MyStringColumn], {"0".."9"}),
type text)
From there, depending on the length of the result you could test presence of seven characters sequence in original string, as many times as needed until you reach the end of the new sequence made only of numbers.
SecondStep=
Table.AddColumn("My7numbers",
each if Text.Length([MyNumberColumn]) = 7
then [MyTempNumberColumn]
else if
Text.Contains([MyStringColumn],
Text.Range([MyTempNumberColumn], 0, 7))
then
Text.Range([MyTempNumberColumn], 0, 7))
else if
Text.Contains([MyStringColumn],
Text.Range([MyTempNumberColumn], 1, 7)
then
Text.Range([MyTempNumberColumn], 1, 7))
Depending on how many numbers you can get, it might worth trying to use Liste.Generate in a function that would give a list of every 7 figures sequences from [MyTempNumberColumn], whatever its length.
https://learn.microsoft.com/en-us/powerquery-m/list-generate

REGEXEXTRACT - Error when trying to get a phone number from sting

I am wondering if someone can help me get this formula right in google spreadsheets.
After a 2 week event I do get a spreadsheet with more that 2000 rows of comments which include phone numbers here and there. I am trying to extract the phone numbers from those strings.
example string: call at 228-219-4241 after
formula: =IFERROR(REGEXEXTRACT(V133,"^(?(?:\d{3}))?[-.]?(?:\d{3})[-.]?(?:\d{4})$"),"NOT FOUND!!!")
and I do get "NOT FOUND!!!!
image from gsheet... NOT FOUND!!!
But it works only in this case..
just the number
Cheers.
Your regex is too complicated and your restricting it to a rule that says the number is the first thing in the string, change to this:
=iferror(regexextract(A1,"\d{3}\-\d{3}\-\d{4}"))
In your example the '^' sign means beginning of the line and '$' means the end so your saying the first thing in your string will always be 3 numbers and the last will always be 4

Regex pattern for a number within a number

Can anyone think of a better way to write this? It works but it is a little ugly.
Input data looks like this: 125100001
The first two numbers are the year, next two are the week number, and last 5 are the serial. I want to validate that the week number is not over 52 for an angular input[number] pattern option. Basically just to leverage the $error field :)
So here it is:
^\d\d(0[0-9]|1[0-9]|2[0-9]|3[0-9]|4[0-9]|5[0-2]){1}\d{5}$
Use this:
^(\d{2})([0-4][1-9]|[1-5]0|5[12])(\d{5})$
Notes
The first set of parentheses (0[1-9]|1[0-2]) validates the month: 01-12
The second set of parentheses ([0-4][1-9]|[1-5]0|5[12]) validates the week: 01-52
If you wish, you can retrieve each component with groups 1, 2 and 2
Just for the week part:
[0-4]\d|5[0-2]
so the entire regex would be:
^\d\d([0-4]\d|5[0-2])\d{5}$

Regular expression prices

I'm trying to find a valid price validation for my needs..
Valid input format (xxx means no maximum length - 0000 means 4 decimal places at maximum):
15,0000
15.0000
150.0000
150,0000
xxxxxxxxxxxx.0000
xxxxxxxxxxxx,0000
15,00
15,1
15.00
15.1
Invalid input format (basically everything that starts by 0):
01.0000
01.00
01
My regular expression so far: ^\$?[1-9][1-9,]*[0-9]\.?[0-9]{0,2}$
Edit 1: Changed my regex for this one: ^\$?[1-9]*[1-9]((\,)|(\.))?[0-9]{0,4}$ but now I need to be able to add 150000000 and it only allows me 150000
EDIT: just saw that you updated the question and added 0 as a valid input. I'll see if I can add that.
How about:
^([1-9].*[,\.][0-9]*)$
This will work on the examples above.
But be careful with input like 15x,001
See it in action
Okay this one seems okay to me
^[^0]\d+(\.|\,)?[0-9]{0,4}$
checked here http://rubular.com/r/97Ra9VS9h4
and yes one more thing if you want to check for one digit numbers also like 1,2 etc
then you can just replace the + with * like this ^[^0]\d*(\.|\,)?[0-9]{0,4}$
What about this one:
^\$?[1-9][0-9]*(,|\.)[0-9]{1,4}$
The first regex makes sure the price doesnt starts with a zero.
Then all numbers are allowed, zero or more numbers.
Then there must be a comma or a point.
Finaly all numbers are allowed, max count is four and minimum one
^[1-9][0-9]*([.,][0-9]{1,4})?$