R grepl to find a pure number - regex

Probably a very basic question but its buggging me that i can't easily find a solution...so i thought i should come to the wisdom of the SO wise ones...
I would like to be able to return a TRUE or FALSE acording to if a character string is a pure number rather than just containing numbers... The closest I got was
grepl("[0-9]","99393")
grepl("[0-9]","blah")
However this doesn't work since the following is returned as TRUE when it should be FALSE
grepl("[0-9]","993T3")
As ever any help would be appreciated!
EDIT
As joran pointed out it is important to note that the character string will only ever include integers and letters, i.e. will not include decimal points or commas for the number...

You should specify the whole regular expression and specify the beginning (^) and the end of the string ($). For instance :
> grepl("^[[:digit:]]+$","993T3")
[1] FALSE
Look at http://en.wikibooks.org/wiki/R_Programming/Text_Processing#Regular_Expressions if you want to learn more about regexp.

Why don't you just use robust internal methods for coercing to either an integer or numeric?
It will return NA if it can't be done. Use is.na if you want a logical result:
is.na( as.integer( "993T3" ) )
# [1] TRUE
is.na( as.integer( "99393" ) )
# [1] FALSE
Remember that if you are dealing with floating point numbers use as.numeric otherwise you will truncate the floating point part of your number using as.integer

What about !grepl("[^0-9]","993T3")?
Edit: This returns TRUE for the empty string. To avoid this, use
!grepl("[^0-9]", x) & nzchar(x)
for a vector x of character type.

Related

Regex to get values of format Number-Decimal-Number (eg 1.2)

I need to ensure that a textbox is having a specific format entered against it... Number from a variable then a Decimal Point then any other number (1.10, 2.6 etc...) The important bit is that the first number should come from a variable then it must be a decimal followed by another number.
I have not been able to find anything too specific and the REGEX functionality looks to require a bit of investigation of how it all works... If I can get a quick result here would be great though!
I instinctively (although didnt expect it to work) tried:
If System.Text.RegularExpressions.Regex.IsMatch(txbCriterionNo.Text, OutcomeNo.ToString() + "." + "^[0-9]+$") Then
...
where OutcomeNo is an integer variable - so I hope you can see what I am aiming to get. So, the format MUST be integer variable - decimal point then another integer value.
What should work:
1.5 or 5.42 or 10.5
What shouldn't work:
.14 or a.1 or 1.c
etc...
Thanks!
Chris85 pointed me in the right direction, but I also needed to ensure that the first number matched a variable value so I have arrived at the following which works a treat...
If System.Text.RegularExpressions.Regex.IsMatch(txbCriterionNo.Text, "^\d+\.\d+$") And txbCriterionNo.Text.Substring(0, Convert.ToInt32(InStr(txbCriterionNo.Text, "."))) = OutcomeNo Then
Here we are fistly using the regex "^\d+.\d+$" to make sure the format is correct [number][decimal][number] and then a second check get the position of the decimal and using that to get the substring we want to compare against my variable OutcomeNo.
Thanks all!!
TextBox This will allow only digits and dot to be enetered. And it will have to start with a digit.
Private Sub txtValue_KeyPress(ByVal sender As Object, ByVal e As System.Windows.Forms.KeyPressEventArgs) Handles txtValue.KeyPress
Dim txtValue As txtValue = DirectCast(sender, txtValue)
If Not (Char.IsDigit(e.KeyChar) Or Char.IsControl(e.KeyChar) Or (e.KeyChar = "." And txtValue.Text.IndexOf(".") < 0) ) Then
e.Handled = True
If txtValue.Text.StartsWith(".") Then
txtValue.Text = ""
End If
End If
End Sub

Subsetting a string based on pre- and suffix

I have a column with these type of names:
sp_O00168_PLM_HUMAM
sp_Q8N1D5_CA158_HUMAN
sp_Q15818_NPTX1_HUMAN
tr_Q6FGH5_Q6FGH5_HUMAN
sp_Q9UJ99_CAD22_HUMAN
I want to remove everything before, and including, the second _ and everything after, and including, the third _.
I do not which to remove based on number of characters, since this is not a fixed number.
The output should be:
PLM
CA158
NPTX1
Q6FGH5
CAD22
I have played around with these, but don't quite get it right..
library(stringer)
str_sub(x,-6,-1)
That’s not really a subset in programming terminology1, it’s a substring. In order to extract partial strings, you’d usually use regular expressions (pretty much regardless of language); in R, this is accessible via sub and other related functions:
pattern = '^.*_.*_([^_]*)_.*$'
result = sub(pattern, '\\1', strings)
1 Aside: taking a subset is, as the name says, a set operation, and sets are defined by having no duplicate elements and there’s no particular order to the elements. A string by contrast is a sequence which is a very different concept.
Another possible regular expression is this:
sub("^(?:.+_){2}(.+?)_.+", "\\1", vec)
# [1] "PLM" "CA158" "NPTX1" "Q6FGH5" "CAD22"
where vec is your vector of strings.
A visual explanation:
> gsub(".*_.*_(.*)_.*", "\\1", "sp_O00168_PLM_HUMAM")
[1] "PLM"

Regex for both double precision value or empty string: ^[0-9]\d*(\.\d+)?$

For validation if a text is a double precision value, I use ^[0-9]\d*(\.\d+)?$. However, what regex should I use so that either an empty string or a double precision value would match?
You mean as in empty string?
You could use this:
^([0-9]\d*(.\d+)?|)$
Though to make it work as intended, you probably want:
^([0-9]+(\.[0-9]+)?|)$
or
^(\d+(\.\d+)?|)$
Notice I put an or operator | there and since there's nothing after it, it will match an empty line.
This should work
^\d*(?:\d\.\d+)?$
It will match strings like:
'123'
'123.4'
'0.3'
''
It will not allow strings start with a decimal point (e.g. .3); if you'd like allow that as well, use this:
^\d*(?:\.\d+)?$
If you'd also like allow strings that end with a decimal point (e.g. 3.), use this:
^\d*(?:\.\d*)?$
You could also do this
^\d+([.]\d+)?$|^$

Regexp for a double

I have got this regexp "^[0-9]+\.?[0-9]*$") to match a double number or an integer number in visual C++ but it doesn't seem to work. Any ideas?
This is how I am applying the code:
if (System::Text::RegularExpressions::Regex::IsMatch(e0, "^[0-9]+\.?[0-9]*$")){
e0_val = System::Convert::ToDouble(e0);
}
the regexp above is not perfect since it accepts "09" which is not a valid number. a better expression would be:
"^(-?)(0|([1-9][0-9]*))(\\.[0-9]+)?$"
where:
1. is an optional negative sign;
2. is zero or a valid non-zero integer;
4. is the optional fracture part;
in theory, the fracture part should be written as "(\.[0-9]*[1-9])?"
instead, because a number must not have tailing zeroes. in practice, the source string might have been created with a fixed number of digits e.g:
printf("%.1f", x);
so it might easily end with a zero character. and, of course, these are all fixed point representations, not the doubles themselves. a double number can
be written as -1.23e-4 as well instead of -0.000123.
There's nothing wrong with the regex per se, it's your escaping that's at fault. You need to double escape the \ character since that's also a C++ string escape character.
Additionaly there is an edge case where this regex would think that 1. is a valid floating pointer number. So you might be better off with /^[0-9]+(\\.[0-9]+)?$ which eliminates that possibility.
Maybe not a direct answer, just useful information. The regexp:
std::regex rx(R"(^([+-]?(?:[[:d:]]+\.?|[[:d:]]*\.[[:d:]]+))(?:[Ee][+-]?[[:d:]]+)?$)");
matches strings:
"1", "0", "10",
"1000.1", "+1",
"+10", "-10", "1.",
".1", "1.1", "+1.",
"-1.", "+.1", "-.1",
"009", "+009", "-009",
"-01e0", "+01E0", "+1e-1",
"+1e+1", "+1.e1", "1E1",
"1E+1", "0.001e-12", "0.111111111111111"
and does not matches the next strings:
".", "1a", "++1",
"+-1", "+", "-.",
"-", "--1.", "1.e.1",
"1e.1", "0+.e0"
The first ones look like valid values for the double type in C++, e.g. double test = +009.e+10 is OK.
Play it in ideone.com: https://ideone.com/ooF8sG
/^[0-9]+.[0-9]+$ : use this for doubles.
accepts 123.123 types.
/^[0-9]*[.]?[0-9]+$/
Regex above works for doubles such as "45.5", "12", ".12"
(\d+)?\.(\d+)?
Regex above works for doubles such as "45.5", "12.", ".12"

Optional characters in a regex

The task is pretty simple, but I've not been able to come up with a good solution yet: a string can contain numbers, dashes and pluses, or only numbers.
^[0-9+-]+$
does most of what I need, except when a user enters garbage like "+-+--+"
I've not had luck with regular lookahead, since the dashes and pluses could potentially be anywhere in the string.
Valid strings:
234654
24-3+-2
-234
25485+
Invalid:
++--+
How about this:
([+-]?\d[+-]?)+
which means "one or more digits, each of which can be preceded or followed by an optional plus or minus".
Here's a Python test script:
import re
TESTS = "234654 24-3+-2 -234 25485+ ++--+".split()
for test in TESTS:
print test, ":", re.match(r'([+-]?\d[+-]?)+', test) is not None
which prints this:
234654 : True
24-3+-2 : True
-234 : True
25485+ : True
++--+ : False
How about:
^[0-9+-]*[0-9][0-9+-]*$
This ensures that there is at least one digit somewhere in the string. (It looks like it might have a lot of backtracking, though. But on the other hand it doesn't have a + or * wrapped inside another + or *, which I don't like either.)
^([+-]*[0-9]+[+-]*)+$
Another solution using a positive look behind assertion ensuring there is at leat one number.
^[0-9+-]+$(?<=[0-9][+-]*)
Or using a positive look ahead assertion.
(?=[+-]*[0-9])^[0-9+-]+
I like the
^(?=.*\d)[\d+-]+$
solution, myself. It says exactly what you need without requiring any head-scratching.
I'd do it like this:
^[-+]*\d[\d+-]*$
Fast is good!