Transform syntax in R - regex

Is it possible to transform expressions like this, using R:
IF(expr.bool, expr1, expr2) into if (expr.bool) expr1 else expr2
AND(expr.bool1, expr.bool2) (or &&) into expr.bool1 & expr.
OR(expr.bool1, expr.bool2) (or ||) into expr.bool1 | expr.bool2
NOT(expr.bool) into !expr.bool
TRUE into 1
FALSE into 0
and so on.
I have tried the ast package and using substitute to build an expression tree and then adapt them to the new syntax but no one seems to work.
What I want to do is to read an expression string using the syntax in the left, parse it and then use eval to get a float result.
p.s. I am completely new to R.

Everything that does something is a function in R. Just do something like this:
IF <- `if`
IF(FALSE, 1, 2)
#[1] 2
NOT <- `!`
NOT(TRUE)
#[1] FALSE
Then eval/parse your strings.
Coercing logical values to integers can be done with as.integer.

Related

Finding permutations using regular expressions

I need to create a regular expression (for program in haskell) that will catch the strings containing "X" and ".", assuming that there are 4 "X" and only one ".". It cannot catch any string with other X-to-dot relations.
I have thought about something like
[X\.]{5}
But it catches also "XXXXX" or ".....", so it isn't what I need.
That's called permutation parsing, and while "pure" regular expressions can't parse permutations it's possible if your regex engine supports lookahead. (See this answer for an example.)
However I find the regex in the linked answer difficult to understand. It's cleaner in my opinion to use a library designed for permutation parsing, such as megaparsec.
You use the Text.Megaparsec.Perm module by building a PermParser in a quasi-Applicative style using the <||> operator, then converting it into a regular MonadParsec action using makePermParser.
So here's a parser which recognises any combination of four Xs and one .:
import Control.Applicative
import Data.Ord
import Data.List
import Text.Megaparsec
import Text.Megaparsec.Perm
fourXoneDot :: Parsec Dec String String
fourXoneDot = makePermParser $ mkFive <$$> x <||> x <||> x <||> x <||> dot
where mkFive a b c d e = [a, b, c, d, e]
x = char 'X'
dot = char '.'
I'm applying the mkFive function, which just stuffs its arguments into a five-element list, to four instances of the x parser and one dot, combined with <||>.
ghci> parse fourXoneDot "" "XXXX."
Right "XXXX."
ghci> parse fourXoneDot "" "XX.XX"
Right "XXXX."
ghci> parse fourXoneDot "" "XX.X"
Left {- ... -}
This parser always returns "XXXX." because that's the order I combined the parsers in: I'm mapping mkFive over the five parsers and it doesn't reorder its arguments. If you want the permutation parser to return its input string exactly, the trick is to track the current position within the component parsers, and then sort the output.
fourXoneDotSorted :: Parsec Dec String String
fourXoneDotSorted = makePermParser $ mkFive <$$> x <||> x <||> x <||> x <||> dot
where mkFive a b c d e = map snd $ sortBy (comparing fst) [a, b, c, d, e]
x = withPos (char 'X')
dot = withPos (char '.')
withPos = liftA2 (,) getPosition
ghci> parse fourXoneDotSorted "" "XX.XX"
Right "XX.XX"
As the megaparsec docs note, the implementation of the Text.Megaparsec.Perm module is based on Parsing Permutation Phrases; the idea is described in detail in the paper and the accompanying slides.
The other answers look quite complicated to me, given that there are only five strings in this language. Here's a perfectly fine and very readable regex for this:
\.XXXX|X\.XXX|XX\.XX|XXX\.X|XXXX\.
Are you attached to regex, or did you just end up at regex because this was a question you didn't want to try answering with applicative parsers?
Here's the simplest possible attoparsec implementation I can think of:
parseDotXs :: Parser ()
parseDotXs = do
dotXs <- count 5 (satisfy (inClass ".X"))
let (dots,xS) = span (=='.') . sort $ dotXs
if (length dots == 1) && (length xS == 4) then do
return ()
else do
fail "Mismatch between dots and Xs"
You may need to adjust slightly depending on your input type.
There are tons of fancy ways to do stuff in applicative parsing land, but there is no rule saying you can't just do things the rock-stupid simple way.
Try the following regex :
(?<=^| )(?=[^. ]*\.)(?=(?:[^X ]*X){4}).{5}(?=$| )
Demo here
If you have one word per string, you can simplify the regex by this one :
^(?=[^. \n]*\.)(?=(?:[^X \n]*X){4}).{5}$
Demo here

Regular Expression Select Comma But Not In Between Parentheses

I'm looking to create a function in R that loads the defaults of a given function. To do this, I'm using the args argument on a function and looking to break it down to the defaulted arguments of the function and load those into the global environment. This takes a bit of regular expressions and have bumped into this that I'm having difficulty addressing.
Here is a sample function:
myFunc <- function(a = 1, b = "hello world", c = c("Hello", "World")) {}
I've gotten it down to this point using my own functions:
x <- "a = 1, b = \"hello world\", c = c(\"Hello\", \"World\")"
However, where I am struggling is on splitting the function arguments up. I wanted to split on a comma, but if you have a function argument that has a comma within the default (like the c argument does), then that causes issues. What I'm thinking is if there is a way to call a regular expression that matches a comma, but not a comma this in between two parentheses, then I could use strsplit with that expression to get what I want.
My attempt to match the case of a comma between two parentheses looks like this:
\\(.*,.*\\)
Now, I've looked into how to do what I described above and it seems like a negative look ahead may be what I need, so I've attempted to do something like this.
splitx <- strsplit(x, "(?!\\(.*,.*\\)(,)")
But R tells me it is an illegal regular expression. If I set perl = TRUE in the argument, it just returns the same string. Any help here would be greatly appreciated and I hope I've been clear!
I'm going to try and answer your underlying question.
The function formals() returns a pairlist of the formal arguments of a function. You can use the result of formals() by testing for is.symbol() and is.null(). Anything that isn't a symbol and isn't null either, contains a default value.
For example:
get_default_args <- function(fun){
x <- formals(fun)
w <- sapply(x, function(x)!is.symbol(x) && !is.null(x))
x[w]
}
Try it on lm():
get_default_args(lm)
$method
[1] "qr"
$model
[1] TRUE
$x
[1] FALSE
$y
[1] FALSE
$qr
[1] TRUE
$singular.ok
[1] TRUE
Try it on your function:
myFunc <- function(a = 1, b = "hello world", c = c("Hello", "World")) {}
get_default_args(myFunc)
$a
[1] 1
$b
[1] "hello world"
$c
c("Hello", "World")
Note that the comments suggests using match.call(). This may or may not work for you, but match.call() evaluates the argument in the environment of the function after being called, whereas formals() evaluates the language object itself. Therefore you don't need to call the function at all when using formals().
While I don't think this is the right approach (use match.call() to extract arguments as they were passed), a matching regex is
x <- "a = 1, b = \"hello world\", c = c(\"Hello\", \"World\")"
strsplit(x, ",(?![^()]*\\))", perl=TRUE)
#> [[1]]
#> [1] "a = 1" " b = \"hello world\"" " c = c(\"Hello\", \"World\")"

R: Substring after finding a character position?

I have seen a few questions concerning returning the position of a character with a String in R, but maybe I cannot seem to figure it out for my case. I think this is because I'm trying to do it for a whole column rather than a single string, but it could just be my struggles with regex.
Right now, I have a data.frame with a column, df$id that looks something like 13.23-45-6A. The number of digits before the period is variable, but I would like to retain just the part of the string after the period for each row in the column. I would like to do something like:
df$new <- substring(df$id, 1 + indexOf(".", df$id))
So 12.23-45-6A would become 23-45-6A, 0.1B would become 1B, 4.A-A would become A-A and so on for an entire column.
Right now I have:
df$new <- substr(df$id, 1 + regexpr("\\\.", data.count$id),99)
Thanks for any advice.
As #AnandaMahto mentioned his comment, you would probably be better simplifying things and using gsub:
> x <- c("13.23-45-6A", "0.1B", "4.A-A")
> gsub("[0-9]*\\.(.*)", "\\1", x, perl = T, )
[1] "23-45-6A" "1B" "A-A"
To make this work with your existing data frame you can try:
df$id <- gsub("[0-9]*\\.(.*)", "\\1", df$id, perl = T, )
another way is to use strsplit. Using #Tims example
x <- c("13.23-45-6A", "0.1B", "4.A-A")
sapply(strsplit(x, "\\."), "[", -1)
"23-45-6A" "1B" "A-A"
You could remove the characters including the . using
sub('[^.]*\\.', '', x)
#[1] "23-45-6A" "1B" "A-A"
data
x <- c("13.23-45-6A", "0.1B", "4.A-A")

Sequentially replace multiple places matching single pattern in a string with different replacements

Using stringr package, it is easy to perform regex replacement in a vectorized manner.
Question: How can I do the following:
Replace every word in
hello,world??your,make|[]world,hello,pos
to different replacements, e.g. increasing numbers
1,2??3,4|[]5,6,7
Note that simple separators cannot be assumed, the practical use case is more complicated.
stringr::str_replace_all does not seem to work because it
str_replace_all(x, "(\\w+)", 1:7)
produces a vector for each replacement applied to all words, or it has
uncertain and/or duplicate input entries so that
str_replace_all(x, c("hello" = "1", "world" = "2", ...))
will not work for the purpose.
Here's another idea using gsubfn. The pre function is run before the substitutions and the fun function is run for each substitution:
library(gsubfn)
x <- "hello,world??your,make|[]world,hello,pos"
p <- proto(pre = function(t) t$v <- 0, # replace all matches by 0
fun = function(t, x) t$v <- v + 1) # increment 1
gsubfn("\\w+", p, x)
Which gives:
[1] "1,2??3,4|[]5,6,7"
This variation would give the same answer since gsubfn maintains a count variable for use in proto functions:
pp <- proto(fun = function(...) count)
gsubfn("\\w+", pp, x)
See the gsubfn vignette for examples of using count.
I would suggest the "ore" package for something like this. Of particular note would be ore.search and ore.subst, the latter of which can accept a function as the replacement value.
Examples:
library(ore)
x <- "hello,world??your,make|[]world,hello,pos"
## Match all and replace with the sequence in which they are found
ore.subst("(\\w+)", function(i) seq_along(i), x, all = TRUE)
# [1] "1,2??3,4|[]5,6,7"
## Create a cool ore object with details about what was extracted
ore.search("(\\w+)", x, all = TRUE)
# match: hello world your make world hello pos
# context: , ?? , |[] , ,
# number: 1==== 2==== 3=== 4=== 5==== 6==== 7==
Here a base R solution. It should also be vectorized.
x="hello,world??your,make|[]world,hello,pos"
#split x into single chars
x_split=strsplit(x,"")[[1]]
#find all char positions and replace them with "a"
x_split[gregexpr("\\w", x)[[1]]]="a"
#find all runs of "a"
rle_res=rle(x_split)
#replace run lengths by 1
rle_res$lengths[rle_res$values=="a"]=1
#replace run values by increasing number
rle_res$values[rle_res$values=="a"]=1:sum(rle_res$values=="a")
#use inverse.rle on the modified rle object and collapse string
paste0(inverse.rle(rle_res),collapse="")
#[1] "1,2??3,4|[]5,6,7"

regular expression with space

I am using regular expression in R with the following code:
> temp <- c("Herniorrhaphy, left inguinal", "Herniorrhaphy, right inguinal")
> grep("Herniorrhaphy, [left|right] inguinal",temp)
integer(0)
> grep("Herniorrhaphy, [left inguinal|right inguinal]",temp)
[1] 1 2
I wonder why the two regular expression give difference result, thanks.
According to regexp explanation in the documentation (http://stat.ethz.ch/R-manual/R-devel/library/base/html/regex.html):
Note that alternation does not work
inside character classes, where | has
its literal meaning.
That explains why the first alternative doesn't return any results because '[' and ']' characters denote a character class. The correct sytax should be:
grep("Herniorrhaphy, (left|right) inguinal",temp)
On my R, the second alternative also returns empty set as well:
> temp <- c("Herniorrhaphy, left inguinal", "Herniorrhaphy, right inguinal")
> grep("Herniorrhaphy, [left inguinal|right inguinal] inguinal",temp)
integer(0)
>
Are you sure you are copying directly from the workspace?
I think you want brackets ( ) not character class [ ], ie
"Herniorrhaphy, (left|right) inguinal"
"Herniorrhaphy, (left inguinal|right inguinal)"