Find and add to all floats - replace

This is probably a very simple problem, but I was wondering how to find all floats in a file and surround them by some text, e.g.
Input file:
input1 & X & Y & Z \\
input2 & ...
Output file:
input1 & float(X) & float(Y) & float(Z) \\
input2 & ...
I was thinking about using sed or awk, but I don't see how the found float can be reused in the replaced string.

dirty and quick with sed:
kent$ echo "input1 & 3.5 & 0.5 & 3.55"|sed 's/[0-9]\+\.[0-9]\+/float(&)/g'
input1 & float(3.5) & float(0.5) & float(3.55)

Using awk
awk '{$3="float("$3")";$5="float("$5")";$7="float("$7")"}1' file
input1 & float(X) & float(Y) & float(Z) \\
This just replace based on position.

Related

Regular Expressions Finding A Set of Numbers

I am stumped on trying to figure out regular expressions so I thought I would ask the big dogs.
I have a string that can range from 1-4 sets as follows:
1234-abcd, baa74739, maps21342, 6789
Now I have figured out the regular expressions for the 1234-abcd, baa74739, and maps21342. However, I am having trouble figuring out a code to pull the numbers that stand alone. Does anyone have an opinion on a way around this?
Example of the regex I used:
dbout.Range("D7").Formula = "=RegexExtract(DH7," & Chr(34) & "([M][A][P][S]\d+)" & Chr(34) & ")"
dbout.Range("D7").AutoFill Destination:=dbout.Range("D7:D2000")
for digit stand alone replace
dbout.Range("D7").Formula = "=RegexExtract(DH7," & Chr(34) & "([M][A][P][S]\d+)" & Chr(34) & ")"
dbout.Range("D7").AutoFill Destination:=dbout.Range("D7:D2000")
with
dbout.Range("D7").Formula = "=RegexExtract(DH7," & Chr(34) & "(\b\d+\b)" & Chr(34) & ")"
dbout.Range("D7").AutoFill Destination:=dbout.Range("D7:D2000")
OR
dbout.Range("D7").Formula = "=RegexExtract(DH7,""(\b\d+\b)"")"
dbout.Range("D7").AutoFill Destination:=dbout.Range("D7:D2000")

regex to replace captured group, want to filter by linestart

(notice the newline before Bob)
my_string = "Alice & 1 & a \nBob & 2 & b"
gsub("(?m)(?<=& )(.+?)","(\\1)", my_string, perl=TRUE)
> "Alice & (1) & (a) \nBob & (2) & (b)"
How do I adjust the regex to only parenthesise the entries in the line that starts with Alice?
All variations of ^A that I've tried either capture Alice itself or only capture the first occurence of the group after Alice.
edit: expected output
"Alice & (1) & (a) \nBob & 2 & b"
Use (*SKIP)(*F)
gsub("^(?!Alice\\b).*(*SKIP)(*F)|(?<=& )(\\S+)", "(\\1)", s, perl=T)
DEMO
Now sure how efficient this is, but you can always apply it to a vector containing a separate entry for each line
l <- strsplit(my_string, "\n")[[1]]
paste(ifelse(substr(l, 1, 5) == "Alice", gsub("(?<=& )(.+?)(?m)","(\\1)", l, perl=TRUE), l), collapse = "\n")
# [1] "Alice & (1) & (a) \nBob & 2 & b"

Regular expression doesn't work

I have this simple line of code using regular expressions where I want to substitute pieces of strings with empty space:
newAddress = myAddress.replace(/^.*?(ramp|arterial|majorroad|street &|highway &|highway|street|street &|street & highway|arterial & street|street & arterial|majorroad &|majorroad & ramp|ramp & majorroad|major road|highway & majorroad)\,/gi, '');
but having in a variable this:
Highway & Contrada Torremuzza, 95121 Catania CT
why it didn't removed the "highway &" part?
It looks to me like you need neither the .* nor the comma. The .* will cause you to replace everything that precedes your string.
Try just this:
(ramp|arterial|majorroad|street &|highway &|highway|street|street &|street & highway|arterial & street|street & arterial|majorroad &|majorroad & ramp|ramp & majorroad|major road|highway & majorroad)
Or, if you're in a mood for fancy optimizations:
(?:majorroad & )?ramp|(?:major r|(?:(?:ramp|highway) & )?majorr)oad|(?:highway|majorroad|street) &|(?:arterial & )?street|(?:street & )?(?:arterial|highway)
Just kidding. In theory this is more efficient, but it's harder to maintain.
It is trying to match a comma as well, you need to make the comma optional or remove it in this case. Also unless you want to remove the preceding text as well remove the beginning of string ^ anchor and .*?
newAddress = myAddress.replace(/(ramp|arterial|majorroad|street &|highway &|highway|street|street &|street & highway|arterial & street|street & arterial|majorroad &|majorroad & ramp|ramp & majorroad|major road|highway & majorroad)/gi, '');
I think I just solved by myself with:
newAddress = myAddress.replace(/^.*?ramp|arterial|majorroad|street|highway| &|\,/gi, '');
shorter and more efficient...so at least it will match the word plus the &
Cheers,
Luigi

Changing numbers within string in R?

I have a problem where I would appreciate any ideas on how to do it with R. The problem is this: I have a latex-table stored. The numbers in the table are all equipped with three digits after the decimal point. I want to cut these digits off, leaving the others in the table. (Think of the numbers representing estimation results, but with dimension "dollar". Then, a value of 145.553 does not make much sense, and 145 is enough). The person who created these tables did not think too much about this, so here I go trying to avoid going through the table by hand. :)
So far, I only found different solutions for how to extract numbers from strings, not how to change them so that the string itself is unaltered otherwise.
Example:
strings <- c(
"a.name & $-436.735 $ & $-710.832$ \\\\",
"std(a.name) & $(1403.604)$ & $(1274.283)$ \\\\",
)
The solution should return
strings <- c(
"a.name & $-436 $ & $-710$ \\\\",
"std(a.name) & $(1403)$ & $(1274)$ \\\\",
)
and, of course, if it was possible to do the rounding correctly, then it would be even better. But this is not of upmost importance.
I tried using gsub with \\.... to identify the strings that contain a period followed by three other numbers, but this also gives me the variable names, a.name in my example.
Does anyone have an idea how I could accomplish what I would like to do?
Thanks in advance!
This uses base R's gregexpr, regmatches, and regmatches<- to round any number with a decimal part.
It will work correctly even for numbers like .789 (i.e. with no digits before the decimal point) and -0.4 (which should round to a number without a minus sign). The one situation where it might perform less than ideally is that it will not remove the trailing decimal from a number like 10. .
string <- c("a.name & $-436.735 $ & $-710.832$ \\\\",
"std(a.name) & $(1403.604)$ & $(1274.283)$ \\\\")
f <- function(x) {
pat <- "(-)?[[:digit:]]+\\.[[:digit:]]*"
m <- gregexpr(pat, x)
regmatches(x,m) <- lapply(regmatches(x,m), function(X) round(as.numeric(X)))
x
}
f(string)
# [1] "a.name & $-437 $ & $-711$ \\\\"
# [2] "std(a.name) & $(1404)$ & $(1274)$ \\\\"
gsub(strings, pattern ="\\.[[:digit:]]{3}", replacement = "")
#[1] "a.name & $-436 $ & $-710$ \\\\" "std(a.name) & $(1403)$ & $(1274)$ \\\\"
To get the rounding, I'd do something along these lines but the brackets make it a little ugly -
sapply(
strsplit(
strings,
"\\$|\\$\\(|\\)\\$"
),
function(x)
paste(
x[1],'$',
ifelse(as.numeric(x[2]) <0, round(as.numeric(x[2]),0),paste0("(",round(as.numeric(x[2]),0),")")),'$',
x[3],'$',
ifelse(as.numeric(x[4]) <0, round(as.numeric(x[4]),0),paste0("(",round(as.numeric(x[4]),0),")")),'$',
x[5]
)
)
#[1] "a.name & $ -437 $ & $ -711 $ \\\\" "std(a.name) & $ (1404) $ & $ (1274) $ \\\\"

Is this one fulfilling my specifications by regex definition?

I want to make Latex table code from Unix cal output, e.g. It should look like:
Mo & Tu & We & Th & Fr \\
& & 1 & 2 & 3 \\
6 & 7 & 8 & 9 & 10 \\
13 & 14 & 15 & 16 & 17 \\
20 & 21 & 22 & 23 & 24 \\
27 & 28 & & & \\
I've come up with the following solution:
cal | sed -e '1d; /^$/d; s/^\(...\)\?\(...\)\?\(...\)\?\(...\)\?\(...\)\?\(...\)\?.*/\2 \& \3 \& \4 \& \5 \& \6 \\\\/'
Works like a charm! But I'm not sure if the result is defined. Wouldn't it be correct behaviour, e.g. for the first group to match the empty string, and for the second group to match the first three chars of any line (instead of chars 4-6)? And if not, would there be some switch to make a variation of it a correct behaviour (so I can know how to avoid it / control the behaviour)?
Well if you can use awk:
cal | awk 'BEGIN { OFS = " & " }
NR == 1 || $0 ~ "^$" { next }
NR == 2 { for (i=1;i<NF;i++) { printf("%-2s%s",$i,OFS) }
printf("%s %s\n",$NF," \\\\")
next
}
{ for (i=1;i<NF;i++) { printf("% 2i%s",$i,OFS) }
printf("% 2i%s\n",$NF," \\\\")
}'
will do the something really similar without too much regex...
Anyway, from my point of view you don't need those \? as they (the captured groups) must be always present.
My regex is fulfilling the spec. That is because the expression tree is expanded greedily from the left, so if there is a possible match which includes the first subexpression, then it will take this one.