Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
So I have a list of product item descriptions. I have loaded this into R. Most of these descriptions are utter nonsense and we are trying to extract a decent item code from them.
Instead of going through it line by line, can I use a regular expression in R to create a new vector that will only have integer values from the list?
I have most of the code now
JJ <- read.csv2(file.choose(),header= TRUE)
JJ$X <- gsub(pattern = "[0-9]+", replacement = "",
x = JJ$LGY_DHB_ITEM_DESCRIPTION, ignore.case = TRUE)
But I am unsure what to put in the replacement argument.
you can try replacing non (^) numerical ([:digit:]) characters with empty string :
gsub("[^[:digit:]]*", "", 'PRIVATE CONTRACT INV 710456354')
[1] "710456354"
but this wont work if you have more than one numeric in your string:
gsub("[^[:digit:]]*", "", 'PRIVATE 123 CONTRACT INV 710456354')
[1] "123710456354"
You could try to find the longest numercial in each string:
JJ <- data.frame(LGY_DHB_ITEM_DESCRIPTION=c('PRIVATE CONTRACT INV 710456354', 'PRIVATE 123 CONTRACT INV 710456354'))
m <- gregexpr("[0-9]*", JJ$LGY_DHB_ITEM_DESCRIPTION)
all_m <- regmatches(JJ$LGY_DHB_ITEM_DESCRIPTION, m)
JJ$X <- mapply(FUN =function(stri,idx) stri[idx],all_m, sapply(lapply(all_m,nchar),which.max))
JJ
LGY_DHB_ITEM_DESCRIPTION X
1 PRIVATE CONTRACT INV 710456354 710456354
2 PRIVATE 123 CONTRACT INV 710456354 710456354
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
Can an element be appended to a list in Scala, for example, I have a list i.e.,
var li = List (1, 2, 3, 4, 5)
can I append 6 to the list, if so how?
In case of tuple how can I append, for example,
var tu = (1, "Hi", 20000)
can I append the "Hello" string to it?
Both List and Tuple are immutable constructs so to append or prepend an element, you create a new List/Tuple.
val oldLst = List(1,2,3,4,5)
val newLst = oldLst :+ 6
Appending to a List is a linear operation, O(n). It's much more efficient when prepending.
val fromZero = 0 :: newLst
Scala 3 offers enhanced Tuple abilities not available in earlier Scala releases.
val threeTup = (1, "Hi", 20000)
val fourTup = threeTup ++ "Hello" *: EmptyTuple
Note: Experienced Scala practitioners never use var.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I have text column with following examplary data:
5,5,0.1;6,6,0.15;7,7,0.2;8,8,0.25;9,9,0.3;10,10,0.35;11,11,0.4;12,12,0.45;13,13,0.5;14,14,0.55;15,15,0.6;16,16,0.65;17,17,0.7;18,18,0.75;19,19,0.8;20,20,0.85;
I need to add some fixed value to each of numeric values (the one before semicolon)
so for example from:
5,5,0.1;6,6,0.15; I want add 0.15 so result would be:
5,5,0.25;6,6,0.3;
I guess I should try something with regexp_replace but I have no idea how to start here
The correct solution would be fix your broken data model and not store multiple, delimited values in a single column.
I wouldn't do this with a regex, but unnesting the elements of the string, adding the value to the third element, then aggregate everything back into the broken design:
update badly_designed_table
set denormalized_column =
(select string_agg(concat_ws(',', a, b, round(c + 0.15,2)), ';' order by idx)
from (
select split_part(val, ',', 1) as a,
split_part(val, ',', 2) as b,
split_part(val, ',', 3)::numeric as c,
idx
from unnest(string_to_array(bad_column, ';')) with ordinality as x(val,idx)
-- skip the "empty" element generated by the trailing ;
where nullif(val, '') is not null
) t)
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
hi i have a value which is
Blockquote
1 1:0.0644343 1 1:0.0309334 1 1:0.0261616
Blockquote
i want to separate value by space but after certain character to get result like this..is there any possible solution . i know we can do in regex
Blockquote
"1 1:0.0644343"
"1 1:0.0309334"
"1 1:0.0261616"
Blockquote
I think regex is the perfect tool here:
var str = "Blockquote 1 1:0.0644343 1 1:0.0309334 1 1:0.0261616 Blockquote"
let regex = try! NSRegularExpression(pattern: "(\\d \\d:[\\.0-9]+)", options: [])
let matches = regex.matchesInString(str, options: [], range: NSMakeRange(0, str.characters.count))
for m in matches.reverse() {
let range = m.rangeAtIndex(1)
let startIndex = str.startIndex.advancedBy(range.location)
let endindex = startIndex.advancedBy(range.length)
let value = str[startIndex..<endindex]
str.replaceRange(startIndex..<endindex, with: "\"\(str[startIndex..<endindex])\"")
}
print(str)
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
In my R code, I have the following content of x as a result of lda prediction output.
[1] lamb
Levels: lamb cow chicken
I would like to capture the word "lamb" in the first line and not the second line.
I had the following reg expression which did not work.
if (regmatches(x,regexec(".*?([a-z]+)",x))[[1]][2]=="lamb"){
cat("It is a lamb")
}
Instead, I also got the following error :-
Error in regexec(".*?([a-z]+)", x) : invalid 'text' argument
Anyone with help ?
Thanks in advance.
mf
Direct Answer:
It is a variable type error. See ?predict.lda to learn why: The return object of a predict() when used with an object of class lda is a list. You just want the first element of the list, which is a factor for an object of type integer. Factors in R store some characters for every element in their level component, which can be accessed by levels() (Read ?factor as well.). But what you want is to access the explicit value your factor shows, which can be acheived by as.character(). By the way: The second line does not get checked by the regex. It is just standard console output of a factor, see ?print.factor.
Here's an example, based on thepredict.lda() help page:
tr <- sample(1:50, 25)
train <- rbind(iris3[tr,,1], iris3[tr,,2], iris3[tr,,3])
test <- rbind(iris3[-tr,,1], iris3[-tr,,2], iris3[-tr,,3])
cl <- factor(c(rep("s",25), rep("c",25), rep("v",25)))
z <- lda(train, cl)
x_lda <- predict(z, test)
# x_lda is a list
typeof(x_lda)
# The first element of the list, called "class", is a factor of type integer.
typeof(x_lda$class)
# Now we create a character vector from the factor:
as.character(x_lda$class)
With an explicit character object, your code works for me:
x <- "lamb"
regmatches(x,regexec(".*?([a-z]+)",x))[[1]][2]=="lamb"
[1] TRUE
So you need to coerce your object to character, and then use it as the "text" argument for the regexec function.
Actual Answer:
There are better ways to do this.
You nest and chain a lot of functions in one line. This is barely readable and makes debugging hard.
If you know that the output will always consist of certain elements (especially, since you know the input of your lda prediction and therefore know the different factor levels beforehand), you can simply check them by == and maybe any() (continuing with the example from before):
levels(cl)
[1] "c" "s" "v"
any(as.character(x_lda$class)=="c")
[1] TRUE
See the help file for ?any, if you don't know what it does.
Finally, if you just want to print "It is a lamb" in the end, and your output will always just have one element, you can simply use paste():
paste("It is a", as.character(x))
[1] "It is a lamb"
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
This is linked the my previous question, regex to add hypen in dates.
I would now like to be able to remove the seconds and milliseconds/change it to zero using gsub again as well
i.e. something like:
x <- c("20130603 00:00:03.102","20130703 00:01:03.103","20130804 00:03:03.104")
y <- gsub([REGEX PATTERN TO MATCH],[REPLACEMENT PATTERN TO INSERT HYPHEN and REMOVE SECONDS] ,x)
> y
[1] "2013-06-03 00:00:00" "2013-07-03 00:01:00" "2013-08-04 00:03:00"
You can use strptime to parse your objects into POSIXlt objects which, when printed, are exactly in the format you expect:
y <- strptime(x, "%Y%m%d %H:%M:%S")
# [1] "2013-06-03 00:00:03" "2013-07-03 00:01:03" "2013-08-04 00:03:03"
To remove seconds, use trunc:
y <- trunc(y, units = "mins")
# [1] "2013-06-03 00:00:00" "2013-07-03 00:01:00" "2013-08-04 00:03:00"
Having your objects as date/time objects will open a lot of doors, but if you really mean to store the output as a character vector, then just use as.character:
y <- as.character(y)
A lubridate version:
library(lubridate)
dt <- ymd_hms(x)
dt2 <- update(dt, seconds = 0)
You can try this regex, which I added a bit:
gsub("(\\d{4})(\\d{2})(\\d{2}) (\\d{2}:\\d{2}).*", "\\1-\\2-\\3 \\4:00", subject, perl=TRUE);
demo on regex101.