regex remove seconds and milliseconds [closed] - regex

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
This is linked the my previous question, regex to add hypen in dates.
I would now like to be able to remove the seconds and milliseconds/change it to zero using gsub again as well
i.e. something like:
x <- c("20130603 00:00:03.102","20130703 00:01:03.103","20130804 00:03:03.104")
y <- gsub([REGEX PATTERN TO MATCH],[REPLACEMENT PATTERN TO INSERT HYPHEN and REMOVE SECONDS] ,x)
> y
[1] "2013-06-03 00:00:00" "2013-07-03 00:01:00" "2013-08-04 00:03:00"

You can use strptime to parse your objects into POSIXlt objects which, when printed, are exactly in the format you expect:
y <- strptime(x, "%Y%m%d %H:%M:%S")
# [1] "2013-06-03 00:00:03" "2013-07-03 00:01:03" "2013-08-04 00:03:03"
To remove seconds, use trunc:
y <- trunc(y, units = "mins")
# [1] "2013-06-03 00:00:00" "2013-07-03 00:01:00" "2013-08-04 00:03:00"
Having your objects as date/time objects will open a lot of doors, but if you really mean to store the output as a character vector, then just use as.character:
y <- as.character(y)

A lubridate version:
library(lubridate)
dt <- ymd_hms(x)
dt2 <- update(dt, seconds = 0)

You can try this regex, which I added a bit:
gsub("(\\d{4})(\\d{2})(\\d{2}) (\\d{2}:\\d{2}).*", "\\1-\\2-\\3 \\4:00", subject, perl=TRUE);
demo on regex101.

Related

Insert a quote in a string [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 4 months ago.
Improve this question
I'm looking to insert a quote in a string, but keep everything else. So, an example string:
' "2020-10-10",8000,"Hello" '
I want to put quotes around 8000 (or whatever number is there). so:
' "2020-10-10","8000","Hello" '
How would I do that in regex?
I'm not an expert on regex but you can do it you just have to do it twice. Because I couldn't figure out a way to look for ",char or char,".
function test() {
try {
let a = ' "2020-10-10",8000,"Hello" ';
a = a.replace(/,/g,'","');
a = a.replace(/""/g,'"');
console.log(a);
}
catch(err) {
console.log(err);
}
}
7:26:23 AM Notice Execution started
7:26:23 AM Info "2020-10-10","8000","Hello"
7:26:23 AM Notice Execution completed

Update series of numeric values in long string [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I have text column with following examplary data:
5,5,0.1;6,6,0.15;7,7,0.2;8,8,0.25;9,9,0.3;10,10,0.35;11,11,0.4;12,12,0.45;13,13,0.5;14,14,0.55;15,15,0.6;16,16,0.65;17,17,0.7;18,18,0.75;19,19,0.8;20,20,0.85;
I need to add some fixed value to each of numeric values (the one before semicolon)
so for example from:
5,5,0.1;6,6,0.15; I want add 0.15 so result would be:
5,5,0.25;6,6,0.3;
I guess I should try something with regexp_replace but I have no idea how to start here
The correct solution would be fix your broken data model and not store multiple, delimited values in a single column.
I wouldn't do this with a regex, but unnesting the elements of the string, adding the value to the third element, then aggregate everything back into the broken design:
update badly_designed_table
set denormalized_column =
(select string_agg(concat_ws(',', a, b, round(c + 0.15,2)), ';' order by idx)
from (
select split_part(val, ',', 1) as a,
split_part(val, ',', 2) as b,
split_part(val, ',', 3)::numeric as c,
idx
from unnest(string_to_array(bad_column, ';')) with ordinality as x(val,idx)
-- skip the "empty" element generated by the trailing ;
where nullif(val, '') is not null
) t)

R: Find product codes using Regular Expressions [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
So I have a list of product item descriptions. I have loaded this into R. Most of these descriptions are utter nonsense and we are trying to extract a decent item code from them.
Instead of going through it line by line, can I use a regular expression in R to create a new vector that will only have integer values from the list?
I have most of the code now
JJ <- read.csv2(file.choose(),header= TRUE)
JJ$X <- gsub(pattern = "[0-9]+", replacement = "",
x = JJ$LGY_DHB_ITEM_DESCRIPTION, ignore.case = TRUE)
But I am unsure what to put in the replacement argument.
you can try replacing non (^) numerical ([:digit:]) characters with empty string :
gsub("[^[:digit:]]*", "", 'PRIVATE CONTRACT INV 710456354')
[1] "710456354"
but this wont work if you have more than one numeric in your string:
gsub("[^[:digit:]]*", "", 'PRIVATE 123 CONTRACT INV 710456354')
[1] "123710456354"
You could try to find the longest numercial in each string:
JJ <- data.frame(LGY_DHB_ITEM_DESCRIPTION=c('PRIVATE CONTRACT INV 710456354', 'PRIVATE 123 CONTRACT INV 710456354'))
m <- gregexpr("[0-9]*", JJ$LGY_DHB_ITEM_DESCRIPTION)
all_m <- regmatches(JJ$LGY_DHB_ITEM_DESCRIPTION, m)
JJ$X <- mapply(FUN =function(stri,idx) stri[idx],all_m, sapply(lapply(all_m,nchar),which.max))
JJ
LGY_DHB_ITEM_DESCRIPTION X
1 PRIVATE CONTRACT INV 710456354 710456354
2 PRIVATE 123 CONTRACT INV 710456354 710456354

R : regular expression to match pattern in only the first line [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
In my R code, I have the following content of x as a result of lda prediction output.
[1] lamb
Levels: lamb cow chicken
I would like to capture the word "lamb" in the first line and not the second line.
I had the following reg expression which did not work.
if (regmatches(x,regexec(".*?([a-z]+)",x))[[1]][2]=="lamb"){
cat("It is a lamb")
}
Instead, I also got the following error :-
Error in regexec(".*?([a-z]+)", x) : invalid 'text' argument
Anyone with help ?
Thanks in advance.
mf
Direct Answer:
It is a variable type error. See ?predict.lda to learn why: The return object of a predict() when used with an object of class lda is a list. You just want the first element of the list, which is a factor for an object of type integer. Factors in R store some characters for every element in their level component, which can be accessed by levels() (Read ?factor as well.). But what you want is to access the explicit value your factor shows, which can be acheived by as.character(). By the way: The second line does not get checked by the regex. It is just standard console output of a factor, see ?print.factor.
Here's an example, based on thepredict.lda() help page:
tr <- sample(1:50, 25)
train <- rbind(iris3[tr,,1], iris3[tr,,2], iris3[tr,,3])
test <- rbind(iris3[-tr,,1], iris3[-tr,,2], iris3[-tr,,3])
cl <- factor(c(rep("s",25), rep("c",25), rep("v",25)))
z <- lda(train, cl)
x_lda <- predict(z, test)
# x_lda is a list
typeof(x_lda)
# The first element of the list, called "class", is a factor of type integer.
typeof(x_lda$class)
# Now we create a character vector from the factor:
as.character(x_lda$class)
With an explicit character object, your code works for me:
x <- "lamb"
regmatches(x,regexec(".*?([a-z]+)",x))[[1]][2]=="lamb"
[1] TRUE
So you need to coerce your object to character, and then use it as the "text" argument for the regexec function.
Actual Answer:
There are better ways to do this.
You nest and chain a lot of functions in one line. This is barely readable and makes debugging hard.
If you know that the output will always consist of certain elements (especially, since you know the input of your lda prediction and therefore know the different factor levels beforehand), you can simply check them by == and maybe any() (continuing with the example from before):
levels(cl)
[1] "c" "s" "v"
any(as.character(x_lda$class)=="c")
[1] TRUE
See the help file for ?any, if you don't know what it does.
Finally, if you just want to print "It is a lamb" in the end, and your output will always just have one element, you can simply use paste():
paste("It is a", as.character(x))
[1] "It is a lamb"

Different spellings of Chanukah Regex [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
Hannuka, Chanukah, Hanukkah...Due to transliteration from another language and character set, there are many ways to spell the name of this holiday. How many legitimate spellings can you come up with?
Now, write a regular expression that will recognise all of them.
According to http://www.holidays.net/chanukah/spelling.htm, it can be spelled any of the following ways:
Chanuka
Chanukah
Chanukkah
Channukah
Hanukah
Hannukah
Hanukkah
Hanuka
Hanukka
Hanaka
Haneka
Hanika
Khanukkah
Here is my regex that matches all of them:
/(Ch|H|Kh)ann?[aeiu]kk?ah?/
Edit: Or this, without branches:
/[CHK]h?ann?[aeiu]kk?ah?/
Call me a sucker for readability.
In Python:
def find_hanukkah(s):
import re
spellings = ['hannukah', 'channukah', 'hanukkah'] # etc...
for m in re.finditer('|'.join(spellings), s, re.I):
print m.group()
find_hanukkah("Hannukah Channukah, Hanukkah")
Something like C?hann?uk?kah? matches most of the common cases. There also a bunch of weirder spellings C?hann?uk?kah?|Han[aei]ka|Khanukkah matches almost every spelling I could think of (that had at least half a million hits on google).
((Ch|H|X|Х|Kh|J)[aа](н|n{1,2})(у|ou|[auei])(к|k|q){1,2}[aа]h?)|(חנו?כה)
This regex is much more inclusive and covers all of the following options:
Channuka
Channukah
Channukka
Channukkah
Chanuka
Chanukah
Chanukah
Chanukka
Chanukkah
Chanuqa
Hanaka
Haneka
Hanika
Hannuka
Hannukah
Hannukka
Hannukkah
Hanoukka
Hanuka
Hanukah
Hanukka
Hanukkah
Januka
Khanukkah
Xanuka
Ханука
Ханука
חנוכה
חנכה
Try this:
/^[ck]?hann?ukk?ah?$/i
I think the only approved spellings in English are Hanukkah and Chanukh, so it's something like
/(Ch|H)anuk?kah/
Or maybe even better
/(Chanukah|Hanukkah)/
I like Triptych's answer, but i would take it one step forward... also in python:
def valid(spelling):
import re
regex_spelling = re.compile(r'^[cCkK]{0,1}han{1,2}uk{1,2}ah$')
valid = regex_spelling.match(spelling)
if valid:
print 'Valid spelling'
else:
print spelling, " is not a spelling for the word"
to use it:
valid("hanukkah")