Converting numbers into time in R - regex

My data looks like this:
> str(m)
int [1:8407] 930 1050 1225 1415 1620 1840 820 1020 1215 1410 ...
This is the time in hours and minutes. I'm trying to turn it into something (9:30, 12:10, 16:40, 8:25...).
> m1 <- strptime(m, "%H%M")
> head(m1)
[1] NA "2015-10-14 10:50:00 VLAT"
[3] "2015-10-14 12:25:00 VLAT" "2015-10-14 14:15:00 VLAT"
[5] "2015-10-14 16:20:00 VLAT" "2015-10-14 18:40:00 VLAT"
> str(m1)
POSIXlt[1:8407], format: NA "2015-10-14 10:50:00" "2015-10-14 12:25:00" ...
How to convert a set of digits in time?

Using regex:
sub("(\\d{2})$", ":\\1", x)
#[1] "9:30" "10:50" "12:25" "14:15" "16:20" "18:40" "8:20"
#[8] "10:20" "12:15" "14:10"
A match is made on the last two digits and adds a colon before it.
Data
x <- c(930, 1050, 1225, 1415, 1620, 1840, 820, 1020, 1215, 1410)

We format the numbers with sprintf to pad leading 0 for 3 digit numbers, use strptime and then use format to get the hour and min.
format(strptime(sprintf('%04d', v1), format='%H%M'), '%H:%M')
#[1] "09:30" "10:50" "12:25"
Or another option is
sub('(\\d{2})$', ':\\1', v1)
#[1] "9:30" "10:50" "12:25"
data
v1 <- c(930, 1050,1225)

Another way is,
x <- c(645, 1234,2130)
substr(as.POSIXct(sprintf("%04.0f", x), format='%H%M'), 12, 16)
#[1] "06:45" "12:34" "21:30"

Related

Extracting position of pattern in a string using ifelse in R

I have a set of strings x for example:
[1] "0000000000000000000000000000000000000Y" "9000000000D00000000000000000000Y"
[3] "0000000000000D00000000000000000000X" "000000000000000000D00000000000000000000Y"
[5] "000000000000000000D00000000000000000000Y" "000000000000000000D00000000000000000000Y"
[6]"000000000000000000000000D0000000011011D1X"
I want to extract the last position of a particular character like 1. I am running this code:
ifelse(grepl("1",x),rev(gregexpr("1",x)[[1]])[1],50)
But this is returning -1 for all elements. How do I correct this?
We can use stri_locate_last from stringi. If there are no matches, it will return NA.
library(stringi)
r1 <- stri_locate_last(v1, fixed=1)[,1]
r1
#[1] NA NA NA NA NA NA 40
nchar(v1)
#[1] 38 32 35 40 40 40 41
If we need to replace the NA values with number of characters
ifelse(is.na(r1), nchar(v1), r1)
data
v1 <- c("0000000000000000000000000000000000000Y",
"9000000000D00000000000000000000Y",
"0000000000000D00000000000000000000X",
"000000000000000000D00000000000000000000Y",
"000000000000000000D00000000000000000000Y",
"000000000000000000D00000000000000000000Y",
"000000000000000000000000D0000000011011D1X")
In base R, the following returns the position of the last matched "1".
# Make some toy data
toydata <- c("001", "007", "00101111Y", "000AAAYY")
# Find last postion
last_pos <- sapply(gregexpr("1", toydata), function(m) m[length(m)])
print(last_pos)
#[1] 3 -1 8 -1
It returns -1 whenever the pattern is not matched.

R + converting a integer to a hh:mm format using regex + gsub

interval is a subset of 5 minute intervals for a 25 hour period
> interval
[1] 45 50 55 100 105 110 115 120 125 130 135 2035 2040 2045 2050 2055 2100 2105 2110 2115 2120 2125
I want to insert : to put it in a time fomat that i can convert to a time format
> gsub('^([0-9]{1,2})([0-9]{2})$', '\\1:\\2', interval)
[1] "45" "50" "55" "1:00" "1:05" "1:10" "1:15" "1:20" "1:25" "1:30" "1:35" "20:35" "20:40" "20:45"
[15] "20:50" "20:55" "21:00" "21:05" "21:10" "21:15" "21:20" "21:25"
I have got it working for nearly all my examples.
How do I get it so that it works on the numbers "5" ... "45" "50" "55"
Found this duplicate here but this does not use gsub
An easy way to do this would be to make sure all the inputs have at least 4 characters:
gsub('^([0-9]{1,2})([0-9]{2})$', '\\1:\\2', sprintf('%04d',interval))
# "00:45" "00:50" "00:55" "01:00" "01:05" "01:10" "01:15" "01:20" "01:25"
# "01:30" "01:35" "20:35" "20:40" "20:45" "20:50" "20:55" "21:00" "21:05"
# "21:10" "21:15" "21:20" "21:25"
Using sub:
> sub('..\\K', ':', sprintf('%04d',interval), perl=T)
# [1] "00:45" "00:50" "00:55" "01:00" "01:05" "01:10" "01:15" "01:20" "01:25"
# [10] "01:30" "01:35" "20:35" "20:40" "20:45" "20:50" "20:55" "21:00" "21:05"
# [19] "21:10" "21:15" "21:20" "21:25"

Regex to extract number after a certain string

How can I in R extract the number that always comes after the string -{any single letter}, e.g. from the vector:
c("JFSDLKJ-H465", "FJSLKJHSD-Y5FSDLKJ", "DFSJLKJAAA-Z3216FJJ")
one should get:
(465, 5, 3216).
The -{any single letter} pattern occurs only once.
You could use gsub, e.g.:
x <- c("JFSDLKJ-H465", "FJSLKJHSD-Y5FSDLKJ", "DFSJLKJAAA-Z3216FJJ")
as.numeric(gsub("^.*-[A-Z]+([0-9]+).*$", "\\1", x))
# [1] 465 5 3216
library(stringr)
v <- c("JFSDLKJ-H465", "FJSLKJHSD-Y5FSDLKJ", "DFSJLKJAAA-Z3216FJJ")
as.numeric(sapply(str_match_all(v, "\\-[a-zA-Z]([0-9]+)"),"[")[2,])
## [1] 465 5 3216
> x <- c("JFSDLKJ-H465", "FJSLKJHSD-Y5FSDLKJ", "DFSJLKJAAA-Z3216FJJ")
> as.numeric(gsub("[A-Z]|-", "", x))
## [1] 465 5 3216

Replace the first N dots of a string revisited

In January I asked how to replace the first N dots of a string: replace the first N dots of a string
DWin's answer was very helpful. Can it be generalized?
df.1 <- read.table(text = '
my.string other.stuff
1111111111111111 120
..............11 220
11.............. 320
1............... 320
.......1........ 420
................ 820
11111111111111.1 120
', header = TRUE)
nn <- 14
# this works:
df.1$my.string <- sub("^\\.{14}", paste(as.character(rep(0, nn)), collapse = ""),
df.1$my.string)
# this does not work:
df.1$my.string <- sub("^\\.{nn}", paste(as.character(rep(0, nn)), collapse = ""),
df.1$my.string)
Using sprintf you can have the desired output
nn <- 3
sub(sprintf("^\\.{%s}", nn),
paste(rep(0, nn), collapse = ""), df.1$my.string)
## [1] "1111111111111111" "000...........11" "11.............."
## [4] "1..............." "000....1........" "000............."
## [7] "11111111111111.1"
pattstr <- paste0("\\.", paste0( rep(".",nn), collapse="") )
pattstr
#[1] "\\..............."
df.1$my.string <- sub(pattstr,
paste0( rep("0", nn), collapse=""),
df.1$my.string)
> df.1
my.string other.stuff
1 1111111111111111 120
2 000000000000001 220
3 11.............. 320
4 100000000000000 320
5 00000000000000. 420
6 00000000000000. 820
7 11111111111111.1 120

Find the location of a character in string

I would like to find the location of a character in a string.
Say: string = "the2quickbrownfoxeswere2tired"
I would like the function to return 4 and 24 -- the character location of the 2s in string.
You can use gregexpr
gregexpr(pattern ='2',"the2quickbrownfoxeswere2tired")
[[1]]
[1] 4 24
attr(,"match.length")
[1] 1 1
attr(,"useBytes")
[1] TRUE
or perhaps str_locate_all from package stringr which is a wrapper for gregexpr stringi::stri_locate_all (as of stringr version 1.0)
library(stringr)
str_locate_all(pattern ='2', "the2quickbrownfoxeswere2tired")
[[1]]
start end
[1,] 4 4
[2,] 24 24
note that you could simply use stringi
library(stringi)
stri_locate_all(pattern = '2', "the2quickbrownfoxeswere2tired", fixed = TRUE)
Another option in base R would be something like
lapply(strsplit(x, ''), function(x) which(x == '2'))
should work (given a character vector x)
Here's another straightforward alternative.
> which(strsplit(string, "")[[1]]=="2")
[1] 4 24
You can make the output just 4 and 24 using unlist:
unlist(gregexpr(pattern ='2',"the2quickbrownfoxeswere2tired"))
[1] 4 24
find the position of the nth occurrence of str2 in str1(same order of parameters as Oracle SQL INSTR), returns 0 if not found
instr <- function(str1,str2,startpos=1,n=1){
aa=unlist(strsplit(substring(str1,startpos),str2))
if(length(aa) < n+1 ) return(0);
return(sum(nchar(aa[1:n])) + startpos+(n-1)*nchar(str2) )
}
instr('xxabcdefabdddfabx','ab')
[1] 3
instr('xxabcdefabdddfabx','ab',1,3)
[1] 15
instr('xxabcdefabdddfabx','xx',2,1)
[1] 0
To only find the first locations, use lapply() with min():
my_string <- c("test1", "test1test1", "test1test1test1")
unlist(lapply(gregexpr(pattern = '1', my_string), min))
#> [1] 5 5 5
# or the readable tidyverse form
my_string %>%
gregexpr(pattern = '1') %>%
lapply(min) %>%
unlist()
#> [1] 5 5 5
To only find the last locations, use lapply() with max():
unlist(lapply(gregexpr(pattern = '1', my_string), max))
#> [1] 5 10 15
# or the readable tidyverse form
my_string %>%
gregexpr(pattern = '1') %>%
lapply(max) %>%
unlist()
#> [1] 5 10 15
You could use grep as well:
grep('2', strsplit(string, '')[[1]])
#4 24