i need to modify output from another powershell script - if-statement

i have the following output from another powershell script and need it modified.....slightly
USabc: ## only 1 row like this - fixed - 1st group
Pipe Status ## ## only 1 row like this - fixed
Status: red, TotalMissing : 23, 100Missing : 0 ## ## only 1 row like this - fixed
queue01: ## this could be any # of "queue" numbered 01-100
48f24729 - 1 missing ## this name "48*" could be anything with variable counts - could be hundreds....
gz34d2c0 - 1 missing ## this name "gz*" could be anything with variable counts - could be hundreds....
USxyz: ## only 1 row like this - fixed - 2nd group
Pipe Status ## ## only 1 row like this - fixed
Status: black, TotalMissing : 578000, 100Missing : 1 ## ## only 1 row like this - fixed
queurx04: ## this could be any "queue" numbered 01-100
77903416 - 578506 missing ## ## this name "77*" could be anything with variable counts - could be hundreds....
queeex01: ## ## this could be any "queue" numbered 01-100
73903416 - 578124 missing ## this name "73*" could be anything with variable counts - could be hundreds....
i have been able to indent / add spacing to some lines - USabe - Pipe - Status - queue
unable to indent or add spacing to others - containing "...missing..."
the customer believes indenting the "missing" lines is more readable....
can you help????

Related

Problem for line breaks (\n) with gtsummary functions

I have a problem trying to include line breaks into arguments of gtsummary functions, as statistic argument of tbl_summary() or update for modify_header(). It's a bit strange because it always have worked until now, and the package documentation indicates that this the way to do so...
Here a reproducible example :
## loading packages ##
library(dplyr)
library(gtsummary)
## gtsummary table ##
trial %>% tbl_summary(include = c("trt","stage","grade"),
by = "trt",
statistic = all_categorical() ~ "{p}% \n ({n})", # \n does not pass "({n})" to next line...
missing = "no") %>%
modify_header(update =list(all_stat_cols() ~ "**{level}** \n ({p}%, \n N = {n})"), # and here as well...
text_interpret = "md")
gtsummary cross table
Does the problem uniquely come from my computer ? Could it be due to a recent package update ?

Split string, extract and add to another column regex BIGQUERY

I have a table with Equipment column containing strings. I want to split string, take a part of it and add this part to a new column (SerialNumber_Asset). Part of the string i want to extract always has the same pattern: A + 7 digits. Example:
Equipment SerialNumber_Asset
1 AXION 920 - A2302888 - BG-ADM-82 -NK A2302888
2 Case IH Puma T4B 220 - BG-AEH-87 - NK null
3 ARION 650 - A7702047 - BG-ADZ-74 - MU A7702047
4 ARION 650 - A7702039 - BG-ADZ-72 - NK A7702039
My code:
select x, y, z,
regexp_extract(Equipment, r'([\A][\d]{7})') as SerialNumber_Asset
FROM `aa.bb.cc`
The message i got:
Cannot parse regular expression: invalid escape sequence: \A
Any suggestions what could be wrong? Thanks
Just use A instead of [\A], check example below:
select regexp_extract('AXION 920 - A2302888 - BG-ADM-82 -NK', r'(A[\d]{7})') as SerialNumber_Asset

Appending data frames that satisfy regular expressions in loop in R?

I asked a similar question earlier but asked it confusingly. So now I'm trying to do it in a more orderly fashion.
I'm running a loop that imports up to 6 dataframes based on 650 ID variables. I want to append these 6 dataframes for every one of the 650 cases. I import the data like this:
for(i in 1:650){
try(part1 <- read.csv(file = paste0("Twitter Scrapes/searchTwitter/09July/",MP.ID[i],".csv")))
try(part2 <- read.csv(file = paste0("Twitter Scrapes/userTimeline/08July/",MP.ID[i],".csv")))
try(part3 <- read.csv(file = paste0("Twitter Scrapes/userTimeline/16July/",MP.ID[i],".csv")))
try(part4 <- read.csv(file = paste0("Twitter Scrapes/searchTwitter/17July/",MP.ID[i],".csv")))
try(part5 <- read.csv(file = paste0("Twitter Scrapes/userTimeline/24July/",MP.ID[i],".csv")))
try(part6 <- read.csv(file = paste0("Twitter Scrapes/searchTwitter/24July/",MP.ID[i],".csv")))
This all works fine. If any part doesn't exist, the try-arguments makes sure that the loop continues to execute.
So, for some cases, not all 6 datasets exist. This means I can't simply have the next line read
combinedData <- rbind(part1, part2, part3, part4, part5, part6)
as one of these elements may not exist and therefore mean that the appended dataset can't be produced. This is why I thought it would be good to have the rbind command run for any dataframe that satisfied a regular expression requirement, i.e. partX. In this case, even if, say, part5 doesn't exist, it can simply append the existing other dataframes and then move on to the next ID in the loop.
However, I have no idea how to do this. It would be amazing if you could help me with this, and I'm really sorry for posting the confusing question earlier.
I might use the recursive argument in list.files instead and use lists:
(lf <- list.files('~/desktop/test', recursive = TRUE, full.names = TRUE))
# [1] "/Users/rawr/desktop/test/feb/three.csv"
# [2] "/Users/rawr/desktop/test/jan/one.csv"
# [3] "/Users/rawr/desktop/test/jan/three.csv"
# [4] "/Users/rawr/desktop/test/jan/two.csv"
# [5] "/Users/rawr/desktop/test/jul/one.csv"
# [6] "/Users/rawr/desktop/test/jul/two.csv"
you can group by ids by grep'ing
id <- c('one','two','three')
for (ii in id) {
print(lf[grepl(ii, lf)])
cat('\n')
}
# [1] "/Users/rawr/desktop/test/jan/one.csv" "/Users/rawr/desktop/test/jul/one.csv"
#
# [1] "/Users/rawr/desktop/test/jan/two.csv" "/Users/rawr/desktop/test/jul/two.csv"
#
# [1] "/Users/rawr/desktop/test/feb/three.csv" "/Users/rawr/desktop/test/jan/three.csv"
so using this idea, you can use lapply to read them in resulting in one object with all the data frames
ll <- lapply(id, function(ii) {
files <- lf[grepl(ii, lf)]
setNames(lapply(files, function(x)
read.csv(x, header = FALSE)), files)
})
setNames(ll, id)
# $one
# $one$`/Users/rawr/desktop/test/jan/one.csv`
# V1
# 1 one
#
# $one$`/Users/rawr/desktop/test/jul/one.csv`
# V1
# 1 one
# 2 one
# 3 one
#
#
# $two
# $two$`/Users/rawr/desktop/test/jan/two.csv`
# V1
# 1 two
#
# $two$`/Users/rawr/desktop/test/jul/two.csv`
# V1
# 1 two
#
#
# $three
# $three$`/Users/rawr/desktop/test/feb/three.csv`
# V1
# 1 three
#
# $three$`/Users/rawr/desktop/test/jan/three.csv`
# V1
# 1 three
And then rbind them:
lapply(ll, function(x) `rownames<-`(do.call('rbind', x), NULL))
# [[1]]
# V1
# 1 one
# 2 one
# 3 one
# 4 one
#
# [[2]]
# V1
# 1 two
# 2 two
#
# [[3]]
# V1
# 1 three
# 2 three
or you can rbind them in the step before

How to use separate() properly?

I have some difficulties to extract an ID in the form:
27da12ce-85fe-3f28-92f9-e5235a5cf6ac
from a data frame:
a<-c("NAME_27da12ce-85fe-3f28-92f9-e5235a5cf6ac_THOMAS_MYR",
"NAME_94773a8c-b71d-3be6-b57e-db9d8740bb98_THIMO",
"NAME_1ed571b4-1aef-3fe2-8f85-b757da2436ee_ALEX",
"NAME_9fbeda37-0e4f-37aa-86ef-11f907812397_JOHN_TYA",
"NAME_83ef784f-3128-35a1-8ff9-daab1c5f944b_BISHOP",
"NAME_39de28ca-5eca-3e6c-b5ea-5b82784cc6f4_DUE_TO",
"NAME_0a52a024-9305-3bf1-a0a6-84b009cc5af4_WIS_MICHAL",
"NAME_2520ebbb-7900-32c9-9f2d-178cf04f7efc_Sarah_Lu_Van_Gar/Thomas")
Basically its the thing between the first and the second underscore.
Usually I approach that by:
library(tidyr)
df$a<-as.character(df$a)
df<-df[grep("_", df$a), ]
df<- separate(df, a, c("ID","Name") , sep = "_")
df$a<-as.numeric(df$ID)
However this time there a to many underscores...and my approach fails. Is there a way to extract that ID?
I think you should use extract instead of separate. You need to specify the patterns which you want to capture. I'm assuming here that ID is always starts with a number so I'm capturing everything after the first number until the next _ and then everything after it
df <- data.frame(a)
df <- df[grep("_", df$a),, drop = FALSE]
extract(df, a, c("ID", "NAME"), "[A-Za-z].*?(\\d.*?)_(.*)")
# ID NAME
# 1 27da12ce-85fe-3f28-92f9-e5235a5cf6ac THOMAS_MYR
# 2 94773a8c-b71d-3be6-b57e-db9d8740bb98 THIMO
# 3 1ed571b4-1aef-3fe2-8f85-b757da2436ee ALEX
# 4 9fbeda37-0e4f-37aa-86ef-11f907812397 JOHN_TYA
# 5 83ef784f-3128-35a1-8ff9-daab1c5f944b BISHOP
# 6 39de28ca-5eca-3e6c-b5ea-5b82784cc6f4 DUE_TO
# 7 0a52a024-9305-3bf1-a0a6-84b009cc5af4 WIS_MICHAL
# 8 2520ebbb-7900-32c9-9f2d-178cf04f7efc Sarah_Lu_Van_Gar/Thomas
try this (which assumes that the ID is always the part after the first unerscore):
sapply(strsplit(a, "_"), function(x) x[[2]])
which gives you "the middle part" which is your ID:
[1] "27da12ce-85fe-3f28-92f9-e5235a5cf6ac" "94773a8c-b71d-3be6-b57e-db9d8740bb98"
[3] "1ed571b4-1aef-3fe2-8f85-b757da2436ee" "9fbeda37-0e4f-37aa-86ef-11f907812397"
[5] "83ef784f-3128-35a1-8ff9-daab1c5f944b" "39de28ca-5eca-3e6c-b5ea-5b82784cc6f4"
[7] "0a52a024-9305-3bf1-a0a6-84b009cc5af4" "2520ebbb-7900-32c9-9f2d-178cf04f7efc"
if you want to get the Name as well a simple solution would be (which assumes that the Name is always after the second underscore):
Names <- sapply(strsplit(a, "_"), function(x) Reduce(paste, x[-c(1,2)]))
which gives you this:
[1] "THOMAS MYR" "THIMO" "ALEX" "JOHN TYA"
[5] "BISHOP" "DUE TO" "WIS MICHAL" "Sarah Lu Van Gar/Thomas"

How would I turn a multivalue string into a usable frequency table in R?

I have a field in a data frame called plugins_Apache_module
it contains strings like:
c("mod_perl/1.99_16,mod_python/3.1.3,mod_ssl/2.0.52",
"mod_auth_passthrough/2.1,mod_bwlimited/1.4,mod_ssl/2.2.23",
"mod_ssl/2.2.9")
I need a frequency table on the modules, and also their versions.
What is the best way to do this in R? As being rather new in R, I've seen strsplit, gsub, some chatrooms also suggested I use the qdap package.
Ideally I would want the string transformed into a dataframe with a column for every mod, if the module is there, then the version goes in that particular field. How would I accomplish such a transform?
What dataframe format would be suggested if I want top-level frequencies - say mod_ssl (all versions) as well as relational options (mod_perl is very often used with mod_ssl).
I'm not too sure how to handle such variable length data when pushing into a dataframe for processing. Any advice is welcome.
I consider the right answer to look like:
mod_perl mod_python mod_ssl mod_auth_passthrough mod_bwlimited
1.99_16 3.1.3 2.0.52
2.2.23 2.1 1.4
2.2.9
So basically the first bit becomes a column and the version(s) that follows become a row entry
st <- c("mod_perl/1.99_16,mod_python/3.1.3,mod_ssl/2.0.52", "mod_auth_passthrough/2.1,mod_bwlimited/1.4,mod_ssl/2.2.23", "mod_ssl/2.2.9")
scan(text=st, what="", sep=",")
Read 7 items
[1] "mod_perl/1.99_16" "mod_python/3.1.3" "mod_ssl/2.0.52"
[4] "mod_auth_passthrough/2.1" "mod_bwlimited/1.4" "mod_ssl/2.2.23"
[7] "mod_ssl/2.2.9"
strsplit( scan(text=st, what="", sep=","), "/")
Read 7 items
[[1]]
[1] "mod_perl" "1.99_16"
[[2]]
[1] "mod_python" "3.1.3"
[[3]]
[1] "mod_ssl" "2.0.52"
[[4]]
[1] "mod_auth_passthrough" "2.1"
[[5]]
[1] "mod_bwlimited" "1.4"
[[6]]
[1] "mod_ssl" "2.2.23"
[[7]]
[1] "mod_ssl" "2.2.9"
table( sapply(strsplit( scan(text=st, what="", sep=","), "/"), "[",1) )
#----------------
Read 7 items
mod_auth_passthrough mod_bwlimited mod_perl mod_python
1 1 1 1
mod_ssl
3
table( scan(text=st, what="", sep=",") )
#-----------
Read 7 items
mod_auth_passthrough/2.1 mod_bwlimited/1.4 mod_perl/1.99_16
1 1 1
mod_python/3.1.3 mod_ssl/2.0.52 mod_ssl/2.2.23
1 1 1
mod_ssl/2.2.9
1
You ask for at minimum two different things. Adding desired output greatly helped. I'm not sure if what you ask for is what you really want but you asked and it seemed like a fun problem. Ok here's how I would approach this using qdap (this requires qdap version 1.1.0 though):
## load qdap
library(qdap)
## your data
x <- c("mod_perl/1.99_16,mod_python/3.1.3,mod_ssl/2.0.52",
"mod_auth_passthrough/2.1,mod_bwlimited/1.4,mod_ssl/2.2.23",
"mod_ssl/2.2.9")
## strsplit on commas and slashes
dat <- unlist(lapply(x, strsplit, ",|/"), recursive=FALSE)
## make just a list of mods per row
mods <- lapply(dat, "[", c(TRUE, FALSE))
## make a string of versions
ver <- unlist(lapply(dat, "[", c(FALSE, TRUE)))
## make a lookup key and split it into lists
key <- data.frame(mod = unlist(mods), ver, row = rep(seq_along(mods),
sapply(mods, length)))
key2 <- split(key[, 1:2], key$row)
## make it into freq. counts
freqs <- mtabulate(mods)
## rename assign freq table to vers in case you want freqs ans replace 0 with NA
vers <- freqs
vers[vers==0] <- NA
## loop through and fill the ones in each row using an env. lookup (%l%)
for(i in seq_len(nrow(vers))) {
x <- vers[i, !is.na(vers[i, ]), drop = FALSE]
vers[i, !is.na(vers[i, ])] <- colnames(x) %l% key2[[i]]
}
## Don't print the NAs
print(vers, na.print = "")
## mod_auth_passthrough mod_bwlimited mod_perl mod_python mod_ssl
## 1 1.99_16 3.1.3 2.0.52
## 2 2.1 1.4 2.2.23
## 3 2.2.9
## the frequency counts per mods
freqs
## mod_auth_passthrough mod_bwlimited mod_perl mod_python mod_ssl
## 1 0 0 1 1 1
## 2 1 1 0 0 1
## 3 0 0 0 0 1