Good evening everybody,
a very simple question: I have created a trellis object with a list of plots like this
ls(grafico.PAX)
[1] "HT00027074" "HT00041471" "HT00042977" "HT00044297" "HT00044352" "HT00044735" "HT00046016"
[8] "HT00047780" "HT00049362" "HT00055644" "HT00055649" "HT00058023" "HT00058172" "HT00058650"
[15] "HT00061221" "HT00061283" "HT00061952" "HT00062062" "HT00067896" "HT00068212" "HT00068231"
[22] "HT00068665" "HT00070389" "HT00071625" "HT00071640" "HT00071705" "HT00071768" "HT00071998"
[29] "HT00072343" "HT00078488" "HT00078520" "HT00078735" "HT00078775" "HT00078796" "HT00079322"
[36] "HT00079921" "HT00081229" "HT00081484" "HT00081490" "HT00081519" "HT00081695" "HT00081784"
[43] "HT00081788" "HT00081800" "HT00081897" "HT00081899" "HT00082062" "HT00082426" "HT00082569"
[50] "HT00082589" "HT00082637" "HT00082638" "HT00082885"
and I would like to know how many elements are there in the panel.args list of each of the grafico.PAX elements.
I have tried to get it in several ways, and I don't have a clue...
Thanks in advance,
MZ
Sorry, the last option was the right one (and I didn't check before posting! too bad from me!):
length(grafico.PAX[[1]]$panel.args)
Thanks again,
MZ
Related
How do I pull out all words that have the symbol "<-" either at the end of the word or somewhere in between but in the latter case only if the "<-" symbol is followed by a dot.
To put it into context. Exercise 6.5.3 a. of Hadley Wickhams - Advanced R asks the reader to list all replacement functions in the base package.
Replacement function that only have one method are indicated by the symbol <-
right at the end of the function name. Generic functions, however, have their
method name attached to the name of the replacement form (with a dot), such that the <- is no longer at the end of the function name. Example split<-.data.frame
EDIT:
obj <- mget(ls("package:base"), inherits = TRUE)
funs <- Filter(is.function, objs)
This is how you pull out all functions in the base package. Now I want to find only the replacement functions.
If you want all base package replacement functions and their respective S3 methods, you can try
ls(envir = as.environment("package:base"), pattern = "<-")
With no packages loaded, this gives the following result:
[1] "<<-" "<-" "[<-"
[4] "[[<-" "#<-" "$<-"
[7] "attr<-" "attributes<-" "body<-"
[10] "class<-" "colnames<-" "comment<-"
[13] "[<-.data.frame" "[[<-.data.frame" "$<-.data.frame"
[16] "[<-.Date" "diag<-" "dim<-"
[19] "dimnames<-" "dimnames<-.data.frame" "Encoding<-"
[22] "environment<-" "[<-.factor" "[[<-.factor"
[25] "formals<-" "is.na<-" "is.na<-.default"
[28] "is.na<-.factor" "is.na<-.numeric_version" "length<-"
[31] "length<-.factor" "levels<-" "levels<-.factor"
[34] "mode<-" "mostattributes<-" "names<-"
[37] "names<-.POSIXlt" "[<-.numeric_version" "[[<-.numeric_version"
[40] "oldClass<-" "parent.env<-" "[<-.POSIXct"
[43] "[<-.POSIXlt" "regmatches<-" "row.names<-"
[46] "rownames<-" "row.names<-.data.frame" "row.names<-.default"
[49] "split<-" "split<-.data.frame" "split<-.default"
[52] "storage.mode<-" "substr<-" "substring<-"
[55] "units<-" "units<-.difftime"
Thanks to #42 for helping me improve this answer.
We can try
library(stringr)
str_extract(v1, "\\w+<-$|\\w*<-\\.\\S+")
#[1] "split<-.data.frame" NA "splitdata<-"
data
v1 <- c("split<-.data.frame", "split<-data", "splitdata<-")
I have c++ code which parses 2 command line arguments and prints the arguments. One of the argument is an URL of google search. I paste the code below
int main(int argc, char* argv[])
{
std::cout << argv[1] << argv[2] << "\n";
}
When I pass URL through command line after compilation as below,
./demo 1 https://www.google.co.in/search?sourceid=chrome-psyapi2&ion=1&espv=2&ie=UTF-8&client=ubuntu&q=size%20of%20unsigned%20char%20array%20c%2B%2B&oq=length%20of%20unsigned%20char*%20arra&aqs=chrome.4.69i57j0l5.13353j0j7
I get the output as,
[1] 8680
[2] 8681
[3] 8682
[4] 8683
[5] 8684
[6] 8685
[7] 8686
[2] Done ion=1
[3] Done espv=2
[4] Done ie=UTF-8
[6]- Done q=size%20of%20unsigned%20char%20array%20c%2B%2B
It looks like there has been some internal splitting of the string. Is there any way I can retrieve the entire string?
Thank You in advance.
You have to quote it. Otherwise & gets interpreted by the shell as "invoke what's on the left of & in background".
I took the privilege of replacing your program with echo.
Good:
$ echo "https://www.google.co.in/search?sourceid=chrome-psyapi2&ion=1&espv=2&ie=UTF-8&client=ubuntu&q=size%20of%20unsigned%20char%20array%20c%2B%2B&oq=length%20of%20unsigned%20char*%20arra&aqs=chrome.4.69i57j0l5.13353j0j7"
https://www.google.co.in/search?sourceid=chrome-psyapi2&ion=1&espv=2&ie=UTF-8&client=ubuntu&q=size%20of%20unsigned%20char%20array%20c%2B%2B&oq=length%20of%20unsigned%20char*%20arra&aqs=chrome.4.69i57j0l5.13353j0j7
Bad:
$ echo https://www.google.co.in/search?sourceid=chrome-psyapi2&ion=1&espv=2&ie=UTF-8&client=ubuntu&q=size%20of%20unsigned%20char%20array%20c%2B%2B&oq=length%20of%20unsigned%20char*%20arra&aqs=chrome.4.69i57j0l5.13353j0j7
[1] 21705
[2] 21706
https://www.google.co.in/search?sourceid=chrome-psyapi2
[3] 21707
[4] 21708
[5] 21709
[6] 21710
[7] 21711
[1] Done echo https://www.google.co.in/search?sourceid=chrome-psyapi2
[2] Done ion=1
[3] Done espv=2
[4] Done ie=UTF-8
[5] Done client=ubuntu
[6]- Done q=size%20of%20unsigned%20char%20array%20c%2B%2B
[7]+ Done oq=length%20of%20unsigned%20char*%20arra
You need to quote the argument, and you should use single quotes, ', in order to stop your shell from attempting to evaluate anything inside it.
What happens is that every ampersand, "&", on your command line launches a background process.
The first process is ./demo 1 https://www.google.co.in/search?sourceid=chrome-psyapi2, and all the following are assignments to variables.
You can see from the output (it looks like you didn't post all of it)
[1] 8680
[2] 8681
[3] 8682
[4] 8683
[5] 8684
[6] 8685
[7] 8686
[2] Done ion=1
[3] Done espv=2
[4] Done ie=UTF-8
[6]- Done q=size%20of%20unsigned%20char%20array%20c%2B%2B
that background process 2 is ion=1 (pid 8681), process 3 (pid 8682) is espv=2, and so on.
I have a character vector where single elements contain multiple strings separated by commas. I have obtained this list by extracting it from a data frame, and it looks like this:
[1] "Acworth, Crescent Lake, East Acworth, Lynn, South Acworth"
[2] "Ferncroft, Passaconaway, Paugus Mill"
[3] "Alexandria, South Alexandria"
[4] "Allenstown, Blodgett, Kenison Corner, Suncook (part)"
[5] "Alstead, Alstead Center, East Alstead, Forristalls Corner, Mill Hollow"
[6] "Alton, Alton Bay, Brookhurst, East Alton, Loon Cove, Mount Major, South Alton, Spring Haven, Stockbridge Corners, West Alton, Woodlands"
[7] "Amherst, Baboosic Lake, Cricket Corner, Ponemah"
[8] "Andover, Cilleyville, East Andover, Halcyon Station, Potter Place, West Andover"
[9] "Antrim, Antrim Center, Clinton Village, Loverens Mill, North Branch"
[10] "Ashland"
I would like to obtain a new character vector whereby every single string is an element within this character vector, i.e.:
[1] "Acworth", "Crescent Lake", "East Acworth", "Lynn", "South Acworth"
[6] "Ferncroft", "Passaconaway", "Paugus Mill", "Alexandria", "South Alexandria"
I used the strsplit() function, however this returns a list. When I try to turn it into a character vector, it reverts to the old state.
I'm sure this is a really simple problem - any help would be greatly appreciated! thanks!
You may get rid of the spaces and split the character vector with the "\\s*,\\s*" regex and then unlist the result:
v <- c("Acworth, Crescent Lake, East Acworth, Lynn, South Acworth", "Ferncroft, Passaconaway, Paugus Mill", "Alexandria, South Alexandria", "Allenstown, Blodgett, Kenison Corner, Suncook (part)", "Alstead, Alstead Center, East Alstead, Forristalls Corner, Mill Hollow", "Alton, Alton Bay, Brookhurst, East Alton, Loon Cove, Mount Major, South Alton, Spring Haven, Stockbridge Corners, West Alton, Woodlands", "Amherst, Baboosic Lake, Cricket Corner, Ponemah", "Andover, Cilleyville, East Andover, Halcyon Station, Potter Place, West Andover", "Antrim, Antrim Center, Clinton Village, Loverens Mill, North Branch", "Ashland" )
s <- unlist(strsplit(v, "\\s*,\\s*"))
See the IDEONE demo
The regex matches zero or more whitespace symbols (\s*) on both sides of ,, thus trimming the values. This will handle cases even when there is a "wild" space before a comma in the initial character vector.
Your post title suggests you want unique strings, so
unique(unlist(strsplit(myvec, split=",")))
or
unique(unlist(strsplit(myvec, split=", ")))
if you always have a space following the comma.
As an alternative, you can also use scan, like this:
unique(scan(what = "", text = v, sep = ",", strip.white = TRUE))
The strip.white = TRUE part takes care of any leading or trailing whitespace you may have.
Note: "v" comes from this other answer.
I'm working with a file generated from several different machines that had different locale-settings, so I ended up with a column of a data frame with different writings for the same word:
CÓRDOBA
CÓRDOBA
CÒRDOBA
I'd like to convert all those to CORDOBA. I've tried doing
t<-gsub("Ó|Ó|Ã’|°|°|Ò","O",t,ignore.case = T) # t is the vector of names
Wich works until it finds some "invisible" characters:
As you can see, I'm not able to see, in R, the additional charater that lies between à and \ (If I copy-paste to MS Word, word shows it with an empty rectangle). I've tried to dput the vector, but it shows exactly as in screen (without the "invisible" character).
I ran Encoding(t), and ir returns unknown for all values.
My system configuration follows:
> sessionInfo()
R version 3.2.1 (2015-06-18)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 8 x64 (build 9200)
locale:
[1] LC_COLLATE=Spanish_Colombia.1252 LC_CTYPE=Spanish_Colombia.1252 LC_MONETARY=Spanish_Colombia.1252 LC_NUMERIC=C
[5] LC_TIME=Spanish_Colombia.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] zoo_1.7-12 dplyr_0.4.2 data.table_1.9.4
loaded via a namespace (and not attached):
[1] R6_2.1.0 assertthat_0.1 magrittr_1.5 plyr_1.8.3 parallel_3.2.1 DBI_0.3.1 tools_3.2.1 reshape2_1.4.1 Rcpp_0.11.6 stringi_0.5-5
[11] grid_3.2.1 stringr_1.0.0 chron_2.3-47 lattice_0.20-31
I've saveRDS a file with a data frame of actual and expected toy values, wich could be loadRDS from here. I'm not absolutely sure it will load with the same problems I have (depending on you locale), but I hope it does, so you can provide some help.
At the end, I'd like to convert all those special characters to unaccented ones (Ó to O, etc.), hopefully without having to manually input each one of the special ones into a regex (in other words, I'd like --if possible-- some sort of gsub("[:weird:]","[:equivalentToWeird:]",t). If not possible, at least I'd like to be able to find (and replace) those "invisible" characters.
Thanks,
############## EDIT TO ADD ###################
If I run the following code:
d<-readRDS("c:/path/to(downloaded/Dropbox/file/inv_char.Rdata")
stri_escape_unicode(d$actual)
This is what I get:
[1] "\\u00c3\\u201cN N\\u00c2\\u00b0 08 \\\"CACIQUE CALARC\\u00c3\\u0081\\\" - ARMENIA"
[2] "\\u00d3N N\\u00b0 08 \\\"CACIQUE CALARC\\u00c1\\\" - ARMENIA"
[3] "\\u00d3N N\\u00b0 08 \\\"CACIQUE CALARC\\u00c1\\\" - ARMENIA(ALTERNO)"
Normal output is:
> d$actual
[1] ÓN N° 08 "CACIQUE CALARCÃ" - ARMENIA ÓN N° 08 "CACIQUE CALARCÁ" - ARMENIA ÓN N° 08 "CACIQUE CALARCÁ" - ARMENIA(ALTERNO)
With the help of #hadley, who pointed me towards stringi, I ended up discovering the offending characters and replacing them. This was my initial attempt:
unweird<-function(t){
t<-stri_escape_unicode(t)
t<-gsub("\\\\u00c3\\\\u0081|\\\\u00c1","A",t)
t<-gsub("\\\\u00c3\\\\u02c6|\\\\u00c3\\\\u2030|\\\\u00c9|\\\\u00c8","E",t)
t<-gsub("\\\\u00c3\\\\u0152|\\\\u00c3\\\\u008d|\\\\u00cd|\\\\u00cc","I",t)
t<-gsub("\\\\u00c3\\\\u2019|\\\\u00c3\\\\u201c|\\\\u00c2\\\\u00b0|\\\\u00d3|\\\\u00b0|\\\\u00d2|\\\\u00ba|\\\\u00c2\\\\u00ba","O",t)
t<-gsub("\\\\u00c3\\\\u2018|\\\\u00d1","N",t)
t<-gsub("\\u00a0|\\u00c2\\u00a0","",t)
t<-gsub("\\\\u00f3","o",t)
t<-stri_unescape_unicode(t)
}
which produced the expected result. I was a little bit curious about other stringi functions, so I wondered if its substitution one could be faster on my 3.3 million rows. I then tried stri_replace_all_regex like this:
stri_unweird<-function(t){
stri_unescape_unicode(stri_replace_all_regex(stri_escape_unicode(t),
c("\\\\u00c3\\\\u0081|\\\\u00c1",
"\\\\u00c3\\\\u02c6|\\\\u00c3\\\\u2030|\\\\u00c9|\\\\u00c8",
"\\\\u00c3\\\\u0152|\\\\u00c3\\\\u008d|\\\\u00cd|\\\\u00cc",
"\\\\u00c3\\\\u2019|\\\\u00c3\\\\u201c|\\\\u00c2\\\\u00b0|\\\\u00d3|\\\\u00b0|\\\\u00d2|\\\\u00ba|\\\\u00c2\\\\u00ba",
"\\\\u00c3\\\\u2018|\\\\u00d1",
"\\u00a0|\\u00c2\\u00a0",
"\\\\u00f3"),
c("A","E","I","O","N","","o"),
vectorize_all = F))
}
As a side note, I ran microbenchmark on both methods, these are the results:
g<-microbenchmark(unweird(t),stri_unweird(t),times = 100L)
summary(g)
min lq mean median uq max neval cld
1 423.0083 425.6400 431.9609 428.1031 432.6295 490.7658 100 b
2 118.5831 119.5057 121.2378 120.3550 121.8602 138.3111 100 a
If this is the test string -
alt="mass |36 grams\nserving volume | 63 mL (milliliters)\nserving density | 0.57 g\/cm^3 (grams per cubic centimeter)" title="mass | 36 grams.
\btitle="mass| \b.*+\s*+\K.*(?=serving volume\b)
This is my code but it does not return what is required.
Then how to extract 36 grams from this text?
It would be great if someone could share a link from where I can learn regex.
gsub('mass \\|([0-9]* [A-Za-z]*).*', '\\1', alt)
[1] "36 grams"
To exclude the unit:
gsub('mass \\|([0-9]*).*', '\\1', alt)
[1] "36"
Careful with the extra space, it will be captured too. This is not what you want:
gsub('mass \\|([0-9]* ).*', '\\1', alt)
[1] "36 "
For the example you gave this will work, but depending on what you want to do you might need something more general:
alt<-"mass |36 grams\nserving volume | 63 mL (milliliters)\nserving density | 0.57 g/cm^3 (grams per cubic centimeter)"
gsub(".*\\|([0-9]+ gram).*","\\1",alt)
[1] "36 gram"
Did you try with:
/mass \|([a-zA-Z-0-9\s]+)\sserving volume/