I have c++ code which parses 2 command line arguments and prints the arguments. One of the argument is an URL of google search. I paste the code below
int main(int argc, char* argv[])
{
std::cout << argv[1] << argv[2] << "\n";
}
When I pass URL through command line after compilation as below,
./demo 1 https://www.google.co.in/search?sourceid=chrome-psyapi2&ion=1&espv=2&ie=UTF-8&client=ubuntu&q=size%20of%20unsigned%20char%20array%20c%2B%2B&oq=length%20of%20unsigned%20char*%20arra&aqs=chrome.4.69i57j0l5.13353j0j7
I get the output as,
[1] 8680
[2] 8681
[3] 8682
[4] 8683
[5] 8684
[6] 8685
[7] 8686
[2] Done ion=1
[3] Done espv=2
[4] Done ie=UTF-8
[6]- Done q=size%20of%20unsigned%20char%20array%20c%2B%2B
It looks like there has been some internal splitting of the string. Is there any way I can retrieve the entire string?
Thank You in advance.
You have to quote it. Otherwise & gets interpreted by the shell as "invoke what's on the left of & in background".
I took the privilege of replacing your program with echo.
Good:
$ echo "https://www.google.co.in/search?sourceid=chrome-psyapi2&ion=1&espv=2&ie=UTF-8&client=ubuntu&q=size%20of%20unsigned%20char%20array%20c%2B%2B&oq=length%20of%20unsigned%20char*%20arra&aqs=chrome.4.69i57j0l5.13353j0j7"
https://www.google.co.in/search?sourceid=chrome-psyapi2&ion=1&espv=2&ie=UTF-8&client=ubuntu&q=size%20of%20unsigned%20char%20array%20c%2B%2B&oq=length%20of%20unsigned%20char*%20arra&aqs=chrome.4.69i57j0l5.13353j0j7
Bad:
$ echo https://www.google.co.in/search?sourceid=chrome-psyapi2&ion=1&espv=2&ie=UTF-8&client=ubuntu&q=size%20of%20unsigned%20char%20array%20c%2B%2B&oq=length%20of%20unsigned%20char*%20arra&aqs=chrome.4.69i57j0l5.13353j0j7
[1] 21705
[2] 21706
https://www.google.co.in/search?sourceid=chrome-psyapi2
[3] 21707
[4] 21708
[5] 21709
[6] 21710
[7] 21711
[1] Done echo https://www.google.co.in/search?sourceid=chrome-psyapi2
[2] Done ion=1
[3] Done espv=2
[4] Done ie=UTF-8
[5] Done client=ubuntu
[6]- Done q=size%20of%20unsigned%20char%20array%20c%2B%2B
[7]+ Done oq=length%20of%20unsigned%20char*%20arra
You need to quote the argument, and you should use single quotes, ', in order to stop your shell from attempting to evaluate anything inside it.
What happens is that every ampersand, "&", on your command line launches a background process.
The first process is ./demo 1 https://www.google.co.in/search?sourceid=chrome-psyapi2, and all the following are assignments to variables.
You can see from the output (it looks like you didn't post all of it)
[1] 8680
[2] 8681
[3] 8682
[4] 8683
[5] 8684
[6] 8685
[7] 8686
[2] Done ion=1
[3] Done espv=2
[4] Done ie=UTF-8
[6]- Done q=size%20of%20unsigned%20char%20array%20c%2B%2B
that background process 2 is ion=1 (pid 8681), process 3 (pid 8682) is espv=2, and so on.
Related
How do I pull out all words that have the symbol "<-" either at the end of the word or somewhere in between but in the latter case only if the "<-" symbol is followed by a dot.
To put it into context. Exercise 6.5.3 a. of Hadley Wickhams - Advanced R asks the reader to list all replacement functions in the base package.
Replacement function that only have one method are indicated by the symbol <-
right at the end of the function name. Generic functions, however, have their
method name attached to the name of the replacement form (with a dot), such that the <- is no longer at the end of the function name. Example split<-.data.frame
EDIT:
obj <- mget(ls("package:base"), inherits = TRUE)
funs <- Filter(is.function, objs)
This is how you pull out all functions in the base package. Now I want to find only the replacement functions.
If you want all base package replacement functions and their respective S3 methods, you can try
ls(envir = as.environment("package:base"), pattern = "<-")
With no packages loaded, this gives the following result:
[1] "<<-" "<-" "[<-"
[4] "[[<-" "#<-" "$<-"
[7] "attr<-" "attributes<-" "body<-"
[10] "class<-" "colnames<-" "comment<-"
[13] "[<-.data.frame" "[[<-.data.frame" "$<-.data.frame"
[16] "[<-.Date" "diag<-" "dim<-"
[19] "dimnames<-" "dimnames<-.data.frame" "Encoding<-"
[22] "environment<-" "[<-.factor" "[[<-.factor"
[25] "formals<-" "is.na<-" "is.na<-.default"
[28] "is.na<-.factor" "is.na<-.numeric_version" "length<-"
[31] "length<-.factor" "levels<-" "levels<-.factor"
[34] "mode<-" "mostattributes<-" "names<-"
[37] "names<-.POSIXlt" "[<-.numeric_version" "[[<-.numeric_version"
[40] "oldClass<-" "parent.env<-" "[<-.POSIXct"
[43] "[<-.POSIXlt" "regmatches<-" "row.names<-"
[46] "rownames<-" "row.names<-.data.frame" "row.names<-.default"
[49] "split<-" "split<-.data.frame" "split<-.default"
[52] "storage.mode<-" "substr<-" "substring<-"
[55] "units<-" "units<-.difftime"
Thanks to #42 for helping me improve this answer.
We can try
library(stringr)
str_extract(v1, "\\w+<-$|\\w*<-\\.\\S+")
#[1] "split<-.data.frame" NA "splitdata<-"
data
v1 <- c("split<-.data.frame", "split<-data", "splitdata<-")
Just now I answered this Removing characters after a EURO symbol in R question. But it's not working for me where the r code works for others who are on Ubuntu.
This is my code.
x <- "services as defined in this SOW at a price of € 15,896.80 (if executed fro"
euro <- "\u20AC"
gsub(paste(euro , "(\\S+)|."), "\\1", x)
# ""
I think this is all about changing the locale settings, I don't know how to do that.
I'm running rstudio on Windows 8.
> sessionInfo()
R version 3.2.0 (2015-04-16)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 8 x64 (build 9200)
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods
[7] base
loaded via a namespace (and not attached):
[1] tools_3.2.0
#Anada's answer is good but we need to add that encoding parameter for every time when we use unicodes in regex. Is there any way to modify the default encoding to utf-8 on Windows?
Seems to be a problem with encoding.
Consider:
x <- "services as defined in this SOW at a price of € 15,896.80 (if executed fro"
gsub(paste(euro , "(\\S+)|."), "\\1", x)
# [1] ""
gsub(paste(euro , "(\\S+)|."), "\\1", `Encoding<-`(x, "UTF8"))
# [1] "15,896.80"
Good evening everybody,
a very simple question: I have created a trellis object with a list of plots like this
ls(grafico.PAX)
[1] "HT00027074" "HT00041471" "HT00042977" "HT00044297" "HT00044352" "HT00044735" "HT00046016"
[8] "HT00047780" "HT00049362" "HT00055644" "HT00055649" "HT00058023" "HT00058172" "HT00058650"
[15] "HT00061221" "HT00061283" "HT00061952" "HT00062062" "HT00067896" "HT00068212" "HT00068231"
[22] "HT00068665" "HT00070389" "HT00071625" "HT00071640" "HT00071705" "HT00071768" "HT00071998"
[29] "HT00072343" "HT00078488" "HT00078520" "HT00078735" "HT00078775" "HT00078796" "HT00079322"
[36] "HT00079921" "HT00081229" "HT00081484" "HT00081490" "HT00081519" "HT00081695" "HT00081784"
[43] "HT00081788" "HT00081800" "HT00081897" "HT00081899" "HT00082062" "HT00082426" "HT00082569"
[50] "HT00082589" "HT00082637" "HT00082638" "HT00082885"
and I would like to know how many elements are there in the panel.args list of each of the grafico.PAX elements.
I have tried to get it in several ways, and I don't have a clue...
Thanks in advance,
MZ
Sorry, the last option was the right one (and I didn't check before posting! too bad from me!):
length(grafico.PAX[[1]]$panel.args)
Thanks again,
MZ
I'm trying to create a list of files from a directory containing files with the following patterns:
Name_Surname_12345_noe_xy.xls
Name_Surname_12345_xy.xls
xy can be one or two characters.
Now I want a list of all files wich do not contain "noe" in the filename.
I can read in only "noe" - files using
fl = list.files(pattern = "noe.+xls$", recursive=T, full.names=T)
but found no way to exclude them. Any suggestions?
Many thanks
Markus
Get all the files and then use grep to find the noe ones and subset them out:
> all
[1] "Name_Surname_123425_xy.xls" "Name_Surname_1234445_xy.xls"
[3] "Name_Surname_12345_noe_xy.xls" "Name_Surname_12345_xy.xls"
[5] "Name_Surname_13245_noe_xy.xls"
> all[grep("noe_xy.xls",all,invert=TRUE)]
[1] "Name_Surname_123425_xy.xls" "Name_Surname_1234445_xy.xls"
[3] "Name_Surname_12345_xy.xls"
always make sure you check the edge cases where all or none of the files match:
> all[grep("xls",all,invert=TRUE)]
character(0)
> all[grep("fnord",all,invert=TRUE)]
[1] "Name_Surname_123425_xy.xls" "Name_Surname_1234445_xy.xls"
[3] "Name_Surname_12345_noe_xy.xls" "Name_Surname_12345_xy.xls"
[5] "Name_Surname_13245_noe_xy.xls"
Using grep with a negative index works except in these edge cases:
> all
[1] "Name_Surname_123425_xy.xls" "Name_Surname_1234445_xy.xls"
[3] "Name_Surname_12345_noe_xy.xls" "Name_Surname_12345_xy.xls"
[5] "Name_Surname_13245_noe_xy.xls"
> all[-grep("noe_xy.xls",all)] # strip out the noe_xy.xls files
[1] "Name_Surname_123425_xy.xls" "Name_Surname_1234445_xy.xls"
[3] "Name_Surname_12345_xy.xls"
# works. Now strip out any xls files (should leave nothing)
> all[-grep("xls",all)]
character(0)
# yup, that works too. Now strip out 'fnord' files, shouldn't remove anything:
> all[-grep("fnord",all)]
character(0)
Epic fail! Reason is left as an exercise to the reader.
I need a regexp which does the following:
Heres the name of an HTML input field:
lm[0][ti]
I need to find the basic name ("lm"). Only if the name contains brackets I need to find the string in the second brackets ("ti").
To get it in portions is easy with the following regexp:
([a-zA-Z\d_]+)\[?([0-9]*)\]?\[?([a-zA_Z\d_]+)\]?
It matches all the portions I need.
Array
(
[0] => lm[0][ti]
[1] => lm
[2] => 0
[3] => ti
)
But if the HTML input name was just "lm", using this regexp I cannot determine that item #4 in the array is a valid name. The array would look like this:
Array
(
[0] => lm
[1] => l
[2] =>
[3] => m
)
"m" is not valid for me, I'd like to get this array:
Array
(
[0] => lm
[1] =>
[2] =>
[3] =>
)
or this
Array
(
[0] => lm
)
You can test the regexp here:
http://regexp-tester.mediacix.de/exp/regex/
Thanks for support in finding the right regexp...
Try this:
(\w+)(?:\[(\d+)\])?(?:\[(\w+)\])?
Input:
lm[0][ti]
Output:
Input:
lm
Output: