I am trying to serialize some lists in clojure using pr-str, but any list with over 100 elements is getting cut off. Example:
(pr-str (repeat 200 [2]))
yields
"([2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] ...)"
Presumably you have *print-length* bound to 100. To lift the limit, reset it to nil:
(set! *print-length* nil)
As for where it could be bound / set in the first place, that depends on your setup. For Leiningen, both user-level and project-level settings are relevant (so have a look in ~/.lein/profiles.clj and in project.clj).
Related
I have following sorted list (lst) of time periods and I want to split the periods into specific dates and then extract maximum time period without altering order of the list.
$`1`
[1] "01.12.2015 - 21.12.2015"
$`2`
[1] "22.12.2015 - 05.01.2016"
$`3`
[1] "14.09.2015 - 12.10.2015" "29.09.2015 - 26.10.2015"
Therefore, after adjustment list should look like this:
$`1`
[1] "01.12.2015" "21.12.2015"
$`2`
[1] "22.12.2015" "05.01.2016"
$`3`
[1] "14.09.2015" "12.10.2015" "29.09.2015" "26.10.2015"
In order to do so, I began with splitting the list:
lst_split <- str_split(lst, pattern = " - ")
which leads to the following:
[[1]]
[1] "01.12.2015" "21.12.2015"
[[2]]
[1] "22.12.2015" "05.01.2016"
[[3]]
[1] "c(\"14.09.2015" "12.10.2015\", \"29.09.2015" "26.10.2015\")"
Then, I tried to extract the pattern:
lapply(lst_split, function(x) str_extract(pattern = c("\\d+\\.\\d+\\.\\d+"),x))
but my output is missing one date (29.09.2015)
[[1]]
[1] "01.12.2015" "21.12.2015"
[[2]]
[1] "22.12.2015" "05.01.2016"
[[3]]
[1] "14.09.2015" "12.10.2015" "26.10.2015"
Does anyone have an idea how I could make it work and maybe propose more efficient solution? Thank you in advance.
Thanks to comments of #WiktorStribiżew and #akrun it is enough to use str_extract_all.
In this example:
> str_extract_all(lst,"\\d+\\.\\d+\\.\\d+")
[[1]]
[1] "01.12.2015" "21.12.2015"
[[2]]
[1] "22.12.2015" "05.01.2016"
[[3]]
[1] "14.09.2015" "12.10.2015" "29.09.2015" "26.10.2015"
1) Use strsplit, flatten each component using unlist, convert the dates to "Date" class and then use range to get the maximum time span. No packages are used.
> lapply(lst, function(x) range(as.Date(unlist(strsplit(x, " - ")), "%d.%m.%Y")))
$`1`
[1] "2015-12-01" "2015-12-21"
$`2`
[1] "2015-12-22" "2016-01-05"
$`3`
[1] "2015-09-14" "2015-10-26"
2) This variation using a magrittr pipeline also works:
library(magrittr)
lapply(lst, function(x)
x %>%
strsplit(" - ") %>%
unlist %>%
as.Date("%d.%m.%Y") %>%
range
)
Note: The input lst in reproducible form is:
lst <- structure(list(`1` = "01.12.2015 - 21.12.2015", `2` = "22.12.2015 - 05.01.2016",
`3` = c("14.09.2015 - 12.10.2015", "29.09.2015 - 26.10.2015"
)), .Names = c("1", "2", "3"))
I have c++ code which parses 2 command line arguments and prints the arguments. One of the argument is an URL of google search. I paste the code below
int main(int argc, char* argv[])
{
std::cout << argv[1] << argv[2] << "\n";
}
When I pass URL through command line after compilation as below,
./demo 1 https://www.google.co.in/search?sourceid=chrome-psyapi2&ion=1&espv=2&ie=UTF-8&client=ubuntu&q=size%20of%20unsigned%20char%20array%20c%2B%2B&oq=length%20of%20unsigned%20char*%20arra&aqs=chrome.4.69i57j0l5.13353j0j7
I get the output as,
[1] 8680
[2] 8681
[3] 8682
[4] 8683
[5] 8684
[6] 8685
[7] 8686
[2] Done ion=1
[3] Done espv=2
[4] Done ie=UTF-8
[6]- Done q=size%20of%20unsigned%20char%20array%20c%2B%2B
It looks like there has been some internal splitting of the string. Is there any way I can retrieve the entire string?
Thank You in advance.
You have to quote it. Otherwise & gets interpreted by the shell as "invoke what's on the left of & in background".
I took the privilege of replacing your program with echo.
Good:
$ echo "https://www.google.co.in/search?sourceid=chrome-psyapi2&ion=1&espv=2&ie=UTF-8&client=ubuntu&q=size%20of%20unsigned%20char%20array%20c%2B%2B&oq=length%20of%20unsigned%20char*%20arra&aqs=chrome.4.69i57j0l5.13353j0j7"
https://www.google.co.in/search?sourceid=chrome-psyapi2&ion=1&espv=2&ie=UTF-8&client=ubuntu&q=size%20of%20unsigned%20char%20array%20c%2B%2B&oq=length%20of%20unsigned%20char*%20arra&aqs=chrome.4.69i57j0l5.13353j0j7
Bad:
$ echo https://www.google.co.in/search?sourceid=chrome-psyapi2&ion=1&espv=2&ie=UTF-8&client=ubuntu&q=size%20of%20unsigned%20char%20array%20c%2B%2B&oq=length%20of%20unsigned%20char*%20arra&aqs=chrome.4.69i57j0l5.13353j0j7
[1] 21705
[2] 21706
https://www.google.co.in/search?sourceid=chrome-psyapi2
[3] 21707
[4] 21708
[5] 21709
[6] 21710
[7] 21711
[1] Done echo https://www.google.co.in/search?sourceid=chrome-psyapi2
[2] Done ion=1
[3] Done espv=2
[4] Done ie=UTF-8
[5] Done client=ubuntu
[6]- Done q=size%20of%20unsigned%20char%20array%20c%2B%2B
[7]+ Done oq=length%20of%20unsigned%20char*%20arra
You need to quote the argument, and you should use single quotes, ', in order to stop your shell from attempting to evaluate anything inside it.
What happens is that every ampersand, "&", on your command line launches a background process.
The first process is ./demo 1 https://www.google.co.in/search?sourceid=chrome-psyapi2, and all the following are assignments to variables.
You can see from the output (it looks like you didn't post all of it)
[1] 8680
[2] 8681
[3] 8682
[4] 8683
[5] 8684
[6] 8685
[7] 8686
[2] Done ion=1
[3] Done espv=2
[4] Done ie=UTF-8
[6]- Done q=size%20of%20unsigned%20char%20array%20c%2B%2B
that background process 2 is ion=1 (pid 8681), process 3 (pid 8682) is espv=2, and so on.
Just now I answered this Removing characters after a EURO symbol in R question. But it's not working for me where the r code works for others who are on Ubuntu.
This is my code.
x <- "services as defined in this SOW at a price of € 15,896.80 (if executed fro"
euro <- "\u20AC"
gsub(paste(euro , "(\\S+)|."), "\\1", x)
# ""
I think this is all about changing the locale settings, I don't know how to do that.
I'm running rstudio on Windows 8.
> sessionInfo()
R version 3.2.0 (2015-04-16)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 8 x64 (build 9200)
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods
[7] base
loaded via a namespace (and not attached):
[1] tools_3.2.0
#Anada's answer is good but we need to add that encoding parameter for every time when we use unicodes in regex. Is there any way to modify the default encoding to utf-8 on Windows?
Seems to be a problem with encoding.
Consider:
x <- "services as defined in this SOW at a price of € 15,896.80 (if executed fro"
gsub(paste(euro , "(\\S+)|."), "\\1", x)
# [1] ""
gsub(paste(euro , "(\\S+)|."), "\\1", `Encoding<-`(x, "UTF8"))
# [1] "15,896.80"
I'm trying to match the following ordered and unordered lists and extract the bullet/list point.
library(stringr)
examples <- c(
"* Bullet 1\n* Bullet 2\n* Bullet 3",
"1. Bullet 1\n2. Bullet 2\n3. Bullet 3",
"* This is a test 1\n* This is a test with some *formatting*\n* This is a test with different _formatting_"
)
What I would like to do is:
Recognize that it's a list programatically
Parse each into just the text of the list item
The result would be
some_str_fun(example,pattern) # or multiples
"Bullet 1" "Bullet 2" "Bullet 3"
"Bullet 1" "Bullet 2" "Bullet 3"
"This is a test 1" "This is a test with some *formatting*"
"This is a test with different _formatting_"
I've been playing with the following patterns, and str_extract/match but can't seem to find something completely functional
[*]+\\s(.*?)[\n]* # for * Bullet X\n
[1-9]+[.]\\s(.*?)[\n]* # for 1. Bullet X\n
I've tried a bunch of different iterations on these patters but can't quite seem to get what I'm looking for.
You can use strapply from the gsubfn package to match the entire pattern.
library(gsubfn)
examples <- c(
"* Bullet 1\n* Bullet 2\n* Bullet 3",
"1. Bullet 1\n2. Bullet 2\n3. Bullet 3",
"* This is a test 1\n* This is a test with some *formatting*\n* This is a test with different _formatting_"
)
strapply(examples, '(?:\\*|\\d+\\.) *([^\n]+)', c, simplify = c)
# [1] "Bullet 1"
# [2] "Bullet 2"
# [3] "Bullet 3"
# [4] "Bullet 1"
# [5] "Bullet 2"
# [6] "Bullet 3"
# [7] "This is a test 1"
# [8] "This is a test with some *formatting*"
# [9] "This is a test with different _formatting_"
This is a bit of a different approach, but if you render the markdown to HTML you can use some existing extraction methods to do what you want:
library(stringr)
examples <- c(
"* Bullet 1\n* Bullet 2\n* Bullet 3",
"1. Bullet 1\n2. Bullet 2\n3. Bullet 3",
"* This is a test 1\n* This is a test with some *formatting*\n* This is a test with different _formatting_"
)
extract_md_list <- function(md_text) {
require(rvest)
require(rmarkdown)
fil_md <- tempfile()
fil_html <- tempfile()
writeLines(md_text, con=fil_md)
render(fil_md, output_format="html_document", output_file=fil_html, quiet=TRUE)
pg <- html(fil_html)
ret <- html_nodes(pg, "li") %>% html_text()
# cleanup
unlink(fil_md)
unlink(fil_html)
return(ret)
}
extract_md_list(examples)
## [1] "Bullet 1"
## [2] "Bullet 2"
## [3] "Bullet 3"
## [4] "Bullet 1"
## [5] "Bullet 2"
## [6] "Bullet 3"
## [7] "This is a test 1"
## [8] "This is a test with some formatting"
## [9] "This is a test with different formatting"
Here is another option. You can wrap in unlist if desired:
str_extract_all(examples, "[^*1-9\n ]\\w+( ?[\\w*]+)*")
# or
#str_extract_all(examples, "[^*1-9\n ]\\w+( ?[a-zA-Z0-9_*]+)*")
#[[1]]
#[1] "Bullet 1" "Bullet 2" "Bullet 3"
#
#[[2]]
#[1] "Bullet 1" "Bullet 2" "Bullet 3"
#
#[[3]]
#[1] "This is a test 1"
#[2] "This is a test with some *formatting*"
#[3] "This is a test with different _formatting_"
There are several other options, particularly if you're not concerned about getting it all in a single regex or single line of code. Here's one more approach. The regex is simpler, but you end up with "", which requires the additional line:
splits <- unlist(str_split(examples, "\n|\\d+\\. |\\* "))
splits[splits != ""]
#[1] "Bullet 1"
#[2] "Bullet 2"
#[3] "Bullet 3"
#[4] "Bullet 1"
#[5] "Bullet 2"
#[6] "Bullet 3"
#[7] "This is a test 1"
#[8] "This is a test with some *formatting*"
#[9] "This is a test with different _formatting_"
I'm trying to create a list of files from a directory containing files with the following patterns:
Name_Surname_12345_noe_xy.xls
Name_Surname_12345_xy.xls
xy can be one or two characters.
Now I want a list of all files wich do not contain "noe" in the filename.
I can read in only "noe" - files using
fl = list.files(pattern = "noe.+xls$", recursive=T, full.names=T)
but found no way to exclude them. Any suggestions?
Many thanks
Markus
Get all the files and then use grep to find the noe ones and subset them out:
> all
[1] "Name_Surname_123425_xy.xls" "Name_Surname_1234445_xy.xls"
[3] "Name_Surname_12345_noe_xy.xls" "Name_Surname_12345_xy.xls"
[5] "Name_Surname_13245_noe_xy.xls"
> all[grep("noe_xy.xls",all,invert=TRUE)]
[1] "Name_Surname_123425_xy.xls" "Name_Surname_1234445_xy.xls"
[3] "Name_Surname_12345_xy.xls"
always make sure you check the edge cases where all or none of the files match:
> all[grep("xls",all,invert=TRUE)]
character(0)
> all[grep("fnord",all,invert=TRUE)]
[1] "Name_Surname_123425_xy.xls" "Name_Surname_1234445_xy.xls"
[3] "Name_Surname_12345_noe_xy.xls" "Name_Surname_12345_xy.xls"
[5] "Name_Surname_13245_noe_xy.xls"
Using grep with a negative index works except in these edge cases:
> all
[1] "Name_Surname_123425_xy.xls" "Name_Surname_1234445_xy.xls"
[3] "Name_Surname_12345_noe_xy.xls" "Name_Surname_12345_xy.xls"
[5] "Name_Surname_13245_noe_xy.xls"
> all[-grep("noe_xy.xls",all)] # strip out the noe_xy.xls files
[1] "Name_Surname_123425_xy.xls" "Name_Surname_1234445_xy.xls"
[3] "Name_Surname_12345_xy.xls"
# works. Now strip out any xls files (should leave nothing)
> all[-grep("xls",all)]
character(0)
# yup, that works too. Now strip out 'fnord' files, shouldn't remove anything:
> all[-grep("fnord",all)]
character(0)
Epic fail! Reason is left as an exercise to the reader.