I'm writing some code where I need to map format over each value of a record. To save myself some duplicate writing, it would be super handy if I could rely on records having a set order. This is basically what it looks like right now:
(defrecord Pet [health max-health satiety max-satiety])
(let [{:keys [health max-health satiety max-satiety]} pet
[h mh s ms] (mapv #(format "%.3f" (double %))
[health max-health satiety max-satiety])]
...)
Ideally, I'd like to write this using vals:
(let [[h mh s ms] (mapv #(format "%.3f" (double %)) (vals pet))]
...)
But I can't find any definitive sources on if records have a guaranteed ordering when seq'd. From my testing, they seem to be ordered. I tried creating a massive record (in case records rely on a sorted collection when small):
(defrecord Order-Test [a b c d e f g h i j k l m n o p q r s t u v w x y z
aa bb cc dd ee ff gg hh ii jj kk ll mm nn oo pp qq rr ss tt uu vv ww xx yy zz])
(vals (apply ->Order-Test (range 52)))
=> (0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51)
And they do seem to maintain order.
Can anyone verify this?
For this exact scenario, I supposed I could have reduce-kv'd over the record and reassociated the vals, then deconstructed. That would have gotten pretty bulky though. I'm also curious now since I wasn't able to find anything.
As with many things in Clojure, there is no guarantee because there is no spec. If it's not in the docstring for records, you assume it at your own risk, even if it happens to be true in the current version of Clojure.
But I'd also say: that's not really what records mean, philosophically. Record fields are supposed to have individual domain semantics, and it looks like in your record they indeed do. It is a big surprise when an operation like "take the N distinctly meaningful fields of this record, and treat them all uniformly" is the right thing to do, and it deserves to be spelled out when you do it.
You can at least do what you want with a bit less redundancy:
(let [[h mh s ms] (for [k [:health :max-health, :satiety :max-satiety]]
(format "%.3f" (get pet k)))]
...)
Personally I would say that you are modeling your domain wrong: you clearly have a concept of a "resource" (health and satiety) which has both a "current" and a "max" value. Those deserve to be grouped together by resource, e.g.
{:health {:current 50 :max 80}
:satiety {:current 3 :max 10}}
and having done that, I'd say that a pet's "set of resources" is really just a single map field, rather than N fields for the N resources it contains. Then this whole question of ordering of record fields doesn't come up at all.
Related
I'm writing a function, inDayRange, which takes two days (a start day and an end day) and a list of event structures and produces a new list where each element in the new list is only the name of all events which occurred between the two days (including start day and end day) but in any month or year.
(struct event (name day month year xlocation ylocation) #:transparent)
(define e1 (event "new years" 1 "Jan" 2021 0 0))
(define e2 (event "valentines" 14 "Feb" 2021 2 2))
(define e3 (event "my birthday" 6 "Mar" 2021 20 20))
(define e4 (event "tyler's birthday" 10 "Sep" 2020 23 23))
(define l1 (list e1 e2 e3 e4))
(define (inDayRange start end events)
(filter (lambda (e) (>= end (event-day e) start)) events))
I have the function written to filter what events occurred between the two days, but it returns a list of event structures. How do I produce a list of just the names of the events?
I am trying to understand the results I got for a fake dataset. I have two independent variables, hours, type and response pain.
First question: How was 82.46721 calculated as the lsmeans for the first type?
Second question: Why is the standard error exactly the same (8.24003) for both types?
Third question: Why is the degrees of freedom 3 for both types?
data = data.frame(
type = c("A", "A", "A", "B", "B", "B"),
hours = c(60,72,61, 54,68,66),
# pain = c(85,95,69, 73, 29, 30)
pain = c(85,95,69, 85,95,69)
)
model = lm(pain ~ hours + type, data = data)
lsmeans(model, c("type", "hours"))
> data
type hours pain
1 A 60 85
2 A 72 95
3 A 61 69
4 B 54 85
5 B 68 95
6 B 66 69
> lsmeans(model, c("type", "hours"))
type hours lsmean SE df lower.CL upper.CL
A 63.5 82.46721 8.24003 3 56.24376 108.6907
B 63.5 83.53279 8.24003 3 57.30933 109.7562
Try this:
newdat <- data.frame(type = c("A", "B"), hours = c(63.5, 63.5))
predict(model, newdata = newdat)
An important thing to note here is that your model has hours as a continuous predictor, not a factor.
So I'm working on an assignment for my class in which I have to read in data from a file and create a doubly linked list with it. I have all the difficult stuff done, now I'm just running into the problem where my program throws a bunch of random characters and kills itself on the last line.
Here is the function that is reading in the data and inserting it into my link list. My professor wrote this, so to be frank, I don't understand it very well.
void PropogateTheList(ifstream & f, LinkList & ML)
{
static PieCake_struct * record;
record= new PieCake_struct;
f>>record->id>>record->lname>>record->fname>>record->mi>>record->sex>>record->pORc;
while(!f.eof())
{
ML.Insert(record);
record = new PieCake_struct;
f>>record->id>>record->lname>>record->fname>>record->mi>>record->sex>>record->pORc;
}
}
Here is the data that is being propagated:
1 Abay Harege N O C
2 Adcock Leand R F P
3 Anderson Logan B M P
5 Bautista Gloria A F P
10 Beckett Dallas B F C
12 Ambrose Bridget C F C
13 Beekmann Marvin D M P
14 Bacaner Tate D M C
16 Bis Daniel F M P
18 Dale Kaleisa G F C
19 DaCosta Ricardo H M P
23 Adeyemo Oluwanifemi I M C
24 Berger Chelsea J F C
38 Daniels Jazmyn K F P
39 Davis Takaiyh L F C
40 DeJesus Gabriel M M P
51 Castro Floriana N F P
52 Chen Justin O M C
53 Clouden Ariel P F P
54 Conroy Cameron Q M C
61 Contreras Dominic R M P
62 Cooley Kyle S M C
63 Creighton Cara T F P
64 Cullen William U M C
66 Blakey Casey V M C
67 Barbosa Anilda W F P
83 Brecher Benjamin X M P
84 Boulos Alexandre Y F C
85 Barrios Joshua Z M C
85 Bonaventura Nash A M P
86 Bohnsack David B M C
87 Blume Jeffrey C M P
90 Burgman Megan D F C
91 Bursic Gregory E M P
92 Calvo Sajoni F F C
93 Cannan Austin G M P
94 Carballo Nicholas H M C
99 AlbarDiaz Matias I F P
Currently, I sort the data alphabetically based off the last name, so on about the 5th line, when it tries to print out number 99 (AlabraDiaz) it dies. If I sort the list another way, the program always messes up with whatever the last line of data is. Any help would be great!
UPDATE:
So I've tried implementing an if(!.eof()) before inserting but it unfortunately doesn't do anything. I deleted the last of data, thus making the person Carballo. This is what my function prints out:
****** The CheeseCake Survey ******
Id Last Name First Name MI Sex Pie/Cake
-- -------- --------- -- --- --------
2 Adcock
23 Adeyemo
12 Ambrose
3 Anderson
14 Bacaner
67 Barbosa
85 Barrios
5 Bautista
10 Beckett
13 Beekmann
24 Berger
16 Bis
66 Blakey
87 Blume
86 Bohnsack
85 Bonaventura
84 Boulos
83 Brecher
90 Burgman
91 Bursic
92 Calvo
93 Cannan
0 Carballo????NicholasA8?zL`8?zL`A8?zL`8?zL`??
Wouldn't it be better if you'd read from the stream first, then check if it's in eof state and based on that you'd insert the element? I'm writing the following code without compiler's help, here in the edit box, so apologises if I'd made any mistake. Of course the question arises or something I'd think of is what will happen if you'll try to read from your f stream in case where eof results in true. To read more about it you can check the following link:
Why is iostream::eof inside a loop condition (i.e. `while (!stream.eof())`) considered wrong?
void PropogateTheList(ifstream & f, LinkList & ML)
{
while(!f.eof())
{
static PieCake_struct * = new PieCake_struct;
f>>record->id>>record->lname>>record->fname>>record->mi>>record->sex>>record->pORc;
if(!f.eof())
ML.Insert(record);
}
}
I'm importing a very complex .xls file that often combines multiple cells together in the variable names. After importing it into Stata, only the first cell has a variable name, and the other 3 are blank. Is it possible to write a loop to rename all the variables (which come in sets of 4)?
For instance, the variables go: Russia, B, C, D but I would like them to be named Russia_A, Russia_B, Russia_C, Russia_D. Is there a way to do this with a loop or command within Stata?
It's impossible to have blank variable names in Stata, as your own example attests. On the information given your variable names come in fours, so that you could loop. One basic technique is just to cycle over 1, 2, 3, 4 and act accordingly. This example works. If it's not what you want, a minimal reproducible example is essential showing why this is different from what you want.
clear
input Russia B C D Germany E F G France H I J
42 42 42 42 42 42 42 42 42 42 42 42
end
tokenize "A B C D"
local i = 0
foreach v of var * {
local ++i
if `i' == 1 local stub "`v'"
rename `v' `stub'_``i''
if `i' == 4 local i = 0
}
ds
Russia_A Russia_C Germany_A Germany_C France_A France_C
Russia_B Russia_D Germany_B Germany_D France_B France_D
tokenize is possibly the least familiar command here, so see its help if needed.
All that said, it's unlikely that this is a useful data structure. See help reshape.
Here's another way to do it. We set up a counter running over all the variables. This perhaps is more of a finger exercise in macro manipulation.
clear
input Russia B C D Germany E F G France H I J
42 42 42 42 42 42 42 42 42 42 42 42
end
tokenize "A B C D"
forval j = 1/4 {
local sub`j' "``j''"
}
unab all : *
tokenize "`all'"
local J : word count `all'
forval j = 1/`J' {
local k = mod(`j', 4)
if `k' == 0 local k = 4
if `k' == 1 local stub "``j''"
rename ``j'' `stub'`sub`k''
}
ds
my data as follows:
>df2
id calmonth product
1 101 01 apple
2 102 01 apple&nokia&htc
3 103 01 htc
4 104 01 apple&htc
5 104 02 nokia
Now i wanna calculate the number of ids whose products contain both 'apple' and 'htc' when calmonth='01'. Because what i need is not only 'apple' and 'htc', also i need 'apple' and 'nokia',etc.
So i want to realize this by a function like this:
xandy=function(a,b) data.frame(product=paste(a,b,sep='&'),
csum=length(grep('a.*b',x=df2$product))
)
also, i make a parameters list like this:
para=c('apple','htc','nokia')
but the problem is here. When i pass parameters like
xandy(para[1],para[2])
the results is as follows:
product csum
1 apple&htc 0
What my expecting result should be
product csum calmonth
1 apple&htc 2 01
2 apple&htc 0 02
So where is wrong about the parameters passing?
and, how can i add the calmonth in to the function() xandy correctly?
FYI.This question stems from my another question before
What's the R statement responding to SQL's 'in' statement
EDIT AFTER COMMENT
My predictive result will be:
product csum calmonth
1 apple&htc 2 01
2 apple&htc 0 02
May answer is another way how to tackle your problem.
library(stringr)
The function contains will split up the elements of a string vector according to the split character and evaluate if all target words are contained.
contains <- function(x, target, split="&") {
l <- str_split(x, split)
sapply(l, function(x, y) all(y %in% x), y=target)
}
contains(d$product, c("apple", "htc"))
[1] FALSE TRUE FALSE TRUE FALSE
The rest is just subsetting and summarizing
get_data <- function(a, b) {
e <- subset(d, contains(product, c(a, b)))
e$product2 <- paste(a, b, sep="&")
ddply(e, .(calmonth, product2), summarise, csum=length(id))
}
Using the data below, order does not play a role now anymore (see comment below).
get_data("apple", "htc")
calmonth product2 csum
1 1 apple&htc 1
2 2 apple&htc 2
get_data("htc", "apple")
calmonth product2 csum
1 1 htc&apple 1
2 2 htc&apple 2
I know this is not a direct answer to your question but I find this approach quite clean.
EDIT AFTER COMMENT
The reason that you get csum=0 is simply that you are searching for the wrong regex pattern, i.e. a something in between b not for apple ... htc. You need to construct the correct regex pattern,i.e. paste0(a, ".*", b).
Here a complete solution. I would not call it beautiful code, but anyway (note that I change the data to show that it generalizes for months).
library(plyr)
df2 <- read.table(text="
id calmonth product
101 01 apple
102 01 apple&nokia&htc
103 01 htc
104 02 apple&htc
104 02 apple&htc", header=T)
xandy <- function(a, b) {
pattern <- paste0(a, ".*", b)
d1 <- df2[grep(pattern, df2$product), ]
d1$product <- paste0(a,"&", b)
ddply(d1, .(calmonth), summarise,
csum=length(calmonth),
product=unique(product))
}
xandy("apple", "htc")
calmonth csum product
1 1 1 apple&htc
2 2 2 apple&htc