rnd from a local variable - list

I am trying to write a code in which the probability of a collaboration being successful increases the more similar the agents are. on each run, a local variable computes the difference between agents and creates a link between agents if that difference doesn't exceed a threshold. the agents are subsequently stored in 2 separate lists based on whether a link was created or not. now, i would like to create another list that contains a successful vs unsuccessful value for each agent with whom a link was created. the probability of that value being 'successful' increases the more similar agents are (the more their difference value approaches 0).
the closest i've come to implementing this is the rnd extension. however, it seems to me that rnd:weighted-one-of only takes agent sets or lists as input and i do not have a predefined list for the similarities of agents. it is the whole range of 0 (complete similarity) and 1 (dissimilarity) that i would like the local variable to be compared to. is this possible in the way i'm currently thinking about it?
let difference 0
let initiator one-of turtles
ask initiator [
let potential one-of other turtles
if random-float 1.0 <= [my-activation] of self [ ;; different probability
set difference [my-profile] of self - [my-profile] of potential] ;; of initiating
;; collab
ifelse difference <= threshold [ ;; if threshold is met
create-link-with potential ;; link is initiated
set collaborators fput potential collaborators][ ;; the initiator adds the
set failures fput potential failures] ;; potential to their list of
;; either collaborators

I recommend considering logistic functions as a way to relate similarity to probability. A logistic function is a "s"-shaped function that ranges from 0 to 1 as some X value (e.g., similarity) varies over a wide range. You can define the logistic function by assuming, for example, what values of similarity produce probabilities of 10% and 90%.
There is a complete discussion of using and programming logistic functions in NetLogo models in chapter 16 of: Railsback and Grimm 2019, "Agent-based and Individual-based Modeling".

Related

Linear Programming - Re-setting a variable based on it's cumulative count

Detailed business problem:
I'm trying to solve a production scheduling business problem as below:
I have two plants producing FG A and B respectively.
Both the products consume the same Raw Material x
I need to create a 30 day production schedule looking at the Raw Material availability.
FG A and B can be produced if there is sufficient raw material available on the day.
After every 6 days of production the plant has to undergo maintenance and the production on that day will be zero.
Objective is to maximize the margin looking at the day level Raw material available and adhere to the production constraint (i.e. shutdown after every 6th day)
I need to build a linear programming to address the below problem:
Variable y: (binary)
variable z: cumulative of y
When z > 6 then y = 0. I also need to reset the cumulation of z after this point.
Desired output:
How can I build the statement to MILP constraint. Are there any techniques for solving this problem. Thank you.
I think you can model your maintenance differently. Just forbid any sequences of 7 ones for y. I.e.
y[t-6]+y[t-5]+y[t-4]+y[t-3]+y[t-2]+y[t-1]+y[t] <= 6 for t=1,..,T
This is easier than using your accumulator. Note that the beginning needs some attention: you can use historic data for this. I.e., at t=1, the values for t=0,-1,-2,.. are known.
Your accumulator approach is not inherently wrong. We often use it to model inventory. An inventory capacity is a restriction on how large the accumulated inventory can be.

Netlogo - Creating a global mean for a value that is sometimes not existing

I'm trying to measure the mean value of agents that are performing a certain acitivity. To calculate this value, I tried to use: set mean-powerdemand mean [ powerdemand ] of agents with [ powerdemand > 0 ]
To give some contexT: My model concerns charging electric cars. I want to measure the powerdemand of agents that are actually charging their car and thus have a powerdemand > 0. I want to exclude the non-charging agents for this mean calculation, as this would bring down the mean value. However, as there are moments in time (specifically at the start of the run) where no cars are charging, I am getting the error: Can't find mean of a list with no numbers: []
Does someone know a way to work around calculations with agent-own variables and/or lists that are sometimes empty?
I started of by calculating this value in a plot by using the same code. By doing so, it does not prohibit me from running the model, but since I want to use it as a reporter in the BehaviourSpace set-up, I want to make a global value of it.
Thanks.

NETLOGO: turtles-own lists too big for GUI

In Netlogo, I have turtles-own lists, which means I set a turtle's variable to be a list. Each tick, another value is added to the list. After a few thousand ticks, these lists are quite long... and the problem arises that I can't open the agent monitor in the GUI any more because it takes too long to load the list.
reproducible code:
breed [persons person]
turtles-own [examplelist]
to setup
clear-all
reset-ticks
create-persons 1 [setxy 0 0]
ask turtles [set examplelist []]
end
to go
ask turtles [set examplelist lput ticks examplelist]
tick
end
I would need the agent monitor to watch another turtle-own variable; I don't need to watch the lists (they are just used to do a calculation every 8760 ticks).
Is there maybe a possibility, to e.g. hide the list from the agent monitor? Or do I need to handle the lists as global variables instead? Being quite unhandy as I would need to create and name separate lists for every turtle...
I can see three options:
1/ If you are creating a modelling framework, I assume that your user cannot actually code in NetLogo. This means that you have to predefine the scenarios for them anyway (for example, they could choose the calculation), so you only need to have the possible calculations stored instead of all the input values to those calculations.
2/ It is not clear from your question why any user would open an inspect window or otherwise access the individual turtle. If the user doesn't need it directly, instead of adding all this information to the turtles, you could export it to a file, adding a line each tick. The user would do the analysis of the simulation in R or Excel or whatever.
3/ You could create a shadow turtle for every turtle. This is not something I would recommend, but the idea is that the shadow turtle has a subset of variables (not the lists) and the variable values it does have are identical to the turtle it is shadowing. The limited set of variables version of the turtle is the one that would accessible to monitor.

Imbalance between errors in data summary and tree visualization in Weka

I tried to run a simple classification on the iris.arff dataset in Weka, using the J48 algorithm. I used cross-validation with 10 folds and - if I'm not wrong - all the default settings for J48.
The result is a 96% accuracy with 6 incorrectly classified instances.
Here's my question: according to this the second number in the tree visualization is the number of the wrongly classified instances in each leaf, but then why their sum isn't 6 but 3?
EDIT: running the algorithm with different test options I obtain different results in terms of accuracy (and therefore number of errors), but when I visualize the tree I get always the same tree with the same 3 errors. I still can't explain why.
The second number in the tree visualization is not the number of the wrongly classified instances in each leaf - it's the total weight of those wrongly classified instances.
Did you, by any chance, weigh some of those instances with 0.5 instead of 1?
Another option is that you are actually executing two different models. One where you use the full training set to build the classifier (classifier.buildClassifier(instances)) and another one where you run Cross-validation (eval.crossValidateModel(...)) with 10 train/test folds. The first model will produce the visualised tree with less errors (larger trainingset) while the second model from CV produces the output statistics with more errors. This would explain why you get different stats when changing the test set but still the same tree that is built on the full set.
For the record: if you train (and visualise) the tree with the full dataset, you will appear to have less errors, but your model will actually be overfitted and the obtained performance measures will probably not be realistic. As such, your results from CV are much more useful and you should visualise the tree from that model.

WEKA cross validation discretization

I'm trying to improve the accuracy of my WEKA model by applying an unsupervised discretize filter. I need to decided on the number of bins and whether equal frequency binning should be used. Normally, I would optimize this using a training set.
However, how do I determine the bin size and whether equal frequency binning should be used when using cross-validation? My initial idea was to use the accuracy result of the classifier in multiple cross-validation tests to find the optimal bin size. However, isn't it wrong, despite using cross-validation, to use this same set to also test the accuracy of the model, because I then have an overfitted model? What would then be a correct way of determining the bin sizes?
I also tried the supervized discretize filter to determine the bin sizes, however this results in only in single bins. Does this mean that my data is too random and therefore cannot be clustered into multiple bins?
Yes, you are correct in both your idea and your concerns for the first issue.
What you are trying to do is Parameter Optimization. This term is usually used when you try to optimize the parameters of your classifier, e.g., the number of trees for the Random Forest or the C parameter for SVMs. But you can apply it as well to pre-processing steps and filters.
What you have to do in this case is a nested cross-validation. (You should check https://stats.stackexchange.com/ for more information, for example here or here). It is important that the final classifier, including all pre-processing steps like binning and such, has never seen the test set, only the training set. This is the outer cross-validation.
For each fold of the outer cross-validation, you need to do an inner cross-validation on the training set to determine the optimal parameters for your model.
I'll try to "visualize" it on a simple 2-fold cross-validation
Data set
########################################
Split for outer cross-validation (2-fold)
#################### ####################
training set test set
Split for inner cross-validation
########## ##########
training test
Evaluate parameters
########## ##########
build with evaluated
bin size 5 acc 70%
bin size 10 acc 80%
bin size 20 acc 75%
...
=> optimal bin size: 10
Outer cross-validation (2-fold)
#################### ####################
training set test set
apply bin size 10
train model evaluate model
Parameter optimization can be very exhausting. If you have 3 parameters with 10 possible parameter values each, that makes 10x10x10=1000 parameter combinations you need to evaluate for each outer fold.
This is a topic of machine learning by itself, because you can do everything from the naive grid search to evolutionary search here. Sometimes you can use heuristics. But you need to do some kind of parameter optimization every time.
As for your second question: This is really hard to tell without seeing your data. But you should post that as a separate question anyway.