can I reset the hidden state of an RNN between input data sets in Keras? - hidden

I am training an RNN on a large data set which consists of disparate sources. I do not want the history of one set to spill over to the next. This means I want to reset the hidden state at the end of one set, before sending in the next. How can I do that with Keras? The doc claims you can get into the low level configurations.
What I am trying to do is resetting the lstm hidden state every time a new data set is fed, so no influence from the prev dataset is carried forward. see line
prevh = Hout[t-1] if t > 0 else h0
from Karpathy's simple python implementation
https://gist.github.com/karpathy/587454dc0146a6ae21fc
line 45
If I find the lstm layer and call reset on it, I am worried that would wipe out the entire training of the weights and biases, not just Hout
Here is the training loop code
for iteration in range(1, 10):
for key in X_dict:
X = X_dict[key]
y = y_dict[key]
history=model.fit(X, y, batch_size=batch_size, callbacks=cbks, nb_epoch=1,verbose=0)
Each turn in the loop feeds in data from a single market. That's where I like to reset the hout in the lstm.

To reset the states of your model, call .reset_states() on either a specific layer, or on your entire model. source
So if you have a list of datasets :
for ds in datasets :
model.reset_states()
model.fit(ds['inputs'],ds['targets'],...)
Is that what you are looking for?
EDIT :
for iteration in range(1, 10):
for key in X_dict:
model.reset_states() # reset the states of all the LSTM's of your network
#model.layers[lstm_layer_index].reset_states() # reset the states of this specific LSTM layer
X = X_dict[key]
y = y_dict[key]
history=model.fit(X, y, batch_size=batch_size, callbacks=cbks, nb_epoch=1,verbose=0)
This is how you apply it.
By default, the LSTM's are not stateful. Which means that they won't keep a hidden state after going over a sequence. The initial state when starting a new sequence will be set to 0. If you selected stateful=True, then it will keep the last hidden state (the output) of the previous sequence to initialize itself for the next sequence in the batch. It's like the sequence was continuing.
Doing model.reset_states() will just reset those last hidden states that were kept in memory to 0, just like if the sequence was starting from scratch.
If you don't trust that .reset_states() to do what you expect, feel free to go to the source code.

Related

Power Query Excel. Read and store variable from worksheet

I'm sorry for this poor title but can't formulate correct title at the time.
What is the process:
Get message id aka offset from a sheet.
Get list of updates from telegram based in this offset.
Store last message id which will be used as offset in p.1
Process updates.
Offset allows you to fetch updates only received after this offset.
E.g i get 10 messages 1st time and last message id aka offset is 100500. Before 2nd run bot received additional 10 message so there's 20 in total. To not load all of 20 messages (10 of which i already processed) i need to specify offset from 1st run, so API will return only last 10 messages.
PQ is run in Excel.
let
// Offset is number read from table in Excel. Let's say it's 10.
Offset = 10
// Returns list of messages as JSON objects.
Updates = GetUpdates(Offset),
// This update_id will be used as offset in next query run.
LastMessageId = List.Last(Updates)[update_id],
// Map Process function to each item in the list of update JSON objects.
Map = List.Transform(Updates, each Process(_))
in
Map
The issue is that i need to store/read this offset number each time query is executed AND process updates.
Because of lazy evaluation based on code below i can either output LastMessageId from the query or output result of a Map function.
The question is: how can i do both things store/load LastMessageId from Updates and process those updates.
Thank you.

TopologyTestDriver with streaming groupByKey.windowedBy.reduce not working like kafka server [duplicate]

I'm trying to play with Kafka Stream to aggregate some attribute of People.
I have a kafka stream test like this :
new ConsumerRecordFactory[Array[Byte], Character]("input", new ByteArraySerializer(), new CharacterSerializer())
var i = 0
while (i != 5) {
testDriver.pipeInput(
factory.create("input",
Character(123,12), 15*10000L))
i+=1;
}
val output = testDriver.readOutput....
I'm trying to group the value by key like this :
streamBuilder.stream[Array[Byte], Character](inputKafkaTopic)
.filter((key, _) => key == null )
.mapValues(character=> PersonInfos(character.id, character.id2, character.age) // case class
.groupBy((_, value) => CharacterInfos(value.id, value.id2) // case class)
.count().toStream.print(Printed.toSysOut[CharacterInfos, Long])
When i'm running the code, I got this :
[KTABLE-TOSTREAM-0000000012]: CharacterInfos(123,12), 1
[KTABLE-TOSTREAM-0000000012]: CharacterInfos(123,12), 2
[KTABLE-TOSTREAM-0000000012]: CharacterInfos(123,12), 3
[KTABLE-TOSTREAM-0000000012]: CharacterInfos(123,12), 4
[KTABLE-TOSTREAM-0000000012]: CharacterInfos(123,12), 5
Why i'm getting 5 rows instead of just one line with CharacterInfos and the count ?
Doesn't groupBy just change the key ?
If you use the TopologyTestDriver caching is effectively disabled and thus, every input record will always produce an output record. This is by design, because caching implies non-deterministic behavior what makes itsvery hard to write an actual unit test.
If you deploy the code in a real application, the behavior will be different and caching will reduce the output load -- which intermediate results you will get, is not defined (ie, non-deterministic); compare Michael Noll's answer.
For your unit test, it should actually not really matter, and you can either test for all output records (ie, all intermediate results), or put all output records into a key-value Map and only test for the last emitted record per key (if you don't care about the intermediate results) in the test.
Furthermore, you could use suppress() operator to get fine grained control over what output messages you get. suppress()—in contrast to caching—is fully deterministic and thus writing a unit test works well. However, note that suppress() is event-time driven, and thus, if you stop sending new records, time does not advance and suppress() does not emit data. For unit testing, this is important to consider, because you might need to send some additional "dummy" data to trigger the output you actually want to test for. For more details on suppress() check out this blog post: https://www.confluent.io/blog/kafka-streams-take-on-watermarks-and-triggers
Update: I didn't spot the line in the example code that refers to the TopologyTestDriver in Kafka Streams. My answer below is for the 'normal' KStreams application behavior, whereas the TopologyTestDriver behaves differently. See the answer by Matthias J. Sax for the latter.
This is expected behavior. Somewhat simplified, Kafka Streams emits by default a new output record as soon as a new input record was received.
When you are aggregating (here: counting) the input data, then the aggregation result will be updated (and thus a new output record produced) as soon as new input was received for the aggregation.
input record 1 ---> new output record with count=1
input record 2 ---> new output record with count=2
...
input record 5 ---> new output record with count=5
What to do about it: You can reduce the number of 'intermediate' outputs through configuring the size of the so-called record caches as well as the setting of the commit.interval.ms parameter. See Memory Management. However, how much reduction you will be seeing depends not only on these settings but also on the characteristics of your input data, and because of that the extent of the reduction may also vary over time (think: could be 90% in the first hour of data, 76% in the second hour of data, etc.). That is, the reduction process is deterministic but from the resulting reduction amount is difficult to predict from the outside.
Note: When doing windowed aggregations (like windowed counts) you can also use the Suppress() API so that the number of intermediate updates is not only reduced, but there will only ever be a single output per window. However, in your use case/code you the aggregation is not windowed, so cannot use the Suppress API.
To help you understand why the setup is this way: You must keep in mind that a streaming system generally operates on unbounded streams of data, which means the system doesn't know 'when it has received all the input data'. So even the term 'intermediate outputs' is actually misleading: at the time the second input record was received, for example, the system believes that the result of the (non-windowed) aggregation is '2' -- its the correct result to the best of its knowledge at this point in time. It cannot predict whether (or when) another input record might arrive.
For windowed aggregations (where Suppress is supported) this is a bit easier, because the window size defines a boundary for the input data of a given window. Here, the Suppress() API allows you to make a trade-off decision between better latency but with multiple outputs per window (default behavior, Suppress disabled) and longer latency but you'll get only a single output per window (Suppress enabled). In the latter case, if you have 1h windows, you will not see any output for a given window until 1h later, so to speak. For some use cases this is acceptable, for others it is not.

How can I visually see the range of RSU?

Is there a way, I can outline the range of RSU in omnet++?. I know how to set the range of RSU, as shown in the code below. But how do I actually see that range. Because, I want to see if the how does the transmission happens outside the RSU.
I tried adding parameters in RSU class and .ned class but that does not work at all. Sort of lost here what to do.
Here is how I set up range for RSU
Thesis.rsu[0].appl.dataROI = 500m
Thesis.rsu[0].appl.minDistance = 0m
Thesis.rsu[0].appl.maxDistance = 500m
I just want to see if it's possible that in the simulation it shows RSU range.
You can show the maximum interference distance by setting the following property in your omnetpp.ini
*.connectionManager.drawMaxIntfDist = true
However, this shows only the maximum distance where the signal is considered by other nodes in the network. Setting the value for maxInterfDist to a very high value doesn't imply that the signal is successful received at another node within that distance.

RRD Time since last non-change of counter

I have a RRD DCOUNTER, which gets its data from the water meter: so many units since start of the program which looks at the meter.
So the input might be 2,3,4,5,5,5,5,8,12,13,13,14,14,14,14,14
That means the flow is 1,1,1,0,0,0,0,3,4,1,0,1,0,0,0,0,0
I want a graph showing minutes since last rest
0,1,2,0,1,2,3,0,0,0,0,0,0,1,2,3,4,5
If the flow is never zero, there must be a leak.
Hopefully the graph should rise steadily from bedtime to wakeup, and from leaving to work to coming back.
Ideas?
First, you set up your input data source as a COUNTER type, so that you will be storing the changes, IE the flow.
Now, you can define a calculated datasource (for graphs etc) that counts the minutes since the last zero, using something like:
IF ( flow == 0 )
THEN
timesincerest = 0
ELSE
timesincerest = previous value of timesincerest + 1
END
In RPN, that would be:
timesincerest = flow, 0, GT, PREV(timesincerest), STEPWIDTH, +, 0, IF
This will give you a count of the number of seconds since the last reset.

Stopping or cancelling queued keyboard commands in a program

I have a program written in python 2.7 which takes photos of a sample from 3 different cameras when the result value is typed into the program.
The USB controller bandwidth can't handle all cameras firing at the same time, so I have to call each one individually. This causes a delay between pressing the value and the preview of the pictures showing up.
During this delay, the program is still able to accept keyboard commands which are then addressed once the photos have been taken. This is causing issues, as sometimes, values are inputted twice, which means that the value is then applied to the next one after it has taken the photos for the first sample.
What I'm after is a way to disregard any queued keyboard commands whilst the program is working on the current command:
def selChange(self):
#Disable the textbox
self.valInput.configure(state='disabled')
#Gather pictures from cameras and store them in 2D list with sample result (This takes a second or two to complete)
self.gatherPictures()
if not int(self.SampleList.size()) == 0:
#clear texbox
self.valInput.delete(0,END)
#Create previews from 2D list
self.img1 = ImageTk.PhotoImage(self.dataList[int(self.SampleList.curselection()[0])][2].resize((250,250),Image.ANTIALIAS))
self.pic1.configure(image = self.img1)
self.img2 = ImageTk.PhotoImage(self.dataList[int(self.SampleList.curselection()[0])][3].resize((250,250),Image.ANTIALIAS))
self.pic2.configure(image = self.img2)
self.img3 = ImageTk.PhotoImage(self.dataList[int(self.SampleList.curselection()[0])][4].resize((250,250),Image.ANTIALIAS))
self.pic3.configure(image = self.img3)
self.img4 = ImageTk.PhotoImage(Image.open("Data/" + str(self.dataList[int(self.SampleList.curselection()[0])][1]) + ".jpg").resize((250,250),Image.ANTIALIAS))
self.pic4.configure(image = self.img4)
#Unlock textbox ready for next sample
self.valInput.configure(state='normal')
I was hoping that disabling the textbox and re-enabling it afterwards would work, but it doesn't. I wanted to use buttons, but they have insisted that it be typed to increase speed