Power Query Excel. Read and store variable from worksheet - powerbi

I'm sorry for this poor title but can't formulate correct title at the time.
What is the process:
Get message id aka offset from a sheet.
Get list of updates from telegram based in this offset.
Store last message id which will be used as offset in p.1
Process updates.
Offset allows you to fetch updates only received after this offset.
E.g i get 10 messages 1st time and last message id aka offset is 100500. Before 2nd run bot received additional 10 message so there's 20 in total. To not load all of 20 messages (10 of which i already processed) i need to specify offset from 1st run, so API will return only last 10 messages.
PQ is run in Excel.
let
// Offset is number read from table in Excel. Let's say it's 10.
Offset = 10
// Returns list of messages as JSON objects.
Updates = GetUpdates(Offset),
// This update_id will be used as offset in next query run.
LastMessageId = List.Last(Updates)[update_id],
// Map Process function to each item in the list of update JSON objects.
Map = List.Transform(Updates, each Process(_))
in
Map
The issue is that i need to store/read this offset number each time query is executed AND process updates.
Because of lazy evaluation based on code below i can either output LastMessageId from the query or output result of a Map function.
The question is: how can i do both things store/load LastMessageId from Updates and process those updates.
Thank you.

Related

Select events with a maximum in a sliding window

I have this stream :
define stream locationStream (cell string, device string, power long);
I want to select in this stream, with a sliding windows of 10 seconds, for every device, the value of the 'cell' attribute for which 'power' is the largest.
What queries should I use to get this result with Siddhi ? Something like
from locationStream#window.time(10 seconds)
select max(power), device, <cell where power = max(power)>
group by device
insert all events into cellStream
You can use Siddhi maxByTimeWindow offered through extrema extension. Usage is documented in shared resources. You will have to use this with a partition to get per device max. Suggested query should look like below.
partition with ( device of locationStream )
begin
from locationStream#extrema:maxByTime(power, 10 sec)
select power, device, cell
insert events into cellStream
end;

Long lived state with Google Dataflow

Just trying to get my head around the programming model here. Scenario is I'm using Pub/Sub + Dataflow to instrument analytics for a web forum. I have a stream of data coming from Pub/Sub that looks like:
ID | TS | EventType
1 | 1 | Create
1 | 2 | Comment
2 | 2 | Create
1 | 4 | Comment
And I want to end up with a stream coming from Dataflow that looks like:
ID | TS | num_comments
1 | 1 | 0
1 | 2 | 1
2 | 2 | 0
1 | 4 | 2
I want the job that does this rollup to run as a stream process, with new counts being populated as new events come in. My question is, where is the idiomatic place for the job to store the state for the current topic id and comment counts? Assuming that topics can live for years. Current ideas are:
Write a 'current' entry for the topic id to BigTable and in a DoFn query what the current comment count for the topic id is coming in. Even as I write this I'm not a fan.
Use side inputs somehow? It seems like maybe this is the answer, but if so I'm not totally understanding.
Set up a streaming job with a global window, with a trigger that goes off every time it gets a record, and rely on Dataflow to keep the entire pane history somewhere. (unbounded storage requirement?)
EDIT: Just to clarify, I wouldn't have any trouble implementing any of these three strategies, or a million different other ways of doing it, I'm more interested in what is the best way of doing it with Dataflow. What will be most resilient to failure, having to re-process history for a backfill, etc etc.
EDIT2: There is currently a bug with the dataflow service where updates fail if adding inputs to a flatten transformation, which will mean you'll need to discard and rebuild any state accrued in the job if you make a change to a job that includes adding something to a flatten operation.
You should be able to use triggers and a combine to accomplish this.
PCollection<ID> comments = /* IDs from the source */;
PCollection<KV<ID, Long>> commentCounts = comments
// Produce speculative results by triggering as data comes in.
// Note that this won't trigger after *every* element, but it will
// trigger relatively quickly (as the system divides incoming data
// into work units). You could also throttle this with something
// like:
// AfterProcessingTime.pastFirstElementInPane()
// .plusDelayOf(Duration.standardMinutes(5))
// which will produce output every 5 minutes
.apply(Window.triggering(
Repeatedly.forever(AfterPane.elementCountAtLeast(1)))
.accumulatingFiredPanes())
// Count the occurrences of each ID
.apply(Count.perElement());
// Produce an output String -- in your use case you'd want to produce
// a row and write it to the appropriate source
commentCounts.apply(new DoFn<KV<ID, Long>, String>() {
public void processElement(ProcessContext c) {
KV<ID, Long> element = c.element();
// This includes details about the pane of the window being
// processed, and including a strictly increasing index of the
// number of panes that have been produced for the key.
PaneInfo pane = c.pane();
return element.key() + " | " + pane.getIndex() + " | " + element.value();
}
});
Depending on your data, you could also read whole comments from the source, extract the ID, and then use Count.perKey() to get the counts for each ID. If you want a more complicated combination, you could look at defining a custom CombineFn and using Combine.perKey.
Since BigQuery does not support overwriting rows, one way to go about this is to write the events to BigQuery, and query the data using COUNT:
SELECT ID, COUNT(num_comments) from Table GROUP BY ID;
You can also do per-window aggregations of num_comments within Dataflow before writing the entries to BigQuery; the query above will continue to work.

Stopping or cancelling queued keyboard commands in a program

I have a program written in python 2.7 which takes photos of a sample from 3 different cameras when the result value is typed into the program.
The USB controller bandwidth can't handle all cameras firing at the same time, so I have to call each one individually. This causes a delay between pressing the value and the preview of the pictures showing up.
During this delay, the program is still able to accept keyboard commands which are then addressed once the photos have been taken. This is causing issues, as sometimes, values are inputted twice, which means that the value is then applied to the next one after it has taken the photos for the first sample.
What I'm after is a way to disregard any queued keyboard commands whilst the program is working on the current command:
def selChange(self):
#Disable the textbox
self.valInput.configure(state='disabled')
#Gather pictures from cameras and store them in 2D list with sample result (This takes a second or two to complete)
self.gatherPictures()
if not int(self.SampleList.size()) == 0:
#clear texbox
self.valInput.delete(0,END)
#Create previews from 2D list
self.img1 = ImageTk.PhotoImage(self.dataList[int(self.SampleList.curselection()[0])][2].resize((250,250),Image.ANTIALIAS))
self.pic1.configure(image = self.img1)
self.img2 = ImageTk.PhotoImage(self.dataList[int(self.SampleList.curselection()[0])][3].resize((250,250),Image.ANTIALIAS))
self.pic2.configure(image = self.img2)
self.img3 = ImageTk.PhotoImage(self.dataList[int(self.SampleList.curselection()[0])][4].resize((250,250),Image.ANTIALIAS))
self.pic3.configure(image = self.img3)
self.img4 = ImageTk.PhotoImage(Image.open("Data/" + str(self.dataList[int(self.SampleList.curselection()[0])][1]) + ".jpg").resize((250,250),Image.ANTIALIAS))
self.pic4.configure(image = self.img4)
#Unlock textbox ready for next sample
self.valInput.configure(state='normal')
I was hoping that disabling the textbox and re-enabling it afterwards would work, but it doesn't. I wanted to use buttons, but they have insisted that it be typed to increase speed

How to store different timestamps of packets in OMNeT++

I am new in OMNeT++. I am doing a simple simulation where a client sends some packets to a server. I want, for instance, store the timestamp of the first packet sent, and later, I want store the timestamp of the tenth package sent. I would want to be able to store those two timestamps in two variables, timestamp_of_first_packet and timestamp_of_last_packet, kind of like
packets_sent = 1
cPacket* testPacket = new cPacket();
double timestamp_of_first_packet = testPacket->getTimestamp().dbl();
packets_sent++;
...
double timestamp_of_last_packet = testPacket->getTimestamp().dbl();
The aim is to calculate a time interval between the two packets, with this formula:
double time_interval = timestamp_of_last_packet - timestamp_of_first_packet;
I know that this method is wrong, because both variables store the same value.
How I can store both timestamps correctly? Thanks in advance.
You can get the current simulation time by calling simTime(). If you want some time to pass in your simulation, have your module schedule an event to itself (using scheduleAt). Remember your module is written in C++, so you can use all its features (like member variables) to write clean code.

measuring concurent loop times in erlang

I create a round of processes in erlang and wish to measure the time that it took for the first message to pass throigh the network and the entire message series, each time the first node gets the message back it sends another one.
right now in the first node i have the following code:
receive
stop->
io:format("all processes stopped!~n"),
true;
start->
statistics(runtime),
Son!{number, 1},
msg(PID, Son, M, 1);
{_, M} ->
{Time1, _} = statistics(runtime),
io:format("The last message has arrived after ~p! ~n",[Time1*1000]),
Son!stop;
of course i start the statistics when sending the first message.
as you can see i use the Time_Since_Last_Call for the first message loop and wish to use the Total_Run_Time for the entire run, the problem is that Total_Run_Time is accumulative since i start the statistics for the first time.
The second thought i had in mind is using another process with 2 receive loops getting the times for each one adding them and printing but i'm sure that erlang can do better than this.
i guess the best method to solve this is somehow flush the Total_Run_Time, but i couldn't find how this could be done. any ideas how this can be tackled?
One way to measure round-trip times would be to send a timestamp along with each message. When the first node receives the message, it can then measure the round-trip time, calculating Total_Run_Time - Timestamp.
To calculate the total run time, I would memorize the first timestamp in the process state (or dictionary), and calculate the total run time when stopping the test.
Besides, given that you mention the network, are you sure that the CPU time (which is what statistics(runtime) calculates is what you're after? Perhaps, wall clock time would be more appropriate.