Should I reset running averages of Ray actor in reset_config when reuse_actors=True? - ray

Looking at the pbt_example, I see that the actor's accumulated accuracy is not reset to zero in reset_config, only the new hyperparameters are set (including changing lr in particular). While implementing my own actor, which as state has a running average of the current metric that should be minimized by the hyperparameter sweep, I was wondering if I should reset this running average to zero or not in reset_config? The implications of either resetting this state or not are not clear from the documentation and from browsing the source files of ray.tune.
Would appreciate some clarification on reuse_actors and reset_config, in particular. The documentation is unfortunately a bit vague in this respect. Otherwise great library and easy to get started!

Related

How do I measure GPU time on Metal?

I want to see programmatically how much GPU time a part of my application consumes on macOS and iOS. On OpenGL and D3D I can use GPU timer query objects. I searched and couldn't find anything similar for Metal. How do I measure GPU time on Metal without using Instruments etc. I'm using Objective-C.
There are a couple of problems with this method:
1) You really want to know what is the GPU side latency within a command buffer most of the time, not round trip to CPU. This is better measured as the time difference between running 20 instances of the shader and 10 instances of the shader. However, that approach can add noise since the error is the sum of the errors associated with the two measurements.
2) Waiting for completion causes the GPU to clock down when it stops executing. When it starts back up again, the clock is in a low power state and may take quite a while to come up again, skewing your results. This can be a serious problem and may understate your performance in benchmark vs. actual by a factor of two or more.
3) if you start the clock on scheduled and stop on completed, but the GPU is busy running other work, then your elapsed time includes time spent on the other workload. If the GPU is not busy, then you get the clock down problems described in (2).
This problem is considerably harder to do right than most benchmarking cases I've worked with, and I have done a lot of performance measurement.
The best way to measure these things is to use on device performance monitor counters, as it is a direct measure of what is going on, using the machine's own notion of time. I favor ones that report cycles over wall clock time because that tends to weed out clock slewing, but there is not universal agreement about that. (Not all parts of the hardware run at the same frequency, etc.) I would look to the developer tools for methods to measure based on PMCs and if you don't find them, ask for them.
You can add scheduled and completed handler blocks to a command buffer. You can take timestamps in each and compare. There's some latency, since the blocks are executed on the CPU, but it should get you close.
With Metal 2.1, Metal now provides "events", which are more like fences in other APIs. (The name MTLFence was already used for synchronizing shared heap stuff.) In particular, with MTLSharedEvent, you can encode commands to modify the event's value at particular points in the command buffer(s). Then, you can either way for the event to have that value or ask for a block to be executed asynchronously when the event reaches a target value.
That still has problems with latency, etc. (as Ian Ollmann described), but is more fine grained than command buffer scheduling and completion. In particular, as Klaas mentions in a comment, a command buffer being scheduled does not indicate that it has started executing. You could put commands to set an event's value at the beginning and (with a different value) at the end of a sequence of commands, and those would only notify at actual execution time.
Finally, on iOS 10.3+ but not macOS, MTLCommandBuffer has two properties, GPUStartTime and GPUEndTime, with which you can determine how much time a command buffer took to execute on the GPU. This should not be subject to latency in the same way as the other techniques.
As an addition to Ken's comment above, GPUStartTime and GPUEndTime is now available on macOS too (10.15+):
https://developer.apple.com/documentation/metal/mtlcommandbuffer/1639926-gpuendtime?language=objc

What is the proper way to calculate latency in omnet++?

I have written a simulation module. For measuring latency, I am using this:
simTime().dbl() - tempLinkLayerFrame->getCreationTime().dbl();
Is this the proper way ? If not then please suggest me or a sample code would be very helpful.
Also, is the simTime() latency is the actual latency in terms of micro
seconds which I can write in my research paper? or do I need to
scale it up?
Also, I found that the channel data rate and channel delay has no impact on the link latency instead if I vary the trigger duration the latency varies. For example
timer = new cMessage("SelfTimer");
scheduleAt(simTime() + 0.000000000249, timer);
If this is not the proper way to trigger simple module recursively then please suggest one.
Assuming both simTime and getCreationTime use the OMNeT++ class for representing time, you can operate on them directly, because that class overloads the relevant operators. Going with what the manual says, I'd recommend using a signal for the measurements (e.g., emit(latencySignal, simTime() - tempLinkLayerFrame->getCreationTime());).
simTime() is in seconds, not microseconds.
Regarding your last question, this code will have problems if you use it for all nodes, and you start all those nodes at the same time in the simulation. In that case you'll have perfect synchronization of all nodes, meaning you'll only see collisions in the first transmission. Therefore, it's probably a good idea to add a random jitter to every newly scheduled message at the start of your simulation.

Does the Zookeeper Watches system have a bug, or is this a limitation of the CAP theorem?

The Zookeeper Watches documentation states:
"A client will see a watch event for a znode it is watching before seeing the new data that corresponds to that znode." Furthermore, "Because watches are one time triggers and there is latency between getting the event and sending a new request to get a watch you cannot reliably see every change that happens to a node in ZooKeeper."
The point is, there is no guarantee you'll get a watch notification.
This is important, because in a sytem like Clojure's Avout, you're trying to mimic Clojure's Software Transactional Memory, over the network using Zookeeper. This relies on there being a watch notification for every change.
Now I'm trying to work out if this is a coding flaw, or a fundamental computer science problem, (ie the CAP Theorem).
My question is: Does the Zookeeper Watches system have a bug, or is this a limitation of the CAP theorem?
This seems to be a limitation in the way ZooKeeper implements watches, not a limitation of the CAP theorem. There is an open feature request to add continuous watch to ZooKeeper: https://issues.apache.org/jira/browse/ZOOKEEPER-1416.
etcd has a watch function that uses long polling. The limitation here which you need to account for is that multiple events may happen between receiving the first long poll result, and re-polling. This is roughly analogous to the issue with ZooKeeper. However they have a solution:
However, the watch command can do more than this. Using the index [passing the last index we've seen], we can watch for commands that have happened in the past. This is useful for ensuring you don't miss events between watch commands.
curl -L 'http://127.0.0.1:4001/v2/keys/foo?wait=true&waitIndex=7'

Restart C++ Concurrency Agent?

Can I restart the Concurrency Agent object after it done his work?
Short answer is No.
If you look at the life cycle described here, you'll see the following:
Agents have a set life cycle. The concurrency::agent_status
enumeration defines the various states of an agent. The following
illustration is a state diagram that shows how agents progress from
one state to another. In this illustration, solid lines represent
methods that you call from your application; dotted lines represent
methods that are called from the runtime.
This shows clearly that once your agent has entered the done or cancelled state, there's no way back.
Also, if you look at the agent::start documentation, you see this:
Moves an agent from the agent_created state to the agent_runnable state, and schedules it for execution.
and this:
An agent that has been canceled cannot be started.
Although this doesn't mention the done state, I've found from experience that once it's done, it's done. The state sequence diagram shows a one-way trip for all paths.

Methodology for debugging serial poll

I'm reading values from a sensor via serial (accelerometer values), in a loop similar to this:
while( 1 ) {
vector values = getAccelerometerValues();
// Calculate velocity
// Calculate total displacement
if ( displacement == 0 )
print("Back at origin");
}
I know the time that it takes per sample, which is taken care of in the getAccelerometerValues(), so I have a time-period to calculate velocity, displacement etc. I sample at approximately 120 samples / second.
This works, but there are bugs (non-precise accelerometer values, floating-point errors, etc), and calibrating and compensating to get reasonably accurate displacement values is proving difficult.
I'm having great amounts of trouble finding a process to debug the loop. If I use a debugger (my code happens to be written in C++, and I'm slowly learning to use gdb rather than print statements), I have issues stepping through and pushing my sensor around to get an accelerometer reading at the point in time that the debugger executes the line. That is, it's very difficult to get the timing of "continue to next line" and "pushing sensor so it's accelerating" right.
I can use lots of print statements, which tend to fly past on-screen, but due to the number of samples, this gets tedious and difficult to derive where the problems are, particularly if there's more than one print statement per loop tick.
I can decrease the number of samples, which improves the readability of the programs output, but drastically decreases the reliability of the acceleration values I poll from the sensor; if I poll at 1Hz, the chances of polling the accelerometer value while it's undergoing acceleration drop considerably.
Overall, I'm having issues stepping through the code and using realistic data; I can step through it with non-useful data, or I can not really step through it with better data.
I'm assuming print statements aren't the best debugging method for this scenario. What's a better option? Are there any kinds of resources that I would find useful (am I missing something with gdb, or are there other tools that I could use)? I'm struggling to develop a methodology to debug this.
A sensible approach would be to use some interface for getAccelerometerValues() that you can substitute at either runtime or build-time, such as passing in a base-class pointer with a virtual method to override in the concrete derived class.
I can describe the mechanism in more detail if you need, but the ideal is to be able to run the same loop against:
real live data
real live data (and save it to file)
canned real data saved from a previous run
fake data you cooked up as a test case
Note in particular that the 'replay' version(s) should be easy to debug, if each call just returns the next data from the file.
Create if blocks for the exact conditions you want to debug. For example, if you only care about when the accelerometer reads that you are moving left:
if(movingLeft(values) {
print("left");
}
The usual solution to this problem is recording. Your capture sample sequences from your sensor in a real-time manner, and store them to files. Then you train your system, debug your code, etc. using the recorded data. Finally, you connect the working code to the real data stream which flows immediately from the sensor.
I would debug your code with fake (ie: random) values, before everything else.
It the computations work as expected, then I would use the values read from the port.
Also, isnt' there a way to read those values in a callback/push fashion, that is, get your function called only when there's new, reliable data?
edit: I don't know what libraries you are using, but in the .NET framework you can use a SerialPort class with the Event DataReceived. That way you are sure to use the most actual and reliable data.