How to log images in middle of `train`

How to log images in middle of `train` - ray

It seems like you can only log data by return values from train. In many workflows, it might make more sense to directly save images in the middle of a train function (e.g. save images sampled by a generative model or from a vision-based MDP).
Is there a simple way to do this? One idea would be to try to find the log-directory and write to it directly, but would this have issues?

I'm guessing you're asking about using logging images in Trainable:_train().
If you're not in local mode, within the trainable, you can access the self.logdir attribute to write images to. This should automatically be sync'ed back to your head node (if you're running remotely).

Related

Django: Making sure a complex object is accessible throughout multiple view calls

for a project, I am trying to create a web-app that, among other things, allows training of machine learning agents using python libraries such as Dedupe or TensorFlow. In cases such as Dedupe, I need to provide an interface for active learning, which I currently realize through jquery based ajax calls to a view that takes and sends the necessary training data.
The problem is that I need this agent object to stay alive throughout multiple view calls and be accessible by each individual call. I have tried realizing this via the built-in cache system using Memcached, but the serialization does not seem to keep all the info intact, and while I am technically able to restore the object from the cache, this appears to break the training algorithm.
Essentially, I want to keep the object alive within the application itself (rather than an external memory store) and be able to access it from another view, but I am at a bit of a loss of how to realize this.
If someone knows the proper technique to achieve this, I would be very grateful.
Thanks in advance!

To follow up with this question, I have since realized that the behavior shown seemed to have been an effect of trying to use the result of a method call from the object loaded from cache directly in the return properties of a view. Specifically, my code looked as follows:
#model is the object loaded from cache
#this returns the wrong object (same object as on an earlier call)
return JsonResponse({"pairs": model.uncertain_pairs()})
and was changed to the following
#model is the object loaded from cache
#this returns the correct object (calls and returns the model.uncertain_pairs() method properly)
uncertain = model.uncertain_pairs()
return JsonResponse({"pairs": uncertain})
I am unsure if this specifically happens due to an implementation from Dedupe or Django side or due to Python, but this has undoubtedly fixed the issue.
To return back to the question, Django does seem to be able to properly (de-)serialize objects and their properties in cache, as long as the cache is set up properly (see Apparent bug storing large keys in django memcached which I also had to deal with)

Repository pattern: isn't getting the entire domain object bad behavior (read method)?

A repository pattern is there to abstract away the actual data source and I do see a lot of benefits in that, but a repository should not use IQueryable to prevent leaking DB information and it should always return domain objects, not DTO's or POCO's, and it is this last thing I have trouble with getting my head around.
If a repository pattern always has to return a domain object, doesn't that mean it fetches way too much data most of the times? Lets say it returns an employee domain object with forty properties and in the service and view layers consuming that object only five of those properties are actually used.
It means the database has fetched a lot of unnecessary data a pumped that across the network. Doing that with one object is hardly noticeable, but if millions of records are pushed across that way and a lot of of the data is thrown away every time, is that not considered bad behavior?
Yes, when adding or editing or deleting the object, you will use the entire object, but reading the entire object and pushing it to another layer which uses only a fraction of it is not utilizing the underline database and network in the most optimal way. What am I missing here?

There's nothing preventing you from having a separate read model (which could a separately stored projection of the domain or a query-time projection) and separating out the command and query concerns - CQRS.
If you then put something like GraphQL in front of your read side then the consumer can decide exactly what data they want from the full model down to individual field/property level.
Your commands still interact with the full domain model as before (except where it's a performance no-brainer to use set based operations).

Tensorboard logging non-tensor (numpy) information (AUC)

I would like to record in tensorboard some per-run information calculated by some python-blackbox function.
Specifically, I'm envisioning using sklearn.metrics.auc after having run sess.run().
If "auc" was actually a tensor node, life would be simple. However, the setup is more like:
stuff=sess.run()
auc=auc(stuff)
If there is a more tensorflow-onic way of doing this I am interested in that. My current setup involves creating separate train&test graphs.
If there is a way to complete the task as stated above, I am interested in that as well.

You can make a custom summary with your own data using this code:
tf.Summary(value=[tf.Summary.Value(tag="auc", simple_value=auc)]))
Then you can add that summary to the summary writer yourself. (Don't forget to add a step).

Sharing data across Sitecore pipelines

I´m trying to perform some actions in the pipeline "httpRequestBegin" only when necessary.
My processor is executed after Sitecore resolves the user (processor type="Sitecore.Pipelines.HttpRequest.UserResolver, Sitecore.Kernel" ), as i´m resolving the user too if Sitecore is not able to resolve it first.
Later, i want to add some rendering in the pipeline "insertRenderings", only if actions in the previous pipeline were executed (If i resolved the user, show a message), so i´m trying to save some "flag" in the first step, to check in the second.
My question is, where can I store that flag? I´m trying to find some kind of "per request" cache...
So far, I've tried:
The session: Wrong, it's too early, session doesn't exists yet.
Items (HttpContext.Current.Items): It doesn't work either, my item is not there on the seconds step.
So far i'm using the application cache (HttpContext.Current.Cache) with some unique key, but I don´t like this solution.
Anybody body knows a better approach to share this "flag"?

You could add a flag to the request header and then check it's existence in the latter pipelines, e.g.
// in HttpRequest pipeline
HttpContext.Current.Request.Headers.Add("CustomUserResolve", "true");
// in InsertRenderings pipeline
var customUserResolve = HttpContext.Current.Request.Headers["CustomUserResolve"];
if (Sitecore.MainUtil.GetBool(customUserResolve, false))
{
// custom logic goes here
}
This feels a little dirty, I think adding to Request.QueryString or Request.Params would been nicer but those are readonly. However, if you only need this for a one time deal (i.e. only the first time it is resolved) then it will work since in the next request the Headers are back to default without your custom header added.

HttpContext.Current.Cache or HttpRuntime.Cache could be the fastest solution here. Though this approach would not preserve data when the AppPool gets recycled.
If you add only a few keys to the cache and then maintain them, this solution might work for you. If each request puts an entry into the cache, it may eventually overflow the memory used by worker process in a long run.
As alternative to this you may try to use Sitecore.Context.ClientData property. It uses ClientDataStore that employs a database (look for clientDataStore section in the web.config file) to store data. These entries can survive the AppPool recycle.
Though if you use them a lot, it may become a bottleneck under the load when you need to write to and/or read from the entries.
If you do know that there could be a lot of entries created for sharing purposes, I'd create a scheduled task to clean up the data store from obsolete entries.

I know this is a very old question, but I just want post solution I worked around
Below will hold data per http request basis.
HttpContext.Current.Items["ModuleInfo"] = "Custom Module Info"
we can store data to httpcontext in one sitecore pipeline and retrieve in another...
https://www.codeproject.com/Articles/146455/When-Can-We-Use-HttpContext-Current-Items-to-Store

Repository Pattern

I've got a quick question regarding the use of repositories. But the best way to ask is to show a bit of pseudocode and you guys tell me what the result should be
Get a record from the repository with ID of 1 (assume it exists)
Edit a couple of properties
Query the repository again for an item with ID of 1
Result = ??
Should I get the object with updated values or the object without (original state), bearing in mind that since updating the values of properties (step 2) I have not told the repository to update this record.
I think I should get a copy of the original item and not a reference to the edited version.
Please tell me what is correct.
Cheers

The repository pattern is suppose to act like a collection of your objects, so ideally I think it should return the same object instance which would have the updates in it.
Generally there is an identity map somewhere so your repositories can keep track of what has already been loaded. With an identity map, when you fetch an object with the same Id you should always get the already loaded object back regardless of how many times. This is how all more sophisticated ORMs work and is generally a good practice. An identity map helps keep things in sync while you are in the same transaction and saves you some data access.
NHibernate's session has an identity map it keeps track of so you don't have to worry about trying to implement your own in your repositories. Also I believe you can use NHibernate's stateless session if you want to load another instance without change tracking, but I'm not positive on that.

Judging from your past questions I'm assuming you are using LINQ/C#?
If you are using a DataContext and you haven't called SubmitChanges() then you should get back the original unchanged object.
Just tested it. I was wrong, you get back the changed object.
If you set ObjectTrackingEnabled = false on the DataContext you will get the unchanged object.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to log images in middle of `train` - ray

I'm guessing you're asking about using logging images in Trainable:_train(). If you're not in local mode, within the trainable, you can access the self.logdir attribute to write images to. This should automatically be sync'ed back to your head node (if you're running remotely).

Related

Django: Making sure a complex object is accessible throughout multiple view calls

Repository pattern: isn't getting the entire domain object bad behavior (read method)?

Tensorboard logging non-tensor (numpy) information (AUC)

Sharing data across Sitecore pipelines

Repository Pattern

Categories

Resources