ray restore checkpoint in rllib

ray restore checkpoint in rllib - ray

Ray saves a bunch of checkpoints during a call of agent.train(). How do I know which one is the checkpoint with the best agent to load?
Is there any function like tune-analysis-output.get_best_checkpoint(path, mode="max") to explore different loading possibilities over the checkpoints?

As answered in https://discuss.ray.io/t/ray-restore-checkpoint-in-rllib/3186/2 you can use:
analysis = tune.Analysis(experiment_path) # can also be the result of `tune.run()`
trial_logdir = analysis.get_best_logdir(metric="metric", mode="max") # Can also just specify trial dir directly
checkpoints = analysis.get_trial_checkpoints_paths(trial_logdir) # Returns tuples of (logdir, metric)
best_checkpoint = analysis.get_best_checkpoint(trial_logdir, metric="metric", mode="max")
See https://docs.ray.io/en/master/tune/api_docs/analysis.html#id1

analysis = tune.run(
"A2C",
name = model_name,
config = config,
...
checkpoint_freq = 5,
checkpoint_at_end = True,
restore = best_checkpoint
)
trial_logdir = analysis.get_best_logdir(metric="episode_reward_mean", mode="max")
best_checkpoint = analysis.get_best_checkpoint(trial_logdir, metric="episode_reward_mean", mode="max")

Related

How to update GCP Scheduler Jobs with Python

I'm working in this project to automate updates in Cloud Scheduler Jobs with Python.
I already wrote the logic in Python but I'm facing one problem, it looks like that to update a Cloud Scheduler job with Python is similar to create a job, you have to past most of the properties of the job in the code, that is the problem, I only want to update the retry_config, nothing else. I want to leave the schedule and the target as it is, so I don't have to past those again every time.
Of course I can get the current schedule and target of the job using another class as GetJobRequest for example, that wouldn't be a problem, but I wish I didn't have to, since I don't want to update those fields.
Help?
from google.cloud import scheduler_v1
from google.protobuf import duration_pb2
client = scheduler_v1.CloudSchedulerClient()
retry_config = scheduler_v1.RetryConfig()
retry_config.retry_count = 4
retry_config.max_doublings = 4
retry_config.min_backoff_duration = duration_pb2.Duration(seconds=5)
retry_config.max_backoff_duration = duration_pb2.Duration(seconds=60)
job = scheduler_v1.Job()
job.name = f"projects/{PROJECT_ID}/locations/{DATAFLOW_REGION}/jobs/test"
job.retry_config = retry_config
job.schedule = "* * * * 1"
method = scheduler_v1.HttpMethod(2)
target = scheduler_v1.HttpTarget()
target.uri = "https://xxxx"
target.http_method = method
job.http_target = target
request = scheduler_v1.UpdateJobRequest(
job=job
)
response = client.update_job(request=request)
print(response)

It is possible to specify the properties that need to be changed using the update_mask parameter.
The final code will be as follows:
from google.cloud import scheduler_v1
from google.protobuf import duration_pb2, field_mask_pb2
client = scheduler_v1.CloudSchedulerClient()
retry_config = scheduler_v1.RetryConfig()
retry_config.retry_count = 4
retry_config.max_doublings = 4
retry_config.min_backoff_duration = duration_pb2.Duration(seconds=5)
retry_config.max_backoff_duration = duration_pb2.Duration(seconds=60)
job = scheduler_v1.Job()
job.name = f"projects/{PROJECT_ID}/locations/{DATAFLOW_REGION}/jobs/test"
job.retry_config = retry_config
update_mask = field_mask_pb2.FieldMask(paths=['retry_config'])
request = scheduler_v1.UpdateJobRequest(
job=job,
update_mask=update_mask
)
response = client.update_job(request=request)
print(response)

AWS LEX: Slot update, intent update and then new publishing bot through a Lambda function

I am writing a lambda function that has an array of words that I want to put into a slotType, basically updating it every time. Here is how it goes. Initially, the slotType has values ['car', 'bus']. Next time I run the lambda function the values get updated to ['car', 'bus', 'train', 'flight'] which is basically after appending a new array into the old one.
I want to know how I publish the bot every time the Lambda function gets invoked so the next time I hit the lex bot from the front-end, it uses the latest slotType in the intent and newly published bot alias. Yep, also the alias!
I know for a fact that the put_slot_type() is working because the slot is getting updated in the bot.
Here is the function which basically takes in new labels as parameters.
def lex_extend_slots(new_labels):
print('entering lex model...')
lex = boto3.client('lex-models')
slot_name = 'keysDb'
intent_name = 'searchKeys'
bot_name = 'photosBot'
res = lex.get_slot_type(
name = slot_name,
version = '$LATEST'
)
current_labels = res['enumerationValues']
latest_checksum = res['checksum']
arr = [x['value'] for x in current_labels]
labels = arr + new_labels
print('arry: ', arr)
print('new_labels', new_labels)
print('labels in lex: ', labels)
labels = list(set(labels))
enumerationList = [{'value': label, 'synonyms': []} for label in labels]
print('getting ready to push enum..: ', enumerationList)
res_slot = lex.put_slot_type(
name = slot_name,
description = 'updated slots...',
enumerationValues = enumerationList,
valueSelectionStrategy = 'TOP_RESOLUTION',
)
res_build_intent = lex.create_intent_version(
name = intent_name
)
res_build_bot = lex.create_bot_version(
name = bot_name,
checksum = latest_checksum
)
return current_labels

It looks like you're using Version 1 of the Lex Models API on Boto3.
You can use the put_bot method in the lex-models client to effectively create or update your Lex bot.
The put_bot method expects the full list of intents to be used for building the bot.
It is worth mentioning that you will first need to use put_intent to update your intents to ensure they use the latest version of your updated slotType.
Here's the documentation for put_intent.
The appropriate methods for creating and updating aliases are contained in the same link that I've shared above.

How to delete Feature Group from SageMaker Feature Store, by name

The way to delete a feature group using the SageMaker Python SDK is as follows:
my_feature_group.delete()
But this only deletes the feature group you are currently working on. How can one delete feature groups from prior sessions? I tried deleting them out of the S3 bucket directly, but they still appear in the Feature Store UI.
It would be great if feature groups could be deleted through the UI. But if not, is there a way to delete a feature group using it's full name; the one that was created using:
my-feature-group-" + strftime("%d-%H-%M-%S", gmtime())

You can create a FeatureGroup object and call delete or via cli or SageMakerFeatureStoreRuntime client
source: aws

You can loop over list_feature_groups as follows:
def extract_feature_groups(feature_groups):
list_feature_groups = []
list_feature_groups.extend([x['FeatureGroupName'] for x in feature_groups['FeatureGroupSummaries']])
next_token = '' if not ('NextToken' in feature_groups.keys()) else feature_groups['NextToken']
while not (next_token==''):
page_feature_groups = boto_client.list_feature_groups(NextToken=next_token)
list_feature_groups.extend([x['FeatureGroupName'] for x in page_feature_groups['FeatureGroupSummaries']])
next_token = '' if not ('NextToken' in page_feature_groups.keys()) else page_feature_groups['NextToken']
return list_feature_groups
region_name = <your_region_name>
boto_client = boto3.client('sagemaker', region_name=region_name)
boto_session = boto3.session.Session(region_name=region_name)
fs_sagemaker_session = sagemaker.Session(boto_session=boto_session)
feature_groups = boto_client.list_feature_groups()
list_features_groups = extract_feature_groups(feature_groups)
for fg in list_features_groups:
<make sure to include appropriate name filter and/or confirmation requests>
feature_group = FeatureGroup(name = feature, sagemaker_session = fs_sagemaker_session)
feature_group.delete()
Feature groups take time to complete deletion; you might want to add a function for checking deletion has concluded successfully.

Generating a data table to iterate through in a Django Template

I have this function that uses PrettyTables to gather information about the Virtual Machines owned by a user. Right now, it only shows information and it works well. I have a new idea where I want to add a button to a new column which allows the user to reboot the virutal machine. I already know how to restart the virtual machines but what I'm struggling to figure out is the best way to create a dataset which i can iterate through and then create a HTML table. I've done similar stuff with PHP/SQL in the past and it was straight forward. I don't think I can iterate through PrettyTables so I'm wondering what is my best option? Pretty tables does a very good job of making it simple to create the table (as you can see below). I'm hoping to use another method, but also keep it very simple. Basically, making it relational and easy to iterate through. Any other suggestions are welcome. Thanks!
Here is my current code:
x = PrettyTable()
x.field_names = ["VM Name", "OS", "IP", "Power State"]
for uuid in virtual_machines:
vm = search_index.FindByUuid(None, uuid, True, False)
if vm.summary.guest.ipAddress == None:
ip = "Unavailable"
else:
ip = vm.summary.guest.ipAddress
if vm.summary.runtime.powerState == "poweredOff":
power_state = "OFF"
else:
power_state = "ON"
if vm.summary.guest.guestFullName == None:
os = "Unavailable"
else:
os = vm.summary.guest.guestFullName
x.add_row([vm.summary.config.name, os, ip, power_state])
table = x.get_html_string(attributes = {"class":"table table-striped"})
return table
Here is a sample of what it looks like and also what I plan to do with the button. http://prntscr.com/nki3ci

Figured out how to query the prettytable. It was a minor addition without having to redo it all.
html = '<table class="table"><tr><th>VM Name</th><th>OS</th><th>IP</th><th>Power
State</th></tr>'
htmlend = '</tr></table>'
body = ''
for vmm in x:
vmm.border = False
vmm.header = False
vm_name = (vmm.get_string(fields=["VM Name"]))
operating_system = (vmm.get_string(fields=["OS"]))
ip_addr = ((vmm.get_string(fields=["IP"])))
body += '<tr><td>'+ vm_name + '</td><td>' + operating_system + '</td> <td>'+ ip_addr +'</td> <td>ON</td></tr>'
html += body
html += htmlend
print(html)

filter query results without launching a new sql query

With this code I expected that an sql query would run only on 'all_user_videos' and that the subsequent filter would use the results stored in 'all_user_videos' to do the filter.
# Get all user videos
all_user_videos = Video.objects.filter( author = user )
# User pending videos
pending_videos = all_user_videos.filter( status = 'pending_review' )
# Get user Video that need information
info_required_videos = all_user_videos.filter( status = 'info_required' )
# Get user video on sale
on_sale_videos = all_user_videos.filter( status = 'on_sale' )
context = {
"user_member_profile": instance,
'pending_videos' : pending_videos,
'info_required' : info_required_videos,
'on_sale' : on_sale_videos
}
It seems not to be the case, and that each filter runs a query.
How can I make so that pending_videos, info_required_videos and on_sale_videos don't relaunch queries and use result from the query in 'all_user_videos' ?

You can use regex in a query.
all_user_videos = list(Video.objects.filter(author=user, status__regex=r'^(pending_review|info_required|on_sale)$'))
pending_videos = [i for i in all_user_videos if i.status == 'pending_review']
or
pending_videos = []
for i in all_user_videos:
if i.status == 'pending_review':
pending_videos.append(i)
elif i.status == 'info_required':
...

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

ray restore checkpoint in rllib - ray

Ray saves a bunch of checkpoints during a call of agent.train(). How do I know which one is the checkpoint with the best agent to load? Is there any function like tune-analysis-output.get_best_checkpoint(path, mode="max") to explore different loading possibilities over the checkpoints?

Related

How to update GCP Scheduler Jobs with Python

AWS LEX: Slot update, intent update and then new publishing bot through a Lambda function

How to delete Feature Group from SageMaker Feature Store, by name

Generating a data table to iterate through in a Django Template

filter query results without launching a new sql query

Categories

Resources