Input states for Deep Q Learning

Input states for Deep Q Learning - state

I am using the DQN for a resource allocation where the agent should assign the arrival requests to the best Virtual Machine.
I am modifying a Cartpole code as follow:
import random
import gym
import numpy as np
from collections import deque
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import Adam
import os
class DQNAgent:
def __init__(self, state_size, action_size):
self.state_size = state_size
self.action_size = action_size
self.memory = deque(maxlen=2000)
self.gamma = 0.95
self.epsilon = 1.0
self.epsilon_decay = 0.995
self.epsilon_min = 0.01
self.learning_rate = 0.001
self.model = self._build_model()
def _build_model(self):
model = Sequential()
model.add(Dense(24, input_dim=self.state_size, activation='relu'))
model.add(Dense(24, activation='relu'))
model.add(Dense(self.action_size, activation='linear'))
model.compile(loss='mse', optimizer=Adam(lr=self.learning_rate))
return model
def remember(self, state, action, reward, next_state, done):
self.memory.append((state, action, reward, next_state, done))
def act(self, state):
if np.random.rand() <= self.epsilon:
return random.randrange(self.action_size)
act_values = self.model.predict(state)
return np.argmax(act_values[0])
def replay(self, batch_size):
minibatch = random.sample(self.memory, batch_size)
for state, action, reward, next_state, done in minibatch:
target = reward
if not done:
target = (reward + self.gamma * np.amax(self.model.predict(next_state)[0]))
target_f = self.model.predict(state)
target_f[0][action] = target
self.model.fit(state, target_f, epochs=1, verbose=0)
if self.epsilon > self.epsilon_min:
self.epsilon *= self.epsilon_decay
def load(self, name):
self.model.load_weights(name)
def save(self, name):
self.model.save_weights(name)
The Cartpole states as the inputs of the Q network are given by the environment.
0 Cart Position
1 Cart Velocity -Inf Inf
2 Pole Angle ~ -41.8° ~ 41.8°
3 Pole Velocity At Tip
The question is that in my code what are the inputs of the Q network?
Since the agent should take the best possible action based on the size of the arrival request but this is not given by the environment. Shall I feed the Q network by this input value, the size?

The inputs of the Deep Q-Network architecture is fed by the replay memory, in the following part of the code:
def remember(self, state, action, reward, next_state, done):
self.memory.append((state, action, reward, next_state, done))
The dynamic of this system as shown in the original paper Deepmind paper, is that you interact with the system, store the transition in the replay memory, and then use it for the training step. In the lines above you're storing these experiences.
Basically, the input of the network is the states and outputs the Q-values. In your code, there's no interaction with the environment, that's when you can get these transitions (experiences) to feed the replay memory. So, if you can't extract some information in the environment to be represented as states, you're not able to make assumptions about that.

Related

use global variables in AWS Sagemaker script

After having correctly deployed our model, I need to invoke it via lambda function. The script features two cleaning function, the first one (cleaning()) gives us 5 variables: the cleaned dataset and 4 other variables (scaler, monthdummies, compadummies, parceldummies) that we need to use in the second cleaning function (cleaning_test()).
The reason behind this is that in the use case I'll have only one instance at a time to perform predictions on, not an entire dataset. This means that I pass the row to the first cleaning() function since some commands won't work. I can't also use a scaler and neither create dummy variables, so the aim is to import the scaler and some dummies used in the cleaning() function, since they come from the whole dataset, that I used to train the model.
Hence, in the input_fn() function, the input needs to be cleaned using the cleaning_test() function, that requires the scaler and the three lists of dummies from the cleaning() one.
When I train the model, the cleaning() function works fine, but after the deployment, if we invoke the endpoint, it raises the error that variable "scaler" is not defined.
Below is the script.py:
Note that the test is # since I've already tested it, so now I'm training on the whole dataset and I want to predict completely new instances
def cleaning(data):
some cleaning on data stored in s3
return cleaned_data, scaler, monthdummies, compadummies, parceldummies
def cleaning_test(data, scaler, monthdummies, compadummies, parceldummies):
cleaning on data without labels
return cleaned_data
def model_fn(model_dir):
clf = joblib.load(os.path.join(model_dir, "model.joblib"))
return clf
def input_fn(request_body, request_content_type):
if request_content_type == "application/json":
data = json.loads(request_body)
df = pd.DataFrame(data, index = [0])
input_data = cleaning_test(df, scaler, monthdummies, compadummies, parceldummies)
else:
pass
return input_data
def predict_fn(input_data, model):
return model.predict_proba(input_data)
if __name__ =='__main__':
print('extracting arguments')
parser = argparse.ArgumentParser()
# hyperparameters sent by the client are passed as command-line arguments to the script.
parser.add_argument('--n_estimators', type=int, default=10)
parser.add_argument('--min-samples-leaf', type=int, default=3)
# Data, model, and output directories
parser.add_argument('--model-dir', type=str, default=os.environ.get('SM_MODEL_DIR'))
parser.add_argument('--train', type=str, default=os.environ.get('SM_CHANNEL_TRAIN'))
#parser.add_argument('--test', type=str, default=os.environ.get('SM_CHANNEL_TEST'))
parser.add_argument('--train-file', type=str, default='fp_train.csv')
#parser.add_argument('--test-file', type=str, default='fp_test.csv')
args, _ = parser.parse_known_args()
print('reading data')
train_df = pd.read_csv(os.path.join(args.train, args.train_file))
#test_df = pd.read_csv(os.path.join(args.test, args.test_file))
print("cleaning")
train_df, scaler, monthdummies, compadummies, parceldummies = cleaning(train_df)
#test_df, scaler1, monthdummies1, compadummies1, parceldummies1 = cleaning(test_df)
print("splitting")
y = train_df.loc[:,"event"]
X = train_df.loc[:, train_df.columns != 'event']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
"""print('building training and testing datasets')
X_train = train_df.loc[:, train_df.columns != 'event']
X_test = test_df.loc[:, test_df.columns != 'event']
y_train = train_df.loc[:,"event"]
y_test = test_df.loc[:,"event"]"""
print(X_train.columns)
print(X_test.columns)
# train
print('training model')
model = RandomForestClassifier(
n_estimators=args.n_estimators,
min_samples_leaf=args.min_samples_leaf,
n_jobs=-1)
model.fit(X_train, y_train)
# print abs error
print('validating model')
proba = model.predict_proba(X_test)
# persist model
path = os.path.join(args.model_dir, "model.joblib")
joblib.dump(model, path)
print('model persisted at ' + path)
That I run through:
sklearn_estimator = SKLearn(
entry_point='script.py',
role = get_execution_role(),
train_instance_count=1,
train_instance_type='ml.c5.xlarge',
framework_version='0.20.0',
base_job_name='rf-scikit',
hyperparameters = {'n_estimators': 15})
sklearn_estimator.fit({'train':trainpath})
sklearn_estimator.latest_training_job.wait(logs='None')
artifact = sm_boto3.describe_training_job(
TrainingJobName=sklearn_estimator.latest_training_job.name)['ModelArtifacts']['S3ModelArtifacts']
predictor = sklearn_estimator.deploy(
instance_type='ml.c5.large',
initial_instance_count=1)
The question is, how can I "store" the variables given by the cleaning() function during the training process, in order to use them in the input_fn() function, making cleaning_test() work fine?
Thanks!

Why layer.get_weights() is returning list of length 1 and 4?

I am trying to get the weights and biases of all convolutional layers of resnet50 model for one of my assignments. I learned that we can use the function layer.get_weights() to get the weight and bias. This will return a list of which contains two elements weight of the layer stored at layer.get_weights()[0] and the bias is stored at layer.get_weights()[1]. Here is the code which I used.
import tensorflow as to
import source
from source import models
from source.utils.image import read_image_bgr, preprocess_image, resize_image
from source.utils.visualization import draw_box, draw_caption
from source.utils.colors import label_color
from source.models import retinanet
import warnings
warnings.filterwarnings("ignore")
from tensorflow import ConfigProto
import numpy as np
import os
import argparse
import keras
from keras.layers import Input,Conv2D,MaxPooling2D,UpSampling2D, Activation, Dropout
from keras.models import Model
ap = argparse.ArgumentParser()
ap.add_argument("-weight", "--weight_file", type=str,default="trained_model.h5",help="Path to the weights file")
ap.add_argument("-backbone", "--backbone", type=str, default="resnet50",help="Backbone model name")
args = vars(ap.parse_args())
#fetching a tensorflow session
def get_session():
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
return tf.Session(config=config)
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
tf.keras.backend.set_session(tf.Session(config=config))
keras.backend.tensorflow_backend.set_session(get_session())
model = str(args.get("weight_file", False))
backbone = str(args.get("backbone", False))
model = models.load_model(str(model), backbone_name=str(backbone))
#model is the resnet50 model
for layer in model.layers:
print('layer name', layer.name)
we = layer.get_weights()
print('len(we)',len(we))
But in my case, I am getting length 1 for some of the cases and length 4 for other cases which is different from what it is expected. I am really confused at this point. If anybody has any idea and suggestions will be really helpful.
Thanks in advance.

The get_weights() function returns both trainable and not trainable parameters of a layer. The BatchNormalization layer has 4 parameters, which explains the 4 length outputs (since Resnet blocks have batchnorm). As far as I am aware, ResNet models do not use the bias term in the convolutional layers because of the batchnorm, which would explain the length 1 outputs.

how to get ticking timer with dynamic label?

What im trying to do is that whenever cursor is on label it must show the time elapsed since when it is created it does well by subtracting (def on_enter(i)) the value but i want it to be ticking while cursor is still on label.
I tried using after function as newbie i do not understand it well to use on dynamic labels.
any help will be appreciated thx
code:
from Tkinter import *
import datetime
date = datetime.datetime
now = date.now()
master=Tk()
list_label=[]
k=[]
time_var=[]
result=[]
names=[]
def delete(i):
k[i]=max(k)+1
time_var[i]='<deleted>'
list_label[i].pack_forget()
def create():#new func
i=k.index(max(k))
for j in range(i+1,len(k)):
if k[j]==0:
list_label[j].pack_forget()
list_label[i].pack(anchor='w')
time_var[i]=time_now()
for j in range(i+1,len(k)):
if k[j]==0:
list_label[j].pack(anchor='w')
k[i]=0
###########################
def on_enter(i):
list_label[i].configure(text=time_now()-time_var[i])
def on_leave(i):
list_label[i].configure(text=names[i])
def time_now():
now = date.now()
return date(now.year,now.month,now.day,now.hour,now.minute,now.second)
############################
for i in range(11):
lb=Label(text=str(i),anchor=W)
list_label.append(lb)
lb.pack(anchor='w')
lb.bind("<Button-3>",lambda event,i=i:delete(i))
k.append(0)
names.append(str(i))
lb.bind("<Enter>",lambda event,i=i: on_enter(i))
lb.bind("<Leave>",lambda event,i=i: on_leave(i))
time_var.append(time_now())
master.bind("<Control-Key-z>",lambda event: create())
mainloop()

You would use after like this:
###########################
def on_enter(i):
list_label[i].configure(text=time_now()-time_var[i])
list_label[i].timer = list_label[i].after(1000, on_enter, i)
def on_leave(i):
list_label[i].configure(text=names[i])
list_label[i].after_cancel(list_label[i].timer)
However, your approach here is all wrong. You currently have some functions and a list of data. What you should do is make a single object that contains the functions and data together and make a list of those. That way you can write your code for a single Label and just duplicate that. It makes your code a lot simpler partly because you no longer need to keep track of "i". Like this:
import Tkinter as tk
from datetime import datetime
def time_now():
now = datetime.now()
return datetime(now.year,now.month,now.day,now.hour,now.minute,now.second)
class Kiran(tk.Label):
"""A new type of Label that shows the time since creation when the mouse hovers"""
hidden = []
def __init__(self, master=None, **kwargs):
tk.Label.__init__(self, master, **kwargs)
self.name = self['text']
self.time_var = time_now()
self.bind("<Enter>", self.on_enter)
self.bind("<Leave>", self.on_leave)
self.bind("<Button-3>", self.hide)
def on_enter(self, event=None):
self.configure(text=time_now()-self.time_var)
self.timer = self.after(1000, self.on_enter)
def on_leave(self, event=None):
self.after_cancel(self.timer) # cancel the timer
self.configure(text=self.name)
def hide(self, event=None):
self.pack_forget()
self.hidden.append(self) # add this instance to the list of hidden instances
def show(self):
self.time_var = time_now() # reset time
self.pack(anchor='w')
def undo(event=None):
'''if there's any hidden Labels, show one'''
if Kiran.hidden:
Kiran.hidden.pop().show()
def main():
root = tk.Tk()
root.geometry('200x200')
for i in range(11):
lb=Kiran(text=i)
lb.pack(anchor='w')
root.bind("<Control-Key-z>",undo)
root.mainloop()
if __name__ == '__main__':
main()
More notes:
Don't use lambda unless you are forced to; it's known to cause bugs.
Don't use wildcard imports (from module import *), they cause bugs and are against PEP8.
Put everything in functions.
Use long, descriptive names. Single letter names just waste time. Think of names as tiny comments.
Add a lot more comments to your code so that other people don't have to guess what the code is supposed to do.
Try a more beginner oriented forum for questions like this, like learnpython.reddit.com

How to order NDB query by the key?

I try to use task queues on Google App Engine. I want to utilize the Mapper class shown in the App Engine documentation "Background work with the deferred library".
I get an exception on the ordering of the query result by the key
def get_query(self):
...
q = q.order("__key__")
...
Exception:
File "C:... mapper.py", line 41, in get_query
q = q.order("__key__")
File "C:\Program Files (x86)\Google\google_appengine\google\appengine\ext\ndb\query.py", line 1124, in order
'received %r' % arg)
TypeError: order() expects a Property or query Order; received '__key__'
INFO 2017-03-09 11:56:32,448 module.py:806] default: "POST /_ah/queue/deferred HTTP/1.1" 500 114
The article is from 2009, so I guess something might have changed.
My environment: Windows 7, Python 2.7.9, Google App Engine SDK 1.9.50
There are somewhat similar questions about ordering in NDB on SO.
What bugs me this code is from the official doc, presumably updated in Feb 2017 (recently) and posted by someone within top 0.1 % of SO users by reputation.
So I must be doing something wrong. What is the solution?

Bingo.
Avinash Raj is correct. If it were an answer I'd accept it.
Here is the full class code
#!/usr/bin/python2.7
# -*- coding: utf-8 -*-
from google.appengine.ext import deferred
from google.appengine.ext import ndb
from google.appengine.runtime import DeadlineExceededError
import logging
class Mapper(object):
"""
from https://cloud.google.com/appengine/docs/standard/python/ndb/queries
corrected with suggestions from Stack Overflow
http://stackoverflow.com/questions/42692319/how-to-order-ndb-query-by-the-key
"""
# Subclasses should replace this with a model class (eg, model.Person).
KIND = None
# Subclasses can replace this with a list of (property, value) tuples to filter by.
FILTERS = []
def __init__(self):
logging.info("Mapper.__init__: {}")
self.to_put = []
self.to_delete = []
def map(self, entity):
"""Updates a single entity.
Implementers should return a tuple containing two iterables (to_update, to_delete).
"""
return ([], [])
def finish(self):
"""Called when the mapper has finished, to allow for any final work to be done."""
pass
def get_query(self):
"""Returns a query over the specified kind, with any appropriate filters applied."""
q = self.KIND.query()
for prop, value in self.FILTERS:
q = q.filter(prop == value)
if __name__ == '__main__':
q = q.order(self.KIND.key) # the fixed version. The original q.order('__key__') failed
# see http://stackoverflow.com/questions/42692319/how-to-order-ndb-query-by-the-key
return q
def run(self, batch_size=100):
"""Starts the mapper running."""
logging.info("Mapper.run: batch_size: {}".format(batch_size))
self._continue(None, batch_size)
def _batch_write(self):
"""Writes updates and deletes entities in a batch."""
if self.to_put:
ndb.put_multi(self.to_put)
self.to_put = []
if self.to_delete:
ndb.delete_multi(self.to_delete)
self.to_delete = []
def _continue(self, start_key, batch_size):
q = self.get_query()
# If we're resuming, pick up where we left off last time.
if start_key:
key_prop = getattr(self.KIND, '_key')
q = q.filter(key_prop > start_key)
# Keep updating records until we run out of time.
try:
# Steps over the results, returning each entity and its index.
for i, entity in enumerate(q):
map_updates, map_deletes = self.map(entity)
self.to_put.extend(map_updates)
self.to_delete.extend(map_deletes)
# Do updates and deletes in batches.
if (i + 1) % batch_size == 0:
self._batch_write()
# Record the last entity we processed.
start_key = entity.key
self._batch_write()
except DeadlineExceededError:
# Write any unfinished updates to the datastore.
self._batch_write()
# Queue a new task to pick up where we left off.
deferred.defer(self._continue, start_key, batch_size)
return
self.finish()

Using NUTS sampler for likelihood with matrix exponential

I am trying to infer the generator of a continuous markov process observed at discrete intervals. If the generator of the markov process is $T$, then the stochastic matrix for the discrete time intervals is given by $ P = \exp(T \Delta t)$. To implement this using pymc, I wrote the custom distribution class
import pymc3
from pymc3.distributions import Discrete
from pymc3.distributions.dist_math import bound
class ContinuousMarkovChain(Discrete):
def __init__(self, t10=None, t01=None, dt=None, *args, **kwargs):
super(ContinuousMarkovChain, self).__init__(*args, **kwargs)
# self.p = p
# self.q = q
self.p = tt.slicetype
self.gt0 = (t01 >0) & (t10> 0)
T = tt.stacklists([[-t01, t01], [t10,-t10]])
self.p = ts.expm(T*dt)
def logp(self, x):
return bound(tt.log(self.p[x[:-1],x[1:]]).sum(), self.gt0)
I can use find_MAP and the Slice sampler with this class, but it fails with NUTS. The error message is:
AttributeError: 'ExpmGrad' object has no attribute 'grad'
I thought that NUTS only needed information about the gradient, so why is it trying to take the Hessian of expm?

I thought Pymc3 needs Hessian in parameter space to set the step-size and directionality for the parameters when using NUTS algorithm. Maybe you can define the grad of ExpmGrad yourself.
A relative discussing is here https://github.com/pymc-devs/pymc3/issues/1226

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Input states for Deep Q Learning - state

Related

use global variables in AWS Sagemaker script

Why layer.get_weights() is returning list of length 1 and 4?

how to get ticking timer with dynamic label?

How to order NDB query by the key?

Using NUTS sampler for likelihood with matrix exponential

Categories

Resources