Can I add images to Tensorboard through Keras? - python-2.7

I have set up Tensorboard with Keras as a callback, like so:
callback_tb = keras.callbacks.TensorBoard(log_dir=tb_dir, histogram_freq=2,write_graph=True,write_images=True)
callbacks_list = [callback_save,callack_view, callback_tb]
model.fit(x_train,y_train,
batch_size=batch_size,
epochs=epochs,
callbacks=callbacks_list,
verbose=1,
validation_data=(x_test,y_test),
shuffle='batch')
This works fine and I can see loss and accuracy graphs on Tensorboard.
I am generating and saving model predictions in another file, but I want to know if it is possible to view these images on Tensorboard with Keras?
I have found the tf.summary.image function on https://github.com/tensorflow/tensorboard
But I don't understand how this relates to Keras.
Any help would be appreciated.

I fixed my problem by creating my own Keras callback based on the TensorBoard callback, where I could use the tf.summary.image feature

Related

How to pass the experiment configuration to a SagemakerTrainingOperator while training?

Idea:
To use experiments and trials to log the training parameters and artifacts in sagemaker while using MWAA as the pipeline orchestrator
I am using the training_config to create the dict to pass the training configuration to the Tensorflow estimator, but there is no parameter to pass the experiment configuration
tf_estimator = TensorFlow(entry_point='train_model.py',
source_dir= source
role=sagemaker.get_execution_role(),
instance_count=1,
framework_version='2.3.0',
instance_type=instance_type,
py_version='py37',
script_mode=True,
enable_sagemaker_metrics = True,
metric_definitions=metric_definitions,
output_path=output
model_training_config = training_config(
estimator=tf_estimator,
inputs=input
job_name=training_jobname,
)
training_task = SageMakerTrainingOperator(
task_id=test_id,
config=model_training_config,
aws_conn_id="airflow-sagemaker",
print_log=True,
wait_for_completion=True,
check_interval=60
)
You can use the experiment_config in estimator.fit. More detailed example can be found here
The only way that i found right now is to use the CreateTrainigJob API (https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTrainingJob.html#sagemaker-CreateTrainingJob-request-RoleArn). The following steps are needed:
I am not sure if this will work with Bring your own script method for E.g with a Tensorflow estimator
it works with a build your own container approach
Using the CreateTrainigJob API i created the configs which in turn includes all the needed configs like - training, experiment, algporthm etc and passed that to SagemakerTrainingOperator

Pytorch Lightning Tensorboard Logger Across Multiple Models

I'm relatively new to Lightning and Loggers vs manually tracking metrics. I am trying to train two distinct models and have their accuracy and loss plotted on the same charts in tensorboard (or any other logger) within Colab.
What I have right now is basically:
trainer1 = pl.Trainer(gpus=n_gpus, max_epochs=n_epochs, progress_bar_refresh_rate=20, num_sanity_val_steps=0)
trainer2 = pl.Trainer(gpus=n_gpus, max_epochs=n_epochs, progress_bar_refresh_rate=20, num_sanity_val_steps=0)
trainer1.fit(Model1, train_loader, val_loader)
trainer2.fit(Model2, train_loader, val_loader)
#Then later:
%load_ext tensorboard
%tensorboard --logdir lightning_logs/
What I'd like to see at this point are those logged metrics charted together on the same chart, any help would be appreciated. I've spent some time trying to toy with this but I'm a bit out of my depth on this, thank you!
The exact chart used for logging a specific metric depends on the key name you provide in the .log() call (its a feature that Lightning inherits from TensorBoard itself)
def validation_step(self, batch, _):
# This string decides which chart to use in the TB web interface
# vvvvvvvvv
self.log('valid_acc', acc)
Just use the same string for both .log() calls and have both runs saved in same directory.
logger = TensorBoardLogger(save_dir='lightning_logs/', name='model1')
logger = TensorBoardLogger(save_dir='lightning_logs/', name='model2')
If you run tesnsorboard --logdir ./lightning_logs pointing at the parent directory, you should be able to see both metric in the same chart with the key named valid_acc.

Use TensorBoard with Keras Tuner

I ran into an apparent circular dependency trying to use log data for TensorBoard during a hyper-parameter search done with Keras Tuner, for a model built with TF2. The typical setup for the latter needs to set up the Tensorboard callback in the tuner's search() method, which wraps the model's fit() method.
from kerastuner.tuners import RandomSearch
tuner = RandomSearch(build_model, #this method builds the model
hyperparameters=hp, objective='val_accuracy')
tuner.search(x=train_x, y=train_y,
validation_data=(val_x, val_y),
callbacks=[tensorboard_cb]
In practice, the tensorboard_cb callback method needs to set up the directory where data will be logged and this directory has to be unique to each trial. A common way is to do this by naming the directory based on the current timestamp, with code like below.
log_dir = time.strftime('trial_%Y_%m_%d-%H_%M_%S')
tensorboard_cb = TensorBoard(log_dir)
This works when training a model with known hyper-parameters. However, when doing hyper-parameters search, I have to define and specify the TensorBoard callback before invoking tuner.search(). This is the problem: tuner.search() will invoke build_model() multiple times and each of these trials should have its own TensorBoard directory. Ideally defining log_dir will be done inside build_model() but the Keras Tuner search API forces the TensorBoard to be defined outside of that function.
TL;DR: TensorBoard gets data through a callback and requires one log directory per trial, but Keras Tuner requires defining the callback once for the entire search, before performing it, not per trial. How can unique directories per trial be defined in this case?
The keras tuner creates a subdir for each run (statement is probably version dependent).
I guess finding the right version mix is of importance.
Here is how it works for me, in jupyterlab.
prerequisite:
pip requirements
keras-tuner==1.0.1
tensorboard==2.1.1
tensorflow==2.1.0
Keras==2.2.4
jupyterlab==1.1.4
(2.) jupyterlab installed, built and running [standard compile arguments: production:minimize]
Here is the actual code. First i define the log folder and the callback
# run parameter
log_dir = "logs/" + datetime.datetime.now().strftime("%m%d-%H%M")
# training meta
stop_callback = EarlyStopping(
monitor='loss', patience=1, verbose=0, mode='auto')
hist_callback = tf.keras.callbacks.TensorBoard(
log_dir=log_dir,
histogram_freq=1,
embeddings_freq=1,
write_graph=True,
update_freq='batch')
print("log_dir", log_dir)
Then i define my hypermodel, which i do not want to disclose. Afterwards
i set up the hyper parameter search
from kerastuner.tuners import Hyperband
hypermodel = get_my_hpyermodel()
tuner = Hyperband(
hypermodel
max_epochs=40,
objective='loss',
executions_per_trial=5,
directory=log_dir,
project_name='test'
)
which i then execute
tuner.search(
train_data,
labels,
epochs=10,
validation_data=(val_data, val_labels),
callbacks=[hist_callback],
use_multiprocessing=True)
tuner.search_space_summary()
While the notebook with this code searches for adequate hyper parameters i control the loss in another notebook. Since tf V2 tensorboard can be called via a magic function
Cell 1
import tensorboard
Cell 2
%load_ext tensorboard
Cell 3
%tensorboard --logdir 'logs/'
Sitenote: Since i run jupyterlab in a docker container i have to specifiy the appropriate address and port for tensorboard and also forward this in the dockerfile.
The result is not really predictable for me... I did not understand yet, when i can expect histograms and distributions in tensorboard.
Some runs the loading time seems really excessive... so have patience
Under scalars i find a list of the turns as follows
"logdir"/"model_has"/execution[iter]/[train/validation]
E.g.
0101-1010/bb7981e03d05b05106d8a35923353ec46570e4b6/execution0/train
0101-1010/bb7981e03d05b05106d8a35923353ec46570e4b6/execution0/validation

Tensorflow, missing checkpoint files. Does saver only allow for keeping 5 check points?

I am working with tensorflow and have been training some models and saving them after each epoch using the tf.saver() method. I am able to save and load models just fine and I am doing this in the usual way.
with tf.Graph().as_default(), tf.Session() as session:
initialiser = tf.random_normal_initializer(config.mean, config.std)
with tf.variable_scope("model",reuse=None, initializer=initialiser):
m = a2p(session, config, training=True)
saver = tf.train.Saver()
ckpt = tf.train.get_checkpoint_state(model_dir)
if ckpt and tf.gfile.Exists(ckpt.model_checkpoint_path)
saver.restore(session, ckpt.model_checkpoint_path)
...
for i in range(epochs):
runepoch()
save_path = saver.save(session, '%s.ckpt'%i)
My code is set up to save a model for each epoch which should be labelled accordingly. However, I have noticed that after fifteen epochs of training I only have check point files for the last five epochs (10, 11, 12, 13,14). The documentation doesn't say anything about this so I am at a loss as to why it is happening.
Does the saver only allow for keeping five checkpoints or have I done something wrong?
Is there a way to make sure that all of the checkpoints are kept?
You can choose how many checkpoints to save when you create your Saver object by setting the max_to_keep argument which defaults to 5.
saver = tf.train.Saver(max_to_keep=10000)
setting max_to_keep=None actually makes the Saver keep all checkpoints.
For eg.,
saver = tf.train.Saver(max_to_keep=None)

Django share unpicklable objects between requests

I am currently working on a Django project that will hopefully make some transformations to videofiles thru the web. To make the transformation to the videos, I am using opencv's python API and I am also uing Dajax to perform ajax requests.
In the ajax requests file i have the following code:
#dajaxice_register
def transform_and_show(request, filename, folder, frame_count, img_number):
detector = Detector(filename) //Object which uses opencv API
dajax = Dajax()
generated_file = detector.detect_people_by_frame(folder, str(img_number))
dajax.assign('#video', 'src', '/media/generated'+folder+generated_file)
return dajax.json()
The idea is to tranform videos frame by frame and to display each transformed frame in the browser in an img tag giving the sensation to the user that he/she is watching the transformed video, so this method is called in a javascript loop.
The problem is that in this approach, the object "detector" is reinitialized in every iteration so it only generates the image corresponding to the first frame of the video. My idea was to workaround this issue by making "detector" persistent between requests so that the pointer to the next frame of the video wouldn't be set to 0 on every request.
The problem is that Dectector object is not picklable, meaning that it cannot be cached or saved to a session object.
Is there anything I can do to make it persistent between requests?
NOTE: I have considered using HTTP push approaches like APE or Orbit but since this is just an investigation project there is no real concern about performance.
Have you tried a module level variable to store the object?
Make "detector" a global at the file level.
detector = None
def transform():
global detector
if detector is None:
detector = Detector(filename)
file = detector.detect(....)