Training on Sagemaker GPU is too slow

Training on Sagemaker GPU is too slow - computer-vision

I've launched a training on CelebA dataset for a binary classification with PyTorch, in Sagemaker Studio.
I've made sure all, model, tensors are sent to cuda().
My image dataset is in S3, and I'm accessing it via this import and code:
from PIL import Image
import s3fs
fs = s3fs.S3FileSystem()
# example
f = fs.open(f's3://aoha-bucket/img_celeba/dataset/000001.jpg')
And of course my PyTorch DataLoader class, which is using of s3fs to load data into DataLoaders.
class myDataset(Dataset):
def __init__(self, csv_file, root_dir, target, length, adv = None, transform=None):
self.annotations = pd.read_csv(csv_file).iloc[:length,:]
self.root_dir = root_dir
self.transform = transform
self.target = target
self.length = length
self.adv = adv
def __len__(self):
return len(self.annotations)
def __getitem__(self, index):
img_path = fs.open(os.path.join(self.root_dir, self.annotations.loc[index, 'image_id']))
image = Image.open(img_path)
image = np.array(image)
if self.transform:
image = self.transform(image=image)["image"]
image = np.transpose(image, (2, 0, 1)).astype(np.float32)
image = torch.Tensor(image)
y_label = torch.tensor(int(self.annotations.loc[index, str(self.target)]))
if self.adv is None:
return image, y_label
if self.adv :
z_label = torch.tensor(int(self.annotations.loc[index, 'origin']))
return image, y_label, z_label
when I run this function, I get True:
next(model.parameters()).is_cuda
My issue is that, I don't know the training is too slow, even slower than my local CPU (not that powerful). It says for example, that one epoch is needing 1h45minutes, which is way too much.
Im' using a GPU optimized PyTorch instance of Studio.
Have you ever launched a training on GPU in Sagemaker using PyTorch ?
Could you please help ?
Thank you very much,
Habib

Related

How to save pillow processed image in already existing Django object

I have created a object model as below
from django.db import models
# Create your models here.
class ImageModel(models.Model):
image = models.ImageField(upload_to='images/')
editedImg = models.ImageField(upload_to='images/')
def delete(self, *args, **kwargs):
self.image.delete()
self.editedImg.delete()
super().delete(*args, **kwargs)
And here is what i am trying to do in a function
from django.shortcuts import render
from EditorApp.forms import ImageForm
from EditorApp.models import ImageModel
from django.http import HttpResponseRedirect
from PIL import Image
def edit_column(request):
codArr = request.POST.getlist('codArr[]')
imgs = ImageModel.objects.first()
orgImage = ImageModel.objects.first().image
orgImage = Image.open(orgImage)
croppedImg = orgImage.crop((int(codArr[0]), int(codArr[1]), int(codArr[2]), int(codArr[3])))
# croppedImg.show()
# imgs.editedImg = croppedImg
# imgs.save()
return HttpResponseRedirect("/editing/")
What i am trying to do is the codArr consists of coordinates of top(x, y) and bottom(x, y) in the array form(Which is not an issue and is tested(croppedImg.show() showed the desired cropped image) and handled and used to crop the image). Image crop is working fine. But what i am trying to do is to save the cropped image in editedImg of the model used above. The above commented one is what i tried but throw a error AttributeError: _committed
As i have not used any name for image in model as its not required.
Kindly help please, Would be very thankfull.

you should do it like this:
from io import BytesIO
from api.models import ProductPicture
from django.core import files
codArr = request.POST.getlist('codArr[]')
img_obj = ImageModel.objects.first()
orgImage = img_obj.image
orgImage = Image.open(orgImage)
croppedImg = orgImage.crop((int(codArr[0]), int(codArr[1]), int(codArr[2]), int(codArr[3])))
thumb_io = BytesIO() # create a BytesIO object
croppedImg.save(thumb_io, 'png')
editedImg = files.File(thumb_io, name=file_name)
img_obj.editedImg = editedImg
img_obj.save()

You can use Python's context manager to open the image and save it to the desired storage in that case I'm using the images dir.
Pillow will crop the image and image.save() will save it to the filesystem and after that, you can add it to Django's ImageField and save it into the DB.
The context manager takes care of the file opening and closing, Pillow
takes care of the image, and Django takes care of the DB.
from PIL import Image
with Image.open(orgImage) as image:
file_name = image.filename # Can be replaced by orgImage filename
cropped_path = f"images/croped-{file_name}"
# The crop method from the Image module takes four coordinates as input.
# The right can also be represented as (left+width)
# and lower can be represented as (upper+height).
(left, upper, right, lower) = (20, 20, 100, 100)
# Here the image "image" is cropped and assigned to new variable im_crop
im_crop = image.crop((left, upper, right, lower))
im_crop.save(cropped_path)
imgs.editedImg = cropped_path
imgs.save()
Pillow's reference

Deploy trained pytorch model in C++

I have trained pytorch model. I am trying to import it in C++. I have followed the steps mentioned in Pytorch website for this but i am unable to do so. Can anyone please tell me what should i do? I am using this piece of neural network.
class MLP(nn.Module):
def __init__(self, layers):
super(MLP,self).__init__()
'activation function'
self.activation = nn.Tanh()
'loss function'
self.loss_function = nn.MSELoss(reduction ='mean')
'Initialise neural network as a list using nn.Modulelist'
self.linears = nn.ModuleList([nn.Linear(layers[i], layers[i+1]) for i in range(len(layers)-1)])
'Xavier Normal Initialization'
for i in range(len(layers)-1):
nn.init.xavier_normal_(self.linears[i].weight.data, gain=1.0)
# set biases to zero
nn.init.zeros_(self.linears[i].bias.data)
'forward pass'
def forward(self,x):
x = (x-l_b)/(u_b-l_b)
x=x.float()
for i in range(len(layers)-2):
z = self.linears[i](x)
x = self.activation(z)
x = self.linears[-1](x)
return x
def loss_bc_init(self,x,y):
loss_u = self.loss_function(self.forward(x), y)
return loss_u
Please help.

use global variables in AWS Sagemaker script

After having correctly deployed our model, I need to invoke it via lambda function. The script features two cleaning function, the first one (cleaning()) gives us 5 variables: the cleaned dataset and 4 other variables (scaler, monthdummies, compadummies, parceldummies) that we need to use in the second cleaning function (cleaning_test()).
The reason behind this is that in the use case I'll have only one instance at a time to perform predictions on, not an entire dataset. This means that I pass the row to the first cleaning() function since some commands won't work. I can't also use a scaler and neither create dummy variables, so the aim is to import the scaler and some dummies used in the cleaning() function, since they come from the whole dataset, that I used to train the model.
Hence, in the input_fn() function, the input needs to be cleaned using the cleaning_test() function, that requires the scaler and the three lists of dummies from the cleaning() one.
When I train the model, the cleaning() function works fine, but after the deployment, if we invoke the endpoint, it raises the error that variable "scaler" is not defined.
Below is the script.py:
Note that the test is # since I've already tested it, so now I'm training on the whole dataset and I want to predict completely new instances
def cleaning(data):
some cleaning on data stored in s3
return cleaned_data, scaler, monthdummies, compadummies, parceldummies
def cleaning_test(data, scaler, monthdummies, compadummies, parceldummies):
cleaning on data without labels
return cleaned_data
def model_fn(model_dir):
clf = joblib.load(os.path.join(model_dir, "model.joblib"))
return clf
def input_fn(request_body, request_content_type):
if request_content_type == "application/json":
data = json.loads(request_body)
df = pd.DataFrame(data, index = [0])
input_data = cleaning_test(df, scaler, monthdummies, compadummies, parceldummies)
else:
pass
return input_data
def predict_fn(input_data, model):
return model.predict_proba(input_data)
if __name__ =='__main__':
print('extracting arguments')
parser = argparse.ArgumentParser()
# hyperparameters sent by the client are passed as command-line arguments to the script.
parser.add_argument('--n_estimators', type=int, default=10)
parser.add_argument('--min-samples-leaf', type=int, default=3)
# Data, model, and output directories
parser.add_argument('--model-dir', type=str, default=os.environ.get('SM_MODEL_DIR'))
parser.add_argument('--train', type=str, default=os.environ.get('SM_CHANNEL_TRAIN'))
#parser.add_argument('--test', type=str, default=os.environ.get('SM_CHANNEL_TEST'))
parser.add_argument('--train-file', type=str, default='fp_train.csv')
#parser.add_argument('--test-file', type=str, default='fp_test.csv')
args, _ = parser.parse_known_args()
print('reading data')
train_df = pd.read_csv(os.path.join(args.train, args.train_file))
#test_df = pd.read_csv(os.path.join(args.test, args.test_file))
print("cleaning")
train_df, scaler, monthdummies, compadummies, parceldummies = cleaning(train_df)
#test_df, scaler1, monthdummies1, compadummies1, parceldummies1 = cleaning(test_df)
print("splitting")
y = train_df.loc[:,"event"]
X = train_df.loc[:, train_df.columns != 'event']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
"""print('building training and testing datasets')
X_train = train_df.loc[:, train_df.columns != 'event']
X_test = test_df.loc[:, test_df.columns != 'event']
y_train = train_df.loc[:,"event"]
y_test = test_df.loc[:,"event"]"""
print(X_train.columns)
print(X_test.columns)
# train
print('training model')
model = RandomForestClassifier(
n_estimators=args.n_estimators,
min_samples_leaf=args.min_samples_leaf,
n_jobs=-1)
model.fit(X_train, y_train)
# print abs error
print('validating model')
proba = model.predict_proba(X_test)
# persist model
path = os.path.join(args.model_dir, "model.joblib")
joblib.dump(model, path)
print('model persisted at ' + path)
That I run through:
sklearn_estimator = SKLearn(
entry_point='script.py',
role = get_execution_role(),
train_instance_count=1,
train_instance_type='ml.c5.xlarge',
framework_version='0.20.0',
base_job_name='rf-scikit',
hyperparameters = {'n_estimators': 15})
sklearn_estimator.fit({'train':trainpath})
sklearn_estimator.latest_training_job.wait(logs='None')
artifact = sm_boto3.describe_training_job(
TrainingJobName=sklearn_estimator.latest_training_job.name)['ModelArtifacts']['S3ModelArtifacts']
predictor = sklearn_estimator.deploy(
instance_type='ml.c5.large',
initial_instance_count=1)
The question is, how can I "store" the variables given by the cleaning() function during the training process, in order to use them in the input_fn() function, making cleaning_test() work fine?
Thanks!

Pillow png compressing

Im making a simple app that able to compress images with jpeg and png format using Pillow library, python3 and Django. Made a simple view that able to identify formats, save compress images and give some statistics of compressing. With images in jpeg format it works really fine, i got compressicons close to 70-80% of original size, and it works really fast, but if i upload png i works much worse. Compression takes a long time, and it only 3-5% of original size. Trying to find some ways to upgrade compress script, and stuck on it.
Right now ive got this script in my compress django view:
from django.shortcuts import render, redirect, get_object_or_404, reverse
from django.contrib.auth import login, authenticate, logout
from django.contrib.auth.models import User
from django.http import HttpResponse, HttpResponseRedirect
from django.http import JsonResponse
from django.contrib import auth
from .forms import InputForm, SignUpForm, LoginForm, FTPForm
import os
import sys
from PIL import Image
from .models import image, imagenew, FTPinput
from django.views import View
import datetime
from django.utils import timezone
import piexif
class BasicUploadView(View):
def get(self, request):
return render(self.request, 'main/index.html', {})
def post(self, request):
form = InputForm(self.request.POST, self.request.FILES)
if form.is_valid():
photo = form.save(commit=False)
photo.name = photo.image.name
photo.delete_time = timezone.now() + datetime.timedelta(hours=1)
photo.user = request.user
photo.size = photo.image.size
photo = form.save()
name = (photo.name).replace(' ', '_')
picture = Image.open(photo.image)
if picture.mode in ('RGB'):
piexif.remove('/home/andrey/sjimalka' + photo.image.url)
picture.save('media/new/'+name,"JPEG",optimize=True,quality=75)
newpic = 'new/'+name
new = imagenew.objects.create(
name = name,
image = newpic,
delete_time = timezone.now() + datetime.timedelta(hours=1),
user = request.user,
)
if new.image.size < photo.image.size:
diff = round((new.image.size-photo.image.size)/float(photo.image.size)*100, 2)
else:
diff = str(round((new.image.size-photo.image.size)/float(photo.image.size)*100, 2))+' Не удалось сжать файл'
oldsize = round(photo.image.size/1000000, 2)
newsize = round(new.image.size/1000000, 2)
id = new.pk
imagenew.objects.filter(pk=id).update(size=new.image.size)
elif picture.mode != ('RGB'):
picture.save('media/new/'+name,"PNG", optimize=True, quality=75)
newpic = 'new/'+name
new = imagenew.objects.create(
name = name,
image = newpic,
delete_time = timezone.now() + datetime.timedelta(hours=1),
user = request.user,
)
if new.image.size < photo.image.size:
diff = round((new.image.size-photo.image.size)/float(photo.image.size)*100, 2)
else:
diff = str(round((new.image.size-photo.image.size)/float(photo.image.size)*100, 2))+' Не удалось сжать файл'
oldsize = round(photo.image.size/1000000, 2)
newsize = round(new.image.size/1000000, 2)
id = new.pk
imagenew.objects.filter(pk=id).update(size=new.image.size)
data = {'is_valid': True, 'name': new.image.name, 'url': new.image.url, 'diff': diff,
'oldsize':oldsize, 'newsize':newsize,}
else:
alert = 'Данный формат не поддерживается. Пожалуйста загрузите картинки форматов png или jpg(jpeg)'
data = {'is_valid': False, 'name': alert,}
return JsonResponse(data)
The question: is there any ways to make script with png upload work faster, and (that much more important) make png size compressions closer to jpeg? Maybe i should use another python library?

how tinypng works then? They compressing same png files with 50-60%
They probably reduce the colour palette from 24-bit to 8-bit. Here's a detailed answer about that - https://stackoverflow.com/a/12146901/1925257
Basic method
You can try that in Pillow like this:
picture_8bit = picture.convert(
mode='P', # use mode='PA' for transparency
palette=Image.ADAPTIVE
)
picture_8bit.save(...) # do as usual
This should work similar to what tinypng does.
If you don't want transparency, it's better to first convert RGBA to RGB and then to P mode:
picture_rgb = picture.convert(mode='RGB') # convert RGBA to RGB
picture_8bit = picture_rgb.convert(mode='P', ...)
Getting better results
Calling convert() as shown above will actually call quantize() in the background and Median Cut algorithm will be used by default for reducing the colour palette.
In some cases, you'll get better results with other algorithms such as MAXCOVERAGE. To use a different algorithm, you can just call the quantize() method directly:
picture_rgb = picture.convert(mode='RGB') # convert RGBA to RGB
picture_8bit = picture.quantize(colors=256, method=Image.MAXCOVERAGE)
You have to understand that downsizing the colour palette means that if the image has lots of colours, you will be losing most of them because 8-bit can only contain 256 colours.

The document of Pillow Image.quatize displays a more convenient way to compress png files. In a personal experiment, the following code could make png about 70% compression of the original size, which is also close to the result created by ImageMagick.
# Image.quantize(colors=256, method=None, kmeans=0, palette=None)
# method: 0 = median cut; 1 = maximum coverage; 2 = fast octree
img = img.quantize(method=2)

Resize thumbnails django Heroku, 'backend doesn't support absolute paths'

I've got an app deployed on Heroku using Django, and so far it seems to be working but I'm having a problem uploading new thumbnails. I have installed Pillow to allow me to resize images when they're uploaded and save the resized thumbnail, not the original image. However, every time I upload, I get the following error: "This backend doesn't support absolute paths." When I reload the page, the new image is there, but it is not resized. I am using Amazon AWS to store the images.
I'm suspecting it has something to do with my models.py. Here is my resize code:
class Projects(models.Model):
project_thumbnail = models.FileField(upload_to=get_upload_file_name, null=True, blank=True)
def __unicode__(self):
return self.project_name
def save(self):
if not self.id and not self.project_description:
return
super(Projects, self).save()
if self.project_thumbnail:
image = Image.open(self.project_thumbnail)
(width, height) = image.size
image.thumbnail((200,200), Image.ANTIALIAS)
image.save(self.project_thumbnail.path)
Is there something that I'm missing? Do I need to tell it something else?

Working with Heroku and AWS, you just need to change the method of FileField/ImageField 'path' to 'name'. So in your case it would be:
image.save(self.project_thumbnail.name)
instead of
image.save(self.project_thumbnail.path)

I found the answer with the help of others googling as well, since my searches didn't pull the answers I wanted. It was a problem with Pillow and how it uses absolute paths to save, so I switched to using the storages module as a temp save space and it's working now. Here's the code:
from django.core.files.storage import default_storage as storage
...
def save(self):
if not self.id and not self.project_description:
return
super(Projects, self).save()
if self.project_thumbnail:
size = 200, 200
image = Image.open(self.project_thumbnail)
image.thumbnail(size, Image.ANTIALIAS)
fh = storage.open(self.project_thumbnail.name, "w")
format = 'png' # You need to set the correct image format here
image.save(fh, format)
fh.close()

NotImplementedError: This backend doesn't support absolute paths - can be fixed by replacing file.path with file.name
How it looks in the the console
c = ContactImport.objects.last()
>>> c.json_file
<FieldFile: protected/json_files/data_SbLN1MpVGetUiN_uodPnd9yE2prgeTVTYKZ.json>
>>> c.json_file.name
'protected/json_files/data_SbLN1MpVGetUiN_uodPnd9yE2prgeTVTYKZ.json'

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Training on Sagemaker GPU is too slow - computer-vision

Related

How to save pillow processed image in already existing Django object

Deploy trained pytorch model in C++

use global variables in AWS Sagemaker script

Pillow png compressing

Resize thumbnails django Heroku, 'backend doesn't support absolute paths'

Categories

Resources