torchvision.datasets.mnist RunTimeError on JupyterLab - google-cloud-platform

I'm trying to run the following sample code on JupyterLab (through GCP vertex AI):
import torch
from torchvision import transforms
from torchvision import datasets
train_data = datasets.MNIST(root='data', train=True, download=True, transform=None)
print(train_data)
with versions:
torch-1.12.1+cu113
torchvision-0.13.1+cu113
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
/tmp/ipykernel_10081/229378695.py in <module>
11 from torchvision import datasets
12
---> 13 train_data = datasets.MNIST(root='data', train=True, download=True, transform=None)
14 print(train_data)
/opt/conda/lib/python3.7/site-packages/torchvision/datasets/mnist.py in __init__(self, root, train, transform, target_transform, download)
102 raise RuntimeError("Dataset not found. You can use download=True to download it")
103
--> 104 self.data, self.targets = self._load_data()
105
106 def _check_legacy_exist(self):
/opt/conda/lib/python3.7/site-packages/torchvision/datasets/mnist.py in _load_data(self)
121 def _load_data(self):
122 image_file = f"{'train' if self.train else 't10k'}-images-idx3-ubyte"
--> 123 data = read_image_file(os.path.join(self.raw_folder, image_file))
124
125 label_file = f"{'train' if self.train else 't10k'}-labels-idx1-ubyte"
/opt/conda/lib/python3.7/site-packages/torchvision/datasets/mnist.py in read_image_file(path)
542
543 def read_image_file(path: str) -> torch.Tensor:
--> 544 x = read_sn3_pascalvincent_tensor(path, strict=False)
545 if x.dtype != torch.uint8:
546 raise TypeError(f"x should be of dtype torch.uint8 instead of {x.dtype}")
/opt/conda/lib/python3.7/site-packages/torchvision/datasets/mnist.py in read_sn3_pascalvincent_tensor(path, strict)
529
530 assert parsed.shape[0] == np.prod(s) or not strict
--> 531 return parsed.view(*s)
532
533
RuntimeError: shape '[60000, 28, 28]' is invalid for input of size 9437168
____________________
and I'm getting this strange error when trying to load MNIST
I tried reproducing it in other envaironments but couldn't - it works great locally & on cloab
I tried lots of other versions of torch and torchvision but non of them works

This error is often caused by an issue with the MNIST dataset files that are downloaded onto your system. Try deleting the MNIST dataset files in the data directory and then running the code again to download fresh copies of the dataset files. Follow this code:
import os
import shutil
mnist_folder = 'data/MNIST'
if os.path.exists(mnist_folder):
shutil.rmtree(mnist_folder)
train_data = datasets.MNIST(root='data', train=True, download=True, transform=None)
If this method doesn't work, visit this website and placing them in the data/MNIST folder.

Related

Accessing a bz2 file in S3 from Sagemaker notebook

I am able to read and write csv files from and to S3 bucket from Sagemaker notebook, but when trying to read a bz2 file, using the path method used in csv files, I get the error of no file or directory
IOErrorTraceback (most recent call last)
<ipython-input-19-d14d47a702e1> in <module>()
2 # Create corpus
3 #%time wiki = WikiCorpus("resources/articles1.xml.bz2", tokenizer_func=spacy_tokenize)
----> 4 wiki = WikiCorpus("s3://sagemakerq/enwiki.xml.bz2", tokenizer_func=spacy_tokenize)
/home/ec2-user/anaconda3/envs/amazonei_mxnet_p27/lib/python2.7/site-packages/gensim/corpora/wikicorpus.pyc in __init__(self, fname, processes, lemmatize, dictionary, filter_namespaces, tokenizer_func, article_min_tokens, token_min_len, token_max_len, lower, filter_articles)
634
635 if dictionary is None:
--> 636 self.dictionary = Dictionary(self.get_texts())
637 else:
638 self.dictionary = dictionary
/home/ec2-user/anaconda3/envs/amazonei_mxnet_p27/lib/python2.7/site-packages/gensim/corpora/dictionary.pyc in __init__(self, documents, prune_at)
82
83 if documents is not None:
---> 84 self.add_documents(documents, prune_at=prune_at)
85
86 def __getitem__(self, tokenid):
/home/ec2-user/anaconda3/envs/amazonei_mxnet_p27/lib/python2.7/site-packages/gensim/corpora/dictionary.pyc in add_documents(self, documents, prune_at)
195
196 """
--> 197 for docno, document in enumerate(documents):
198 # log progress & run a regular check for pruning, once every 10k docs
199 if docno % 10000 == 0:
/home/ec2-user/anaconda3/envs/amazonei_mxnet_p27/lib/python2.7/site-packages/gensim/corpora/wikicorpus.pyc in get_texts(self)
676 ((text, self.lemmatize, title, pageid, tokenization_params)
677 for title, text, pageid
--> 678 in extract_pages(bz2.BZ2File(self.fname), self.filter_namespaces, self.filter_articles))
679 pool = multiprocessing.Pool(self.processes, init_to_ignore_interrupt)
680
IOError: [Errno 2] No such file or directory: 's3://sagemakerq/enwiki.xml.bz2'
Looks like you are using Python gensim package to construct a corpus from a wiki based database dump from S3. The package does not support reading directly from S3. Instead you can download the file and work with it.
import boto3
from gensim.corpora.wikicorpus import WikiCorpus
s3 = boto3.client('s3')
s3.download_file('BUCKET_NAME', 'OBJECT_NAME', 'FILE_NAME')
wiki = WikiCorpus('FILE_NAME')

Keep Getting Permission denied when using fastai library on AWS setting

I'm learning deep learning by taking a lecture that uses fastai. I'm running fastai library on AWS p2.xlarge. When I ran some function on fastai library I get this error.:
Traceback (most recent call last)
<ipython-input-12-1d86fc0ece07> in <module>()
1 arch = resnet34
2 data = ImageClassifierData.from_paths(PATH, tfms=tfms_from_model(arch,sz ))
----> 3 learn = ConvLearner.pretrained(arch, data, precompute = True)
4 learn.fit(0.01, 2)
~/fastai/fastai/conv_learner.py in pretrained(cls, f, data, ps, xtra_fc, xtra_cut, custom_head, precompute, pretrained, **kwargs)
112 models = ConvnetBuilder(f, data.c, data.is_multi, data.is_reg,
113 ps=ps, xtra_fc=xtra_fc, xtra_cut=xtra_cut, custom_head=custom_head, pretrained=pretrained)
--> 114 return cls(data, models, precompute, **kwargs)
115
116 #classmethod
~/fastai/fastai/conv_learner.py in __init__(self, data, models, precompute, **kwargs)
95 def __init__(self, data, models, precompute=False, **kwargs):
96 self.precompute = False
---> 97 super().__init__(data, models, **kwargs)
98 if hasattr(data, 'is_multi') and not data.is_reg and self.metrics is None:
99 self.metrics = [accuracy_thresh(0.5)] if self.data.is_multi else [accuracy]
~/fastai/fastai/learner.py in __init__(self, data, models, opt_fn, tmp_name, models_name, metrics, clip, crit)
35 self.tmp_path = tmp_name if os.path.isabs(tmp_name) else os.path.join(self.data.path, tmp_name)
36 self.models_path = models_name if os.path.isabs(models_name) else os.path.join(self.data.path, models_name)
---> 37 os.makedirs(self.tmp_path, exist_ok=True)
38 os.makedirs(self.models_path, exist_ok=True)
39 self.crit = crit if crit else self._get_crit(data)
~/anaconda3/envs/fastai/lib/python3.6/os.py in makedirs(name, mode, exist_ok)
218 return
219 try:
--> 220 mkdir(name, mode)
221 except OSError:
222 # Cannot rely on checking for EEXIST, since the operating system
PermissionError: [Errno 13] Permission denied: 'data/dogscats/tmp'
I think the AWS console has no permission to make the directory.
I did sudo mkdir tmp data/dogscats/ but I get another error that I couldn't understand.
I think I have to give AWS some permission but I have no clue how to do that.
I hope you guys can give me some clear idea on how to solve this kind of problem.
Fastai creates saves data like current loss etc. in a folder it creates by default the folder is created in the working directory but you can pass the argument path that is the path where you have the privileges to create a folder.

"RuntimeError: Could not create write struct" with pyplot

UPDATE:
I get this message no matter what I attempt to plot: even this
plt.plot([1,2,3,4])
plt.ylabel('some numbers')
plt.show()
returns the error RuntimeError: Could not create write struct
I am trying to plot a raw image inline. My Jupyter notebook is up on an AWS instance with port forwarding.
My code is as follows:
see above update
When I try this, I get the error message below, which culminates in the message RuntimeError: Could not create write struct.
The weird thing is, the exact same code runs fine locally. I can view images all day long.
So as an experiment I pulled the image down off AWS and ran it locally and I could see it displayed just fine.
I'm thinking, there must be some problem with either my Matplotlib or even jupyter notebook.
I've removed / reinstalled both multiple times, in multiple configurations. I made sure the local and AMI versions of the packages are the exact same.
I have no idea what is going on.
The error itself, naturally, isn't useful. And when googling the error, there's few exact string matching results, which is always scary.
Other random stuff:
I'm using Python 2.7
Both libraries are managed within Conda
Jupyter: 4.4.0
Matplotlib: 2.1.2
<matplotlib.image.AxesImage at 0x7f261c1f2b50>
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
/home/ubuntu/anaconda3/envs/pytorch_p27/lib/python2.7/site-packages/IPython/core/formatters.pyc in __call__(self, obj)
332 pass
333 else:
--> 334 return printer(obj)
335 # Finally look for special method names
336 method = get_real_method(obj, self.print_method)
/home/ubuntu/anaconda3/envs/pytorch_p27/lib/python2.7/site-packages/IPython/core/pylabtools.pyc in <lambda>(fig)
238
239 if 'png' in formats:
--> 240 png_formatter.for_type(Figure, lambda fig: print_figure(fig, 'png', **kwargs))
241 if 'retina' in formats or 'png2x' in formats:
242 png_formatter.for_type(Figure, lambda fig: retina_figure(fig, **kwargs))
/home/ubuntu/anaconda3/envs/pytorch_p27/lib/python2.7/site-packages/IPython/core/pylabtools.pyc in print_figure(fig, fmt, bbox_inches, **kwargs)
122
123 bytes_io = BytesIO()
--> 124 fig.canvas.print_figure(bytes_io, **kw)
125 data = bytes_io.getvalue()
126 if fmt == 'svg':
/home/ubuntu/anaconda3/envs/pytorch_p27/lib/python2.7/site-packages/matplotlib/backend_bases.pyc in print_figure(self, filename, dpi, facecolor, edgecolor, orientation, format, **kwargs)
2214 orientation=orientation,
2215 dryrun=True,
-> 2216 **kwargs)
2217 renderer = self.figure._cachedRenderer
2218 bbox_inches = self.figure.get_tightbbox(renderer)
/home/ubuntu/anaconda3/envs/pytorch_p27/lib/python2.7/site-packages/matplotlib/backends/backend_agg.pyc in print_png(self, filename_or_obj, *args, **kwargs)
524 try:
525 _png.write_png(renderer._renderer, filename_or_obj,
--> 526 self.figure.dpi, metadata=metadata)
527 finally:
528 if close:
RuntimeError: Could not create write struct
<matplotlib.figure.Figure at 0x7f2624b94950>
This is unclear and I have no idea why, but I removed the conda installation of matplotlib, and then reinstalled matplotlib with pip.
Now everything works fine.
¯\_(ツ)_/¯

Python 2: Type error "only integer scalar arrays can be converted to a scalar index" using pd.read() with neo.Spike2IO

I have code to load in Spike2 .smr files and read them in Jupyter. My code was working fine 2 days ago and now, with absolutely no change on either the file that is loaded in or the code that loads it in, it is not working. The problem code is as follows...
Cell 1 Input (to show the versions of my packages):
import sys
print("Python version: {}\n\nPackages versions: ".format(sys.version))
# which package versions are installed?
import pip
all_packages = pip.get_installed_distributions()
used_packages = ["matplotlib", "neo", "numpy", "OpenElectrophy", "os", "pandas",
"pylab", "scipy"]
for entry in used_packages:
for p in all_packages:
if entry in str(p):
print(str(p))
Cell 1 Output:
Python version: 2.7.13 |Anaconda custom (64-bit)| (default, Dec 20 2016, 23:09:15)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]
Packages versions:
matplotlib 1.4.3
matplotlib-venn 0.11.3
neo 0.3.3
numpy 1.12.0
pycosat 0.6.1
nose 1.3.7
backports.ssl-match-hostname 3.5.0.1
pandas 0.19.2
scipy 0.15.1
Cell 2 Input (load in my modules):
import pylab
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats as st
import os
import tables
import neo
import scipy.signal as sg
from scipy import interpolate as inter
import h5py as h
import quantities as q
plt.style.use('ggplot')
pd.options.display.max_rows = 999
%matplotlib inline
Now, I load in the Spike2 .smr file with:
r = neo.Spike2IO("Rawdata/143-16/nerve.smr").read()[0]
and get the following type error:
TypeError Traceback (most recent call last)
<ipython-input-3-f81fd520a4c5> in <module>()
----> 1 r = neo.Spike2IO("Rawdata/143-16/nerve.smr").read()[0]
/home/wolverine/anaconda/lib/python2.7/site-packages/neo/io/baseio.pyc in read(self, lazy, cascade, **kargs)
107 if not cascade:
108 return bl
--> 109 seg = self.read_segment(lazy=lazy, cascade=cascade, **kargs)
110 bl.segments.append(seg)
111 create_many_to_one_relationship(bl)
/home/wolverine/anaconda/lib/python2.7/site-packages/neo/io/spike2io.pyc in read_segment(self, take_ideal_sampling_rate, lazy, cascade)
120 if channelHeader.kind in [1, 9]:
121 #~ print 'analogChanel'
--> 122 anaSigs = self.readOneChannelContinuous( fid, i, header, take_ideal_sampling_rate, lazy = lazy)
123 #~ print 'nb sigs', len(anaSigs) , ' sizes : ',
124 for anaSig in anaSigs :
/home/wolverine/anaconda/lib/python2.7/site-packages/neo/io/spike2io.pyc in readOneChannelContinuous(self, fid, channel_num, header, take_ideal_sampling_rate, lazy)
240
241 anaSigs = [ ]
--> 242 if channelHeader.unit in unit_convert:
243 unit = pq.Quantity(1, unit_convert[channelHeader.unit] )
244 else:
/home/wolverine/anaconda/lib/python2.7/site-packages/neo/io/spike2io.pyc in __getattr__(self, name)
444 else:
445 l = np.fromstring(self.array[name][0], 'u1')
--> 446 return self.array[name][1:l+1]
447 else:
448 return self.array[name]
TypeError: only integer scalar arrays can be converted to a scalar index
The "neo.Spike2IO("filename.smr") works fine, but as soon as I add the "read()[0]" part, that is when I get the TypeError. I read up on this type error and the only answers I saw were that the file could be corrupt. I deleted my local file and re-downloaded it and also downloaded another similar file just in case the master file for the other one was corrupt. I retried my code on these two new files and received the Type Error code for both. As stated before, the code was working flawlessly just two days ago and now it won't load any .smr file. I went through and updated all of my modules and pip and anaconda, all of this did not help.
Here is a link to a short sample .smr file (only 3.1 MB) that I cut for sharing purposes. It also gives the Type Error. Any ideas? Thank you.
I solved this issue by further updating my modules and Anaconda itself (and all of its respective modules). Something must have reverted to an older version.
The code to update every package in Anaconda is:
conda update --all
Further help can be found here at the Conda homepage. Shutting down, then restarting your computer can also help to ensure that all of these updates are implemented.

Issue starting out with xlwings - AttributeError: Excel.Application.Workbooks

I was trying to use the package xlwings and ran into a simple error right from the start. I was able to run the example files they provided here without any major issues (except for multiple Excel books opening up upon running the code) but as soon as I tried to execute code via IPython I got the error AttributeError: Excel.Application.Workbooks. Specifically I ran:
from xlwings import Workbook, Sheet, Range, Chart
wb = Workbook()
Range('A1').value = 'Foo 1'
and got
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-7-7436ba97d05d> in <module>()
1 from xlwings import Workbook, Sheet, Range, Chart
----> 2 wb = Workbook()
3 Range('A1').value = 'Foo 1'
PATH\xlwings\main.pyc in __init__(self, fullname, xl_workbook, app_visible)
139 else:
140 # Open Excel if necessary and create a new workbook
--> 141 self.xl_app, self.xl_workbook = xlplatform.new_workbook()
142
143 self.name = xlplatform.get_workbook_name(self.xl_workbook)
PATH\xlwings\_xlwindows.pyc in new_workbook()
103 def new_workbook():
104 xl_app = _get_latest_app()
--> 105 xl_workbook = xl_app.Workbooks.Add()
106 return xl_app, xl_workbook
107
PATH\win32com\client\dynamic.pyc in __getattr__(self, attr)
520
521 # no where else to look.
--> 522 raise AttributeError("%s.%s" % (self._username_, attr))
523
524 def __setattr__(self, attr, value):
AttributeError: Excel.Application.Workbooks
I noticed the examples have a .xlxm file already present in the folder with the python code. Does the python code only ever work if it's in the same location as an existing Excel file? Does this mean it can't create Excel files automatically? Apologies if this is basic.