Reaching limitations creating a gallery of plots - python-2.7

In a script I'm using, the code generates a figure where a number of subplots are generated. Usually it creates a rectangular grid of plots, but for it's current use, the horizontal parameter only has 1 value, and the vertical parameter has considerably more values than it has had previously. This is causing my program to crash while running, because (presumably) the vertical dimension is too large. The code that's causing the issue is:
#can't get past the first line here
self.fig1 = plt.figure('Title',figsize=(4.6*numXparams,2.3*numYparams))
self.gs = gridspec.GridSpec(numYparams,numXparams)
self.gs.update(left=0.03, right=0.97, top=0.9, bottom=0.1, wspace=0.5, hspace=0.5)
and then later in a nested for loop running over both params:
ax = plt.subplot(self.gs[par0, par1])
The error I'm getting is:
X Error of failed request: badAlloc (insufficient resources for operation)
Major opcode of failed request: 53 (X_CreatePixmap)
Serial number of failed request: 295
Current serial number in output stream: 296
My vertical parameter currently has 251 values in it, so I can see how 251*2.3 inches could lead to trouble. I added in the 2.3*numYparams because the plots were overlapping, but I don't know how to create the figure any smaller without changing how the plots are arranged in the figure. It is important for these plots to stay in a vertically oriented column.

There are a couple of errors in your code. Fixing them allowed me to generate the figure you are asking for.
# I needed the figsize keyword here
self.fig1 = plt.figure(figsize=(4.6*numXparams,2.3*numYparams))
# You had x and y switched around here
self.gs = gridspec.GridSpec(numYparams,numXparams)
self.gs.update(left=0.03, right=0.97, top=0.9, bottom=0.1, wspace=0.5, hspace=0.5)
# I ran this loop
for i in range(numYparams):
ax = fig1.add_subplot(gs[i, 0]) # note the y coord in the gridspec comes first
ax.text(0.5,0.5,i) # just an identifier
fig1.savefig('column.png',dpi=50) # had to drop the dpi, because you can't have a png that tall!
and this is the top and bottom of the output figure:
Admittedly, there was a lot of space above the first and below the last subplot, but you can fix that by playing with the figure dimensions or gs.update

Related

Extract page sizes from large PDFs

I need to extract the number of pages and their sizes in px/mm/cm/some-unit from PDF files using Python (sadly, 2.7, because it's a legacy project). The problem is that the files can be truly huge (hundreds of MiBs) because they'll contain large images.
I do not care for this content and I really want just a list of page sizes from the file, with as little consumption of RAM as possible.
I found quite a few libraries that can do that (included, but not limited, to the ones in the answers here), but none provide any remarks on the memory usage, and I suspect that most of them - if not all - read the whole file in memory before doing anything with it, which doesn't fit my purpose.
Are there any libraries that extract only structure and give me the data that I need without clogging my RAM?
pyvips can do this. It loads the file structure when you open the PDF and only renders each page when you ask for pixels.
For example:
#!/usr/bin/python
import sys
import pyvips
i = 0
while True:
try:
x = pyvips.Image.new_from_file(sys.argv[1], dpi=300, page=i)
print("page =", i)
print("width =", x.width)
print("height =", x.height)
except:
break
i += 1
libvips 8.7, due in another week or so, adds a new metadata item called n-pages you can use to get the length of the document. Until that is released though you need to just keep incrementing the page number until you get an error.
Using this PDF, when I run the program I see:
$ /usr/bin/time -f %M:%e ./sizes.py ~/pics/r8.pdf
page = 0
width = 2480
height = 2480
page = 1
width = 2480
height = 2480
page = 2
width = 4960
height = 4960
...
page = 49
width = 2480
height = 2480
55400:0.19
So it opened 50 pages in 0.2s real time, with a total peak memory use of 55mb. That's with py3, but it works fine with py2 as well. The dimensions are in pixels at 300 DPI.
If you set page to -1, it'll load all the pages in the document as a single very tall image. All the pages need to be the same size for this though, sadly.
Inspired by the other answer, I found that libvips, which is suggested there, uses poppler (it can fall back to some other library if it cannot find poppler).
So, instead of using the superpowerful pyvips, which seems great for multiple types of documents, I went with just poppler, which has multiple Python libraries. I picked pdflib and came up with this solution:
from sys import argv
from pdflib import Document
doc = Document(argv[1])
for num, page in enumerate(doc, start=1):
print(num, tuple(2.54 * x / 72 for x in page.size))
The 2.54 * x / 72 part converts from px to cm, nothing more.
Speed and memory test on a 264MiB file with one huge image per page:
$ /usr/bin/time -f %M\ %e python t2.py big.pdf
1 (27.99926666666667, 20.997333333333337)
2 (27.99926666666667, 20.997333333333337)
...
56 (27.99926666666667, 20.997333333333337)
21856 0.09
Just for the reference, if anyone is looking a pure Python solution, I made a crude one which is available here. Not thoroughly tested and much, much slower than this (some 30sec for the above).

can I reset the hidden state of an RNN between input data sets in Keras?

I am training an RNN on a large data set which consists of disparate sources. I do not want the history of one set to spill over to the next. This means I want to reset the hidden state at the end of one set, before sending in the next. How can I do that with Keras? The doc claims you can get into the low level configurations.
What I am trying to do is resetting the lstm hidden state every time a new data set is fed, so no influence from the prev dataset is carried forward. see line
prevh = Hout[t-1] if t > 0 else h0
from Karpathy's simple python implementation
https://gist.github.com/karpathy/587454dc0146a6ae21fc
line 45
If I find the lstm layer and call reset on it, I am worried that would wipe out the entire training of the weights and biases, not just Hout
Here is the training loop code
for iteration in range(1, 10):
for key in X_dict:
X = X_dict[key]
y = y_dict[key]
history=model.fit(X, y, batch_size=batch_size, callbacks=cbks, nb_epoch=1,verbose=0)
Each turn in the loop feeds in data from a single market. That's where I like to reset the hout in the lstm.
To reset the states of your model, call .reset_states() on either a specific layer, or on your entire model. source
So if you have a list of datasets :
for ds in datasets :
model.reset_states()
model.fit(ds['inputs'],ds['targets'],...)
Is that what you are looking for?
EDIT :
for iteration in range(1, 10):
for key in X_dict:
model.reset_states() # reset the states of all the LSTM's of your network
#model.layers[lstm_layer_index].reset_states() # reset the states of this specific LSTM layer
X = X_dict[key]
y = y_dict[key]
history=model.fit(X, y, batch_size=batch_size, callbacks=cbks, nb_epoch=1,verbose=0)
This is how you apply it.
By default, the LSTM's are not stateful. Which means that they won't keep a hidden state after going over a sequence. The initial state when starting a new sequence will be set to 0. If you selected stateful=True, then it will keep the last hidden state (the output) of the previous sequence to initialize itself for the next sequence in the batch. It's like the sequence was continuing.
Doing model.reset_states() will just reset those last hidden states that were kept in memory to 0, just like if the sequence was starting from scratch.
If you don't trust that .reset_states() to do what you expect, feel free to go to the source code.

Tensorflow RNN slice error

I am attempting to create a multilayered RNN using LSTMs in tensorflow. I am using Tensorflow version 0.9.0 and python 2.7 on Ubuntu 14.04.
However, I keep getting the following error:
tensorflow.python.framework.errors.InvalidArgumentError: Expected begin[1] in [0, 2000], but got 4000
when I use
rnn_cell.MultiRNNCell([cell]*num_layers)
if num_layers is greater than 1.
My code:
size = 1000
config.forget_bias = 1
and config.num_layers = 3
cell = rnn_cell.LSTMCell(size,forget_bias=config.forget_bias)
cell_layers = rnn_cell.MultiRNNCell([cell]*config.num_layers)
I would also like to be able to switch to using GRU cells but this gives me the same error:
Expected begin[1] in [0, 1000], but got 2000
I have tried explicitly setting
num_proj = 1000
which also did not help.
Is this something to do with my use of concatenated states? As I have attempted to set
state_is_tuple=True
which gives:
`ValueError: Some cells return tuples of states, but the flag state_is_tuple is not set. State sizes are: [LSTMStateTuple(c=1000, h=1000), LSTMStateTuple(c=1000, h=1000), LSTMStateTuple(c=1000, h=1000)]`
Any help would be much appreciated!
I'm not sure why this worked but, I added in a dropout wrapper. i.e.
if Training:
cell = rnn_cell.DropoutWrapper(cell,output_keep_prob=config.keep_prob)
And now it works.
This works for both LSTM and GRU cells.
This problem is occurring because you have increased layer of your GRU cell but your initial vector is not doubled. If your initial_vector size is [batch_size,50].
Then initial_vector = tf.concat(1,[initial_vector]*num_layers)
Now input this to decoder as initial vector.

Stopping or cancelling queued keyboard commands in a program

I have a program written in python 2.7 which takes photos of a sample from 3 different cameras when the result value is typed into the program.
The USB controller bandwidth can't handle all cameras firing at the same time, so I have to call each one individually. This causes a delay between pressing the value and the preview of the pictures showing up.
During this delay, the program is still able to accept keyboard commands which are then addressed once the photos have been taken. This is causing issues, as sometimes, values are inputted twice, which means that the value is then applied to the next one after it has taken the photos for the first sample.
What I'm after is a way to disregard any queued keyboard commands whilst the program is working on the current command:
def selChange(self):
#Disable the textbox
self.valInput.configure(state='disabled')
#Gather pictures from cameras and store them in 2D list with sample result (This takes a second or two to complete)
self.gatherPictures()
if not int(self.SampleList.size()) == 0:
#clear texbox
self.valInput.delete(0,END)
#Create previews from 2D list
self.img1 = ImageTk.PhotoImage(self.dataList[int(self.SampleList.curselection()[0])][2].resize((250,250),Image.ANTIALIAS))
self.pic1.configure(image = self.img1)
self.img2 = ImageTk.PhotoImage(self.dataList[int(self.SampleList.curselection()[0])][3].resize((250,250),Image.ANTIALIAS))
self.pic2.configure(image = self.img2)
self.img3 = ImageTk.PhotoImage(self.dataList[int(self.SampleList.curselection()[0])][4].resize((250,250),Image.ANTIALIAS))
self.pic3.configure(image = self.img3)
self.img4 = ImageTk.PhotoImage(Image.open("Data/" + str(self.dataList[int(self.SampleList.curselection()[0])][1]) + ".jpg").resize((250,250),Image.ANTIALIAS))
self.pic4.configure(image = self.img4)
#Unlock textbox ready for next sample
self.valInput.configure(state='normal')
I was hoping that disabling the textbox and re-enabling it afterwards would work, but it doesn't. I wanted to use buttons, but they have insisted that it be typed to increase speed

pygame clock managed process accelerates and shouldn't

I'm doing an RPG in pygame and just added portals to get from a map to another one. The problem is that when I get back to the first map, somehow movement and animation of my player character accelerate a lot. This acceleration is increased each time that I get back and forth.
Time is managed with a pygame clock object at 32 ticks per second:
time_passed = clock.tick(32)
self.worlds[self.currentWorld].process(time_passed)
general process method:
def process(self, time_passed):
tps = time_passed/1000.0
for entity in self.entities.itervalues():
entity.process(tps)
process method for an entity:
def process(self, tps):
if self.location != self.destination and self.animseq != None:
self.tps += tps
if self.tps > 0.25:
self.tps -= 0.25
self.image_to_render1 += 1
if self.image_to_render1 > self.animn:
self.image_to_render1 = 0
"teleport" method
def changeWorld(self, target):
self.currentWorld = target
self.worlds[self.currentWorld].addEntity(self.player)
self.player.world = self.worlds[self.currentWorld]
self.player.location.x = 200
self.player.location.y = 200
self.player.reset()
The reset is what I first tried to solve the problem, it resets the animations and player associated time, but it didn't change anything. I wonder if I just got something wrong with the clock or if I should recreate one on teleport. I hope someone can give me a clue, thanks in advance.
It's a simple logic error – you forget to remove your player from previous "world" when teleporting:
def changeWorld(self, target):
self.currentWorld = target
self.worlds[self.currentWorld].addEntity(self.player)
# Where's deleteEntity on the old world?
so when he comes back, there's two of the same player in the world.entities list. And then it gets processed twice, and moves twice faster.
This kind of error would be very easy to catch using basic debugging - if you would just put logging in your loop and in player.process methods, you would clearly see something like this in the output:
starting tick
processing player
processing player <--- There is two of them where should be only one!
starting tick
processing player
processing player
starting tick
processing player
processing player
Next time try to use debugging (it is harder in visual applications, but not impossible) or logging to make sure every entity in your state is exactly what you expect it to be at every step, and when you find discrepancy, it will be much easier to find a source of it. Good luck!