Can't import custom python module using multiprocess library - python-2.7

Just getting started with using the multiprocessing library in my code base to parallelise a simple for loop, where previously, in a serial for loop, I would import a custom configuration .py file and pass it to be a function to be run.
However I'm having issues with passing in the configuration module to be parellelised.
NB. There are multiple custom configuration.py which I want to pass into the different processes.
Example:
def get_custom_config():
config_list = []
for project_config in configs:
config = importlib.import_module("config.%s.%s" % (prefix, project_config)
config_list.append(config)
return config_list
def print_config(config):
print config.something_in_config_file
if __name__ = "__main__":
config_list = get_custom_config()
pool = mp.Pool(processes=2)
pool.map(print_config, config_list)
Returns:
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/pool.py", line 251, in map
return self.map_async(func, iterable, chunksize).get()
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/pool.py", line 567, in get
raise self._value
cPickle.PicklingError: Can't pickle <type 'module'>: attribute lookup __builtin__.module failed
What is the best way of passing a module to a parallel process?

I do have a possible solution for you, but I don't like the approach you have.
config = importlib.import_module("config.%s.%s" % (prefix, project_config)
You should try and have config as a dictionary of key value pairs instead as a module. Or import it that way.
The issue is that functions and modules are not picklable by default in Python 2.7. Functions are picklable by default in Python 3.X and modules are still not.
import importlib
import multiprocessing as mp
configs = ["abc", "def"]
import copy_reg
import types
def _pickle_module(module):
module_name = module.__name__
print("pickling" + module_name)
path = getattr(module, "__file__", None)
return _unpickle_module, (module_name, path)
def _unpickle_module(module_name, path):
return importlib.import_module(module_name)
copy_reg.pickle(types.ModuleType, _pickle_module, _unpickle_module)
def get_custom_config():
config_list = []
for project_config in configs:
config = importlib.import_module("config.%s" % (project_config))
config_list.append(config)
return config_list
def print_config(config):
print (vars(config))
if __name__ == "__main__":
config_list = get_custom_config()
pool = mp.Pool(processes=2)
pool.map(print_config, config_list)
This basically re-imports the module in the other process, so do remember you are not sharing data between them. This is a good read only variables.
But as I mentioned passing modules to a different process makes less sense. Try to fix your approach instead of using the code I posted
PS: Solution inspired from Can't pickle <type 'cv2.BRISK'>: attribute lookup cv2.BRISK failed

Related

Get access to class attributes

import yaml
class Import_Yaml_Setting():
def __init__(self, path):
self.read_yaml(path)
def read_yaml(self, path):
stream = open(path, 'r')
self.settings = yaml.load(stream)
stream.close()
class MasterDef(Import_Yaml_Setting):
def __init__(self, path):
Import_Yaml_Setting.__init__(self, path)
def function_1():
path = 'path_to_settings\\yaml_file.yaml'
MasterDef(path)
def function_2():
MasterDef.settings
if __name__ == '__main__':
function_1()
function_2()
My plan is it to have a class Import_Yaml_Setting which imports settings from a yaml file. The class MasterDef inherits the class Import_Yaml_Setting.
After 'function_1' calls MasterDef in order to import the settings. I want to do this once in my program. After, I just want to get access to the imported settings
without import them again. This should do function_2.
My problem
I don't know how I have to call MasterDef at the first place. If I would create an instance of MasterDef them I wouldn't have access to this instance in function_2.
Also, I get an error that says MasterDef has no attribute settings.
What would be the right way to do this.
There are a few things incorrect, so lets start with the most obvious.
If you have a class MasterDef, calling MasterDef() creates an instance
of that class. If you don't assign that to a variable, that instance will
immediately disappear.
Doing MasterDef.settings later on could work if the class had a
class attribute or method called settings, but in that case you are not accessing
the settings attribute on an instance.
Typical such global settings are passed around, or implemented as a function object that
does the loading only once, or are made into a global variable (as
shown in the following example). Simplified you would do:
from __future__ import print_function, absolute_import, division, unicode_literals
class MasterDef(object):
def __init__(self):
self.settings = dict(some='setting')
master_def = None
def function_1():
global master_def
if master_def is None:
master_def = MasterDef()
def function_2():
print('master_def:', master_def.settings)
if __name__ == '__main__':
function_1()
function_2()
which gives:
master_def: {'some': 'setting'}
A few notes to the above:
If, for whatever reason, you are doing anything new on Python 2.7
make things more Python3 compatible by including the from
__future__ import as indicated. Even if you are just using the
print function (instead of the outdated print statement). It
will make transitioning easier (2.7 goes EOL in 2020)
Again in 2.7 make your base classes a subclass of object, that
makes it e.g. possible to have properties.
By testing that master_def is None you can invoke function_1 multiple
times
You should also be aware that PyYAML load, as is written in its
documentation, can be unsafe when you don't have full control over
your input. There is seldom need to use load() so use safe_load()
or upgrade to my ruamel.yaml package which implements the newer YAML
1.2 standard (released 2009, so there is no excuse for using PyYAML
that still doesn't support that).
As you also seem to be on Windows (assumed from you using \\), consider using raw strings
where you don't need to escape the backslash, using os.path.join(). I am leaving out
your path part in my full example as I am not on Windows:
from __future__ import print_function, absolute_import, division, unicode_literals
import ruamel.yaml
class Import_Yaml_Setting(object):
def __init__(self, path):
self._path = path # stored in case you want to write out the configuration
self.settings = self.read_yaml(path)
def read_yaml(self, path):
yaml = ruamel.yaml.YAML(typ='safe')
with open(path, 'r') as stream:
return yaml.load(stream)
class MasterDef(Import_Yaml_Setting):
def __init__(self, path):
Import_Yaml_Setting.__init__(self, path)
master_def = None
def function_1():
global master_def
path = 'yaml_file.yaml'
if master_def is None:
master_def = MasterDef(path)
def function_2():
print('master_def:', master_def.settings)
if __name__ == '__main__':
function_1()
function_2()
If your YAML file looks like:
example: file
very: simple
the output of the above program will be:
master_def: {'example': 'file', 'very': 'simple'}

__init__ variable not found in test class?

I recently changed from using nose to nose2, however a lot of my testing code seems to have broken in the process. One thing in particular is the init variable i put in my test class "self.mir_axis" is giving this error:
mirror_index = mirror_matrix.index(self.mir_axis)
AttributeError: 'TestConvert' object has no attribute 'mir_axis'
This used to work with nose, however with nose2 my init variable for some reason is no longer registering. Am I missing something here? Im using python 2.7.3, and eclipse as an IDE btw.
from nose2.compat import unittest
from nose2.tools import params
from nose2 import session
from nose2.events import ReportTestEvent
from nose2.plugins import testid
from nose2.tests._common import (FakeStartTestEvent, FakeLoadFromNameEvent,
FakeLoadFromNamesEvent, TestCase)#
# Import maya modules
import maya.cmds as mc
# Absolute imports of other modules
from neo_autorig.scripts.basic import name
from neo_autorig.scripts.basic import utils
# Test class for converting strings
class TestConvert(TestCase):
counter = 0 # counter to cycle through mir_axes
def _init__(self):
mir_axes = ['xy', '-xy', 'yz', '-yz'] # different axes to be applied
self.mir_axis = mir_axes[self.__class__.counter]
self.__class__.counter += 1 # increase counter when run
if self.__class__.counter > 3:
self.__class__.counter = 0 # if counter reaches max, reset
self.utils = utils.Utils(self.mir_axis, False) # pass module variables
def setUp(self): # set up maya scene
side_indicator_l = mc.spaceLocator(n='side_indicator_left')[0]
side_indicator_r = mc.spaceLocator(n='side_indicator_right')[0]
mirror_matrix = ['xy', '-xy', 'yz', '-yz']
trans_matrix = ['tz', 'tz', 'tx', 'tx']
side_matrix = [1, -1, 1, -1]
mirror_index = mirror_matrix.index(self.mir_axis)
mc.setAttr(side_indicator_l+'.'+trans_matrix[mirror_index], side_matrix[mirror_index])
mc.setAttr(side_indicator_r+'.'+trans_matrix[mirror_index], side_matrix[mirror_index]*-1)
def tearDown(self): # delete everything after
mc.delete('side_indicator_left', 'side_indicator_right')
def test_prefix_name_side_type(self): # test string
nc = name.Name('prefix_name_side_type')
existing = nc.get_scenenames('transform')
self.assertEqual(nc.convert('test', 'empty', self.utils.find_side('side_indicator_left'),
'object', existing), 'test_empty_l_object')
self.assertEqual(nc.convert('test', 'empty', self.utils.find_side('side_indicator_right'),
'object', existing), 'test_empty_r_object')
# run if script is run from inside module
if __name__ == '__main__':
import nose2
nose2.main()
I see two problems with the snippet you posted:
The first one is def _init__(self): is missing an underscore; it should be def __init__(self):
The second one (and seems to be the reason for the error) is the fact that the first line in _init__, mir_axes = ['xy', '-xy', ..., should be self.mir_axes = ...
Edit
You should use setUp instead of __init__ regardless, according to Ned Batchelder of Coverage.py fame. :)

Does pdb offer watchpoints? [duplicate]

There is large python project where one attribute of one class just have wrong value in some place.
It should be sqlalchemy.orm.attributes.InstrumentedAttribute, but when I run tests it is constant value, let's say string.
There is some way to run python program in debug mode, and run some check (if variable changed type) after each step throught line of code automatically?
P.S. I know how to log changes of attribute of class instance with help of inspect and property decorator. Possibly here I can use this method with metaclasses...
But sometimes I need more general and powerfull solution...
Thank you.
P.P.S. I need something like there: https://stackoverflow.com/a/7669165/816449, but may be with more explanation of what is going on in that code.
Well, here is a sort of slow approach. It can be modified for watching for local variable change (just by name). Here is how it works: we do sys.settrace and analyse the value of obj.attr each step. The tricky part is that we receive 'line' events (that some line was executed) before line is executed. So, when we notice that obj.attr has changed, we are already on the next line and we can't get the previous line frame (because frames aren't copied for each line, they are modified ). So on each line event I save traceback.format_stack to watcher.prev_st and if on the next call of trace_command value has changed, we print the saved stack trace to file. Saving traceback on each line is quite an expensive operation, so you'd have to set include keyword to a list of your projects directories (or just the root of your project) in order not to watch how other libraries are doing their stuff and waste cpu.
watcher.py
import traceback
class Watcher(object):
def __init__(self, obj=None, attr=None, log_file='log.txt', include=[], enabled=False):
"""
Debugger that watches for changes in object attributes
obj - object to be watched
attr - string, name of attribute
log_file - string, where to write output
include - list of strings, debug files only in these directories.
Set it to path of your project otherwise it will take long time
to run on big libraries import and usage.
"""
self.log_file=log_file
with open(self.log_file, 'wb'): pass
self.prev_st = None
self.include = [incl.replace('\\','/') for incl in include]
if obj:
self.value = getattr(obj, attr)
self.obj = obj
self.attr = attr
self.enabled = enabled # Important, must be last line on __init__.
def __call__(self, *args, **kwargs):
kwargs['enabled'] = True
self.__init__(*args, **kwargs)
def check_condition(self):
tmp = getattr(self.obj, self.attr)
result = tmp != self.value
self.value = tmp
return result
def trace_command(self, frame, event, arg):
if event!='line' or not self.enabled:
return self.trace_command
if self.check_condition():
if self.prev_st:
with open(self.log_file, 'ab') as f:
print >>f, "Value of",self.obj,".",self.attr,"changed!"
print >>f,"###### Line:"
print >>f,''.join(self.prev_st)
if self.include:
fname = frame.f_code.co_filename.replace('\\','/')
to_include = False
for incl in self.include:
if fname.startswith(incl):
to_include = True
break
if not to_include:
return self.trace_command
self.prev_st = traceback.format_stack(frame)
return self.trace_command
import sys
watcher = Watcher()
sys.settrace(watcher.trace_command)
testwatcher.py
from watcher import watcher
import numpy as np
import urllib2
class X(object):
def __init__(self, foo):
self.foo = foo
class Y(object):
def __init__(self, x):
self.xoo = x
def boom(self):
self.xoo.foo = "xoo foo!"
def main():
x = X(50)
watcher(x, 'foo', log_file='log.txt', include =['C:/Users/j/PycharmProjects/hello'])
x.foo = 500
x.goo = 300
y = Y(x)
y.boom()
arr = np.arange(0,100,0.1)
arr = arr**2
for i in xrange(3):
print 'a'
x.foo = i
for i in xrange(1):
i = i+1
main()
There's a very simple way to do this: use watchpoints.
Basically you only need to do
from watchpoints import watch
watch(your_object.attr)
That's it. Whenever the attribute is changed, it will print out the line that changed it and how it's changed. Super easy to use.
It also has more advanced features, for example, you can call pdb when the variable is changed, or use your own callback functions instead of print it to stdout.
A simpler way to watch for an object's attribute change (which can also be a module-level variable or anything accessible with getattr) would be to leverage hunter library, a flexible code tracing toolkit. To detect state changes we need a predicate which can look like the following:
import traceback
class MutationWatcher:
def __init__(self, target, attrs):
self.target = target
self.state = {k: getattr(target, k) for k in attrs}
def __call__(self, event):
result = False
for k, v in self.state.items():
current_value = getattr(self.target, k)
if v != current_value:
result = True
self.state[k] = current_value
print('Value of attribute {} has chaned from {!r} to {!r}'.format(
k, v, current_value))
if result:
traceback.print_stack(event.frame)
return result
Then given a sample code:
class TargetThatChangesWeirdly:
attr_name = 1
def some_nested_function_that_does_the_nasty_mutation(obj):
obj.attr_name = 2
def some_public_api(obj):
some_nested_function_that_does_the_nasty_mutation(obj)
We can instrument it with hunter like:
# or any other entry point that calls the public API of interest
if __name__ == '__main__':
obj = TargetThatChangesWeirdly()
import hunter
watcher = MutationWatcher(obj, ['attr_name'])
hunter.trace(watcher, stdlib=False, action=hunter.CodePrinter)
some_public_api(obj)
Running the module produces:
Value of attribute attr_name has chaned from 1 to 2
File "test.py", line 44, in <module>
some_public_api(obj)
File "test.py", line 10, in some_public_api
some_nested_function_that_does_the_nasty_mutation(obj)
File "test.py", line 6, in some_nested_function_that_does_the_nasty_mutation
obj.attr_name = 2
test.py:6 return obj.attr_name = 2
... return value: None
You can also use other actions that hunter supports. For instance, Debugger which breaks into pdb (debugger on an attribute change).
Try using __setattr__ to override the function that is called when an attribute assignment is attempted. Documentation for __setattr__
You can use the python debugger module (part of the standard library)
To use, just import pdb at the top of your source file:
import pdb
and then set a trace wherever you want to start inspecting the code:
pdb.set_trace()
You can then step through the code with n, and investigate the current state by running python commands.
def __setattr__(self, name, value):
if name=="xxx":
util.output_stack('xxxxx')
super(XXX, self).__setattr__(name, value)
This sample code helped me.

How to continuously display Python output in a Webpage?

I want to be able to visit a webpage and it will run a python function and display the progress in the webpage.
So when you visit the webpage you can see the output of the script as if you ran it from the command line.
Based on the answer here
How to continuously display python output in a webpage?
I am trying to display output from PYTHON
I am trying to use Markus Unterwaditzer's code with a python function.
import flask
import subprocess
app = flask.Flask(__name__)
def test():
print "Test"
#app.route('/yield')
def index():
def inner():
proc = subprocess.Popen(
test(),
shell=True,
stdout=subprocess.PIPE
)
while proc.poll() is None:
yield proc.stdout.readline() + '<br/>\n'
return flask.Response(inner(), mimetype='text/html') # text/html is required for most browsers to show the partial page immediately
app.run(debug=True, port=5005)
And it runs but I don't see anything in the browser.
Hi looks like you don't want to call a test function, but an actual command line process which provides output. Also create an iterable from proc.stdout.readline or something. Also you said from Python which I forgot to include that you should just pull any python code you want in a subprocess and put it in a separate file.
import flask
import subprocess
import time #You don't need this. Just included it so you can see the output stream.
app = flask.Flask(__name__)
#app.route('/yield')
def index():
def inner():
proc = subprocess.Popen(
['dmesg'], #call something with a lot of output so we can see it
shell=True,
stdout=subprocess.PIPE
)
for line in iter(proc.stdout.readline,''):
time.sleep(1) # Don't need this just shows the text streaming
yield line.rstrip() + '<br/>\n'
return flask.Response(inner(), mimetype='text/html') # text/html is required for most browsers to show th$
app.run(debug=True, port=5000, host='0.0.0.0')
Here's a solution that allows you to stream the subprocess output & load it statically after the fact using the same template (assuming that your subprocess records it's own output to a file; if it doesn't, then recording the process output to a log file is left as an exercise for the reader)
from flask import Response, escape
from yourapp import app
from subprocess import Popen, PIPE, STDOUT
SENTINEL = '------------SPLIT----------HERE---------'
VALID_ACTIONS = ('what', 'ever')
def logview(logdata):
"""Render the template used for viewing logs."""
# Probably a lot of other parameters here; this is simplified
return render_template('logview.html', logdata=logdata)
def stream(first, generator, last):
"""Preprocess output prior to streaming."""
yield first
for line in generator:
yield escape(line.decode('utf-8')) # Don't let subproc break our HTML
yield last
#app.route('/subprocess/<action>', methods=['POST'])
def perform_action(action):
"""Call subprocess and stream output directly to clients."""
if action not in VALID_ACTIONS:
abort(400)
first, _, last = logview(SENTINEL).partition(SENTINEL)
path = '/path/to/your/script.py'
proc = Popen((path,), stdout=PIPE, stderr=STDOUT)
generator = stream(first, iter(proc.stdout.readline, b''), last)
return Response(generator, mimetype='text/html')
#app.route('/subprocess/<action>', methods=['GET'])
def show_log(action):
"""Show one full log."""
if action not in VALID_ACTIONS:
abort(400)
path = '/path/to/your/logfile'
with open(path, encoding='utf-8') as data:
return logview(logdata=data.read())
This way you get a consistent template used both during the initial running of the command (via POST) and during static serving of the saved logfile after the fact.

Django: is there a way to count SQL queries from an unit test?

I am trying to find out the number of queries executed by a utility function. I have written a unit test for this function and the function is working well. What I would like to do is track the number of SQL queries executed by the function so that I can see if there is any improvement after some refactoring.
def do_something_in_the_database():
# Does something in the database
# return result
class DoSomethingTests(django.test.TestCase):
def test_function_returns_correct_values(self):
self.assertEqual(n, <number of SQL queries executed>)
EDIT: I found out that there is a pending Django feature request for this. However the ticket is still open. In the meantime is there another way to go about this?
Since Django 1.3 there is a assertNumQueries available exactly for this purpose.
One way to use it (as of Django 3.2) is as a context manager:
# measure queries of some_func and some_func2
with self.assertNumQueries(2):
result = some_func()
result2 = some_func2()
Vinay's response is correct, with one minor addition.
Django's unit test framework actually sets DEBUG to False when it runs, so no matter what you have in settings.py, you will not have anything populated in connection.queries in your unit test unless you re-enable debug mode. The Django docs explain the rationale for this as:
Regardless of the value of the DEBUG setting in your configuration file, all Django tests run with DEBUG=False. This is to ensure that the observed output of your code matches what will be seen in a production setting.
If you're certain that enabling debug will not affect your tests (such as if you're specifically testing DB hits, as it sounds like you are), the solution is to temporarily re-enable debug in your unit test, then set it back afterward:
def test_myself(self):
from django.conf import settings
from django.db import connection
settings.DEBUG = True
connection.queries = []
# Test code as normal
self.assert_(connection.queries)
settings.DEBUG = False
If you are using pytest, pytest-django has django_assert_num_queries fixture for this purpose:
def test_queries(django_assert_num_queries):
with django_assert_num_queries(3):
Item.objects.create('foo')
Item.objects.create('bar')
Item.objects.create('baz')
If you don't want use TestCase (with assertNumQueries) or change settings to DEBUG=True, you can use context manager CaptureQueriesContext (same as assertNumQueries using).
from django.db import ConnectionHandler
from django.test.utils import CaptureQueriesContext
DB_NAME = "default" # name of db configured in settings you want to use - "default" is standard
connection = ConnectionHandler()[DB_NAME]
with CaptureQueriesContext(connection) as context:
... # do your thing
num_queries = context.initial_queries - context.final_queries
assert num_queries == expected_num_queries
db settings
In modern Django (>=1.8) it's well documented (it's also documented for 1.7) here, you have the method reset_queries instead of assigning connection.queries=[] which indeed is raising an error, something like that works on django>=1.8:
class QueriesTests(django.test.TestCase):
def test_queries(self):
from django.conf import settings
from django.db import connection, reset_queries
try:
settings.DEBUG = True
# [... your ORM code ...]
self.assertEquals(len(connection.queries), num_of_expected_queries)
finally:
settings.DEBUG = False
reset_queries()
You may also consider resetting queries on setUp/tearDown to ensure queries are reset for each test instead of doing it on finally clause, but this way is more explicit (although more verbose), or you can use reset_queries in the try clause as many times as you need to evaluate queries counting from 0.
Here is the working prototype of context manager withAssertNumQueriesLessThan
import json
from contextlib import contextmanager
from django.test.utils import CaptureQueriesContext
from django.db import connections
#contextmanager
def withAssertNumQueriesLessThan(self, value, using='default', verbose=False):
with CaptureQueriesContext(connections[using]) as context:
yield # your test will be run here
if verbose:
msg = "\r\n%s" % json.dumps(context.captured_queries, indent=4)
else:
msg = None
self.assertLess(len(context.captured_queries), value, msg=msg)
It can be simply used in your unit tests for example for checking the number of queries per Django REST API call
with self.withAssertNumQueriesLessThan(10):
response = self.client.get('contacts/')
self.assertEqual(response.status_code, 200)
Also you can provide exact DB using and verbose if you want to pretty-print list of actual queries to stdout
If you have DEBUG set to True in your settings.py (presumably so in your test environment) then you can count queries executed in your test as follows:
from django.db import connection
class DoSomethingTests(django.test.TestCase):
def test_something_or_other(self):
num_queries_old = len(connection.queries)
do_something_in_the_database()
num_queries_new = len(connection.queries)
self.assertEqual(n, num_queries_new - num_queries_old)
If you want to use a decorator for that there is a nice gist:
import functools
import sys
import re
from django.conf import settings
from django.db import connection
def shrink_select(sql):
return re.sub("^SELECT(.+)FROM", "SELECT .. FROM", sql)
def shrink_update(sql):
return re.sub("SET(.+)WHERE", "SET .. WHERE", sql)
def shrink_insert(sql):
return re.sub("\((.+)\)", "(..)", sql)
def shrink_sql(sql):
return shrink_update(shrink_insert(shrink_select(sql)))
def _err_msg(num, expected_num, verbose, func=None):
func_name = "%s:" % func.__name__ if func else ""
msg = "%s Expected number of queries is %d, actual number is %d.\n" % (func_name, expected_num, num,)
if verbose > 0:
queries = [query['sql'] for query in connection.queries[-num:]]
if verbose == 1:
queries = [shrink_sql(sql) for sql in queries]
msg += "== Queries == \n" +"\n".join(queries)
return msg
def assertNumQueries(expected_num, verbose=1):
class DecoratorOrContextManager(object):
def __call__(self, func): # decorator
#functools.wraps(func)
def inner(*args, **kwargs):
handled = False
try:
self.__enter__()
return func(*args, **kwargs)
except:
self.__exit__(*sys.exc_info())
handled = True
raise
finally:
if not handled:
self.__exit__(None, None, None)
return inner
def __enter__(self):
self.old_debug = settings.DEBUG
self.old_query_count = len(connection.queries)
settings.DEBUG = True
def __exit__(self, type, value, traceback):
if not type:
num = len(connection.queries) - self.old_query_count
assert expected_num == num, _err_msg(num, expected_num, verbose)
settings.DEBUG = self.old_debug
return DecoratorOrContextManager()