Batch SQL INSERT in Django App - python-2.7

I am following this example to batch insert records into a table but modifying it to fit my specific example as such
sql='INSERT INTO CypressApp_grammatrix (name, row_num, col_num, gram_amount) VALUES {}'.format(', '.join(['(%s, %s, %s, %s)']*len(gram_matrix)),)
#print sql
params=[]
for gram in gram_matrix:
col_num=1
for g in gram:
params.extend([(matrix_name, row_num, col_num, g)])
col_num += 1
row_num += 1
print params
with closing(connection.cursor()) as cursor:
cursor.execute(sql, params)
However, upon doing so, I receive this error
return cursor._last_executed.decode('utf-8')
File "/usr/local/lib/python2.7/dist-packages/django/db/backends/mysql/base.py", line 150, in __getattr__
return getattr(self.cursor, attr)
AttributeError: 'Cursor' object has no attribute '_last_executed'
I would like to know why I received this error and what I can do to fix it, although I feel the problem could be with this code that works with MySQL that I did not write
def last_executed_query(self, cursor, sql, params):
# With MySQLdb, cursor objects have an (undocumented) "_last_executed"
# attribute where the exact query sent to the database is saved.
# See MySQLdb/cursors.py in the source distribution.
return cursor._last_executed.decode('utf-8')

So I don't know if I simply have an old copy of MySQLdb or what, but the problem appear to be with cursors.py. The only spot in that file where you can find _last_executed is here
def _do_query(self, q):
db = self._get_db()
self._last_executed = q
db.query(q)
self._do_get_result()
return self.rowcount
However, the __init__ does not set up this variable as an instance attribute. It's missing completely. So I took the liberty of adding it myself and initializing it to some query string. I assumed any would do, so I just added
class BaseCursor(object):
"""A base for Cursor classes. Useful attributes:
description
A tuple of DB API 7-tuples describing the columns in
the last executed query; see PEP-249 for details.
description_flags
Tuple of column flags for last query, one entry per column
in the result set. Values correspond to those in
MySQLdb.constants.FLAG. See MySQL documentation (C API)
for more information. Non-standard extension.
arraysize
default number of rows fetchmany() will fetch
"""
from _mysql_exceptions import MySQLError, Warning, Error, InterfaceError, \
DatabaseError, DataError, OperationalError, IntegrityError, \
InternalError, ProgrammingError, NotSupportedError
def __init__(self, connection):
from weakref import ref
...
self._last_executed ="SELECT * FROM T"
...
Now the cursor object does have the attribute _last_executed and when this function
def last_executed_query(self, cursor, sql, params):
# With MySQLdb, cursor objects have an (undocumented) "_last_executed"
# attribute where the exact query sent to the database is saved.
# See MySQLdb/cursors.py in the source distribution.
return cursor._last_executed.decode('utf-8')
in base.py is called, the attribute does exist and so this error
return cursor._last_executed.decode('utf-8')
File "/usr/local/lib/python2.7/dist-
packages/django/db/backends/mysql/base.py", line 150, in __getattr__
return getattr(self.cursor, attr)
AttributeError: 'Cursor' object has no attribute '_last_executed'
will not be encountered. At least that is how I believe it works. In any case, it fixed the situation for me.

Related

How to extend Django's test database with custom SQL view?

I'm using Django 2.1 with MySQL.
I have one custom SQL view, which is bound with a model with Meta managed = False. Django's TestCase has no idea how the view is created, so I'd like to provide SQL command to create this view. The best option would be to do this on database create, but I have no idea how to do that.
What I've done so far was to override TestCase's setUp method. It looks like that:
class TaskDoneViewTest(TestCase):
def setUp(self):
"""
Create custom SQL view
"""
cursor = connection.cursor()
file_handle = open('app/tests/create_sql_view.sql', 'r+')
sql_file = File(file_handle)
sql = sql_file.read()
cursor.execute(sql)
cursor.close()
def test_repeatable_task_done(self):
# ...
def test_one_time_task_done(self):
# ...
I've got this solution from similar SO post: How to use database view in test cases. It would be a nice temporary solution, but the problem is with all those 2 test cases active I'm getting following error:
$ python manage.py test app.tests
Creating test database for alias 'default'...
System check identified no issues (0 silenced).
...E..
======================================================================
ERROR: test_repeatable_task_done (app.tests.test_views.TaskDoneViewTest)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/asmox/AppDev/Python/bubblechecklist/project_root/app/tests/test_views.py", line 80, in setUp
cursor.execute(sql)
File "/home/asmox/AppDev/Python/bubblechecklist/env/lib/python3.6/site-packages/django/db/backends/utils.py", line 68, in execute
return self._execute_with_wrappers(sql, params, many=False, executor=self._execute)
File "/home/asmox/AppDev/Python/bubblechecklist/env/lib/python3.6/site-packages/django/db/backends/utils.py", line 77, in _execute_with_wrappers
return executor(sql, params, many, context)
File "/home/asmox/AppDev/Python/bubblechecklist/env/lib/python3.6/site-packages/django/db/backends/utils.py", line 80, in _execute
self.db.validate_no_broken_transaction()
File "/home/asmox/AppDev/Python/bubblechecklist/env/lib/python3.6/site-packages/django/db/backends/base/base.py", line 437, in validate_no_broken_transaction
"An error occurred in the current transaction. You can't "
django.db.transaction.TransactionManagementError: An error occurred in the current transaction. You can't execute queries until the end of the 'atomic' block.
For some reason this error doesn't happen when I have only one test case active (why?).This error remains until I change my test's base class from TestCase to TransactionTestCase.
Well, I would ask why this happen and if there is any solution to get it okay with simple TestCase class, because my test for now has nothing to do with transactions and I find this working solutions a bit too dirty, but...
I would more likely stick to the main issue, that is, to globally (for all my test cases) do the following thing:
When testing database is created, do one more custom SQL from provided file. It is going to create required view
Can you please help me how to do that?
If you read the documentation for TestCase, you'll see that it wraps each test in a double transaction, one at the class level and one at the test level. The setUp() method runs for each test and is thus inside this double wrapping.
As shown in the above mentioned docs, it is suggested you use setUpTestData() to set up your db at the class level. This is also where you'd add initial data to your db for all your tests to use.

create and insert into database tables using luigi

I am trying to understand the correct way to drop and recreate a table and insert data into the newly created table using luigi. I have multiple CSV files passed from the prior task to the task which should be inserted into the database.
My current code looks like this.
class CreateTables(sqla.CopyToTable):
connection_string = DatabaseConfig().data_mart_connection_string
table = DatabaseConfig().table_name
def requires(self):
return CustomerJourneyToCSV()
def output(self):
return SQLAlchemyTarget(
connection_string=self.connection_string,
target_table="customerJourney_1",
update_id=self.update_id(),
connect_args=self.connect_args,
echo=self.echo)
def create_table(self, engine):
base = automap_base()
Session = sessionmaker(bind=engine)
session = Session()
metadata = MetaData(engine)
base.prepare(engine, reflect=True)
# Drop existing tables
for i in range(1, len(self.input())+1):
for t in base.metadata.sorted_tables:
if t.name in "{0}_{1}".format(self.table, i):
t.drop(engine)
# Create new tables and insert data
i = 1
for f in self.input():
df = pd.read_csv(f.path, sep="|")
df.fillna(value="", inplace=True)
ts = define_table_schema(df)
t = Table("{0}_{1}".format(self.table, i), metadata, *[Column(*c[0], **c[1]) for c in ts])
t.create(engine)
# TODO: Need to remove head and figure out how to stop the connection from timing out
my_insert = t.insert().values(df.head(500).to_dict(orient="records"))
session.execute(my_insert)
i +=1
session.commit()
The code works creates the tables and inserts the data but falls over with the following error.
C:\Users\simon\AdviceDataMart\lib\site-packages\luigi\worker.py:191:
DtypeWarning: Columns (150) have mixed types. Specify dtype option on import
or set low_memory=False.
new_deps = self._run_get_new_deps()
File "C:\Users\simon\AdviceDataMart\lib\site-packages\luigi\worker.py", line
191, in run
new_deps = self._run_get_new_deps()
File "C:\Users\simon\AdviceDataMart\lib\site-packages\luigi\worker.py", line
129, in _run_get_new_deps
task_gen = self.task.run()
File "C:\Users\simon\AdviceDataMart\lib\site-
packages\luigi\contrib\sqla.py", line 375, in run
for row in itertools.islice(rows, self.chunk_size)]
File "C:\Users\simon\AdviceDataMart\lib\site-
packages\luigi\contrib\sqla.py", line 363, in rows
with self.input().open('r') as fobj:
AttributeError: 'list' object has no attribute 'open'
I am not sure what is causing this and am not able to easily debug a luigi pipeline. I am not sure if this has to do with implementation of the run method or the output method?

Python sqlite3 OperationalError: near "?": syntax error

Trying to let users update column values on existing records for a specific table named "Scenario." The record being updated is identified by an index column called "Scenario_Key", unique to each instance of this class. The code I already have produces a dictionary of key, value pairs where key is the name of the column being updated and value is the value being inserted into it. To update the sqlite database I'm trying the following:
cursor.execute("""UPDATE Scenario SET ?=? WHERE Scenario_Key=?;""", (key, new_val, self.scenario_key))
But when I try to execute by clicking the "Save and Close" button, I get the following:
Traceback (most recent call last):
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-tk/Tkinter.py", line 1536, in __call__
return self.func(*args)
File "/Users/xxx/Documents/Consulting/DCA/Damage Control Assistant/EditScenarioWindow.py", line 91, in <lambda>
SaveAndCloseButton = Button(ButtonFrame, text="Save and Close", command=lambda: self.SaveAndCloseWindow())
File "/Users/xxx/Documents/Consulting/DCA/Damage Control Assistant/EditScenarioWindow.py", line 119, in SaveAndCloseWindow
cursor.execute(cmd_string, (key, new_val, self.scenario_key))
OperationalError: near "?": syntax error
I've read over sqlite3.OperationalError: near "?": syntax error, but I'm trying to do a single sqlite query where all the variables have already been calculated, not get values from the database and build a query from there. I'm supplying the positional arguments as a tuple. So why doesn't sqlite3 like the query I'm submitting?
You cannot parametrize column names. While being cognisant of the possibility of SQL Injection attacks, you could instead do:
cursor.execute("""UPDATE Scenario
SET {}=?
WHERE Scenario_Key=?;""".format(key),
(new_val, self.scenario_key))

Why this dynamodb query failed with 'Requested resource not found'?

I have doubled checked that the item exists in the dynamodb table. id is the default hash key.
I want to retrieve the content by using the main function in this code:
import boto.dynamodb2
from boto.dynamodb2 import table
table='doc'
region='us-west-2'
aws_access_key_id='YYY'
aws_secret_access_key='XXX'
def get_db_conn():
return boto.dynamodb2.connect_to_region(
region,
aws_access_key_id=aws_access_key_id,
aws_secret_access_key=aws_secret_access_key)
def get_table():
return table.Table(table, get_db_conn())
def main():
tbl = get_table()
doc = tbl.get_item(id='4d7a73b6-2121-46c8-8fc2-54cd4ceb2a30')
print doc.keys()
However I get this exception instead:
File "scripts/support/find_doc.py", line 31, in <module>
main()
File "scripts/support/find_doc.py", line 33, in main
doc = tbl.get_item(id='4d7a73b6-2121-46c8-8fc2-54cd4ceb2a30')
File "/Users/antkong/project-ve/lib/python2.7/site-packages/boto/dynamodb2/table.py", line 504, in get_item
consistent_read=consistent
File "/Users/antkong/project-ve/lib/python2.7/site-packages/boto/dynamodb2/layer1.py", line 1065, in get_item
body=json.dumps(params))
File "/Users/antkong/project-ve/lib/python2.7/site-packages/boto/dynamodb2/layer1.py", line 2731, in make_request
retry_handler=self._retry_handler)
File "/Users/antkong/project-ve/lib/python2.7/site-packages/boto/connection.py", line 953, in _mexe
status = retry_handler(response, i, next_sleep)
File "/Users/antkong/project-ve/lib/python2.7/site-packages/boto/dynamodb2/layer1.py", line 2774, in _retry_handler
data)
boto.exception.JSONResponseError: JSONResponseError: 400 Bad Request
{u'message': u'Requested resource not found', u'__type': u'com.amazonaws.dynamodb.v20120810#ResourceNotFoundException'}
Why I am getting this error message?
I am using boto version 2.34
The problem is in this code:
def get_table():
return table.Table(table, get_db_conn())
It should be
def get_table():
return table.Table(table, connection=get_db_conn())
Note the connection named parameter
If you have a range key you have to specify in the get_item, like so:
get_item(timestamp=Decimal('1444232509'), id='HASH_SHA1')
Here on my table Packages I have an index (id) and a range key (timestamp).
I was getting this error because I was connecting to the wrong region.
To check your table region, go to overview tab of your table and scroll down to Amazon Resource Name (ARN) field.
My ARN starts with arn:aws:dynamodb:us-east-2:. Here 'us-east-2' is the region I need to pass while initiating the boto3 client.

Django character encoding

I'm working on a new Django site, and, after migrating in a pile of data, have started running into a deeply frustrating DjangoUnicodeDecodeError. The bad character in question is a \xe8 (e-grave).
There's two sticky issues:
It only happens in the production server, running an Apache-fronted fcgi process (running the same code with the same database on the Django dev server has no issues)
The stack trace in question is entirely within Django code. It occurs in the admin site (elsewhere too) when retrieving an item to display, though the field that contains the bad character is not actually ever rendered.
I'm not even entirely sure where to begin debugging this, short of trying to remove the offending characters manually. My guess is that it's a configuration issue, since it's environment-specific, but I'm not sure where to start there either.
EDIT:
As Daniel Roseman pointed out, the error is almost certainly in the unicode method--or, more precisely, another method that it calls. Note that the offending characters are in a field not referenced at all in the code here. I suppose that the exception is raised in a method that builds the object from the db result--if the queryset is never evaluated (e.g. if not self.enabled) there's no error. Here's the code:
def get_blocking_events(self):
return Event.objects.filter(<get a set of events>)
def get_blocking_reason(self):
blockers = self.get_blocking_events()
label = u''
if not self.enabled:
label = u'Sponsor disabled'
elif len(blockers) > 0:
label = u'Pending follow-up: "{0}" ({1})'.format(blockers[0],blockers[0].creator.email)
if len(blockers) > 1:
label += u" and {0} other event".format(len(blockers)-1)
if len(blockers) > 2:
label += u"s"
return label
def __unicode__(self):
label = self.name
blocking_msg = self.get_blocking_reason()
if len(blocking_msg):
label += u" ({0})".format(blocking_msg)
return label
Here's the tail of the stack trace, for fun:
File "/opt/opt.LOCAL/Django-1.2.1/django/template/__init__.py", line 954, in render
dict = func(*args)
File "/opt/opt.LOCAL/Django-1.2.1/django/contrib/admin/templatetags/admin_list.py", line 209, in result_list
'results': list(results(cl))}
File "/opt/opt.LOCAL/Django-1.2.1/django/contrib/admin/templatetags/admin_list.py", line 201, in results
yield list(items_for_result(cl, res, None))
File "/opt/opt.LOCAL/Django-1.2.1/django/contrib/admin/templatetags/admin_list.py", line 138, in items_for_result
f, attr, value = lookup_field(field_name, result, cl.model_admin)
File "/opt/opt.LOCAL/Django-1.2.1/django/contrib/admin/util.py", line 270, in lookup_field
value = attr()
File "/opt/opt.LOCAL/Django-1.2.1/django/db/models/base.py", line 352, in __str__
return force_unicode(self).encode('utf-8')
File "/opt/opt.LOCAL/Django-1.2.1/django/utils/encoding.py", line 88, in force_unicode
raise DjangoUnicodeDecodeError(s, *e.args)
DjangoUnicodeDecodeError: 'utf8' codec can't decode bytes in position 956-958: invalid data. You passed in <Sponsor: [Bad Unicode data]> (<class 'SJP.alcohol.models.Sponsor'>)
The issue here is that in unicode you use the following line:
label += " ({0})".format(blocking_msg)
And unfortunately in python 2.x this is trying to format blocking_msg as an ascii string. What you meant to type was:
label += u" ({0})".format(blocking_msg)
Turns out this is likely due to the FreeTDS layer that connects to the SQL Server. While FreeTDS provides some support for automatically converting encodings, my setup is either misconfigured or otherwise not working quite right.
Rather than fighting this battle, I've migrated to MySQL for now.