python shell not able to find sqlite table - python-2.7

I have created a sqlite table using sqlite browser. I have added one row too. When I connect to the table through terminal, I am able to find the table as well as row.
However, I am trying to connect to table through python as I have lots of rows to be inserted. Even though I wrote a python program, I tried to connect and insert the row through shell.
import sqlite3
import os
home=os.environ['HOME']
conn=sqlite3.connect(home+'/AndroidStudioProjects/TableTopicPractice/database/dbTableTopic')
cur=conn.cursor()
cur.execute("SELECT name FROM sqlite_master WHERE type='table'")
print cur.rowcount
The last statement printed -1. When I tried to query the specific table that I created, the result was same. Both the query I tried in sqlite browser and it works.
Please note sqlite3 is installed in my system. I used the following tutorial as guide.
http://zetcode.com/db/sqlitepythontutorial/
Where am I going wrong? What do I need to correct?
Any pointers will be highly appreciated.

Related

Query hive table with Spark

I am newbie to Apache Hive and Spark. I have some existing Hive tables sitting on my Hadoop server that I can run some HQL commands and get what I want out of the table using hive or beeline, e.g, selecting first 5 rows of my table. Instead of that I want to use Spark to achieve the same goal. My Spark version on server is 1.6.3.
Using below code (I replace my database name and table with database and table):
sc = SparkContext(conf = config)
sqlContext = HiveContext(sc)
query = sqlContext.createDataFrame(sqlContext.sql("SELECT * from database.table LIMIT 5").collect())
df = query.toPandas()
df.show()
I get this error:
ValueError: Some of types cannot be determined after inferring.
Error:root: An unexpected error occurred while tokenizing input
The following traceback may be corrupted or invalid
The error message is: ('EOF in multi-line string', (1, 0))
However, I can use beeline with same query and see the results.
After a day of googling and searching I modified the code as:
table_ccx = sqlContext.table("database.table")
table_ccx.registerTemplate("temp")
sqlContext.sql("SELECT * FROM temp LIMIT 5").show()
Now the error is gone but all the row values are null except one or two dates and column names.
I also tried
table_ccx.refreshTable("database.table")
and it did not help. Is there a setting or configuration that I need to ask my IT team to do? I appreciate any help.
EDIT: Having said that, my python code is working for some of the table on Hadoop. Do not know the problem is because of some entries on table or not? If yes, then how come the corresponding beeline/Hive command is working?
As it came out in the comments, straightening up the code a little bit makes the thing work.
The problem lies on this line of code:
query = sqlContext.createDataFrame(sqlContext.sql("SELECT * from database.table LIMIT 5").collect())
What you are doing here is:
asking Spark to query the data source (which creates a DataFrame)
collect everything on the driver as a local collection
parallelize the local collection on Spark with createDataFrame
In general the approach should work, although it's evidently unnecessarily convoluted.
The following will do:
query = sqlContext.sql("SELECT * from database.table LIMIT 5")
I'm not entirely sure of why the thing breaks your code, but still it does (as it came out in the comments) and it also improves it.

Mass data insert into SQL Server?

I've got 8 worksheets within an Excel workbook that I'd like to import into separate tables within a SQL Server DB.
I'd like to import each of the 8 worksheets into a separate table, ideally, with table names coinciding with worksheet tab names, but initially, I just want to get the data into the tables, so arbitrary table names work for the time being too.
The format of the data in each of the worksheets (and tables by extension) is the same (and will be identical), so I'm thinking some kind of loop could be used to do this.
Data looks like this:
Universe Date Symbol Shares MktValue Currency
SMALLCAP 6/30/2011 000360206 27763 606361.92 USD
SMALLCAP 6/30/2011 000361105 99643 2699407.52 USD
SMALLCAP 6/30/2011 00081T108 103305 810926.73 USD
SMALLCAP 6/30/2011 000957100 57374 1339094.76 USD
And table format in SQL would/should be consistent with the following:
CREATE TABLE dbo.[market1] (
[Universe_ID] char(20),
[AsOfDate] smalldatetime,
[Symbol] nvarchar(20),
[Shares] decimal(20,0),
[MktValue] decimal(20,2),
[Currency] char(3)
)
I'm open to doing this using either SQL/VBA/C++ or some combination (as these are the languages I know and have access to). Any thoughts on how to best go about this?
You could use SSIS or DTS packages to import them. Here are a couple references to get you going.
Creating a DTS Package - pre 2005
Creating a SSIS Package - 2005 forward
For an Excel file (2007 or 2010) with an xlsx extension, I have renamed them to .zip and extracted their contents into a directory and use SQL XML Bulk Load to import the sheets and reference tables. When I have all the data in SQL server, I use basic SQL queries to extract/transform the data needed into designated worksheets. -- This keeps the "digestion" logic in SQL and uses minimal external VB script of C# development.
Link to SQL Bulk Load of XML data: http://support.microsoft.com/kb/316005
In SQL Management Studio, right click on a database, then click Tasks, then Import Data. This will take you through some screens and create an SSIS package to import the file. At some point in the process it will ask you if you want to save the package (I would run it a few times as well to make sure it imports your data the way you want it). Save it and then you can schedule the package to be run as a Job via the SQL Server Agent. (The job type will be Sql Server Integration Services).
You can use following script
SELECT * INTO XLImport3 FROM OPENDATASOURCE('Microsoft.Jet.OLEDB.4.0',
'Data Source=C:\test\xltest.xls;Extended Properties=Excel 8.0')...[Customers$]
SELECT * INTO XLImport4 FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0',
'Excel 8.0;Database=C:\test\xltest.xls', [Customers$])
SELECT * INTO XLImport5 FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0',
'Excel 8.0;Database=C:\test\xltest.xls', 'SELECT * FROM [Customers$]')
OR can use following
SELECT * INTO XLImport2 FROM OPENQUERY(EXCELLINK,
'SELECT * FROM [Customers$]')

Verify the structure of a database? (SQLite in C++ / Qt)

I was wondering what the "best" way to verify the structure of my database is with SQLite in Qt / C++. I'm using SQLite so there is a file which contains my database, and I want to make sure that, when launching the program, the database is structured the way it should be- i.e., it has X tables each with their own Y columns, appropriately named, etc. Could someone point my in the right direction? Thanks so much!
You can get a list of all the tables in the database with this query:
select tbl_name from sqlite_master;
And then for each table returned, run this query to get column information
pragma table_info(my_table);
For the pragma, each row of the result set will contain: a column index, the column name, the column's type affinity, whether the column may be NULL, and the column's default value.
(I'm assuming here that you know how to run SQL queries against your database in the SQLite C interface.)
If you have QT and thus QtSql at hand, you can also use the QSqlDatabase::tables() (API doc) method to get the tables and QSqlDatabase::record(tablename) to get the field names. It can also give you the primary key(s), but for further details you will have to follow pkh's advice to use the table_info pragma.

Django ORM misreading PostgreSQL sequences?

Background: Running a PostgreSQL database for a Django app (Django 1.1.1, Python2.4, psycopg2 and Postgres 8.1) I've restored the database from a SQL dump several times. Each time I do that and then try to add a new row, either shell, admin, or site front end, I get this error:
IntegrityError: duplicate key violates unique constraint "app_model_pkey"
The data dump is fine and is resetting the sequences. But if I try adding the row again, it's successful! So I can just try jamming a new row into every table and then everything seems to be copacetic.
Question: Given that (1) the SQL dump is good and Postgres is reading it in correctly (per earlier question), and (2) Django's ORM does not seem to be failing systemically getting next values, what is going on in this specific instance?
Django doesn't hold or directly read the sequence values in any way. I've explained it f.ex. in this question: 2088210/django-object-creation-and-postgres-sequences.
Postgresql does increment the sequence when you try to add a row, even if the result of the operation is not successful (raises a duplicate key error) the sequence incrementation doesn't rollback. So, that's the reason why it works the second time you try adding a row.
I don't know why your sequences are not set properly, could you check what is the sequence value before dump and after restore, and do the same with the max() pk of the table? Maybe it's an 8.1 bug with the restore? I don't know. What I'm sure of: it's not Django's fault.
I am guessing that your sequence is out of date.
You can fix that like this:
select setval('app_model_id_seq', max(id)) from app_model;

Loading from pickled data causes database error with new saves

In order to save time moving data I pickled some models and dumped them to file. I then reloaded them into another database using the same exact model. The save worked fine and the objects kept their old id which is what I wanted. However, when saving new objects I run into nextval errors.
Not being very adept with postgres, I'm not sure how to fix this so I can keep old records with their existing ID while being able to continue adding new data.
Thanks,
Thomas
There is actually a django command that prints out sequence reset SQL called sqlsequencereset.
$ python manage.py sqlsequencereset issues
BEGIN;
SELECT setval('"issues_project_id_seq"', coalesce(max("id"), 1), max("id") IS NOT null) FROM "issues_project";
COMMIT;
I think that you are talking about the sequence that is being used for autoincrementing your id fields.
the easiest solution here would be in a "psql" shell:
select max(id)+1 from YOURAPP_YOURMODEL;
and use the value in this command:
alter sequence YOURAPP_YOURMODEL_id_seq restart with MAX_ID_FROM_PREV_STATEMENT;
that should do the trick.