I have python 3.6.8 on GNU/Linux 3.10 on GCP and I'm trying to load data from Hive to CloudSQL.
gc_cmd_import_csv_p1 = subprocess.Popen(['gcloud', 'sql', 'import', 'csv',
'{}'.format(quote(cloudsql_instance)),
'{}'.format(quote(load_csv_files)),
'--database={}'.format(quote(cloudsql_db)),
'--table={}'.format(quote(cloudsql_table_name)),
'--user={}'.format(quote(db_user_name)),
'--quiet'],
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
universal_newlines=True)
import_cmd_op, import_cmd_error = gc_cmd_import_csv_p1.communicate()
import_cmd_return_code = gc_cmd_import_csv_p1.returncode
if import_cmd_return_code:
print("""[ERROR] Unable to import data from Hive to CloudSQL.
Error description: {}
Error Code(s): {}
Issue file name: {}
""".format(import_cmd_error, import_cmd_return_code, load_csv_files))
sys.exit(9)
print("[INFO] Data Import completed from HIVE to CloudSQL.")
In case of any error above, I'm getting message like:
Error description: ERROR: (gcloud.sql.import.csv) HTTPError 403: The client is not authorized to make this request.Error Code(s): 1
But when I actually run the same import command directly as shown below:
gcloud sql import csv test-cloud-sql-instance gs://test-server-12345/app1/data/lookup_table/000000_0 --database=test_db --table=name_lookup --user=test_user --quiet
I'm getting the actual error like below:
ERROR: (gcloud.sql.import.csv) [ERROR_RDBMS] ERROR: extra data after last expected column CONTEXT: COPY name_lookup, line 16902:
I want this message
( Extra data after last expected column... line 16902:)
to be shown in python script instead of
HTTPError 403:
error. How to capture that?
Please note: There is no authentication issue as suggested by HTTP Error.
So after long discussion with GCP Admin, we have found the issue.
We tried to execute the same import command using os.system() and then again we got the HTTP error. Admin then revisited the GCP IAM documentation and created a role for P-SQL user. Issue is resolved now.
The problem:
I'm trying to get it so I can use Cassandra to work with Python properly. I've been using a toy dataset to practice uploading a csv file into Cassandra with no luck.
Cassandra seems to work fine when I am not using COPY FROM for csv files.
My intention is to use this dataset as a test to make sure that I can load a csv file's information into Cassandra, so I can then load 5 csv files totaling 2 GB into it for my originally intended project.
Note: Whenever I use CREATE TABLE and then run SELECT * FROM tvshow_data, the columns don't appear in the order that I set them, is this going to affect anything, or does it not matter?
Info about my installations and usage:
I've tried running both cqlsh and cassandra with an admin powershell.
I have Python 2.7 installed inside of the apache-cassandra-3.11.6 folder.
I have Cassandra version 3.11.6 installed.
I have cassandra-driver 3.18.0 installed, with conda.
I use Python 3.7 installed for everything other than Cassandra's directory.
I have tried both CREATE TABLE tvshow and CREATE TABLE tvshow.tvshow_data.
My Python script:
from cassandra.cluster import Cluster
cluster = Cluster()
session = cluster.connect()
create_and_add_file_to_tvshow = [
"DROP KEYSPACE tvshow;",
"CREATE KEYSPACE tvshow WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1};",
"USE tvshow;",
"CREATE TABLE tvshow.tvshow_data (id int PRIMARY KEY, title text, year int, age int, imdb decimal, rotten_tomatoes int, netflix int, hulu int, prime_video int, disney_plus int, is_tvshow int);",
"COPY tvshow_data (id, title, year, age, imdb, rotten_tomatoes, netflix, hulu, prime_video, disney_plus, is_tvshow) FROM 'C:tvshows.csv' WITH HEADER = true;"
]
print('\n')
for query in create_and_add_file_to_tvshow:
session.execute(query)
print(query, "\nsuccessful\n")
Resulting python error:
This is the error I get when I run my code in the powershell with the following command, python cassandra_test.py.
cassandra.protocol.SyntaxException: <Error from server: code=2000 [Syntax error in
CQL query] message="line 1:0 no viable alternative at input 'COPY' ([
Resulting cqlsh error:
Running the previously stated cqlsh code in the create_and_add_file_to_tvshow variable in powershell after running cqlsh in the apache-cassandra-3.1.3/bin/ directory, creates the following error.
Note: The following error is only the first few lines to the code as well as the last new lines, I choose not to include it since it was several hundred lines long. If necessary I will include it.
Starting copy of tvshow.tvshow_data with columns [id, title, year, age, imdb, rotten_tomatoes, netflix, hulu, prime_video, disney_plus, is_tvshow].
Failed to import 0 rows: IOError - Can't open 'C:tvshows.csv' for reading: no matching file found, given up after 1 attempts
Process ImportProcess-44:
PTrocess ImportProcess-41:
raceback (most recent call last):
PTPProcess ImportProcess-42:
...
...
...
AA cls._loop.add_timer(timer)
AAttributeError: 'NoneType' object has no attribute 'add_timer'
ttributeError: 'NoneType' object has no attribute 'add_timer'
AttributeError: 'NoneType' object has no attribute 'add_timer'
ttributeError: 'NoneType' object has no attribute 'add_timer'
ttributeError: 'NoneType' object has no attribute 'add_timer'
Processed: 0 rows; Rate: 0 rows/s; Avg. rate: 0 rows/s
0 rows imported from 0 files in 1.974 seconds (0 skipped).
A sample of the first 10 lines of the csv file used to import
I have tried creating a csv file with just these first two lines, for a toy's toy test, since I couldn't get anything else to work.
id,title,year,age,imdb,rotten_tomatoes,netflix,hulu,prime_video,disney_plus,is_tvshow
0,Breaking Bad,2008,18+,9.5,96%,1,0,0,0,1
1,Stranger Things,2016,16+,8.8,93%,1,0,0,0,1
2,Money Heist,2017,18+,8.4,91%,1,0,0,0,1
3,Sherlock,2010,16+,9.1,78%,1,0,0,0,1
4,Better Call Saul,2015,18+,8.7,97%,1,0,0,0,1
5,The Office,2005,16+,8.9,81%,1,0,0,0,1
6,Black Mirror,2011,18+,8.8,83%,1,0,0,0,1
7,Supernatural,2005,16+,8.4,93%,1,0,0,0,1
8,Peaky Blinders,2013,18+,8.8,92%,1,0,0,0,1
I want to upload a string in unicode to my sql server. I'm using python 2.7.6, sqlalchemy-migrate 0.7.2, pymssql 2.1.2.
But when I saved my object I got an OperationalError from sqlalchemy
OperationalError: (OperationalError) (105, "Unclosed quotation mark
after the character string '\xd8\xa3\xd8\xb3\xd8\xb1\xd8\xa7\xd8\xb1
\xd8\xaa\xd8\xad\xd8\xaf\xd9\x8a\xd8\xaf\xd8\xa7\xd9\x84\xd9\x88\xd8
\xac\xd9\x87 - \xd9\x81\xd9\x82\xd8\xb7 \xd9\x84\xd8\xaf\xd9\x89\xd9
\x85\xd8\xad\xd9\x84\xd8\xa7\xd8\xaa \xd9\x88\xd8\xac\xd9\x88\xd9\x87
\xe2\x9c\x8e '.DB-Lib error message 105, severity 15:\nGeneral SQL
Server error: Check messages from the SQL Server\n") 'INSERT INTO...
With more detail I'm guesssing is from my Description value :
...\u0648\u062c\u0648\u0647 \u270e \U0001f38'}
I saw a big U and not u, the character is the gift unicode just before, the "\u270e" works well and show a pencil. I strongly thing is because of the 8 values versus 4 for others.
But how to prevent this error ?
The column description inside of my DB is a nvarchar(2000)
I'm using Using flask restful reqparse to parse the argument create sync the object from the DB and save it :
parser_edit.add_argument('Name',
type=unicode,
required=True,
location='json')
parser_edit.add_argument('Description',
type=unicode,
required=False,
location='json')
When I try to install the unaccent Postgres extension (through the postgresql-contrib package), everything works as per the below:
# psql -U postgres -W -h localhost
Password for user postgres:
psql (9.3.9)
SSL connection (cipher: DHE-RSA-AES256-GCM-SHA384, bits: 256)
Type "help" for help.
postgres=# CREATE EXTENSION unaccent;
CREATE EXTENSION
postgres=# SELECT unaccent('Hélène');
unaccent
----------
Helene
(1 row)
However, when I try to use with Django 1.8, I get following error:
ProgrammingError: function unaccent(character varying) does not exist
LINE 1: ...able" WHERE ("my_table"."live" = true AND UNACCENT("...
^
HINT: No function matches the given name and argument types. You might need to add explicit type casts.
Using Postgresql 9.3 and Django 1.8.
A migration file needs to be manually made and applied.
First, create an empty migration:
./manage.py makemigrations myapp --empty
Then open the file and add UnaccentExtension to operations:
from django.contrib.postgres.operations import UnaccentExtension
class Migration(migrations.Migration):
dependencies = [
(<snip>)
]
operations = [
UnaccentExtension()
]
Now apply the migration using ./manage.py migrate.
If you'd get following error during that last step:
django.db.utils.ProgrammingError: permission denied to create extension "unaccent"
HINT: Must be superuser to create this extension.
... then temporarily allow superuser rights to your user by performing postgres# ALTER ROLE <user_name> SUPERUSER; and its NOSUPERUSER counterpart. pgAdminIII can do this, too.
Now enjoy the unaccent functionality using Django:
>>> Person.objects.filter(first_name__unaccent=u"Helène")
[<Person: Michels Hélène>]