I am newbie to Apache Hive and Spark. I have some existing Hive tables sitting on my Hadoop server that I can run some HQL commands and get what I want out of the table using hive or beeline, e.g, selecting first 5 rows of my table. Instead of that I want to use Spark to achieve the same goal. My Spark version on server is 1.6.3.
Using below code (I replace my database name and table with database and table):
sc = SparkContext(conf = config)
sqlContext = HiveContext(sc)
query = sqlContext.createDataFrame(sqlContext.sql("SELECT * from database.table LIMIT 5").collect())
df = query.toPandas()
df.show()
I get this error:
ValueError: Some of types cannot be determined after inferring.
Error:root: An unexpected error occurred while tokenizing input
The following traceback may be corrupted or invalid
The error message is: ('EOF in multi-line string', (1, 0))
However, I can use beeline with same query and see the results.
After a day of googling and searching I modified the code as:
table_ccx = sqlContext.table("database.table")
table_ccx.registerTemplate("temp")
sqlContext.sql("SELECT * FROM temp LIMIT 5").show()
Now the error is gone but all the row values are null except one or two dates and column names.
I also tried
table_ccx.refreshTable("database.table")
and it did not help. Is there a setting or configuration that I need to ask my IT team to do? I appreciate any help.
EDIT: Having said that, my python code is working for some of the table on Hadoop. Do not know the problem is because of some entries on table or not? If yes, then how come the corresponding beeline/Hive command is working?
As it came out in the comments, straightening up the code a little bit makes the thing work.
The problem lies on this line of code:
query = sqlContext.createDataFrame(sqlContext.sql("SELECT * from database.table LIMIT 5").collect())
What you are doing here is:
asking Spark to query the data source (which creates a DataFrame)
collect everything on the driver as a local collection
parallelize the local collection on Spark with createDataFrame
In general the approach should work, although it's evidently unnecessarily convoluted.
The following will do:
query = sqlContext.sql("SELECT * from database.table LIMIT 5")
I'm not entirely sure of why the thing breaks your code, but still it does (as it came out in the comments) and it also improves it.
While using Django 1.7 migrations, I came across a migration that worked in development, but not in production:
ValueError: Found wrong number (0) of constraints for table_name(a, b, c, d)
This is caused by an AlterUniqueTogether rule:
migrations.AlterUniqueTogether(
name='table_name',
unique_together=set([('a', 'b')]),
)
Reading up on bugs and such in the Django bug DB it seems to be about the existing unique_together in the db not matching the migration history.
How can I work around this error and finish my migrations?
(Postgres and MySQL Answer)
If you look at your actual table (use \d table_name) and look at the indexes, you'll find an entry for your unique constraint. This is what Django is trying to find and drop. But it can't find an exact match.
For example,
"table_name_...6cf2a9c6e98cbd0d_uniq" UNIQUE CONSTRAINT, btree (d, a, b, c)
In my case, the order of the keys (d, a, b, c) did not match the constraint it was looking to drop (a, b, c, d).
I went back into my migration history and changed the original AlterUniqueTogether to match the actual order in the database.
The migration then completed successfully.
I had a similar issue come up while I was switching over a CharField to become a ForeignKey. Everything worked with that process, but I was left with Django thinking it still needed to update the unique_together in a new migration. (Even though everything looked correct from inside postgres.) Unfortunately applying this new migration would then give a similar error:
ValueError: Found wrong number (0) of constraints for program(name, funder, payee, payer, location, category)
The fix that ultimately worked for me was to comment out all the previous AlterUniqueTogether operations for that model. The manage.py migrate worked without error after that.
"unique_together in the db not matching the migration history" - Every time an index is altered on a table it checks its previous index and drops it. In your case it is not able to fetch the previous index.
Solution-
1.Either you can generate it manually
2.Or revert to code where previous index is used and migrate.Then finally change to new index in your code and run migration.(django_migration files to be taken care of)
Also worth checking that you only have the expected number of Unique indexes for the table in question.
For example, if your table has multiple Unique indexes, then you should delete them to make sure you have only 1 (or whatever the number of expected Unique indexes is) pre-migration index present.
To check how many Unique indexes are there for a given table in PostgreSQL:
SELECT *
FROM information_schema.table_constraints AS c
WHERE
c.table_name = '<table_name>'
and c.constraint_type = 'UNIQUE'
Just in case someone runs into this and the previous answers haven't solved, In my case the issue was that when I modified the unique together constraint, the migration was attempted but the data didn't allow it (because of a more restrictive unique together constraint). However, the migration managed to delete the unique together constraint from the table leaving it in an inconsistent state. I had to migrate back to zero and re-apply the migration without data. Then it went through without problems.
In summary, make sure your data will be able to accept the new constraint before you apply the migration.
Find the latest migration file of the respective table, find unique
together, and replace current unique constraints fields.
Migrate database using ./manage.py migrate your_app_name.
Revert or undo the previous migrations file.
In my case problem was that the previous migration was not present in the table dajsngo_migrations. I added missing entry and then the new migration worked
Someone may get this issue while modifying the unique_together. Basically, the table state is not consistent with the migrations. You may need to add the previous constraints manually using MySQL shell.
incase one is using migrate with django and there is no data in the database, then drop the database and the do again python manage.py migrate`
UPDATE from 2021
This question is no longer actual for me.
It was a short period when I worked with DB2 and I don't know how it's in recent versions.
The problem was: I could not test effect of MQT without rebuilding it.
Which was not practical when you deal with multi-Gb data.
I did not found solution earlier, I don't know why question was minused.
SO recommends to not delete questions with answers and who knows: maybe somebody finally answers that.
I have a MQT in DB2 10.5 LUW:
CREATE TABLE MyMQT AS(
SELECT * FROM MyTable
WHERE
ServerName = 'COL'
AND LASTOCCURRENCE > TIMESTAMP '2015-12-21 00:00:00'
)
DATA INITIALLY DEFERRED REFRESH immediate
ENABLE QUERY OPTIMIZATION
MAINTAINED BY SYSTEM;
I want to DISABLE QUERY OPTIMIZATION without DROP/CREATE.
I found "Altering materialized query table properties" https://www-01.ibm.com/support/knowledgecenter/SSEPEK_10.0.0/com.ibm.db2z10.doc.admin/src/tpc/db2z_changemqtableattribs.html
but this is for z/OS.
If I try:
ALTER TABLE MyMQT DISABLE QUERY OPTIMIZATION;
I get:
DB21034E The command was processed as an SQL statement because it was not a
valid Command Line Processor command. During SQL processing it returned:
SQL0104N An unexpected token "TABLE" was found following "ALTER ". Expected
tokens may include: "VIEW". SQLSTATE=42601
Documentation for LUW explains how to change MQT to regular table and otherwise.
Can I alter MQT options in DB2 LUW without recreating it?
Edit
It's quite strange, but looks like this is impossible to achieve in DB2 LUW.
As data_henrik mentioned, it's possible to disable/enable optimization for all MQTs.
I accept his answer although it's not quite what I was looking for.
No personal experience with it, but you could:
SET CURRENT MAINTAINED TABLE TYPES FOR OPTIMIZATION = NONE
This would tell DB2 to not consider any MQT. Later on you would enable query optimization by setting that variable to "system" (the default) or something else. That statement is documented here.
Try this:
refreshable-table-options
|--●--DATA INITIALLY DEFERRED--●--REFRESH--+-DEFERRED--+--●----->
'-IMMEDIATE-'
.-ENABLE QUERY OPTIMIZATION--.
>--+----------------------------+--●---------------------------->
'-DISABLE QUERY OPTIMIZATION-'
We are working to upgrade our application to a more current version of Ruby & Rails. Our app integrates with a legacy database (SQL Server 2008 R2) that has a table with a column of image data type (we are unable to change this column to varbinary(max)). Previously we were able to save a binary into the image column. However now we are getting conversion errors.
We are working to upgrade to the following (among others):
Rails 4.2.1
ActiveRecord_SQLServer_Adapter (4.2.4)
tiny_tds (0.6.3.rc1)
freeTDS (v0.91.112)
When we now attempt to save into the image column, we get errors similar to:
TinyTds::Error: Unclosed quotation mark after the character string
Researching various issues within tiny_tds & activerecord_sqlserver_adapter, we decided to create a second table that matched the first but change the data type from image to varbinary(max). We can save a binary into the column.
The code causing the challenge is in a background job where we grab images from s3, store them locally and then push the image into the database. Again, we don't control the legacy database and thus can't change the data type (or confront the issue of why we are storing the image in the db in the first place).
...
#d = Doc.new
...
open("#{Rails.root}/cache/pictures/image.png", "wb") do |file|
file << open(r.image.url).read
end
#d.document = File.binread("#{Rails.root}/cache/pictures/image.png")
#d.save!
Given the upgrade has broken our saving images, we are trying to figure out how best to determine a fix. We could obviously roll back until we find a version that works. However we hope to find a fix. Anyone have any ideas?
Update:
We added the following configuration as we had triggers on the table being inserted: ActiveRecord::ConnectionAdapters::SQLServerAdapter.use_output_inserted = true
When we remove this configuration we get the following error:
TinyTds::Error: The target table 'doc' of the DML statement cannot have any enabled triggers if the statement contains an OUTPUT clause without INTO clause.
Note: We are unable to make any modifications to the triggers.
Per feedback on the ActiveRecord_SQLServer_Adapter site, we rolled back to 4.1.11 and we are now able to save into the image column.
We also had to add this snippet to overcome the issue with the triggers.
I need your kindly support.
I have a big project in my redmine with a lot of subprojects in it.
More than 300 issues have been moved from this project to another subprojects by mistake. And I haven't got a chance to rescue it by hands directly from redmine. But I have a database dump which has been done before this accident.
So, my question is - Can I compare table "issue" from right database with damaged database and move issues back? Or May be has any tools or methods to move back issues to right project?
Redmine version is 2.0.4. Database: PostgreSQL.
Thank you in advance.
plan a:
You can try to analyze table issues and find all issues which are move wrongly.
You know new project_id and you know approximately timestamps of changes. And write sql query (or use rails console) to undo action.
for example (code NOT tested!)
new_project_id = Project.find(ID).id # note that ID is project identificator not id of record!
timestamp = DateTime.parse('2013-10-30 12:20:45')
issues = Issue.where(project_id: new_project_id).where('updated_at > ? AND updated_at < ?', timestamp - 1.minute, timestamp + 1.minute)
# check that all selected issues must be updated!!!
issues.update_all(project_id: old_project_id) # note that old_project_id is correct id (integer value) of record in DB
plan b:
you can find all issue_id which have project_id in correct DB. And then apply SQL query to update project id to correct value for all issues where id IN (issue_ids) on corrupted DB
# load correct DATABASE and start rails console
project = Project.find(OLD_ID) # note that OLD_ID is project identificator not id of record!
issue_ids = project.issue_ids
# save somewhere issue_ids
# load corrupted database and start rails console
issue_ids = [saved_array_of_ids_from_previous_step]
Issue.where(id: issue_ids).update_all(project_id: correct_project_id) # note that correct_project_id is correct id (integer value) of record in DB