How to version seeds in Knex? - database-migration

I have a sql file where I write sentences for run in release, this file contains sentences like:
-- =======================2019-02-01=======================
UPDATE rating set stars = 3 where id = 6;
UPDATE users SET status = 'A' where last_login >= '2019-01-01';
INSERT INTO....
-- =======================2019-02-15=======================
UPDATE rating set stars = 3 where id = 6;
UPDATE users SET status = 'A' where last_login >= '2019-01-01';
INSERT INTO....
I run specifics sentences in each release date, but I believe that is bad practice and its no escalable method.
I'm trying change this method to Knex seeds or migrations. what would be the best practice to do it?
Seeds have a problem because knex executes the seeds every time I write the command knex seed:run, and it show some errors.

Knex stores the filenames and signatures of what it has executed so that it does not need to run them again.
https://knexjs.org/#Installation-migrations
Programmatically you can execute migrations like this:
knex({..config..}).migrate.latest({
directory: 'migrations', // where the files are stored
tableName: 'knex_migrations' // where knex saves its records
});
Example migration file
exports.up = function(knex) {
return knex.raw(`
UPDATE rating set stars = 3 where id = 6;
UPDATE users SET status = 'A' where last_login >= '2019-01-01';
INSERT INTO....
`)
};
The files will be executed alphabetically/sorted, and will not be re-executed against the same database.

Related

How do i update sql value with sum comparison

I am trying to update a value to show the picked status of an order based on the picked status against the order qty. The data is in the same table but i cannot figure out the correct syntax. I tried:
Update Orders set Status = 'FULL' where Sum(Qty_Order) = sum(Qty_Picked)
How can i apply this logic using an aggregate query?
Thanks in advance for any help/
One approach uses an update join:
UPDATE Orders o1
INNER JOIN
(
SELECT id
FROM Orders
GROUP BY id
HAVING SUM(Qty_Order) = SUM(Qty_Picked)
) o2
ON o2.id = o1.id
SET
Status = 'FULL';
This assumes that your Orders table has a column id which uniquely identifies each order.

Django Postgres migration: Fastest way to backfill a column in a table with 100 Million rows

I have a table in Postgres Thing that has 100 Million rows.
I have a column that was populated over time that stores some keys. The keys were prefixed before storing. Let's call it prefixed_keys.
My task is to use the values of this column to populate another column with the same values but with the prefixes trimmed off. Let's call it simple_keys.
I tried the following migration:
from django.db import migrations
import time
def backfill_simple_keys(apps, schema_editor):
Thing = apps.get_model('thing', 'Thing')
batch_size = 100000
number_of_batches_completed = 0
while Thing.objects.filter(simple_key__isnull=True).exists():
things = Thing.objects.filter(simple_key__isnull=True)[:batch_size]
for tng in things:
prefixed_key = tng.prefixed_key
if prefixed_key.startswith("prefix_A"):
simple_key = prefixed_key[len("prefix_A"):]
elif prefixed_key.startswith("prefix_BBB"):
simple_key = prefixed_key[len("prefix_BBB"):]
tng.simple_key = simple_key
Thing.objects.bulk_update(
things,
['simple_key'],
batch_size=batch_size
)
number_of_batches_completed += 1
print("Number of batches updated: ", number_of_batches_completed)
sleep_seconds = 3
time.sleep(sleep_seconds)
class Migration(migrations.Migration):
dependencies = [
('thing', '0030_add_index_to_simple_key'),
]
operations = [
migrations.RunPython(
backfill_simple_keys,
),
]
Each batch took about ~7 minutes to complete. Which would means it would take days to complete!
It also increased the latency of the DB which is bing used in production.
Since you're going to go through every record in that table anyway it makes sense to traverse it in one go using a server-side cursor.
Calling
Thing.objects.filter(simple_key__isnull=True)[:batch_size]
is going to be expensive especially as the index starts to grow.
Also the call above retrieves ALL fields from that table even if you are only going to use only 2-3 fields.
update_query = """UPDATE table SET simple_key = data.key
FROM (VALUES %s) AS data (id, key) WHERE table.id = data.id"""
conn = psycopg2.connect(DSN, cursor_factory=RealDictCursor)
cursor = conn.cursor(name="key_server_side_crs") # having a name makes it a SSC
update_cursor = conn.cursor() # regular cursor
cursor.itersize = 5000 # how many records to retrieve at a time
cursor.execute("SELECT id, prefixed_key, simple_key FROM table")
count = 0
batch = []
for row in cursor:
if not row["simple_key"]:
simple_key = calculate_simple_key(row["prefixed_key"])
batch.append[(row["id"], simple_key)]
if len(batch) >= 1000 # how many records to update at once
execute_values(update_cursor, update_query, batch, page_size=1000)
batch = []
time.sleep(0.1) # allow the DB to "breathe"
count += 1
if count % 100000 == 0: # print progress every 100K rows
print("processed %d rows", count)
The above is NOT tested so it's advisable to create a copy of a few million rows of the table and test it against it first.
You can also test various batch size settings (both for retrieve and update).

Rails Update More Effiecient

I need to update all my entries every 5 minutes. I am using Rails version 4.2.5 and Ruby version 2.3.0. My code below has worked fine with a small number of entries. I have about 800 entries now and it is taking up to 2 minutes to update. Is there a more efficient way?
#players = Entry.all
for player in #players
sort = 0
#player_selection = Selection.includes(:golfer).where("entry_id = ?", player.id).order('golfers.score asc').all
for selection in #player_selection
sort += 1
score_sort = Selection.where("id = ?", selection.id).first
score_sort.sort = sort
score_sort.save
player = Entry.where("id = ?", selection.entry_id).first
player.score = Selection.includes(:golfer).where("entry_id = ? and selections.sort < 6", selection.entry_id).sum('golfers.score')
player.save
end
end
Thank you.
Seems to me that you could offload some of your work to when records are created and/or updated (instead of aggregating them on a schedule).
e.g. something like:
# I assume your relationship is something like this:
class Golfer
belongs_to :entry
after_create :update_scores
after_update :update_scores
private
def update_scores
entry.update(score: entry.golphers.sum(:score))
end
end
This will reduce the workload when you run your sort update process.
Then your sort update process could be streamlined:
#entries = Entry.all
#entries.each do |entry|
sort = 0
#selections = Selection.includes(:golfer).where("entry_id = ?", player.id).order('golfers.score asc')
#selections.each do |selection|
sort += 1
selection.update(sort: sort)
end
end
by removing the extraneous data request, this will slightly improve the operation. However, your current operations are going to be linear at-best because you are running it as loop: (O)n + 1 runtime. You would likely have to drop into SQL to have a significantly faster computation:
# -> SQL; depends on DB a bit
UPDATE selections s
s.sort = ns.new_sort
FROM
(SELECT id, ROW_NUMBER() OVER (ORDER BY score ASC) AS new_sort
from selection) as nt
you could run a pure SQL command using Rails, as such:
# Be careful, this is dangerous
sql = "UPDATE ... your sql query here"
ActiveRecord::Base.connection.execute(sql)

Doctrine2.5 getReference and merge update no works

I have a strange problem with Doctrine 2.5 while try update my table UserProfile where my column BusinessActivity is a foreign key.
CASE 1) USING getReference()
update works, but not on column BusinessActivity.
$myid = 6;
$businessActivity = $entityManger->getReference('BusinessActivity', 6);
//$businessActivity proxy object was created correctly with id 6
$userDetails->setBusinessActivity($businessActivity);
$entityManger->merge($userDetails);
//FLUSH AND COMMIT
CASE 2) CREATING OBJECT FROM DB WITH REPOSITORY WORKS
$rep = $entityManager->getRepository('BusinessActivity');
$businessActivity = $rep->findOneBy(array('idActivity' => 6);
$userDetails->setBusinessActivity($businessActivity);
//FLUSH AND COMMIT
Naturally, I already have id and I didn't want to execute a query with findOneBy.
Why does this happen?

Django Advanced DB Query

I'm running a very complicated query against oracle to return some data. I'm running it manually because django can't natively do what I am doing from what I have read.
I run the query but django doesn't get any values in return. I recently switched from my dev database to my production to run the query because it has live data and the required
data doesn't exist in dev yet because its a new feature.
The query executes but doesn't return anything. However there is another very similarly query that I am running that is returning data. Also if I output the exact statement
django is using to make the query against oracle for the data that is not returning and then if i goto terminal and execute the query against the sql server manually it will return the data.
I also output the fetchall to the page as a debuging tool and the working query has all the info but the none working one outputs blank list.
At first I thought it was the query but i have proven the query django is using works but django gets no data. The only logical answer is django is trying to use the dev database or the models table something is off.
However I ran syncdb and sql and no errors came up... If I run the query against dev it doesnt return data, but does against production.
Here is the query I am running.
if 'customer_licenses_all' in request.POST:
return_query_type = 'customer_licenses_all'
cursor.execute("alter session set time_zone='UTC'")
selected_customer_id = int(selected_customer['selected_customer'])
rundate = selected_customer['compute_date']
date2 = datetime.strptime(rundate, "%d-%b-%Y")
date_list = ()
date_list = (date2.day, date2.month, date2.year)
last_date = get_last_day(date_list)
last_date = last_date.strftime("%d-%b-%Y")
sql = (('''select customer_name, count(*) license_count, sum(cpu_ghz_hours) CPU_Core_Hours, sum(ram_gb_hours) RAM_GB_Hours, licensing_authority, product
from customers a, vm_groups b, vms c, vm_license_histories d, licenses e, vm_compute_usage_histories f, license_authorities g
where a.customer_id = b.customer_id
and b.vm_group_id = c.vm_group_id
and c.vm_id = d.vm_id
and d.license_id = e.license_id
and f.vm_id = c.vm_id
and e.license_authority_id = g.license_authority_id
and trunc(f.datetime) = to_date(%s,'DD-MON-YYYY')
and inactive = 'N'
and (deassignment_date is null or trunc(deassignment_date) between to_date(%s,'DD-MON-YYYY') and last_day(to_date(%s,'DD-MON-YYYY')))
and cpu_ghz_hours > 0
and g.license_authority_id not in (28,27,31)
group by customer_name, licensing_authority, product order by 1,5,6''')% ("'"+rundate+"'","'"+ rundate+"'" ,"'"+last_date+"'"))
cursor.execute(sql)
return_query = cursor.fetchall()
context = Context({'customers': customers,
'return_query': return_query,
'return_query_type':return_query_type,
'rundate':rundate,
'last_date':last_date,
'sql':sql,
})
the django output
select customer_name, count(*) license_count, sum(cpu_ghz_hours)
CPU_Core_Hours, sum(ram_gb_hours) RAM_GB_Hours, licensing_authority,
product from customers a, vm_groups b, vms c, vm_license_histories d,
licenses e, vm_compute_usage_histories f, license_authorities g where
a.customer_id = b.customer_id and b.vm_group_id = c.vm_group_id and
c.vm_id = d.vm_id and d.license_id = e.license_id and f.vm_id =
c.vm_id and e.license_authority_id = g.license_authority_id and
trunc(f.datetime) = to_date('01-Jun-2014','DD-MON-YYYY') and inactive
= 'N' and (deassignment_date is null or trunc(deassignment_date) between to_date('01-Jun-2014','DD-MON-YYYY') and
last_day(to_date('30-Jun-2014','DD-MON-YYYY'))) and cpu_ghz_hours > 0
and g.license_authority_id not in (28,27,31) group by customer_name,
licensing_authority, product order by 1,5,6
[]