I need to update all my entries every 5 minutes. I am using Rails version 4.2.5 and Ruby version 2.3.0. My code below has worked fine with a small number of entries. I have about 800 entries now and it is taking up to 2 minutes to update. Is there a more efficient way?
#players = Entry.all
for player in #players
sort = 0
#player_selection = Selection.includes(:golfer).where("entry_id = ?", player.id).order('golfers.score asc').all
for selection in #player_selection
sort += 1
score_sort = Selection.where("id = ?", selection.id).first
score_sort.sort = sort
score_sort.save
player = Entry.where("id = ?", selection.entry_id).first
player.score = Selection.includes(:golfer).where("entry_id = ? and selections.sort < 6", selection.entry_id).sum('golfers.score')
player.save
end
end
Thank you.
Seems to me that you could offload some of your work to when records are created and/or updated (instead of aggregating them on a schedule).
e.g. something like:
# I assume your relationship is something like this:
class Golfer
belongs_to :entry
after_create :update_scores
after_update :update_scores
private
def update_scores
entry.update(score: entry.golphers.sum(:score))
end
end
This will reduce the workload when you run your sort update process.
Then your sort update process could be streamlined:
#entries = Entry.all
#entries.each do |entry|
sort = 0
#selections = Selection.includes(:golfer).where("entry_id = ?", player.id).order('golfers.score asc')
#selections.each do |selection|
sort += 1
selection.update(sort: sort)
end
end
by removing the extraneous data request, this will slightly improve the operation. However, your current operations are going to be linear at-best because you are running it as loop: (O)n + 1 runtime. You would likely have to drop into SQL to have a significantly faster computation:
# -> SQL; depends on DB a bit
UPDATE selections s
s.sort = ns.new_sort
FROM
(SELECT id, ROW_NUMBER() OVER (ORDER BY score ASC) AS new_sort
from selection) as nt
you could run a pure SQL command using Rails, as such:
# Be careful, this is dangerous
sql = "UPDATE ... your sql query here"
ActiveRecord::Base.connection.execute(sql)
Related
I have a table in Postgres Thing that has 100 Million rows.
I have a column that was populated over time that stores some keys. The keys were prefixed before storing. Let's call it prefixed_keys.
My task is to use the values of this column to populate another column with the same values but with the prefixes trimmed off. Let's call it simple_keys.
I tried the following migration:
from django.db import migrations
import time
def backfill_simple_keys(apps, schema_editor):
Thing = apps.get_model('thing', 'Thing')
batch_size = 100000
number_of_batches_completed = 0
while Thing.objects.filter(simple_key__isnull=True).exists():
things = Thing.objects.filter(simple_key__isnull=True)[:batch_size]
for tng in things:
prefixed_key = tng.prefixed_key
if prefixed_key.startswith("prefix_A"):
simple_key = prefixed_key[len("prefix_A"):]
elif prefixed_key.startswith("prefix_BBB"):
simple_key = prefixed_key[len("prefix_BBB"):]
tng.simple_key = simple_key
Thing.objects.bulk_update(
things,
['simple_key'],
batch_size=batch_size
)
number_of_batches_completed += 1
print("Number of batches updated: ", number_of_batches_completed)
sleep_seconds = 3
time.sleep(sleep_seconds)
class Migration(migrations.Migration):
dependencies = [
('thing', '0030_add_index_to_simple_key'),
]
operations = [
migrations.RunPython(
backfill_simple_keys,
),
]
Each batch took about ~7 minutes to complete. Which would means it would take days to complete!
It also increased the latency of the DB which is bing used in production.
Since you're going to go through every record in that table anyway it makes sense to traverse it in one go using a server-side cursor.
Calling
Thing.objects.filter(simple_key__isnull=True)[:batch_size]
is going to be expensive especially as the index starts to grow.
Also the call above retrieves ALL fields from that table even if you are only going to use only 2-3 fields.
update_query = """UPDATE table SET simple_key = data.key
FROM (VALUES %s) AS data (id, key) WHERE table.id = data.id"""
conn = psycopg2.connect(DSN, cursor_factory=RealDictCursor)
cursor = conn.cursor(name="key_server_side_crs") # having a name makes it a SSC
update_cursor = conn.cursor() # regular cursor
cursor.itersize = 5000 # how many records to retrieve at a time
cursor.execute("SELECT id, prefixed_key, simple_key FROM table")
count = 0
batch = []
for row in cursor:
if not row["simple_key"]:
simple_key = calculate_simple_key(row["prefixed_key"])
batch.append[(row["id"], simple_key)]
if len(batch) >= 1000 # how many records to update at once
execute_values(update_cursor, update_query, batch, page_size=1000)
batch = []
time.sleep(0.1) # allow the DB to "breathe"
count += 1
if count % 100000 == 0: # print progress every 100K rows
print("processed %d rows", count)
The above is NOT tested so it's advisable to create a copy of a few million rows of the table and test it against it first.
You can also test various batch size settings (both for retrieve and update).
I have a sql file where I write sentences for run in release, this file contains sentences like:
-- =======================2019-02-01=======================
UPDATE rating set stars = 3 where id = 6;
UPDATE users SET status = 'A' where last_login >= '2019-01-01';
INSERT INTO....
-- =======================2019-02-15=======================
UPDATE rating set stars = 3 where id = 6;
UPDATE users SET status = 'A' where last_login >= '2019-01-01';
INSERT INTO....
I run specifics sentences in each release date, but I believe that is bad practice and its no escalable method.
I'm trying change this method to Knex seeds or migrations. what would be the best practice to do it?
Seeds have a problem because knex executes the seeds every time I write the command knex seed:run, and it show some errors.
Knex stores the filenames and signatures of what it has executed so that it does not need to run them again.
https://knexjs.org/#Installation-migrations
Programmatically you can execute migrations like this:
knex({..config..}).migrate.latest({
directory: 'migrations', // where the files are stored
tableName: 'knex_migrations' // where knex saves its records
});
Example migration file
exports.up = function(knex) {
return knex.raw(`
UPDATE rating set stars = 3 where id = 6;
UPDATE users SET status = 'A' where last_login >= '2019-01-01';
INSERT INTO....
`)
};
The files will be executed alphabetically/sorted, and will not be re-executed against the same database.
I am using symfony 2.8.39 and Doctrine 2.4.8 and have problems with paged results. Underlying is an Mysql5.7 server.
The documentation on doctrine paging says:
Paginating Doctrine queries is not as simple as you might think in the
beginning. If you have complex fetch-join scenarios with one-to-many
or many-to-many associations using the "default" LIMIT functionality
of database vendors is not sufficient to get the correct results.
https://www.doctrine-project.org/projects/doctrine-orm/en/latest/tutorials/pagination.html
This is exactly the situation I have. My statement in SQL translation looks like this:
SELECT sc.id, sc.name, scc.prio, sd.description
FROM sang_contents sc
JOIN sang_categories_contents scc
JOIN sang_descriptions sd
JOIN sang_languages sl
WHERE
sc.id = scc.content_id AND
scc.category_id = 20 AND
scc.is_enabled = 1 AND
sc.id = sd.content_id AND
sd.language_id = sl.id AND
sd.description != "" AND
sl.name = "DE"
ORDER BY scc.prio ASC, sc.id DESC
As ORM is at Version 3.0 and this problem exists since the beginning I don't think it will be fixed anytime by ORM.
So what to do to achieve proper results for paging?
My idea to solve this is so far to paginate over simplified data the paging should be able to handle correctly:
create a table containing the result for all categories and languages and access it with an extra entity.
The disadvantage is, that I would have to update this table every time a change is done in the for connected tables.
Would you suggest another solution to this problem?
I guess 3rd party software like
https://github.com/KnpLabs/KnpPaginatorBundle/releases
or
https://github.com/whiteoctober/WhiteOctoberPagerfantaBundle/releases
are just sitting on top of the ORM pagination and would not fix the underlying problem.
Correct?
This is my code at the moment:
$page = max(0, $request->query->getInt('page', 0));
$pageRequest = new PageRequest($itemsPerPage, $page);
$query = $this->em->createQuery(
'SELECT sc, sd
FROM NamiApiCoreBundle:Content sc
JOIN sc.categoryContents scc
JOIN sc.descriptions sd
JOIN sd.language sl
WHERE
sc.id = scc.content AND
scc.category = :id AND
scc.enabled = 1 AND
sc.id = sd.content AND
sd.language = sl.id AND
sd.description != \'\' AND
sl.iso = :lang
ORDER BY scc.priority ASC, sc.id DESC'
)
->setFirstResult($pageRequest->getOffset())
->setParameter('lang', $lang)
->setParameter('id', $categoryId)
->useResultCache(true, $this->cache_lifetime);
if ($itemsPerPage > 0) {
$query->setMaxResults($pageRequest->getSize());
}
$paginator = new Paginator($query);
This question already has answers here:
How do I make an UPDATE while joining tables on SQLite?
(2 answers)
Closed 6 years ago.
My select statement finds the records I want to update- Now I want to invert (multiply x -1) the adjusted_sentiment score for only these records. Here is the select statement:
Select players.name, fbposts.company, fbposts.post_id, reactions.reaction,
fbposts_comments.adjusted_sentiment, fbposts_comments.message, fbposts.message from fbposts
join reactions on reactions.post_id = fbposts.post_id
join players on players.id = reactions.id
join fbposts_comments on fbposts_comments.post_id = fbposts.post_id
where adjusted_sentiment > 0 and reactions.reaction like "ANGRY" and
reactions.id = fbposts_comments.comment_id group by fbposts.post_id
This returns records like:
Baktiyar Romanov,WorldOfWanderlust,387990201279405_994067924004960,ANGRY,0.5965,probably fed very ill-mannered rich man
Sheridan Toms,australiapost,96085205666_10153485650690667,ANGRY,0.04676666666666666,Seriously? You can't even get an express post parcel from victoria to wa in under 2 weeks!!!! Super annoyed
Robert Smissen,australiapost,96085205666_10153487649895667,ANGRY,0.8555,Looks like Australia Post is using Diggers' letters to gain some reflected glory
Eve Ismaiel,australiapost,96085205666_10153500759100667,ANGRY,0.1133333333333333,"Ha ha... Present $20, postage $30!!!"
What I want to do is invert the adjusted_sentiment score. For example in the first record, the adjusted_sentiment score is 0.5965 I want to update that to -0.5965
BTW my queries and updates will be done via Python2.7... One thought I working on now is to create a list from the query above then use that list to create a series of update statements.
Below query will give you the expected output
update fbposts_comments set adjusted_sentiment = -adjusted_sentiment where post_id in (Select fbposts_comments.post_id from fbposts
join reactions on reactions.post_id = fbposts.post_id
join players on players.id = reactions.id
join fbposts_comments on fbposts_comments.post_id = fbposts.post_id
where adjusted_sentiment > 0 and reactions.reaction like "ANGRY" and
reactions.id = fbposts_comments.comment_id group by fbposts.post_id)
I'm converting one of my apps from PHP to rails 4 and I'm stuck on making my index view default to a constraint on the date and also accept a start and end date from the params for the constrain.
The query in my PHP reads like so:
$query = " SELECT * FROM EVENTS WHERE EVENT_DATE >= '$start' AND EVENT_DATE <= '$end' ORDER BY EVENT_DATE ASC "
So that is probably similar to what active record needs to give me in the end.
Event.where('event_date >= ? and event_date <= ?', start_date, end_date).order('event_date ASC')
where start_date and end_date - are your dates, and Event - is your event model
You can put scope in your model, most probably in event.rb .
scope :between_dates , -> (start_date,end_date) { where("'EVENT_DATE' >= ? AND 'EVENT_DATE' <= ?", start_date,end_date).order("'EVENT_DATE' ASC") }
And then you can call something like this in your index controller
def index
#events = Event.between_dates(1.day.ago, 2.day.ago)
end
PS: the above method will generate the desire query.