I'm using Symfony 3.4 and Doctrine.
I need to update large amount of entities (300k+) using Doctrine.
I've read batch article form Doctrine docs and I've read topics from stack, but problem is despite size of the batch (20, 100, 200, 500) I'm getting 'out of memory' error anyway when I'm approaching 20k proccessed entities.
Here is my function.
Can someone, please, give me a hint/suggestion how to avoid this?
protected function execute(InputInterface $input, OutputInterface $output): void
{
$io = new SymfonyStyle($input, $output);
$em = $this->getContainer()->get('doctrine.orm.entity_manager');
$em->getConfiguration()->setSQLLogger(null);
$repository = $em->getRepository('AppBundle:Order');
$qb = $repository->createQueryBuilder('o');
$totalCount = (int) $qb->select($qb->expr()->count('o'))
->where($qb->expr()->eq('o.amountOut', 0))
->getQuery()
->getSingleScalarResult();
$progressBar = $io->createProgressBar($totalCount);
$query = $qb->select('o')
->where($qb->expr()->eq('o.amountOut', 0))
->getQuery();
$iterableResult = $query->iterate();
$batchSize = 100;
$i = 0;
foreach ($iterableResult as $row) {
/** #var Order $order */
$order = $row[0];
$commissionsArr = $this->calcCommissionInOutFromOrder($order);
$amountOut = $order->getTransferAmount();
$order->setAmountOut($amountOut);
$order->setCommissionIn($commissionsArr['commission_in']);
$order->setCommissionOut($commissionsArr['commission_out']);
$em->persist($order);
$progressBar->advance();
if (0 === ($i % $batchSize)) {
$em->flush();
$em->clear();
}
++$i;
}
$em->flush();
$io->success('Suckess');
}
Found actual answer in Memory leak when executing Doctrine query in loop.
Quoting: "I resolved this by adding --no-debug to my command. It turns out that in debug mode, the profiler was storing information about every single query in memory."
It actually worked. Using memory_get_usage() I've checked it.
Related
I am refactoring a code base, because this is a legacy code with a lot of raw sql, and the whole thing is spaghetti code.
I am sadly facing that doctrine has no REPLACE INTO functionality, I know the reasons why.
I found some workaround, like merge but that is deprecated.
I spend a lot of hours to learn doctrine, because it is widely used ORM, and a lot of hours while I built the entities.
Is there any "legal" solution to achieve this REPLACE INTO?
You can handle duplicates by catching UniqueConstraintViolationException.
$entity = new PossibleDuplicatedEntity();
try {
$em->persist($entity);
$em->flush();
}
catch (\Doctrine\DBAL\Exception\UniqueConstraintViolationException $e) {
// handle duplicated values
}
But beware – Doctrine uses implicit transaction. When exception is thrown, transaction is rolled-back and EntityManager is closed (entities are detached). See documentation.
Better would be handling cause of duplication occurring. I.e. if it's because of concurrency, try to use table locking etc.
If you just want to 'upsert' rows into the database and you don't care about change tracking you could use the piece of code below. It's probably more performant than relying on UniqueConstraintViolationException's.
I use it for batch-uploading rows into a table.
/**
* #throws Exception
*/
public static function replaceInto(EntityManagerInterface $em, iterable $entities): void
{
$i = 0;
$values = [];
$metadata = $fieldNames = $sql = null;
foreach ($entities as $entity) {
if (!isset($metadata)) {
$metadata = $em->getClassMetadata(get_class($entity));
$fieldNames = $metadata->getFieldNames();
$tableName = $metadata->getTableName();
$sql = "REPLACE INTO `$tableName` VALUES ";
}
$params = [];
foreach ($fieldNames as $fieldName) {
$paramName = ':' . $metadata->getColumnName($fieldName) . '_' . $i;
$params[] = $paramName;
$values[] = [$paramName, $metadata->getFieldValue($entity, $fieldName), $metadata->getTypeOfField($fieldName)];
}
if ($i > 0) {
$sql .= ",";
}
$sql .= "\n(" . join(', ', $params) . ")";
$i++;
}
$stmt = $em->getConnection()->prepare($sql);
foreach ($values as $value) {
$stmt->bindValue(...$value);
}
$stmt->executeQuery();
}
I currently have a fairly complex native SQL query which is used for reporting purposes. Given the amount of data it processes this is the only efficient way to handle it is with native SQL.
This works fine and returns an array of arrays from the scalar results.
What I'd like to do, to keep the results consistent with every other result set in the project is use a Data Transfer Object (DTO). Returning an array of simple DTO objects.
These work really well with DQL but I can't see anyway of using them with native SQL. Is this at all possible?
Doctrine can map the results of a raw SQL query to an entity, as shown here:
http://doctrine-orm.readthedocs.org/projects/doctrine-orm/en/latest/reference/native-sql.html
I cannot see support for DTOs unless you are willing to use DQL as well, so a direct solution does not exist. I tried my hand at a simple workaround that works well enough, so here are the DQL and non-DQL ways to achieve your goal.
The examples were built using Laravel and the Laravel Doctrine extension.
The DTO
The below DTO supports both DQL binding and custom mapping so the constructor must be able to work with and without parameters.
<?php namespace App\Dto;
/**
* Date with corresponding statistics for the date.
*/
class DateTotal
{
public $taskLogDate;
public $totalHours;
/**
* DateTotal constructor.
*
* #param $taskLogDate The date for which to return totals
* #param $totalHours The total hours worked on the given date
*/
public function __construct($taskLogDate = null, $totalHours = null)
{
$this->taskLogDate = $taskLogDate;
$this->totalHours = $totalHours;
}
}
Using DQL to fetch results
Here is the standard version, using DQL.
public function findRecentDateTotals($taskId)
{
$fromDate = new DateTime('6 days ago');
$fromDate->setTime(0, 0, 0);
$queryBuilder = $this->getQueryBuilder();
$queryBuilder->select('NEW App\Dto\DateTotal(taskLog.taskLogDate, SUM(taskLog.taskLogHours))')
->from('App\Entities\TaskLog', 'taskLog')
->where($queryBuilder->expr()->orX(
$queryBuilder->expr()->eq('taskLog.taskLogTask', ':taskId'),
$queryBuilder->expr()->eq(0, ':taskId')
))
->andWhere(
$queryBuilder->expr()->gt('taskLog.taskLogDate', ':fromDate')
)
->groupBy('taskLog.taskLogDate')
->orderBy('taskLog.taskLogDate', 'DESC')
->setParameter(':fromDate', $fromDate)
->setParameter(':taskId', $taskId);
$result = $queryBuilder->getQuery()->getResult();
return $result;
}
Support for DTOs with native SQL
Here is a simple helper that can marshal the array results of a raw SQL query into objects. It can be extended to do other stuff as well, perhaps custom updates and so on.
<?php namespace App\Dto;
use Doctrine\ORM\EntityManager;
/**
* Helper class to run raw SQL.
*
* #package App\Dto
*/
class RawSql
{
/**
* Run a raw SQL query.
*
* #param string $sql The raw SQL
* #param array $parameters Array of parameter names mapped to values
* #param string $className The class to pack the results into
* #return Object[] Array of objects mapped from the array results
* #throws \Doctrine\DBAL\DBALException
*/
public static function query($sql, $parameters, $className)
{
/** #var EntityManager $em */
$em = app('em');
$statement = $em->getConnection()->prepare($sql);
$statement->execute($parameters);
$results = $statement->fetchAll();
$return = array();
foreach ($results as $result) {
$resultObject = new $className();
foreach ($result as $key => $value) {
$resultObject->$key = $value;
}
$return[] = $resultObject;
}
return $return;
}
}
Running the raw SQL version
The function is used and called in the same way as other repository methods, and just calls on the above helper to automate the conversion of data to objects.
public function findRecentDateTotals2($taskId)
{
$fromDate = new DateTime('6 days ago');
$sql = "
SELECT
task_log.task_log_date AS taskLogDate,
SUM(task_log.task_log_hours) AS totalHours
FROM task_log task_log
WHERE (task_log.task_log_task = :taskId OR :taskId = 0) AND task_log.task_log_date > :fromDate
GROUP BY task_log_date
ORDER BY task_log_date DESC
";
$return = RawSql::query(
$sql,
array(
'taskId' => $taskId,
'fromDate' => $fromDate->format('Y-m-d')
),
DateTotal::class
);
return $return;
}
Notes
I would not dismiss DQL too quickly as it can perform most kinds of SQL. I have however also recently been involved in building management reports, and in the world of management information the SQL queries can be as large as whole PHP files. In that case I would join you and abandon Doctrine (or any other ORM) as well.
Summary: which is quicker: updating / flushing a list of entities, or running a query builder update on each?
We have the following situation in Doctrine ORM (version 2.3).
We have a table that looks like this
cow
wolf
elephant
koala
and we would like to use this table to sort a report of a fictional farm. The problem is that the user wishes to have a customer ordering of the animals (e.g. Koala, Elephant, Wolf, Cow). Now there exist possibilities using CONCAT, or CASE to add a weight to the DQL (example 0002wolf, 0001elephant). In my experience this is either tricky to build and when I got it working the result set was an array and not a collection.
So, to solve this we added a "weight" field to each record and, before running the select, we assign each one with a weight:
$animals = $em->getRepository('AcmeDemoBundle:Animal')->findAll();
foreach ($animals as $animal) {
if ($animal->getName() == 'koala') {
$animal->setWeight(1);
} else if ($animal->getName() == 'elephant') {
$animal->setWeight(2);
}
// etc
$em->persist($animal);
}
$em->flush();
$query = $em->createQuery(
'SELECT c FROM AcmeDemoBundle:Animal c ORDER BY c.weight'
);
This works perfectly. To avoid race conditions we added this inside a transaction block:
$em->getConnection()->beginTransaction();
// code from above
$em->getConnection()->rollback();
This is a lot more robust as it handles multiple users generating the same report. Alternatively the entities can be weighted like this:
$em->getConnection()->beginTransaction();
$qb = $em->createQueryBuilder();
$q = $qb->update('AcmeDemoBundle:Animal', 'c')
->set('c.weight', $qb->expr()->literal(1))
->where('c.name = ?1')
->setParameter(1, 'koala')
->getQuery();
$p = $q->execute();
$qb = $em->createQueryBuilder();
$q = $qb->update('AcmeDemoBundle:Animal', 'c')
->set('c.weight', $qb->expr()->literal(2))
->where('c.name = ?1')
->setParameter(1, 'elephant')
->getQuery();
$p = $q->execute();
// etc
$query = $em->createQuery(
'SELECT c FROM AcmeDemoBundle:Animal c ORDER BY c.weight'
);
$em->getConnection()->rollback();
Questions:
1) which of the two examples would have better performance?
2) Is there a third or better way to do this bearing in mind we need a collection as a result?
Please remember that this is just an example - sorting the result set in memory is not an option, it must be done on the database level - the real statement is a 10 table join with 5 orderbys.
Initially you could make use of a Doctrine implementation named Logging (\Doctrine\DBAL\LoggingProfiler). I know that it is not the better answer, but at least you can implement it in order to get best result for each example that you have.
namespace Doctrine\DBAL\Logging;
class Profiler implements SQLLogger
{
public $start = null;
public function __construct()
{
}
/**
* {#inheritdoc}
*/
public function startQuery($sql, array $params = null, array $types = null)
{
$this->start = microtime(true);
}
/**
* {#inheritdoc}
*/
public function stopQuery()
{
echo "execution time: " . microtime(true) - $this->start;
}
}
In you main Doctrine configuration you can enable as:
$logger = new \Doctrine\DBAL\Logging\Profiler;
$config->setSQLLogger($logger);
I am using Docrine 1.2 with Zend Framework and trying to save a Doctrine Collection.
I am retrieving my collection from my table class with the following code.
public function getAll()
{
return $this->createQuery('e')
->orderBy('e.order ASC, e.eventType ASC')
->execute();
}
I also have the following class to reorder the above event records.
class Admin_Model_Event_Sort extends Model_Abstract
{
/**
* Events collection
* #var Doctrine_Collection
*/
protected $_collection = null;
public function __construct()
{
$this->_collection = Model_Doctrine_EventTypesTable::getInstance()->getAll();
}
public function save($eventIds)
{
if ($this->_collection instanceof Doctrine_Collection) {
foreach ($this->_collection as $record)
{
$key = array_search($record->eventTypeId, $eventIds);
if ($key !== false) {
$record->order = (string)$key;
}
}
return $this->_saveCollection($this->_collection);
} else {
return false;
}
}
}
The _saveCollection method above is as follows
/**
* Attempts to save a Doctrine Collection
* Sets the error message property on error
* #param Doctrine_Collection $collection
* #return boolean
*/
protected function _saveCollection(Doctrine_Collection $collection)
{
try {
$collection->save();
return true;
} catch (Exception $e) {
$this->_errorMessage = $e->getMessage();
OpenMeetings_Logger_ErrorLogger::write('Unable to save Doctrine Collection');
OpenMeetings_Logger_ErrorLogger::vardump($this->_errorMessage);
return false;
}
}
The event id's in the above save method is simply an enumerated array of event id's, I am using the keys of the array to set the sort order of the events using the order field. If I do a var_dump of the collection to an array ($this->_collection->toArray()) I get the correct data. However when I attempt to save the collection I get the following error.
"SQLSTATE[42000]: Syntax error or access violation: 1064 You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'order = '0' WHERE eventtypeid = '3'' at line 1"
Is there anyway I can get Doctrine to expand on this error, the full SQL statement would be a start, also if anyone knows as to why this error is occuring then that would be very helpful.
Many thanks in advance
Garry
EDIT
I have modified my above code to try to work one record at a time but I still get the same problem.
public function save($eventIds)
{
foreach ($eventIds as $key => $eventId) {
$event = Model_Doctrine_EventTypesTable::getInstance()->getOne($eventId);
$event->order = (string)$key;
$event->save();
}
}
Ok I have found the problem. I was using the MYSQL reserved word order as a field name thus the error, changed it to sortOrder and the problem went away.
Hope this helps someone with a similar issue.
Garry
$q = $this->_em->createQuery("SELECT s FROM app\models\Quest s
LEFT JOIN s.que c
WHERE s.type = '$sub'
AND c.id = '$id'");
Given a query like the one above, how would I retrieve the number of results?
Alternatively one can look at what Doctrine Paginator class does to a Query object to get a count (this aproach is most probably an overkill though, but it answers your question):
public function count()
{
if ($this->count === null) {
/* #var $countQuery Query */
$countQuery = $this->cloneQuery($this->query);
if ( ! $countQuery->getHint(CountWalker::HINT_DISTINCT)) {
$countQuery->setHint(CountWalker::HINT_DISTINCT, true);
}
if ($this->useOutputWalker($countQuery)) {
$platform = $countQuery->getEntityManager()->getConnection()->getDatabasePlatform(); // law of demeter win
$rsm = new ResultSetMapping();
$rsm->addScalarResult($platform->getSQLResultCasing('dctrn_count'), 'count');
$countQuery->setHint(Query::HINT_CUSTOM_OUTPUT_WALKER, 'Doctrine\ORM\Tools\Pagination\CountOutputWalker');
$countQuery->setResultSetMapping($rsm);
} else {
$countQuery->setHint(Query::HINT_CUSTOM_TREE_WALKERS, array('Doctrine\ORM\Tools\Pagination\CountWalker'));
}
$countQuery->setFirstResult(null)->setMaxResults(null);
try {
$data = $countQuery->getScalarResult();
$data = array_map('current', $data);
$this->count = array_sum($data);
} catch(NoResultException $e) {
$this->count = 0;
}
}
return $this->count;
}
You can either perform a count query beforehand:
$count = $em->createQuery('SELECT count(s) FROM app\models\Quest s
LEFT JOIN s.que c
WHERE s.type=:type
AND c.id=:id)
->setParameter('type', $sub);
->setParameter('id', $id);
->getSingleScalarResult();
Or you can just execute your query and get the size of the results array:
$quests = $q->getResult();
$count = count($quests);
Use the first method if you need the count so that you can make a decision before actually retrieving the objects.