I am running a console command that fetches all entities of one type (and reads some data from them).
$q = $this->em->createQuery(/** #lang DQL */'select u from App\Entity\DataSample u order by u.creationDate ASC');
Both iteration strategies:
foreach ($q->getResult() as $d) {
}
and
$iterableResult = $q->iterate();
foreach ($iterableResult as $onerow) {
/* #var $d DataSample */
$d = $onerow[0];
}
lead to a segmentation fault!
Note, I am actually doing nothing inside the loop!
The second loop does run for some few tenthousand iterations and segfaults then, the first one segfaults during getResult().
There is also enough memory available, the program breaks at roughly 200 MB memory usage.
My xdebug trace is not really helpful to me, it ends as follows:
240.2365 489614864 -> str_pad() /var/www/rrr/vendor/ramsey/uuid/src/Codec/StringCodec.php:167
240.2365 489614904 -> Ramsey\Uuid\Builder\DefaultUuidBuilder->build() /var/www/rrr/vendor/ramsey/uuid/src/Codec/StringCodec.php:84
240.2365 489615000 -> Ramsey\Uuid\Uuid->__construct() /var/www/rrr/vendor/ramsey/uuid/src/Builder/DefaultUuidBuilder.php:52
240.2365 489614448 -> Doctrine\ORM\Internal\Hydration\ObjectHydrator->hydrateColumnInfo() /var/www/rrr/vendor/doctrine/orm/lib/Doctrine/ORM/Internal/Hydration/AbstractHydrator.php:270
240.2365 489614448 -> Ramsey\Uuid\Doctrine\UuidType->convertToPHPValue() /var/www/rrr/vendor/doctrine/orm/lib/Doctrine/ORM/Internal/Hydration/AbstractHydrator.php:315
I checked all (ramsey/uuid) ids in the corresponding table, and all return true on Uuid::isValid()
Fun fact: When outputting all ids, I found out that it always happens at the same DataSample!?
Ok, I found a "solution":
I inserted a call to $em->clear() as well as a gc_collect_cycles every 100 iterations, now it terminates (and is in fact way faster!).
$i = 0;
$iterableResult = $q->iterate();
foreach ($iterableResult as $onerow) {
/* #var $d DataSample */
$d = $onerow[0];
if($i % 100 === 0){
$this->em->clear();
gc_collect_cycles();
}
$i++;
}
But obviously, this only works for the iterableResult, using getResult should theoretically work as well??
Related
I am refactoring a code base, because this is a legacy code with a lot of raw sql, and the whole thing is spaghetti code.
I am sadly facing that doctrine has no REPLACE INTO functionality, I know the reasons why.
I found some workaround, like merge but that is deprecated.
I spend a lot of hours to learn doctrine, because it is widely used ORM, and a lot of hours while I built the entities.
Is there any "legal" solution to achieve this REPLACE INTO?
You can handle duplicates by catching UniqueConstraintViolationException.
$entity = new PossibleDuplicatedEntity();
try {
$em->persist($entity);
$em->flush();
}
catch (\Doctrine\DBAL\Exception\UniqueConstraintViolationException $e) {
// handle duplicated values
}
But beware – Doctrine uses implicit transaction. When exception is thrown, transaction is rolled-back and EntityManager is closed (entities are detached). See documentation.
Better would be handling cause of duplication occurring. I.e. if it's because of concurrency, try to use table locking etc.
If you just want to 'upsert' rows into the database and you don't care about change tracking you could use the piece of code below. It's probably more performant than relying on UniqueConstraintViolationException's.
I use it for batch-uploading rows into a table.
/**
* #throws Exception
*/
public static function replaceInto(EntityManagerInterface $em, iterable $entities): void
{
$i = 0;
$values = [];
$metadata = $fieldNames = $sql = null;
foreach ($entities as $entity) {
if (!isset($metadata)) {
$metadata = $em->getClassMetadata(get_class($entity));
$fieldNames = $metadata->getFieldNames();
$tableName = $metadata->getTableName();
$sql = "REPLACE INTO `$tableName` VALUES ";
}
$params = [];
foreach ($fieldNames as $fieldName) {
$paramName = ':' . $metadata->getColumnName($fieldName) . '_' . $i;
$params[] = $paramName;
$values[] = [$paramName, $metadata->getFieldValue($entity, $fieldName), $metadata->getTypeOfField($fieldName)];
}
if ($i > 0) {
$sql .= ",";
}
$sql .= "\n(" . join(', ', $params) . ")";
$i++;
}
$stmt = $em->getConnection()->prepare($sql);
foreach ($values as $value) {
$stmt->bindValue(...$value);
}
$stmt->executeQuery();
}
This is a simplified version of a program which has two channels to carry out some operation
use v6;
my $length = 512;
my Channel $channel-one .= new;
my Channel $to-mix .= new;
my Channel $mixer = $to-mix.Supply.batch( elems => 2).Channel;
my #promises;
for ^4 {
$channel-one.send( 1.rand xx $length );
my $promise = start react whenever $channel-one -> #crew {
my #new-crew = #crew;
#new-crew[#new-crew.elems.rand] = 1;
if ( sum(#new-crew) < $length ) {
say "Emitting in thread ", $*THREAD.id, " Best ", sum(#crew) ;
$to-mix.send( #new-crew );
} else {
say "Found: closing";
$channel-one.close;
say "Closed";
};
}
#promises.push: $promise;
}
my $pairs = start react whenever $mixer -> #pair {
$to-mix.send( #pair.pick ); # To avoid getting it hanged up
my #new-population = crossover-frequencies( #pair[0], #pair[1] );
$channel-one.send( #new-population);
say "Mixing in ", $*THREAD.id;
};
await #promises;
say "Finished";
# Cross over frequencies
sub crossover-frequencies( #frequencies, #frequencies-prime --> Array ) is export {
my #pairs = #frequencies Z #frequencies-prime;
my #new-population = gather {
for #pairs -> #pair {
take #pair.pick;
}
};
return #new-population;
}
It uses one channel for performing some operation (here simplified to setting a random element to one) and another for mixing elements taken in pairs. It works and finishes after a while, but for indicated sizes it starts to grow in memory usage until it reaches almost 1GB before ending.
It might be to-mix channel which is growing, but I don't see it as the source of the leak, since it's getting one element from one block, and one from the other; there might be a few left on the channel before finishing, but not so many as to justify the memory hogging. Any other idea?
An additional problem is that it seems to be always using the same thread for "processing", despite the fact that 4 different threads have been started. I don't know if this is related or not.
I'm using Symfony 3.4 and Doctrine.
I need to update large amount of entities (300k+) using Doctrine.
I've read batch article form Doctrine docs and I've read topics from stack, but problem is despite size of the batch (20, 100, 200, 500) I'm getting 'out of memory' error anyway when I'm approaching 20k proccessed entities.
Here is my function.
Can someone, please, give me a hint/suggestion how to avoid this?
protected function execute(InputInterface $input, OutputInterface $output): void
{
$io = new SymfonyStyle($input, $output);
$em = $this->getContainer()->get('doctrine.orm.entity_manager');
$em->getConfiguration()->setSQLLogger(null);
$repository = $em->getRepository('AppBundle:Order');
$qb = $repository->createQueryBuilder('o');
$totalCount = (int) $qb->select($qb->expr()->count('o'))
->where($qb->expr()->eq('o.amountOut', 0))
->getQuery()
->getSingleScalarResult();
$progressBar = $io->createProgressBar($totalCount);
$query = $qb->select('o')
->where($qb->expr()->eq('o.amountOut', 0))
->getQuery();
$iterableResult = $query->iterate();
$batchSize = 100;
$i = 0;
foreach ($iterableResult as $row) {
/** #var Order $order */
$order = $row[0];
$commissionsArr = $this->calcCommissionInOutFromOrder($order);
$amountOut = $order->getTransferAmount();
$order->setAmountOut($amountOut);
$order->setCommissionIn($commissionsArr['commission_in']);
$order->setCommissionOut($commissionsArr['commission_out']);
$em->persist($order);
$progressBar->advance();
if (0 === ($i % $batchSize)) {
$em->flush();
$em->clear();
}
++$i;
}
$em->flush();
$io->success('Suckess');
}
Found actual answer in Memory leak when executing Doctrine query in loop.
Quoting: "I resolved this by adding --no-debug to my command. It turns out that in debug mode, the profiler was storing information about every single query in memory."
It actually worked. Using memory_get_usage() I've checked it.
Got myself into trouble today trying to create a stored procedure from ax.
Here is a simple example:
static void testProcedureCreation(Args _args)
{
MyParamsTable myParams;
SqlStatementExecutePermission perm;
str sqlStatement;
LogInProperty Lp = new LogInProperty();
OdbcConnection myConnection;
Statement myStatement;
ResultSet myResult;
str temp;
;
select myParams;
LP.setServer(myParams.Server);
LP.setDatabase(myParams.Database);
//Lp.setUsername("sa");
//Lp.setPassword("sa");
sqlStatement = #"create procedure testproc
as begin
print 'a'
end";
//sqlStatement = strFmt(sqlStatement, myStr);
info(sqlStatement);
perm = new SqlStatementExecutePermission(sqlStatement);
perm.assert();
try
{
myConnection = new OdbcConnection(LP);
}
catch
{
info("Check username/password.");
return;
}
myStatement = myConnection.createStatement();
myResult = myStatement.executeQuery(sqlStatement);
while (myResult.next())
{
temp = myResult.getString(1);
info(temp);
if (strScan(temp, 'Error', 1, strLen(temp)) > 0)
throw error(temp);
}
myStatement.close();
CodeAccessPermission::revertAssert();
}
To be honest, in my real example I am using BCP and some string concat with a lot of | ' and "".
Anyway, here is what I got:
For a couple of hours I kept changing and retrying a lot of things and, a good thought came into my mind.
"Let's try with a much easier example and check the results!"
OK, no luck, the results were the same, as you can see in the pic above.
But for no reason, I tried to :
exec testproc
in my ssms instance and to my surprise, it worked. My small procedure was there.
It would be so nice if someone could explain this behavior and maybe what should be the correct approach.
This Q/A should provide an answer.
How to get the results of a direct SQL call to a stored procedure?
executeQuery vs executeUpdate
Summary: which is quicker: updating / flushing a list of entities, or running a query builder update on each?
We have the following situation in Doctrine ORM (version 2.3).
We have a table that looks like this
cow
wolf
elephant
koala
and we would like to use this table to sort a report of a fictional farm. The problem is that the user wishes to have a customer ordering of the animals (e.g. Koala, Elephant, Wolf, Cow). Now there exist possibilities using CONCAT, or CASE to add a weight to the DQL (example 0002wolf, 0001elephant). In my experience this is either tricky to build and when I got it working the result set was an array and not a collection.
So, to solve this we added a "weight" field to each record and, before running the select, we assign each one with a weight:
$animals = $em->getRepository('AcmeDemoBundle:Animal')->findAll();
foreach ($animals as $animal) {
if ($animal->getName() == 'koala') {
$animal->setWeight(1);
} else if ($animal->getName() == 'elephant') {
$animal->setWeight(2);
}
// etc
$em->persist($animal);
}
$em->flush();
$query = $em->createQuery(
'SELECT c FROM AcmeDemoBundle:Animal c ORDER BY c.weight'
);
This works perfectly. To avoid race conditions we added this inside a transaction block:
$em->getConnection()->beginTransaction();
// code from above
$em->getConnection()->rollback();
This is a lot more robust as it handles multiple users generating the same report. Alternatively the entities can be weighted like this:
$em->getConnection()->beginTransaction();
$qb = $em->createQueryBuilder();
$q = $qb->update('AcmeDemoBundle:Animal', 'c')
->set('c.weight', $qb->expr()->literal(1))
->where('c.name = ?1')
->setParameter(1, 'koala')
->getQuery();
$p = $q->execute();
$qb = $em->createQueryBuilder();
$q = $qb->update('AcmeDemoBundle:Animal', 'c')
->set('c.weight', $qb->expr()->literal(2))
->where('c.name = ?1')
->setParameter(1, 'elephant')
->getQuery();
$p = $q->execute();
// etc
$query = $em->createQuery(
'SELECT c FROM AcmeDemoBundle:Animal c ORDER BY c.weight'
);
$em->getConnection()->rollback();
Questions:
1) which of the two examples would have better performance?
2) Is there a third or better way to do this bearing in mind we need a collection as a result?
Please remember that this is just an example - sorting the result set in memory is not an option, it must be done on the database level - the real statement is a 10 table join with 5 orderbys.
Initially you could make use of a Doctrine implementation named Logging (\Doctrine\DBAL\LoggingProfiler). I know that it is not the better answer, but at least you can implement it in order to get best result for each example that you have.
namespace Doctrine\DBAL\Logging;
class Profiler implements SQLLogger
{
public $start = null;
public function __construct()
{
}
/**
* {#inheritdoc}
*/
public function startQuery($sql, array $params = null, array $types = null)
{
$this->start = microtime(true);
}
/**
* {#inheritdoc}
*/
public function stopQuery()
{
echo "execution time: " . microtime(true) - $this->start;
}
}
In you main Doctrine configuration you can enable as:
$logger = new \Doctrine\DBAL\Logging\Profiler;
$config->setSQLLogger($logger);