Lazy loading with a result from the hydration cache

Lazy loading with a result from the hydration cache - doctrine-orm

$queryBuilder = $this->getEntityManager()->createQueryBuilder()
->from('ShopHqCartBundle:PromotionalDiscount', 'pd')
->leftJoin('pd.requiredSkus', 'pdrs')
->andWhere('pd.id = :id')
->select('pd', 'pdrs');
$query = $queryBuilder->getQuery();
$cache = $this->getEntityManager()->getConfiguration()->getResultCacheImpl();
$hydrationCacheProfile = new QueryCacheProfile(60 * 60 * 12, $cacheKeyHydrated, $cache);
$result = $query
->useQueryCache(true)
->useResultCache(true, 60 * 60 * 12, $cacheKey)
->setHydrationCacheProfile($hydrationCacheProfile)
->getResult();
The first time this runs I get back a managed entity and I can lazy load other relations. When getting a result back from the hydration cache it is not a managed entity and therefore lazy loading doesn't work. Is there something I can do to keep the hydration cache and get lazy loading to work with a result from the hydration cache?

Related

Why does icrawler pause when downloading dozens of images?

I want to download image from Google using icrawler. I set the maximum number of download to 1000. But I just get 92 images when it stops. Moreover, the result is different every time I run it, which is less than 100.
from icrawler.builtin import GoogleImageCrawler
for var in ['car front bumper damage']:
var_folder = var.replace(" ", "_")
image_folder = '/content/drive/MyDrive/DataStor/Crawler-datasets/'
path = image_folder + var_folder
import os
try:
os.makedirs(path)
except FileExistsError:
print("File already exists")
print(f'Collecting images for {var}......')
google_Crawler = GoogleImageCrawler(downloader_threads=4, storage = {'root_dir': path})
google_Crawler.crawl(keyword = var , max_num = 1000)
print(google_Crawler.feeder.in_queue.qsize())
I don't know if the parameters are not set correctly.

this is because when you are crawling in google images only the first page is processed so you cannot get all your 1000 images. a solution is to crawl multiple times and in different periodes :
google_Crawler.crawl(keyword = var, max_num=350, date_min=date(2019, 1, 1), date_max=date(2019, 12, 31))
google_Crawler.crawl(keyword = var, max_num=350, date_min=date(2020,1,1), date_max=date(2020, 12, 31), file_idx_offset='auto')
google_Crawler.crawl(keyword = var, max_num=350, date_min=date(2021,1,1), date_max=date(2021, 12, 31), file_idx_offset='auto')
you can crawl more just you have to specifie a different periode if you don't want to have duplicate images
dont for get to :
from datetime import date

DynamoDB Client.Scan() is not returning the LastEvaluatedKey parameter

I have a table with 10k rows.
I'm trying to parse them to change a small thing inside an attribute (inside each row) with Python, so I'm using the client.scan() taking batches of 10 rows and giving the "LastEvaluatedKey" parameter to the next .scan().
The problem is that after 40 rows the scan() doesn't return the lastKey, like the DB it's only 40 lines long.
I've noticed that launching the same script against another table, 3x times bigger, the stop happens at 120 rows (3x times bigger).
The table has On-Demand capacity.
Any idea about this?
client = boto3.client('dynamodb')
resource = boto3.resource('dynamodb')
table = resource.Table(table_name)
remaining = 3961
iteration = 0
limit = 10
while remaining > 0:
# retrieve Limit
if iteration == 0:
response = client.scan(
TableName=table_name,
Limit=limit,
Select='ALL_ATTRIBUTES',
ReturnConsumedCapacity='TOTAL',
TotalSegments=123,
Segment=122,
)
key = response["LastEvaluatedKey"]
else:
response = client.scan(
TableName=table_name,
Limit=limit,
Select='ALL_ATTRIBUTES',
ExclusiveStartKey=key,
ReturnConsumedCapacity='TOTAL',
TotalSegments=123,
Segment=122,
)
key = response["LastEvaluatedKey"]
iteration += 1
for el in response["Items"]:
print(el)

I think there are two problems:
you seem to be scanning with a limit: try removing that
your are running a parallel scan and always scanning the last segment:
TotalSegments=123
Segment=122
I'm not sure how big your tables are but 123 segments is quite a lot and I don't see you scanning any of other segments, from 0 to 121.
Try this:
iteration = 0
response = client.scan(
TableName=table_name,
Select='ALL_ATTRIBUTES',
ReturnConsumedCapacity='TOTAL'
)
while True:
iteration += 1
for el in response["Items"]:
print(el)
last_key = response["LastEvaluatedKey"]
if not last_key:
break
response = client.scan(
TableName=table_name,
Select='ALL_ATTRIBUTES',
ExclusiveStartKey=last_key,
ReturnConsumedCapacity='TOTAL'
)
I expect the above should work to retrieve all items in your table. Then, if you still would like to run a parallel scan, you can do so but you'll have to handle the splitting into segments and in order for that to be efficient you'll have to handle running those concurrently (more complicated to do than a sequential scan).

Does Django Cache Clear Itself After Expiration

I am currently working with Memcache and Django to cache data requested from an external API, so I don't overwhelm their servers. Currently my code looks like this:
# CACHE CURRENT PRICE
cache_key_price = str(stock.id)+'_price' # needs to be unique
cache_key_change = str(stock.id)+'_change'
cache_keychange_pct = str(stock.id)+'_changePct'
cache_time = 60 * 5 # time in seconds for cache to be valid
price_data = cache.get(cache_key_price) # returns None if no key-value pair
change_data = cache.get(cache_key_change) # returns None if no key-value pair
changePct_data = cache.get(cache_keychange_pct) # returns None if no key-value pair
if not price_data:
delayed_price, change, changePct = get_quote(stock.ticker)
price_data = delayed_price
change_data = change
changePct_data = changePct
cache.set(cache_key_price, price_data, cache_time)
cache.set(cache_key_change, change_data, cache_time)
cache.set(cache_keychange_pct, changePct_data, cache_time)
context_dict['delayed_price'] = cache.get(cache_key_price)
context_dict['change'] = cache.get(cache_key_change)
context_dict['changePct'] = cache.get(cache_keychange_pct)
I'm a bit new to caching and I am curious if after 5 mins the cache will clear itself and data will return None triggering the if not data: bit of code to get updated data.
Thanks in advance for any help!

Here is simplified version of your code (with just 1 key, not all 3 keys); you extend this to suit your needs.
I made 2 changes: first, the statement cache.set(..) needs to be inside the if not price_data: block, so that it is only run when the cache is empty (or expired).
Second, you should use the variable price_data to load into the context; so you don't need to call cache.get(..) a second time.
cache_key_price = str(stock.id)+'_price' # needs to be unique
cache_time = 60 * 5 # time in seconds for cache to be valid
price_data = cache.get(cache_key_price) # returns None if no key-value pair
if not price_data:
delayed_price, change, changePct = get_quote(stock.ticker)
price_data = delayed_price
cache.set(cache_key_price, price_data, cache_time)
context_dict['delayed_price'] = price_data

using pd.read_sql() to extract large data (>5 million records) from oracle database, making the sql execution very slow

Initially tried using pd.read_sql().
Then I tried using sqlalchemy, query objects but none of these methods are
useful as the sql getting executed for long time and it never ends.
I tried using Hints.
I guess the problem is the following: Pandas creates a cursor object in the
background. With cx_Oracle we cannot influence the "arraysize" parameter which
will be used thereby, i.e. always the default value of 100 will be used which
is far too small.
CODE:
import pandas as pd
import Configuration.Settings as CS
import DataAccess.Databases as SDB
import sqlalchemy
import cx_Oracle
dfs = []
DBM = SDB.Database(CS.DB_PRM,PrintDebugMessages=False,ClientInfo="Loader")
sql = '''
WITH
l AS
(
SELECT DISTINCT /*+ materialize */
hcz.hcz_lwzv_id AS lwzv_id
FROM
pm_mbt_materialbasictypes mbt
INNER JOIN pm_mpt_materialproducttypes mpt ON mpt.mpt_mbt_id = mbt.mbt_id
INNER JOIN pm_msl_materialsublots msl ON msl.msl_mpt_id = mpt.mpt_id
INNER JOIN pm_historycompattributes hca ON hca.hca_msl_id = msl.msl_id AND hca.hca_ignoreflag = 0
INNER JOIN pm_tpm_testdefprogrammodes tpm ON tpm.tpm_id = hca.hca_tpm_id
inner join pm_tin_testdefinsertions tin on tin.tin_id = tpm.tpm_tin_id
INNER JOIN pm_hcz_history_comp_zones hcz ON hcz.hcz_hcp_id = hca.hca_hcp_id
WHERE
mbt.mbt_name = :input1 and tin.tin_name = 'x1' and
hca.hca_testendday < '2018-5-31' and hca.hca_testendday > '2018-05-30'
),
TPL as
(
select /*+ materialize */
*
from
(
select
ut.ut_id,
ut.ut_basic_type,
ut.ut_insertion,
ut.ut_testprogram_name,
ut.ut_revision
from
pm_updated_testprogram ut
where
ut.ut_basic_type = :input1 and ut.ut_insertion = :input2
order by
ut.ut_revision desc
) where rownum = 1
)
SELECT /*+ FIRST_ROWS */
rcl.rcl_lotidentifier AS LOT,
lwzv.lwzv_wafer_id AS WAFER,
pzd.pzd_zone_name AS ZONE,
tte.tte_tpm_id||'~'||tte.tte_testnumber||'~'||tte.tte_testname AS Test_Identifier,
case when ppd.ppd_measurement_result > 1e15 then NULL else SFROUND(ppd.ppd_measurement_result,6) END AS Test_Results
FROM
TPL
left JOIN pm_pcm_details pcm on pcm.pcm_ut_id = TPL.ut_id
left JOIN pm_tin_testdefinsertions tin ON tin.tin_name = TPL.ut_insertion
left JOIN pm_tpr_testdefprograms tpr ON tpr.tpr_name = TPL.ut_testprogram_name and tpr.tpr_revision = TPL.ut_revision
left JOIN pm_tpm_testdefprogrammodes tpm ON tpm.tpm_tpr_id = tpr.tpr_id and tpm.tpm_tin_id = tin.tin_id
left JOIN pm_tte_testdeftests tte on tte.tte_tpm_id = tpm.tpm_id and tte.tte_testnumber = pcm.pcm_testnumber
cross join l
left JOIN pm_lwzv_info lwzv ON lwzv.lwzv_id = l.lwzv_id
left JOIN pm_rcl_resultschipidlots rcl ON rcl.rcl_id = lwzv.lwzv_rcl_id
left JOIN pm_pcm_zone_def pzd ON pzd.pzd_basic_type = TPL.ut_basic_type and pzd.pzd_pcm_x = lwzv.lwzv_pcm_x and pzd.pzd_pcm_y = lwzv.lwzv_pcm_y
left JOIN pm_pcm_par_data ppd ON ppd.ppd_lwzv_id = l.lwzv_id and ppd.ppd_tte_id = tte.tte_id
'''
#method1: using query objects.
Q = DBM.getQueryObject(sql)
Q.execute({"input1":'xxxx',"input2":'yyyy'})
while not Q.AtEndOfResultset:
print Q
#method2: using sqlalchemy
connectstring = "oracle+cx_oracle://username:Password#(description=
(address_list=(address=(protocol=tcp)(host=tnsconnect string)
(port=pertnumber)))(connect_data=(sid=xxxx)))"
engine = sqlalchemy.create_engine(connectstring, arraysize=10000)
df_p = pd.read_sql(sql, params=
{"input1":'xxxx',"input2":'yyyy'}, con=engine)
#method3: using pd.read_sql()
df_p = pd.read_sql_query(SQL_PCM, params=
{"input1":'xxxx',"input2":'yyyy'},
coerce_float=True, con= DBM.Connection)
It would be great if some one could help me out in this. Thanks in advance.

And yet another possibility to adjust the array size without needing to create oraaccess.xml as suggested by Chris. This may not work with the rest of your code as is, but it should give you an idea of how to proceed if you wish to try this approach!
class Connection(cx_Oracle.Connection):
def __init__(self):
super(Connection, self).__init__("user/pw#dsn")
def cursor(self):
c = super(Connection, self).cursor()
c.arraysize = 5000
return c
engine = sqlalchemy.create_engine(creator=Connection)
pandas.read_sql(sql, engine)

Here's another alternative to experiment with.
Set a prefetch size by using the external configuration available to Oracle Call Interface programs like cx_Oracle. This overrides internal settings used by OCI programs. Create an oraaccess.xml file:
<?xml version="1.0"?>
<oraaccess xmlns="http://xmlns.oracle.com/oci/oraaccess"
xmlns:oci="http://xmlns.oracle.com/oci/oraaccess"
schemaLocation="http://xmlns.oracle.com/oci/oraaccess
http://xmlns.oracle.com/oci/oraaccess.xsd">
<default_parameters>
<prefetch>
<rows>1000</rows>
</prefetch>
</default_parameters>
</oraaccess>
If you use tnsnames.ora or sqlnet.ora for cx_Oracle, then put the oraaccess.xml file in the same directory. Otherwise, create a new directory and set the environment variable TNS_ADMIN to that directory name.
cx_Oracle needs to be using Oracle Client 12c, or later, libraries.
Experiment with different sizes.
See OCI Client-Side Deployment Parameters Using oraaccess.xml.

WMI query - CPU LoadPercentage

I´m searching for a better way to get the CPU load in percent with WMI from multiple systems(means different CPUs etc.).
My code is working, but I think there is a better way to get over all CPU usage in percent.
Any ideas?
Thank you in advance!
SelectQuery queryCpuUsage = new SelectQuery("SELECT * FROM Win32_Processor");
ManagementObjectSearcher cpuUsage = new ManagementObjectSearcher(scope, queryCpuUsage);
ManagementObjectCollection cpuUsageCollection = cpuUsage.Get();
foreach (ManagementObject queryObj in cpuUsageCollection)
{
iCPU++;
calcCPU = Convert.ToInt32(queryObj["LoadPercentage"]);
perCPU = perCPU + calcCPU;
}
perCPU = perCPU / iCPU;
cpuUsageCollection.Dispose();
Console.WriteLine("LoadPercentage CPU: {0}", perCPU);

Personally I'd go for the Win32_PerfRawData_PerfOS_Processor class because it is much more precise. You will need to query both PercentProcessorTime and TimeStamp_Sys100NS. Here you can find the exact formula.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Lazy loading with a result from the hydration cache - doctrine-orm

Related

Why does icrawler pause when downloading dozens of images?

DynamoDB Client.Scan() is not returning the LastEvaluatedKey parameter

Does Django Cache Clear Itself After Expiration

using pd.read_sql() to extract large data (>5 million records) from oracle database, making the sql execution very slow

WMI query - CPU LoadPercentage

Categories

Resources