import taxonomy term based on url alias from csv(around 6000 records) - drupal-8

$alias = \Drupal::service('path.alias_manager')->getPathByAlias($url);
$params = Url::fromUri("internal:" . $alias)->getRouteParameters();
I get the node id from the above code.
\Drupal::entityTypeManager()
->getStorage('taxonomy_term')
->loadByProperties(['name' => $term]);
I get the tag id from the above code.
function updateTaxonomy($url_tag_array) {
foreach($url_tag_array as $node_id => $tag) {
$node = \Drupal\node\Entity\Node::load($node_id);
// May be if the node is not available
if ($node !== NULL) {
$node->field_ga_tag->target_id = $tag;
$node->save();
}
}
}
Finally I update the taxonomy with the above code.
It all works but it takes huge amount of time and take more than execution time, more than 15 mins.
Is there a way to update query less no of times? any possible using raw batch queries?

You can try this with Cron running. Also, validate the contents not to be duplicated.

Related

How to update local data after mutation?

I want to find a better way to update local component state after executing mutation. I'm using svelte-apollo but my question is about basic principles. I have watchQuery which get list of items and returns ObservableQuery in component.
query GetItems($sort: String, $search: String!) {
items(
sort: $sort
where: { name_contains: $search }
) {
id
name
item_picture{
pictures{
url
previewUrl
}
}
description
created_at
}
}
In component I call it:
<script>
$: query = GetItems({
variables: {
sort: 'created_at:DESC',
search
}
});
</script>
...
{#each $query.data?.items || [] as item, key (item.id)}
<div>
<Item
deleteItem={dropItem}
item={item}
setActiveItem={setActiveItem}
/>
</div>
{/each}
...
And I have addItem mutation.
mutation addItem($name: String!, $description: String) {
createItem(
input: { data: { name: $name, description: $description } }
) {
item {
name
description
}
}
}
I just simply want to update local state and add new item to an observable query result after addItem mutation, without using refetchQueries (because I don't want to get all list by network when I just added one item).
I seen this item in cache but my view is not updated.
P.S. If you have similar problems and some ways to solve it, be glad to see some cases from you.
I believe in this case, you could use the cache.modify function to modify the cache directly if you’re looking to skip the network request from refetchQueries. Would that work for your use case? https://www.apollographql.com/docs/react/data/mutations/#making-all-other-cache-updates
If you don’t mind the network request, I like using cache.evict to evict the data in the cache that I know changed personally. I prefer that to refetchQueries in most cases because it refetches all queries that used that piece of data, not just the queries I specify.

Strange behavior trying to perform data migration in Dynamo DB

We're trying to make a simple data migration in one of our tables in DDB.
Basically we're adding a new field and we need to backfill all the Documents in one of our tables.
This table has around 700K documents.
The process we follow is quite simple:
Manually trigger a lambda that will scan the table and for each document, will update the document and continue doing the same til its close to the 15 minutes top, in that case
Puts LastEvaluatedKey into SQS to trigger new lambda execution that uses that key to continue scanning.
Process goes on spawining lambdas sequentially as needed until there are no more documents
The problem we found is as follows...
Once the migration is done we noticed that the number of documents updated is way lower than the total number of documents existing in that table. It's a random value, not the same always but it ranges from tens of thousands to hundreds of thousands (worst case we seen was 300K difference).
This is obviously a problem, because if we scan the documents again, it seems obvious some documents were not migrated. We thought at first this was because of some clients updating/inserting new documents but the throughput on that table is not that large that will justify such a big difference, so this is not that there are new documents being added while we run the migration.
We tried a second approach that was first scanning, because if we only scan, we noticed that number of scan documents == count of documents in table, so we tried to dump the IDs of the documents in another table, then scan that table and update those items again. Funny thing, same problem happens with this new table with just IDs, there are way less than the count in the table we want to update, thus, we're back to square one.
We thought about using parallel scans but I don't see how this could benefit plus I don't want to compromise reading capacity for the table while running the migration.
Anybody with experience in data migrations in DDB can shed some light here? We're not able to figure out what we're doing wrong.
UPDATE: Sharing the function that is triggered and actually scans and updates
#Override
public Map<String, AttributeValue> migrateDocuments(String lastEvaluatedKey, String typeKey){
LOG.info("Migrate Documents started {} ", lastEvaluatedKey);
int noOfDocumentsMigrated = 0;
Map<String, AttributeValue> docLastEvaluatedKey = null;
DynamoDBMapperConfig documentConfig = new DynamoDBMapperConfig.TableNameOverride("KnowledgeDocumentMigration").config();
if(lastEvaluatedKey != null) {
docLastEvaluatedKey = new HashMap<String,AttributeValue>();
docLastEvaluatedKey.put("base_id", new AttributeValue().withS(lastEvaluatedKey));
docLastEvaluatedKey.put("type_key",new AttributeValue().withS(typeKey));
}
Instant endTime = Instant.now().plusSeconds(840);
LOG.info("Migrate Documents endTime:{}", endTime);
try {
do {
ScanResultPage<Document> docScanList = documentDao.scanDocuments(docLastEvaluatedKey, documentConfig);
docLastEvaluatedKey = docScanList.getLastEvaluatedKey();
LOG.info("Migrate Docs- docScanList Size: {}", docScanList.getScannedCount());
docLastEvaluatedKey = docScanList.getLastEvaluatedKey();
LOG.info("lastEvaluatedKey:{}", docLastEvaluatedKey);
final int chunkSize = 25;
final AtomicInteger counter = new AtomicInteger();
final Collection<List<Document>> docChunkList = docScanList.getResults().stream()
.collect(Collectors.groupingBy(it -> counter.getAndIncrement() / chunkSize)).values();
List<List<Document>> docListSplit = docChunkList.stream().collect(Collectors.toList());
docListSplit.forEach(docList -> {
TransactionWriteRequest documentTx = new TransactionWriteRequest();
for (Document document : docList) {
LOG.info("Migrate Documents- docList Size: {}", docList.size());
LOG.info("Migrate Documents- Doc Id: {}", document.getId());
if (!StringUtils.isNullOrEmpty(document.getType()) && document.getType().equalsIgnoreCase("Faq")) {
if (docIdsList.contains(document.getId())) {
LOG.info("this doc already migrated:{}", document);
} else {
docIdsList.add(document.getId());
}
if ((!StringUtils.isNullOrEmpty(document.getFaq().getQuestion()))) {
LOG.info("doc FAQ {}", document.getFaq().getQuestion());
document.setTitle(document.getFaq().getQuestion());
document.setTitleSearch(document.getFaq().getQuestion().toLowerCase());
documentTx.addUpdate(document);
}
} else if (StringUtils.isNullOrEmpty(document.getType())) {
if (!StringUtils.isNullOrEmpty(document.getTitle()) ) {
if (!StringUtils.isNullOrEmpty(document.getQuestion())) {
document.setTitle(document.getQuestion());
document.setQuestion(null);
}
LOG.info("title {}", document.getTitle());
document.setTitleSearch(document.getTitle().toLowerCase());
documentTx.addUpdate(document);
}
}
}
if (documentTx.getTransactionWriteOperations() != null
&& !documentTx.getTransactionWriteOperations().isEmpty() && docList.size() > 0) {
LOG.info("DocumentTx size {}", documentTx.getTransactionWriteOperations().size());
documentDao.executeTransaction(documentTx, null);
}
});
noOfDocumentsMigrated = noOfDocumentsMigrated + docScanList.getScannedCount();
}while(docLastEvaluatedKey != null && (endTime.compareTo(Instant.now()) > 0));
LOG.info("Migrate Documents execution finished at:{}", Instant.now());
if(docLastEvaluatedKey != null && docLastEvaluatedKey.get("base_id") != null)
sqsAdapter.get().sendMessage(docLastEvaluatedKey.get("base_id").toString(), docLastEvaluatedKey.get("type_key").toString(),
MIGRATE, MIGRATE_DOCUMENT_QUEUE_NAME);
LOG.info("No Of Documents Migrated:{}", noOfDocumentsMigrated);
}catch(Exception e) {
LOG.error("Exception", e);
}
return docLastEvaluatedKey;
}
Note: I would've added this speculation as a comment but my reputation does not allow
I think the issue that you're seeing here could be caused by the Scans not being ordered. So as long as your Scan would be executed in a single lambda I'd expect to you see that everything was handled fine. However, as soon as you hit the runtime limit of the lambda & start a new one your Scan will essentially get a new "ScanID" which might come in a different order. Based on the different order you're now skipping a certain set of entries.
I haven't tried to replicate this behavior & sadly there is no clear indication in the AWS documentation whether a Scan Request can be created in a new Session/Application.
I think #Charles' suggestion might help you in this case as you can simply run the entire migration in one process.

SuiteScript 2.0 UserEvent Script to Call Map Reduce

Good afternoon.
I am trying to get a User Event script to call or use a Map Reduce script. I am really new to the concept of a Map Reduce script and am not having much luck finding resources. Essentially, what I want to do is call a Map Reduce script that finds open transactions with the same Item Name and sets the Class on that item to the new item set by the User. The Map Reduce script would need to the Item Name and Class from the current record.
Here is my User Event:
/**
* #NApiVersion 2.0
* #NScriptType UserEventScript
*/
define(['N/record', 'N/log'],
function (record, log) {
function setFieldInRecord (scriptContext) {
log.debug({
'title': 'TESTING',
'details': 'WE ARE IN THE FUNCTION!'
});
if (scriptContext.type === scriptContext.UserEventType.EDIT) {
var old_Record = scriptContext.oldRecord;
var cur_Record = scriptContext.newRecord;
var oldClassId = old_Record.getValue({ fieldId: 'class'});
var curClassId = cur_Record.getValue({ fieldId: 'class'});
if ( oldClassId != curClassId ) {
// CALL MAP REDUCE HERE
}
}
}
return {
beforeSubmit: setFieldInRecord
};
}
);
Is the Map Reduce Script a separate file or is it embedded in the User Event script? I think I can get the Map Reduce to work if I know how to call it from the User Event. I appreciate any input with this question. Thank you!
Here is how we handled this situation.
We made sure to add 'N/task' to the define statement in the above code for the User Event. Then, in the User Event when the conditions were met in order to call the Map / Reduce Script, we did this:
var scriptTask = task.create({
taskType: task.TaskType.MAP_REDUCE
});
scriptTask.scriptId = 'customscript_id';
scriptTask.deploymentId = 'customdeploy_id';
var scriptTaskId = scriptTask.submit();
This successfully called the Map Reduce Script from the User Event.
I hope this helps out someone in the future.
Thank you.

Meteor regex find() far slower than in MongoDB console

I've been researching A LOT for past 2 weeks and can't pinpoint the exact reason of my Meteor app returning results too slow.
Currently I have only a single collection in my Mongo database with around 2,00,000 documents. And to search I am using Meteor subscriptions on the basis of a given keyword. Here is my query:
db.collection.find({$or:[
{title:{$regex:".*java.*", $options:"i"}},
{company:{$regex:".*java.*", $options:"i"}}
]})
When I run above query in mongo shell, the results are returned instantly. But when I use it in Meteor client, the results take almost 40 seconds to return from server. Here is my meteor client code:
Template.testing.onCreated(function () {
var instance = this;
// initialize the reactive variables
instance.loaded = new ReactiveVar(0);
instance.limit = new ReactiveVar(20);
instance.autorun(function () {
// get the limit
var limit = instance.limit.get();
var keyword = Router.current().params.query.k;
var searchByLocation = Router.current().params.query.l;
var startDate = Session.get("startDate");
var endDate = Session.get("endDate");
// subscribe to the posts publication
var subscription = instance.subscribe('sub_testing', limit,keyword,searchByLocation,startDate,endDate);
// if subscription is ready, set limit to newLimit
$('#searchbutton').val('Searching');
if (subscription.ready()) {
$('#searchbutton').val('Search');
instance.loaded.set(limit);
} else {
console.log("> Subscription is not ready yet. \n\n");
}
});
instance.testing = function() {
return Collection.find({}, {sort:{id:-1},limit: instance.loaded.get()});
}
And here is my meteor server code:
Meteor.publish('sub_testing', function(limit,keyword,searchByLocation,startDate,endDate) {
Meteor._sleepForMs(200);
var pat = ".*" + keyword + ".*";
var pat2 = ".*" + searchByLocation + ".*";
return Jobstesting.find({$or:[{title:{$regex: pat, $options:"i"}}, { company:{$regex:pat,$options:"i"}},{ description:{$regex:pat,$options:"i"}},{location:{$regex:pat2,$options:"i"}},{country:{$regex:pat2,$options:"i"}}],$and:[{date_posted: { $gte : endDate, $lt: startDate }},{sort:{date_posted:-1},limit: limit,skip: limit});
});
One point I'd also like to mention here that I use "Load More" pagination and by default the limit parameter gets 20 records. On each "Load More" click, I increment the limit parameter by 20 so on first click it is 20, on second click 40 and so on...
Any help where I'm going wrong would be appreciated.
But when I use it in Meteor client, the results take almost 40 seconds to return from server.
You may be misunderstanding how Meteor is accessing your data.
Queries run on the client are processed on the client.
Meteor.publish - Makes data available on the server
Meteor.subscribe - Downloads that data from the server to the client.
Collection.find - Looks through the data on the client.
If you think the Meteor side is slow, you should time it server side (print time before/after) and file a bug.
If you're implementing a pager, you might try a meteor method instead, or
a pager package.

Query Facebook Opengraph next page parameters

I am unable to implement pagination with Facebook OpenGraph. I have exhausted every option I have found.
My hope is to query for 500 listens repeatedly until there are none left. However, I am only able to receive a response from my first query. Below is my current code, but I have tried setting the parameters to different amounts rather than having the fields from the [page][next] dictate them
$q_param['limit'] = 500;
$next_exists = true;
while($next_exists){
$music = $facebook->api('/me/music.listens','GET', $q_param);
$music_data = array_merge($music_data, $music['data']);
if($music["paging"]["next"]==null || $music["paging"]["next"]=="")
$next_exists = false;
else{
$url = $music["paging"]["next"];
parse_str(parse_url($url, PHP_URL_QUERY), $array);
foreach ($array as $key => $value) {
$q_param[$key]=$value;
}
}
}
}
a - Can you please share what do you get after first call?
b - Also, possible if you can share the whole file?
I think your script is timing out. Try adding following on top of your file:
set_time_limit(0);
Can you check apache log files?
sudo tail -f /var/log/apache2/error.log