How to RESTfully support the creation of a resource which is a collection of other resources and avoiding HTTP timeouts due to DB creation? - web-services

In my application I have the concept of a Draw, and that Draw has to always be contained within an Order.
A Draw has a set of attributes: background_color, font_size, ...
Quoting the famous REST thesis:
Any information that can be named can be a resource: a document or
image, a temporal service (e.g. "today's weather in Los Angeles"), a
collection of other resources, a non-virtual object (e.g. a person),
and so on.
So, my collection of other resources here would be an Order. An Order is a set of Draws (usually more than thousands). I want to let the User create an Order with several Draws, and here is my first approach:
{
"order": {
"background_color" : "rgb(255,255,255)", "font_size" : 10,
"draws_attributes": [{
"background_color" : "rgb(0,0,0)", "font_size" : 14
}, {
"other_attribute" : "value",
},
]
}
}
A response to this would look like this:
"order": {
"id" : 30,
"draws": [{
"id" : 4
}, {
"id" : 5
},
]
}
}
So the User would know which resources have been created in the DB. However, when there are many draws in the request, since all those draws are inserted in the DB, the response takes a while. Imagine doing 10.000 inserts if an Order has 10.000 draws.
Since I need to give the User the ID of the draws that were just created (by the way, created but not finished, because when the Order is processed we actually build the Draw with some image manipulation libraries), so they can fetch them later, I fail to see how to deal with this in a RESTful way, avoiding to make the HTTP request take a lot time, but at the same time giving the User some kind of Ids for the draws, so they can fetch them later.
How do you deal with this kind of situations?

Accept the request wholesale, queue the processing, return a status URL that represents the state of the request. When the request is finished processing, present a url that represents the results of the request. Then, poll.
POST /submitOrder
301
Location: http://host.com/orderstatus/1234
GET /orderstatus/1234
200
{ status:"PROCESSING", msg: "Request still processing"}
...
GET /orderstaus/1234
200
{ status:"COMPLETED", msg: "Request completed", rel="http://host.com/orderresults/3456" }
Addenda:
Well, there's a few options.
1) They can wait for the result to process and get the IDs when it's done, just like now. The difference with what I suggested is that the state of the network connection is not tied to the success or failure of the transaction.
2) You can pre-assign the order ids before hitting the database, and return those to the caller. But be aware that those resources do not exist yet (and they won't until the processing is completed).
3) Speed up your system to where the timeout is simply not an issue.

I think your exposed granularity is too fine - does the user need to be able to modify each Draw separately? If not, then present a document that represents an Order, and that contains naturally the Draws.
Will you need to query specific Draws from the database based on specific criteria that are unrelated to the Order? If not, then represent all the Draws as a single blob that is part of a row that represents the Order.

Related

Linking groundtruth worker metadata back to the actual task?

As far as I can tell there's no identifier being passed with the GT worker metadata (see below from documentation https://docs.aws.amazon.com/sagemaker/latest/dg/sms-data-output.html)? How would I link this information back to the actual labeling task?
sub I believe is a cognito reference to the worker, so not a unique identifier for the submisson. As of right now, I jsut know that one of the tasks took a certian amount of time for a particular worker, but I can't tell which one. I also guess i have to jump through a few hoops via cognito to get the GT worker id from the sub?
I am looking for a way to summarize origina data shown (from input manifest file), the label given, the time it took to complete. As of right now, I have to make one table that has the data with their human submitted label, and a separate table with time it took to complete by task, but no way to link the two...am I missing something?
here's the worker metadata json:
"submissionTime": "2020-12-28T18:59:58.321Z",
"acceptanceTime": "2020-12-28T18:59:15.191Z",
"timeSpentInSeconds": 40.543,
"workerId": "a12b3cdefg4h5i67",
"workerMetadata": {
"identityData": {
"identityProviderType": "Cognito",
"issuer": "https://cognito-idp.aws-region.amazonaws.com/aws-region_123456789",
"sub": "aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee"
}
}

Handle 3 pub sub messages coming at same time , combine all 3 and store it in fire store

I Have 3 Pub Sub triggered cloud functions which receives 3 different Messages. These are published at the same time.
Main Data.
Sub Data 1.
Sub Data 2.
These messages has to be written into the firestore based on some logic.
Intended Goal:
Sub data 1 and Sub Data 2 has to be combined and outcome should be inserted into the Main Data document. After combining sub data 1&2, it knows the main data path(fire store document path) where it needs to attach itself.
Issues :
The cloud functions has to store their respective message data into firestore before it gets attached into the main data. Also main data has to be inserted before the sub data 1&2 combined, so that combined sub data 1&2 can be attached into the main data as extra.
What I have tried:
Tried orphan/parent storage logic. It works like,
Sub data 1 comes->looks for sub data 2 in its orphaned path , Combines with it ->
knows the main data path-> attach into it, if main data document is not available yet, get stored into the orphaned collection
or
Sub data 1 comes->looks for sub data 2, data 2 not available. Stored into the Orphaned collection.
If sub data 2 comes,
Sub data 2 comes->looks for sub data 1 , combines with it ,knows the main data path
->attach into it, if main data document is not available yet.stored
into the orphaned collection
When Main data inserted into the firestore, it will look for this sub data in the orphaned collection, if it is available, will attach those into it
Since these messages coming at the same time, exactly milliseconds interval, this logic is not working as expected.
There are two approaches worth considering here. If you know how long you need to wait, you can start functions that need to wait with a "sleep" function and just have them tread water for a bit (this is a bit hacky and could be costly at scale, but it works):
async function sleep(milliseconds) {
return new Promise((resolve) => setTimeout(resolve, milliseconds));
}
sleep(5000)
Likely a better solution would be to ditch the pub/sub messages for the subsequent functions and instead switch them to http functions and have the first function call them via a simple fetch request.
From whichever function you want to call first:
// ... do some stuff, then fetch when ready
let res = await fetch(
"https://us-central1-projectName.cloudfunctions.net/secondFunction",
{
method: "post",
body: JSON.stringify(body),
headers: {
Authorization: `bearer ${token}`,
"Content-Type": "application/json",
},
}
);
You can secure these functions with a bearer token, as documented here.

How to transit from Choice State Step Function?

The input that is being sent from previous state is in this form:
[
{
"bucketName": "test-heimdall-employee-data",
"executionId": "ca9f1e5e-4d3a-4237-8a10-8860bb9d58be_1586771571368",
"feedType": "lenel_badge",
"chunkFileKeys": "chunkFileLocation/lenel_badge/68ac7180-69a0-401a-b30c-8f809acf3a1c_1586771581154.csv",
"sanityPassFileKeys": "chunkFileLocation/lenel_badge/0098b86b-fe3c-45ca-a067-4d4a826ee2c1_1586771588882.json"
},
{
"bucketName": "test-heimdall-employee-data",
"executionId": "ca9f1e5e-4d3a-4237-8a10-8860bb9d58be_1586771571368",
"feedType": "lenel_badge",
"errorFilePath": "error/lenel_badge/2a899128-339d-4262-bb2f-a70cc60e5d4e/1586771589234_2e06e043-ad63-4217-9b53-66405ac9a0fc_1586771581493.csv",
"chunkFileKeys": "chunkFileLocation/lenel_badge/2e06e043-ad63-4217-9b53-66405ac9a0fc_1586771581493.csv",
"sanityPassFileKeys": "chunkFileLocation/lenel_badge/f6957aa7-6e22-496a-a6b8-4964da92cb73_1586771588793.json"
},
{
"bucketName": "test-heimdall-employee-data",
"executionId": "ca9f1e5e-4d3a-4237-8a10-8860bb9d58be_1586771571368",
"feedType": "lenel_badge",
"errorFilePath": "error/lenel_badge/8050eb12-c5e6-4ae9-8c4b-0ac539f5c189/1586771589293_1bb32e6c-03fc-4679-9c2f-5a4bca46c8aa_1586771581569.csv",
"chunkFileKeys": "chunkFileLocation/lenel_badge/1bb32e6c-03fc-4679-9c2f-5a4bca46c8aa_1586771581569.csv",
"sanityPassFileKeys": "chunkFileLocation/lenel_badge/48960b7c-04e0-4cce-a77a-44d8834289df_1586771588870.json"
}
]
state machine workflow design:
How do I extract "feedType"value from the above inputs and transit to next state and also pass entire inputs to next state?
Thanks
You can access the input JSON you started your statemachine with using: $$.Execution.Input.todo. Other than that you can't directly access previous state from one step to the next.
As an example lets say you have A->B->C
Lets say you went through A which gave a new field: a : 1, and then you went through B and it returns b : 2, when you get to C you will only have b : 2. But if B also return a : 1 you would then have {a : 1, b : 2} at C. Which is typically what you do to pass state from a step a couple of steps prior.
There are other things which people do, such as storing data in an s3 bucket and accessing that bucket in different stages. You can also query a step function as well but that can be messy.
Other hacks include adding a pass step in a parallel block, but these hacks are not good, the correct way is to pass the data on between your steps, or hopefully have what you need in your execution input.
Looking at your previous state input it looks like feed_type is a constant. Assuming key to your entire input is "input" so that it's dictionary like {"input":[{...},{...}]} and so on. So to access the value of feed_type you can simply do $.input[0].feed_type.
Choice state by default passes the entire input passed to it into the next stage. So to whatever next stage it goes to, that stage is going to have same input that was passed to choice state.
To understand it better or as a proof of concept check the below Step Function in which Hello state is a choice state and other 2 states are simple pass states.
And if you will see below the input and output of Choice state. It's the same.
Hope it helps.

How to make ArrayController ignore DS.Store's transient record

I have a list of clients displayed through a ClientsController, its content is set to the Client.find() i.e. a RecordArray. User creates a new client through a ClientController whose content is set to Client.createRecord() in the route handler.
All works fine, however, while the user fills up the client's creation form, the clients list gets updated with the new client record, the one created in the route handler.
What's the best way to make RecordArray/Store only aware of the new record until the record is saved ?
UPDATE:
I ended up filtering the list based on the object status
{{#unless item.isNew}} Display the list {{/unless}}
UPDATE - 2
Here's an alternative way using filter, however the store has to be loaded first through the find method, App.Client.find().filter() doesn't seem to behave the way the two methods behave when called separately.
// Load the store first
App.Client.find();
var clients = App.Client.filter(function(client){
console.info(client.get('name') + ' ' + client.get('isNew'));
return !client.get('isNew');
});
controller.set('content',clients);
Few ways to go about this:
First, it's very messy for a route/state that deals with a list of clients to have to go out of its way to filter out junk left over from another unrelated state (i.e. the newClient state). I think it'd be way better for you to delete the junk record before leaving the newClient state, a la
if(client.get("isNew")) {
client.deleteRecord();
}
This will make sure it doesn't creep into the clientIndex route, or any other client list route that shouldn't have to put in extra work to filter out junk records. This code would ideally sit in the exit function of your newClient route so it can delete the record before the router transitions to another state that'll called Client.find()
But there's an even better, idiomatic solution: https://gist.github.com/4512271
(not sure which version of the router you're using but this is applicable to both)
The solution is to use transactions: instead of calling createRecord() directly on Client, call createRecord() on the transaction, so that the new client record is associated with that transaction, and then all you need to do is call transaction.rollback() in exit -- you don't even need to call isNew on anything, if the client record was saved, it obviously won't be rolled back.
This is also a useful pattern for editing records: 1) create a transaction on enter state and add the record to it, e.g.
enter: function(router, client) {
this.tx = router.get("store").transaction();
this.tx.add(client);
},
then the same sort of thing on the exit state:
exit: function(router, client) {
this.tx.rollback();
},
This way, if the user completes the form and submits to the server, rollback will correctly/conveniently do nothing. And if the user edits some of the form fields but then backs out halfway through, your exit callback will revert the unsaved changes, so that you don't end up with some dirty zombie client popping up in your clientIndex routes display it's unsaved changes.
Not 100% sure, could you try to set the content of ClientsController with
Client.filter(function(client){
return !client.get('isNew'));
});
EDIT: In order to make this work, you have to first load the store with Client.find().

How should I do post persist/update actions in doctrine 2.1, that involves re-saving to the db?

Using doctrine 2.1 (and zend framework 1.11, not that it matters for this matter), how can I do post persist and post update actions, that involves re-saving to the db?
For example, creating a unique token based on the just generated primary key' id, or generating a thumbnail for an uploaded image (which actually doesn't require re-saving to the db, but still) ?
EDIT - let's explain, shall we ?
The above is actually a question regarding two scenarios. Both scenarios relate to the following state:
Let's say I have a User entity. When the object is flushed after it has been marked to be persisted, it'll have the normal auto-generated id of mysql - meaning running numbers normally beginning at 1, 2, 3, etc..
Each user can upload an image - which he will be able to use in the application - which will have a record in the db as well. So I have another entity called Image. Each Image entity also has an auto-generated id - same methodology as the user id.
Now - here is the scenarios:
When a user uploads an image, I want to generate a thumbnail for that image right after it is saved to the db. This should happen for every new or updated image.
Since we're trying to stay smart, I don't want the code to generate the thumbnail to be written like this:
$image = new Image();
...
$entityManager->persist($image);
$entityManager->flush();
callToFunctionThatGeneratesThumbnailOnImage($image);
but rather I want it to occur automatically on the persisting of the object (well, flush of the persisted object), like the prePersist or preUpdate methods.
Since the user uploaded an image, he get's a link to it. It will probably look something like: http://www.mysite.com/showImage?id=[IMAGEID].
This allows anyone to just change the imageid in this link, and see other user's images.
So in order to prevent such a thing, I want to generate a unique token for every image. Since it doesn't really need to be sophisticated, I thought about using the md5 value of the image id, with some salt.
But for that, I need to have the id of that image - which I'll only have after flushing the persisted object - then generate the md5, and then saving it again to the db.
Understand that the links for the images are supposed to be publicly accessible so I can't just allow an authenticated user to view them by some kind of permission rules.
You probably know already about Doctrine events. What you could do:
Use the postPersist event handler. That one occurs after the DB insert, so the auto generated ids are available.
The EventManager class can help you with this:
class MyEventListener
{
public function postPersist(LifecycleEventArgs $eventArgs)
{
// in a listener you have the entity instance and the
// EntityManager available via the event arguments
$entity = $eventArgs->getEntity();
$em = $eventArgs->getEntityManager();
if ($entity instanceof User) {
// do some stuff
}
}
}
$eventManager = $em->getEventManager():
$eventManager->addEventListener(Events::postPersist, new MyEventListener());
Be sure to check e. g. if the User already has an Image, otherwise if you call flush in the event listener, you might be caught in an endless loop.
Of course you could also make your User class aware of that image creation operation with an inline postPersist eventHandler and add #HasLifecycleCallbacks in your mapping and then always flush at the end of the request e. g. in a shutdown function, but in my opinion this kind of stuff belongs in a separate listener. YMMV.
If you need the entity id before flushing, just after creating the object, another approach is to generate the ids for the entities within your application, e. g. using uuids.
Now you can do something like:
class Entity {
public function __construct()
{
$this->id = uuid_create();
}
}
Now you have an id already set when you just do:
$e = new Entity();
And you only need to call EntityManager::flush at the end of the request
In the end, I listened to #Arms who commented on the question.
I started using a service layer for doing such things.
So now, I have a method in the service layer which creates the Image entity. After it calls the persist and flush, it calls the method that generates the thumbnail.
The Service Layer pattern is a good solution for such things.