Make an API request every minute and then process the results - c++

I have a task to access an API every minute and get data. I call the API using the cpr library (which is based on libcurl)
cpr::Get(cpr::Url{ URL })
And store the data in a vector:
std::vector<std::string> myData;
How can I make these requests every minute (or whatever time interval I set with chrono)?
I would like to draw attention to the fact that I would like to somehow work with these data later (exclude duplicates for example).
In javascript I would write something like setinterval

Related

How do I send a request multiple times while only changing the parameters in POSTMAN?

I'm very new to postman so please bear with me. Basically, I am trying to get data from the clinicaltrials.gov API, which can only give me 1000 studies at a time. Since the data I need is about 25000 studies, I'm querying it based on dates. So, is there any way in Postman that I can GET multiple requests at one time wherein I am only changing one parameter?
Here is my URL: ClinicalTrials.gov/api/query/study_fields??expr=AREA[LocationCountry]United States AND AREA[StudyFirstPostDate]RANGE[MIN,01/01/2017] AND AREA[OverallStatus]Recruiting
I will only be changing the RANGE field in each request but I do not want to manually change it every time. So, is there any other way in which I can maybe at a list of dates and have Postman go through them all?
There's several ways to do this.
So, is there any way in Postman that I can GET multiple requests at one time wherein I am only changing one parameter?
I'm going to assume you don't mind if the requests are sequenced or parallel, the latter is less trivial and doesn't seem to add more value to you. So I'll focus on the following problem statement.
We want to retrieve multiple pages of a resource, where the cursor is StudyFirstPostDate. On each page retrieved the cursor should increment to the latest date from the previous poge. The following is just one way to code this, but the building blocks are:
You have a collection with a single request, the GET described above
We will have a pre-request script that will read a collection variable with the next StudyFirstPostDate
We will have a test script (post-request) that will re-set the StudyFirstPostDate to the next value of the pagination.
On the test script you should save the data the same way you're doing now.
You set the next request (postman.setNextRequest("NAMEOFREQUEST")) to the same GET request we're dealing with, to effectively create a loop. When you've retrieved all pages you kill the looip with postman.setNextRequest(null) - although not calling any function should also stop it. This then goes to step (2) and loop.
This flow will only work on a collection run. Even if you code all of this, just triggering the request by itself will not initiate a loop. setNextRequest only works within a collection run.
Setting a initial value to the variable on the pre-request script
// Set the initial value on the collection variables
// You could use global or environment variables, up to you.
const startDate = pm.collectionVariables.get("startDate")
Re-setting the value on the Tests
// Loop through the results save the data and retrieve the next start date for the request
// After you have it
const startDate = pm.collectionVariables.set("startDate",variableWithDate)
// If you've reach the end you stop, if not you call the same request to loop
// nextPage is an example of a boolean that you've set before
if (nextPage) {
postman.setNextRequest("NAMEOFREQUEST")
} else
postman.setNextRequest(null)
}

Firestore in Datastore mode does not seem to be strongly consistent

I am using cloud endpoints with objectify and Firestore in Datastore mode. Although it says in the documentation that all queries are strongly consistent, I have found that they are not in the following examples:
Example 1
I made an endpoint that queries for an entity by a property, adds +1 to a count property on it, and saves it back to the datastore. I then have 50 different clients all execute that method at the same time. I would expect the count property to be 50, however, it usually ends up being somewhere between 25-30.
Example 2
I have an endpoint that queries for an entity by a property. If the entity does not exist, I create the entity and save it to the datastore. If it exists, I just return it. Again, I hit this endpoint with 50 different clients at the same time. I would expect there to only be one entity in the Datastore. However, I will have maybe 5-10 of the same entity.
It seems to me this is not strongly consistent. If I take my code in the above endpoints and put them in a transaction with retries, all works as intended. I looked around in objectify to see if there is a ReadOptions set somewhere, but from what I can see, there is not, so it should be using the default of read_consistency=STRONG
For example 1, you need to use transactions to ensure that writes do not stomp on each other.
For example 2, again you need to use a transaction to get consistency across clients.
Strong consistency means that if a client writes a value, it can read or query it back after the write succeeds. Not that if a client reads a value, another reads the same value, they each do a transformation, and try to write that the blinds writes for each client will merge together.

Aws amplify shows much less data then in dynamodb exist

I have following problem. In my amplify studio i see 10k datapoints
But if i take a closer look into the corresponding database i see this:
I have over 200k+ data but it shows only 10k inside amplify studio. Why is it like that?
When i try this code in my frontend:
let p = await DataStore.query(Datapoint, Predicates.ALL, {
limit: 1000000000,
})
console.log(p.length)
I get 10000 back. The same number like in amplify studio.
Other questions: Whats the best way to store dozens of datapoints? I need it for chart visualizing.
A DynamoDB Query or Scan request does not return all items in one huge list. Instead, it returns just a single "page" of results, whose size defaults to 1MB. A client library like amplify could call these Query or Scan requests repeatedly to collect all pages into one huge array in memory, but that doesn't make too much sense once it grows very big. So applications usually want to iterate over all the results, not to collect them into one huge array in memory.
So most DynamoDB APIs provide a pagination interface for the query and scan operations - which provides you with one page of results, and a way to get the next page. Alternatively, some APIs can give you an iterator over results - and internally do this pagination. I'm not familiar with Amplify so I don't know which API to recommend. But certainly an API that returns all results as one big array must have its limits, and apparently you've found them.

Reprocess batches of items over and over again - and the batch might change any time

I am just looking for ideas on how to solve one specific thing I'd like to build.
Say I have two sets of items. Each item is just a couple of lines of JSON. Any time an item is added to one set I immediately (well, almost) want to process this against the full other set. So item is added to set A: Process against each item in set B. And vice versa.
Items come in through API Gateway + Lambda. Match processing in Lambda from a queue/stream.
What AWS technology would be a good fit? I have no idea and no clear pattern on when or how often the sets change. Also, I want it to be as strongly consistent as possible. And of course, I want it to be as serverless and cost-effective as possible. :)
Options could be:
sets stored in Aurora, match processing for a new item in A would need to query the full set B from the database each time
sets stored in DynamoDB, maybe with DynamoDB stream in the background; match processing for a new item in A would need to query the full set B from Dynamo; but spiky load, not a good fit because of unclear read/write provisioning
have each set in its own "static" Kinesis stream where match processing reads through items but doesn't trim. Streams to be replaced with fresh sets regularly
My pain point is: While processing items from A there might be thousands of items in B to be matched. And I want to avoid having to load the full set B from some database every time I process an item from A. I was thinking about some caching of sets but then would need a good option to invalidate that cache whenever something changes.

general Java design of consuming an async web service

i need to consume a batch web service where I send a unique id to identify myself, then the service sends back a unique response id that I am to use some minutes later to get the info I need.
in general, what is a good way to keep track of the response id and call the service again at some later time to get the real response?
The easy solution is to stick the ID and timestamp a map* or list, then have a loop in a separate thread that wakes up and processes all IDs older than a certain age. (Make sure that the map or list is thread-safe.) However, if your app goes down and gets relaunched, it will lose track of pending requests. If you must handle that case, use a database.
*
One specific solution is to use a SortedMap keyed by timestamp. You must make sure every timestamp is unique so you should expect not to put more than one element per millisecond into the map. Then to put an ID into the map, let the timestamp be System.currentTimeMillis(), and while the timestamp is already a key in the map, increment it. Then put the (timestamp, ID) pair into the SortedMap. This solution is convenient because the loop thread can just read elements of the SortedMap from the beginning until they are too new and then stop, because all the oldest elements are at the beginning of the map.