S3Stream is getting closed before processing the entire payload - amazon-web-services

I am processing the bulk json payload from s3.
code as follows:
import com.fasterxml.jackson.core.JsonFactory;
import com.fasterxml.jackson.core.JsonParseException;
import com.fasterxml.jackson.core.JsonParser;
import com.amazonaws.services.s3.model.S3Object;
import static com.fasterxml.jackson.core.JsonToken;
import com.google.common.util.concurrent.Futures;
import com.google.common.util.concurrent.ListenableFuture;
public boolean sync(Job job)
throws IOException
//validating the json payload from s3.
try(InputStream s3Stream = readStreamFromS3())
{
validationService.validate(s3Stream);
}
catch (S3SdkInteractionException e) {
{
logger.error(e.getLocalizedMessage();
}
//process the json payload from s3.
try (InputStream s3Stream = readStreamFromS3())
{
syncService.process(s3Stream);
}
catch (S3SdkInteractionException e) {
{
logger.error(e.getLocalizedMessage();
}
}
public InputSteam readStreamFromS3()
{
return S3Object.getObjectContent();
}
// Process will sync the user data in the s3 stream.
// I am not closing the stream till the entire stream is processed. I
// need to handle as a stream processing.
// I dont want keep the contents in memory for processing, not
feasible for my use case.
public boolean process(InputStream s3Stream)
{
jsonFactory = objectMapper.getFactory();
try(JsonParser jsonParser = jsonFactory.createParser(s3Stream) {
JsonToken jsonToken = jsonParser.nextToken();
List<HttpResponseFuture<UserResponse> userFutures = new ArrayLsit<>(20);
while(true) {
for(int i = 0; i < 20; i++)
{
try {
// stream is processed fully
if (jsonToken == null || jsonToken == JSONTOKEN.END_OBJECT) { break; }
while (!jsonToken.isStructStart()) {
jsonToken = jsonParser.nextToken();
}
// Fetch the user record from the stream
if (jsonTokenn.isStructStart()) {
Map<String,Object> userNode = jsonParser.readValueAs(Map.class);
// calling an external service and adding future response
userFutures.add(executeAsync(httpClient, userNode);
//Move to the next user record
if (jsonToken == JSONTOKEN.START_OBJECT) {
jsonToken = jsonParser.nextToken();
}
}
}
catch (JsonParseException jpe) {
logger.error(jpe.getLocalizedMessage());
break;
}
}
for(ListenableFuture<UserResponse> responseFuture : Futures.inCompletionOrder(userFutures)) {
JsonResponse response = responseFuture.get();
}
}
}
return false;
}
There is serviceA through which we are ingesting data (json payload) to S3.
Another serviceB (the pseudocode shown above) will process the s3 data and call another serviceC to sync the data (json payload) in underlying store.
Problem:
I am seeing repeated s3 warning in our code. com.amazonaws.services.s3.internal.S3AbortableInputStream Not all bytes were read from the S3ObjectInputStream, aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged GET or drain the input stream after use
The validation phase is executing as expected without any issues.
However on syncing the data(ie. syncService.process()), the s3Stream is getting closed before the entire payload is processed.
Since the stream is getting the closed before i process the entire stream, i am in inconsistent state.
Dependency information as follows
aws-java-sdk-s3:1.11.411
guava:guava-25.0-jre
jackson-core:2.9.6
Json payload could vary between few MB's to 2 GB.
Any help would be appreciated.

Related

How to upload a large file with multipart using the TemporaryFile from 1.3.0

I'm trying to create an endpoint and corresponding swagger endpoint_info which uploads a file via multipart. I was hoping to use the TemporaryFile to write the parts then iterate the parts from getAllParts() to dump them to my final file. I can't seem to get the endpoint_info to create the right boundary in my request. I'm getting an exception when I try to create the PartList: Error. No 'boundary' value found in headers.
ENDPOINT_INFO(upload) {
info->summary =
"Upload"
info->addResponse<Object<CommandResponseDto>>(Status::CODE_200,
"application/json");
info->addConsumes<oatpp::String>("multipart/form-data");
}
ENDPOINT("POST", "upload",
upload,
REQUEST(std::shared_ptr<IncomingRequest>, request)) {
namespace mp = oatpp::web::mime::multipart;
try {
mp::PartList multipart(request->getHeaders());
} catch (const std::exception& e) {
logger_->error("Error creating multipart object: {}", e.what());
return create_error_response(e.what());
}
mp::PartList multipart(request->getHeaders());
mp::Reader multipartReader(&multipart);
multipartReader.setDefaultPartReader(
mp::createTemporaryFilePartReader("/tmp" /* /tmp directory */));
request->transferBody(&multipartReader);
auto parts = multipart.getAllParts();
for (auto& p : parts) {
/* print part name and filename */
logger_->error("Multipart", "Part name={}, filename={}",
p->getName()->c_str(), p->getFilename()->c_str());
/* some append all files into one large file */
}
return createResponse("OK");
}
I've tried searching around the docs and using both synchronous and asynchronous endpoints.

Multiple StreamBuilder using same data source

Directly connecting to websocket using Streambuilder works seamlessly but I tried to make the stream part of the provider so that I can access the stream data in multiple widgets without encountering the "Bad State: Stream has already been listened to".
Is this a best way of handling multistreaming of data, if not what are my options?
Websocket server is part of Django
Code for provider is mentioned below
late final WebSocketChannel _fpdSockets;
Map _webSocketMessages = {};
Map get webSocketMessages {
return _webSocketMessages;
}
WebSocketStreamProvider()
: _fpdSockets = IOWebSocketChannel.connect(
Uri.parse('ws://127.0.0.1:8000/ws/socket-server/'),
);
Stream<Map<String, dynamic>> get dataStream => _fpdSockets.stream
.asBroadcastStream()
.map<Map<String, dynamic>>((value) => (jsonDecode(value)));
void sendDataToServer(dataToServer) {
print("Sending Data");
_fpdSockets.sink.add(
jsonEncode(dataToServer),
);
}
void closeConnection() {
_fpdSockets.sink.close();
}
handleMessages(data) {
print(data);
_webSocketMessages = data;
// notifyListeners();
}
}

How to enter data from Spring Boot Application into Amazon Kinesis?

I want to add data into kinesis using Sprint Boot Application and React. I am a complete beginner when it comes to Kinesis, AWS, etc. so a beginner friendly guide would be appriciated.
To add data records into an Amazon Kinesis data stream from a Spring BOOT app, you can use the AWS SDK for Java V2 and specifically the Amazon Kinesis Java API. You can use the software.amazon.awssdk.services.kinesis.KinesisClient.
Because you are a beginner, I recommend that you read the AWS SDK Java V2 Developer Guide to become familiar with how to work with this Java API. See Developer guide - AWS SDK for Java 2.x.
Here is a code example that shows you how to add data records using this Service Client. See Github that has the other required classes here.
package com.example.kinesis;
//snippet-start:[kinesis.java2.putrecord.import]
import software.amazon.awssdk.auth.credentials.ProfileCredentialsProvider;
import software.amazon.awssdk.core.SdkBytes;
import software.amazon.awssdk.regions.Region;
import software.amazon.awssdk.services.kinesis.KinesisClient;
import software.amazon.awssdk.services.kinesis.model.PutRecordRequest;
import software.amazon.awssdk.services.kinesis.model.KinesisException;
import software.amazon.awssdk.services.kinesis.model.DescribeStreamRequest;
import software.amazon.awssdk.services.kinesis.model.DescribeStreamResponse;
//snippet-end:[kinesis.java2.putrecord.import]
/**
* Before running this Java V2 code example, set up your development environment, including your credentials.
*
* For more information, see the following documentation topic:
*
* https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/get-started.html
*/
public class StockTradesWriter {
public static void main(String[] args) {
final String usage = "\n" +
"Usage:\n" +
" <streamName>\n\n" +
"Where:\n" +
" streamName - The Amazon Kinesis data stream to which records are written (for example, StockTradeStream)\n\n";
if (args.length != 1) {
System.out.println(usage);
System.exit(1);
}
String streamName = args[0];
Region region = Region.US_EAST_1;
KinesisClient kinesisClient = KinesisClient.builder()
.region(region)
.credentialsProvider(ProfileCredentialsProvider.create())
.build();
// Ensure that the Kinesis Stream is valid.
validateStream(kinesisClient, streamName);
setStockData( kinesisClient, streamName);
kinesisClient.close();
}
// snippet-start:[kinesis.java2.putrecord.main]
public static void setStockData( KinesisClient kinesisClient, String streamName) {
try {
// Repeatedly send stock trades with a 100 milliseconds wait in between
StockTradeGenerator stockTradeGenerator = new StockTradeGenerator();
// Put in 50 Records for this example
int index = 50;
for (int x=0; x<index; x++){
StockTrade trade = stockTradeGenerator.getRandomTrade();
sendStockTrade(trade, kinesisClient, streamName);
Thread.sleep(100);
}
} catch (KinesisException | InterruptedException e) {
System.err.println(e.getMessage());
System.exit(1);
}
System.out.println("Done");
}
private static void sendStockTrade(StockTrade trade, KinesisClient kinesisClient,
String streamName) {
byte[] bytes = trade.toJsonAsBytes();
// The bytes could be null if there is an issue with the JSON serialization by the Jackson JSON library.
if (bytes == null) {
System.out.println("Could not get JSON bytes for stock trade");
return;
}
System.out.println("Putting trade: " + trade);
PutRecordRequest request = PutRecordRequest.builder()
.partitionKey(trade.getTickerSymbol()) // We use the ticker symbol as the partition key, explained in the Supplemental Information section below.
.streamName(streamName)
.data(SdkBytes.fromByteArray(bytes))
.build();
try {
kinesisClient.putRecord(request);
} catch (KinesisException e) {
e.getMessage();
}
}
private static void validateStream(KinesisClient kinesisClient, String streamName) {
try {
DescribeStreamRequest describeStreamRequest = DescribeStreamRequest.builder()
.streamName(streamName)
.build();
DescribeStreamResponse describeStreamResponse = kinesisClient.describeStream(describeStreamRequest);
if(!describeStreamResponse.streamDescription().streamStatus().toString().equals("ACTIVE")) {
System.err.println("Stream " + streamName + " is not active. Please wait a few moments and try again.");
System.exit(1);
}
}catch (KinesisException e) {
System.err.println("Error found while describing the stream " + streamName);
System.err.println(e);
System.exit(1);
}
}
// snippet-end:[kinesis.java2.putrecord.main]
}

How to use sync token on Google People API

I cannot really find an example on how to use this.
Right now, I'm doing like this:
// Request 10 connections.
ListConnectionsResponse response = peopleService.people().connections()
.list("people/me")
.setRequestSyncToken(true)
.setPageSize(10)
.setPersonFields("names,emailAddresses")
.execute();
I make some changes to my contacts (adding, removing, updating), then I do this:
// Request 10 connections.
ListConnectionsResponse response2 = peopleService.people().connections()
.list("people/me")
.setSyncToken(response.getNextSyncToken())
.setPageSize(10)
.setPersonFields("names,emailAddresses")
.execute();
But it seems like I cannot get the changes I've done earlier, not even if I do them directly from the UI. I'm pretty sure I'm using the sync token in the wrong way.
Update (19/02/2020): In this example I call the API requesting the sync token in the first request (I successfully get the contacts), pause the execution (by breakpoint), delete a contact and update another one (from the web page), resume the execution and then I call the API again with the sync token that I extracted from the previous call. The result is that no change was made for some reason:
// Build a new authorized API client service.
final NetHttpTransport HTTP_TRANSPORT = GoogleNetHttpTransport.newTrustedTransport();
PeopleService peopleService = new PeopleService.Builder(HTTP_TRANSPORT, JSON_FACTORY, getCredentials(HTTP_TRANSPORT))
.setApplicationName(APPLICATION_NAME)
.build();
// Request 10 connections.
ListConnectionsResponse response = peopleService.people().connections()
.list("people/me")
.setPageSize(10)
.setPersonFields("names,emailAddresses")
.setRequestSyncToken(true)
.execute();
// Print display name of connections if available.
List<Person> connections = response.getConnections();
if (connections != null && connections.size() > 0) {
for (Person person : connections) {
List<Name> names = person.getNames();
if (names != null && names.size() > 0) {
System.out.println("Name: " + person.getNames().get(0)
.getDisplayName());
} else {
System.out.println("No names available for connection.");
}
}
} else {
System.out.println("No connections found.");
}
// CORRECT: 2 CONTACTS PRINTED
// CORRECT: THE SYNC TOKEN IS THERE
String syncToken = response.getNextSyncToken();
System.out.println("syncToken = "+syncToken);
// I SETUP A BREAKPOINT BELOW, I DELETE ONE CONTACT AND EDIT ANOTHER AND THEN I RESUME THE EXECUTING
// Request 10 connections.
response = peopleService.people().connections()
.list("people/me")
.setPageSize(10)
.setPersonFields("names,emailAddresses")
.setSyncToken(syncToken)
.execute();
// Print display name of connections if available.
connections = response.getConnections();
if (connections != null && connections.size() > 0) {
for (Person person : connections) {
List<Name> names = person.getNames();
if (names != null && names.size() > 0) {
System.out.println("Name: " + person.getNames().get(0)
.getDisplayName());
} else {
System.out.println("No names available for connection.");
}
}
} else {
System.out.println("No connections found.");
}
// WRONG: I GET "NO CONNECTIONS FOUND"
Something I've found out is that, when requesting or setting a sync token, you must iterate the entirety of the contacts for the nextSyncToken to be populated.
That means that as long as there is a nextPageToken (wink wink setPageSize(10)), the sync token will not be populated.
You could either:
A) Loop over all the contacts using your current
pagination, doing whatever you need to do at every
iteration, and after the last call retrieve the populated
sync token.
B) Iterate over all the contacts in one go, using the max
page size of 2000 and a single personField, retrieve the
token, and then do whatever you need to do. Note that if
you are expecting a user to have more than 2000
contacts, you will still need to call the next pages using
the nextPageToken.
Here is an exemple of a sync loop, adapted from Synchronize Resources Efficiently. Note that I usually use the Python client, so this Java code might not be 100% error free:
private static void run() throws IOException {
Request request = people_service.people().connections()
.list("people/me")
.setPageSize(10)
.setPersonFields("names,emailAddresses");
// Load the sync token stored from the last execution, if any.
// The syncSettingsDataStore is whatever you use for storage.
String syncToken = syncSettingsDataStore.get(SYNC_TOKEN_KEY);
String syncType = null;
// Perform the appropiate sync
if (syncToken == null) {
// Perform a full sync
request.setRequestSyncToken(true);
syncType = "FULL";
} else {
// Try to perform an incremental sync.
request.setSyncToken(syncToken);
syncType = "INCREMENTAL";
}
String pageToken = null;
ListConnectionsResponse response = null;
List<Person> contacts = null;
// Iterate over all the contacts, page by page.
do {
request.setPageToken(pageToken);
try {
response = request.execute();
} catch (GoogleJsonResponseException e) {
if (e.getStatusCode() == 410) {
// A 410 status code, "Gone", indicates that the sync token is
// invalid/expired.
// WARNING: The code is 400 in the Python client. I think the
// Java client uses the correct code, but be on the lookout.
// Clear the sync token.
syncSettingsDataStore.delete(SYNC_TOKEN_KEY);
// And anything else you need before re-syncing.
dataStore.clear();
// Restart
run();
} else {
throw e;
}
}
contacts = response.getItems();
if (contacts.size() == 0) {
System.out.println("No contacts to sync.");
} else if (syncType == "FULL"){
//do full sync for this page.
} else if (syncType == "INCREMENTAL") {
//do incremental sync for this page.
} else {
// What are you doing here???
}
pageToken = response.getNextPageToken();
} while (pageToken != null);
// Store the sync token from the last request for use at the next execution.
syncSettingsDataStore.set(SYNC_TOKEN_KEY, response.getNextSyncToken());
System.out.println("Sync complete.");
}

cpprestsdk: handle chunked response

How should I handle chunked response using cpprestsdk? How to request the next chunk? Is there required functionality at all there?
Here is how we are performing http requests:
web::http::http_request request(web::http::methods::GET);
request.headers().add(LR"(User-Agent)", LR"(ExchangeServicesClient/15.00.0847.030)");
request.headers().add(LR"(Accept)", LR"(text/xml)");
request.set_body(L"request body", L"text/xml");
web::http::client::http_client_config clientConfig;
clientConfig.set_credentials(web::credentials(L"username", L"pass"));
clientConfig.set_validate_certificates(true);
web::http::client::http_client client(L"serviceurl", clientConfig);
auto bodyTask = client.request(request)
.then([](web::http::http_response response) {
auto str = response.extract_string().get();
return str;
});
auto body = bodyTask.get();
If I'm trying naively to perform another request just after this one then I got an error:
WinHttpSendRequest: 5023: The group or resource is not in the correct state to perform the
requested operation.
In order to read received data in chunks, one needs to get the input stream from the server response
concurrency::streams::istream bodyStream = response.body();
then read continuously from that stream until a given char is found or the number of specified bytes is read
pplx::task<void> repeat(Concurrency::streams::istream bodyStream)
{
Concurrency::streams::container_buffer<std::string> buffer;
return pplx::create_task([=] {
auto t = bodyStream.read_to_delim(buffer, '\n').get();
std::cout << buffer.collection() << std::endl;
return t;
}).then([=](int /*bytesRead*/) {
if (bodyStream.is_eof()) {
return pplx::create_task([]{});
}
return repeat(bodyStream);
});
}
Here is the full sample: https://github.com/cristeab/oanda_stream