Why does an assert(s_monitors) in MonitoringManager::OnRequestSucceeded() fail? - c++

I upload a file to S3. Directly after the request I get an exception from the MonitoringManager and I don't know what I am doing wrong. We are using multiple threads in our application.
Exception: Assertion failed. Program: ... Monitor...ger.cpp Line 55
Expresion: s_monitors
The cpp file: https://github.com/aws/aws-sdk-cpp/blob/master/aws-cpp-sdk-core/source/monitoring/MonitoringManager.cpp Line 55
uploadFileToS3(...);
method 'uploadFileToS3':
bool result = false;
const Aws::SDKOptions options;
Aws::InitAPI(options);
{
std::shared_ptr<Aws::Utils::Threading::Executor> m_executor = Aws::MakeShared<Aws::Utils::Threading::PooledThreadExecutor>("TransferTests", 4);
Aws::Transfer::TransferManagerConfiguration config(m_executor.get());
config.s3Client = client;
auto transmanager = Aws::Transfer::TransferManager::Create(config);
std::shared_ptr<Aws::Transfer::TransferHandle> handle = transmanager->UploadFile(fileDestination, Aws::String(S3_BUCKET_NAME),
Aws::String(s3key), Aws::String("multipart/form-data"), metadata);
handle->WaitUntilFinished();
result = isAwsActionSuccessful(handle) && boost::filesystem::remove(fileDestination);
}
Aws::ShutdownAPI(options);
return result;

The issue was that my application used multiple threads so the API was initialized and shutdown multiple times. The problem was solved when I executed the initialize / shutdown of the API just once in my application.

Related

electron: ui and backend processes accessing the same log file on Windows

Goal
My electron-based app uses a C++ backend, which keeps a log file. I'd love to show the file content on a page of my Electron frontend.
The macOS version works as expected. I simply use node.js fs and readline libraries and to read the file on the fly, and then insert the parsed text into innerHTML.
Problem
However, on Windows, the log file seems to be locked by the backend while the CRT fopen calls use appending mode "a". So node.js keeps getting exception
EBUSY: resource busy or locked open '/path/to/my.log'
To make it worse, I use a thirdparty lib for logging and it's internal is not that easy to hack.
Code
Here is the Electron-side of code
function OnLoad() {
let logFile = Path.join(__dirname, 'logs', platformDirs[process.platform], 'my.log');
let logElem = document.querySelector('.log');
processLineByLine(logFile, logElem);
}
//
// helpers
//
async function processLineByLine(txtFile, outElement) {
const fileStream = fs.createReadStream(txtFile);
const rl = readline.createInterface({
input: fileStream,
crlfDelay: Infinity
});
// Note: we use the crlfDelay option to recognize all instances of CR LF
// ('\r\n') in input.txt as a single line break.
for await (const line of rl) {
// Each line in input.txt will be successively available here as `line`.
console.log(`Line from file: ${line}`);
outElement.innerHTML += line + '<br>';
}
}
Here is the backend side of code
inline bool OpenLogFile(FILE** ppLogFile) {
TCHAR logPath[MAX_PATH];
DWORD length = GetModuleFileName(NULL, logPath, MAX_PATH);
bool isPathValid = false;
#if (NTDDI_VERSION >= NTDDI_WIN8)
PathCchRemoveFileSpec(logPath, MAX_PATH);
HRESULT resPath = PathCchCombine(logPath, MAX_PATH, logPath, TEXT("my.log"));
isPathValid = (resPath == S_OK);
#else
PathRemoveFileSpec(logPath);
LPWSTR resPath = PathCombine(logPath, logPath, TEXT("my.log"));
isPathValid = (resPath != NULL)
#endif
if (!isPathValid)
return false;
errno_t res = _wfopen_s(ppLogFile, logPath, L"a");
if (res != 0) {
wprintf(TEXT("Error: Failed to open log file: %s"), GetOSErrStr().c_str());
}
return res == 0;
}
Question
Is this an inherent problem with my architecture?
Should I forget about accessing the log file from frontend/backend processes at the same time?
I thought about using a message queue for sharing logs between the frontend and backend processes, but that'd make logging more complex and bug prone.
Is there an easy way to have the same logging experience as with macOS?
Solved it myself.
I must use another Win32 API _wfsopen that provides more sharing options.
In my case, the following change is sufficient
*ppLogFile = _wfsopen(logPath, L"a+", _SH_DENYWR);
This answer helped.

gRPC: What are the best practices for long-running streaming?

We've implemented a Java gRPC service that runs in the cloud, with an unidirectional (client to server) streaming RPC which looks like:
rpc PushUpdates(stream Update) returns (Ack);
A C++ client (a mobile device) calls this rpc as soon as it boots up, to continuously send an update every 30 or so seconds, perpetually as long as the device is up and running.
ChannelArguments chan_args;
// this will be secure channel eventually
auto channel_p = CreateCustomChannel(remote_addr, InsecureChannelCredentials(), chan_args);
auto stub_p = DialTcc::NewStub(channel_p);
// ...
Ack ack;
auto strm_ctxt_p = make_unique<ClientContext>();
auto strm_p = stub_p->PushUpdates(strm_ctxt_p.get(), &ack);
// ...
While(true) {
// wait until we are ready to send a new update
Update updt;
// populate updt;
if(!strm_p->Write(updt)) {
// stream is not kosher, create a new one and restart
break;
}
}
Now different kinds of network interruptions happen while this is happening:
the gRPC service running in the cloud may go down (for maintenance) or may simply become unreachable.
the device's own ip address keeps changing as it is a mobile device.
We've seen that on such events, neither the channel, nor the Write() API is able to detect network disconnection reliably. At times the client keep calling Write() (which doesn't return false) but the server doesn't receive any data (wireshark doesn't show any activity at the outgoing port of the client device).
What are the best practices to recover in such cases, so that the server starts receiving the updates within X seconds from the time when such an event occurs? It is understandable that there would loss of X seconds worth data whenever such an event happens, but we want to recover reliably within X seconds.
gRPC version: 1.30.2, Client: C++-14/Linux, Sever: Java/Linux
Here's how we've hacked this. I want to check if this can be made any better or anyone from gRPC can guide me about a better solution.
The protobuf for our service looks like this. It has an RPC for pinging the service, which is used frequently to test connectivity.
// Message used in IsAlive RPC
message Empty {}
// Acknowledgement sent by the service for updates received
message UpdateAck {}
// Messages streamed to the service by the client
message Update {
...
...
}
service GrpcService {
// for checking if we're able to connect
rpc Ping(Empty) returns (Empty);
// streaming RPC for pushing updates by client
rpc PushUpdate(stream Update) returns (UpdateAck);
}
Here is how the c++ client looks, which does the following:
Connect():
Create the stub for calling the RPCs, if the stub is nullptr.
Call Ping() in regular intervals until it is successful.
On success call PushUpdate(...) RPC to create a new stream.
On failure reset the stream to nullptr.
Stream(): Do the following a while(true) loop:
Get the update to be pushed.
Call Write(...) on the stream with the update to be pushed.
If Write(...) fails for any reason break and the control goes back to Connect().
Once in every 30 minutes (or some regular interval), reset everything (stub, channel, stream) to nullptr to start afresh. This is required because at times Write(...) does not fail even if there is no connection between the client and the service. Write(...) calls are successful but the outgoing port on the client does not show any activity on wireshark!
Here is the code:
constexpr GRPC_TIMEOUT_S = 10;
constexpr RESTART_INTERVAL_M = 15;
constexpr GRPC_KEEPALIVE_TIME_MS = 10000;
string root_ca, tls_key, tls_cert; // for SSL
string remote_addr = "https://remote.com:5445";
...
...
void ResetStreaming() {
if (stub_p) {
if (strm_p) { // graceful restart/stop, this pair of API are called together, in this order
if (!strm_p->WritesDone()) {
// Log a message
}
strm_p->Finish(); // Log if return value of this is NOT grpc::OK
}
strm_p = nullptr;
strm_ctxt_p = nullptr;
stub_p = nullptr;
channel_p = nullptr;
}
}
void CreateStub() {
if (!stub_p) {
ChannelArguments chan_args;
chan_args.SetInt(GRPC_ARG_KEEPALIVE_TIME_MS, GRPC_KEEPALIVE_TIME_MS);
channel_p = CreateCustomChannel(
remote_addr,
SslCredentials(SslCredentialsOptions{root_ca, tls_key, tls_cert}),
chan_args);
stub_p = GrpcService::NewStub(m_channel_p);
}
}
void Stream() {
const auto restart_time = steady_clock::now() + minutes(RESTART_INTERVAL_M);
while (!stop) {
// restart every RESTART_INTERVAL_M (15m) even if ALL IS WELL!!
if (steady_clock::now() > restart_time) {
break;
}
Update updt = GetUpdate(); // get the update to be sent
if (!stop) {
if (channel_p->GetState(true) == GRPC_CHANNEL_SHUTDOWN ||
!strm_p->Write(updt)) {
// could not write!!
return; // we will Connect() again
}
}
}
// stopped due to stop = true or interval to create new stream has expired
ResetStreaming(); // channel, stub, stream are recreated once in every 15m
}
bool PingRemote() {
ClientContext ctxt;
ctxt.set_deadline(system_clock::now() + seconds(GRPC_TIMEOUT_S));
Empty req, resp;
CreateStub();
if (stub_p->Ping(&ctxt, req, &resp).ok()) {
static UpdateAck ack;
strm_ctxt_p = make_unique<ClientContext>(); // need new context
strm_p = stub_p->PushUpdate(strm_ctxt_p.get(), &ack);
return true;
}
if (strm_p) {
strm_p = nullptr;
strm_ctxt_p = nullptr;
}
return false;
}
void Connect() {
while (!stop) {
if (PingRemote() || stop) {
break;
}
sleep_for(seconds(5)); // wait before retrying
}
}
// set to true from another thread when we want to stop
atomic<bool> stop = false;
void StreamUntilStopped() {
if (stop) {
return;
}
strm_thread_p = make_unique<thread>([&] {
while (!stop) {
Connect();
Stream();
}
});
}
// called by the thread that sets stop = true
void Finish() {
strm_thread_p->join();
}
With this we are seeing that the streaming recovers within 15 minutes (or RESTART_INTERVAL_M) whenever there is a disruption for any reason. This code runs in a fast path, so I am curious to know if this can be made any better.

AWS C++ SDK UploadPart times out

I am try to upload a file to Amazon S3 using the AWS C++ SDK.
The call to CreateMultipartUpload returns successfully but the following call to UploadPart times out with the following error:
(Aws::String) m_message = "Unable to parse ExceptionName: RequestTimeout Message: Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed."
I don't understand why the initiate call works but not the part upload call. There clearly isn't any network issue.
This is my code:
bool FileUploader::uploadChunk() {
Aws::S3::Model::UploadPartRequest request;
request.SetBucket("video");
request.SetKey(_key);
request.SetUploadId(_file->uploadId);
request.SetPartNumber(_file->chunksUploaded + 1);
long file_pos = _file->chunksUploaded * CHUNK_SIZE;
_input_file.seekg(file_pos, std::ios::beg);
_input_file.read(_file_buf, CHUNK_SIZE);
long n_bytes = _input_file.gcount();
if(n_bytes > 0) {
request.SetContentLength(n_bytes);
char_array_buffer buf2(_file_buf, _file_buf + n_bytes);
std::iostream *chunk_stream = new std::iostream(&buf2);
request.SetBody(std::shared_ptr<std::iostream>(chunk_stream));
Aws::S3::Model::UploadPartOutcome response = _client->UploadPart(request);
if(response.IsSuccess()) {
_file->chunksUploaded++;
_uploader->updateUploadStatus(_file);
}
return response.IsSuccess();
}
else {
return false;
}
}
The problem was my method of obtaining a stream for SetBody. I switched to using the boost library instead of a homegrown approach.
typedef boost::iostreams::basic_array_source<char> Device;
boost::iostreams::stream_buffer<Device> stmbuf(_file_buf, n_bytes);
std::iostream *stm = new std::iostream(&stmbuf);
request.SetBody(std::shared_ptr<Aws::IOStream>(stm));
This works well.
I also needed to keep track of the parts I was uploading for the call to CompleteMultipartUpload as follows:
Aws::S3::Model::CompletedPart part;
part.SetPartNumber(request.GetPartNumber());
part.SetETag(response.GetResult().GetETag());
_uploadedParts.AddParts(part);
Alternatively, you can use the TransferManager interface which will do this for you. It has an IOStream interface. In addition we provide a preallocated buffer implementation for iostream:
https://github.com/aws/aws-sdk-cpp/blob/master/aws-cpp-sdk-core/include/aws/core/utils/stream/PreallocatedStreamBuf.h

Libcurl returns timeout error when used from multiple threads, but not from a single thread

I'm using libcurl with C++. I have made a thread-safe class for downloading webpages. Each call to static download method creates "easy" handle, performs the job and frees the handle. When I use it from the single thread - everything's fine. But when I spawn several threads to download several pages in parallel I sometimes (not fro every download, but quite often) get error saying "timeout". I have a reasonably high timeout configured (5 sec connection timeout and 25 sec global timeout).
Any ideas as to what might be the cause and how to fix it?
P. S. It happens on both Windows and Linux.
Here's the code of the method in question:
void CHttpDownloaderLibcurl::downloaderThread( const CUrl& url, CThreadSafeQueue<CHtmlPage>& q)
{
CHtmlPage page (url);
CURL* handle = curl_easy_init();
if (!handle)
{
assert(handle);
return;
}
int curlErr = setCurlOptions(handle, url, (void*)onCurlDownloadCallback, (void*)&page.byteArray());
if (CURLE_OK != curlErr)
{
assert("Error setting options" == (char*)curlErr);
return;
}
curlErr = curl_easy_perform(handle);
page._info = getInfo(handle);
curl_easy_cleanup(handle);
if (CURLE_OK != curlErr)
{
if (curlErr == CURLE_OPERATION_TIMEDOUT)
{
CLogger() << "Curl timeout!";
}
else
CLogger() << url.urlString() << ": Error performing download = " << curlErr;
return;
}
q.push(page);
}

Client application crash causes Server to crash? (C++)

I'm not sure if this is a known issue that I am running into, but I couldn't find a good search string that would give me any useful results.
Anyway, here's the basic rundown:
we've got a relatively simple application that takes data from a source (DB or file) and streams that data over TCP to connected clients as new data comes in. its a relatively low number of clients; i would say at max 10 clients per server, so we have the following rough design:
client: connect to server, set to read (with timeout set to higher than the server heartbeat message frequency). It blocks on read.
server: one listening thread that accepts connections and then spawns a writer thread to read from the data source and write to the client. The writer thread is also detached(using boost::thread so just call the .detach() function). It blocks on writes indefinetly, but does check errno for errors before writing. We start the servers using a single perl script and calling "fork" for each server process.
The problem(s):
at seemingly random times, the client will shutdown with a "connection terminated (SUCCESFUL)" indicating that the remote server shutdown the socket on purpose. However, when this happens the SERVER application ALSO closes, without any errors or anything. it just crashes.
Now, to further the problem, we have multiple instances of the server app being started by a startup script running different files and different ports. When ONE of the servers crashes like this, ALL the servers crash out.
Both the server and client using the same "Connection" library created in-house. It's mostly a C++ wrapper for the C socket calls.
here's some rough code for the write and read function in the Connection libary:
int connectionTimeout_read = 60 * 60 * 1000;
int Socket::readUntil(char* buf, int amount) const
{
int readyFds = epoll_wait(epfd,epEvents,1,connectionTimeout_read);
if(readyFds < 0)
{
status = convertFlagToStatus(errno);
return 0;
}
if(readyFds == 0)
{
status = CONNECTION_TIMEOUT;
return 0;
}
int fd = epEvents[0].data.fd;
if( fd != socket)
{
status = CONNECTION_INCORRECT_SOCKET;
return 0;
}
int rec = recv(fd,buf,amount,MSG_WAITALL);
if(rec == 0)
status = CONNECTION_CLOSED;
else if(rec < 0)
status = convertFlagToStatus(errno);
else
status = CONNECTION_NORMAL;
lastReadBytes = rec;
return rec;
}
int Socket::write(const void* buf, int size) const
{
int readyFds = epoll_wait(epfd,epEvents,1,-1);
if(readyFds < 0)
{
status = convertFlagToStatus(errno);
return 0;
}
if(readyFds == 0)
{
status = CONNECTION_TERMINATED;
return 0;
}
int fd = epEvents[0].data.fd;
if(fd != socket)
{
status = CONNECTION_INCORRECT_SOCKET;
return 0;
}
if(epEvents[0].events != EPOLLOUT)
{
status = CONNECTION_CLOSED;
return 0;
}
int bytesWrote = ::send(socket, buf, size,0);
if(bytesWrote < 0)
status = convertFlagToStatus(errno);
lastWriteBytes = bytesWrote;
return bytesWrote;
}
Any help solving this mystery bug would be great! at the VERY least, I would like it to NOT crash out the server even if the client crashes (which is really strange for me, since there is no two-way communication).
Also, for reference, here is the server listening code:
while(server.getStatus() == connection::CONNECTION_NORMAL)
{
connection::Socket s = server.listen();
if(s.getStatus() != connection::CONNECTION_NORMAL)
{
fprintf(stdout,"failed to accept a socket. error: %s\n",connection::getStatusString(s.getStatus()));
}
DATASOURCE* dataSource;
dataSource = open_datasource(XXXX); /* edited */ if(dataSource == NULL)
{
fprintf(stdout,"FATAL ERROR. DATASOURCE NOT FOUND\n");
return;
}
boost::thread fileSender(Sender(s,dataSource));
fileSender.detach();
}
...And also here is the spawned child sending thread:
::signal(SIGPIPE,SIG_IGN);
//const int headerNeeds = 29;
const int BUFFERSIZE = 2000;
char buf[BUFFERSIZE];
bool running = true;
while(running)
{
memset(buf,'\0',BUFFERSIZE*sizeof(char));
unsigned int readBytes = 0;
while((readBytes = read_datasource(buf,sizeof(unsigned char),BUFFERSIZE,dataSource)) == 0)
{
boost::this_thread::sleep(boost::posix_time::milliseconds(1000));
}
socket.write(buf,readBytes);
if(socket.getStatus() != connection::CONNECTION_NORMAL)
running = false;
}
fprintf(stdout,"socket error: %s\n",connection::getStatusString(socket.getStatus()));
socket.close();
fprintf(stdout,"sender exiting...\n");
Any insights would be welcome! Thanks in advance.
You've probably got everything backwards... when the server crashes, the OS will close all sockets. So the server crash happens first and causes the client to get the disconnect message (FIN flag in a TCP segment, actually), the crash is not a result of the socket closing.
Since you have multiple server processes crashing at the same time, I'd look at resources they share, and also any scheduled tasks that all servers would try to execute at the same time.
EDIT: You don't have a single client connecting to multiple servers, do you? Note that TCP connections are always bidirectional, so the server process does get feedback if a client disconnects. Some internet providers have even been caught generating RST packets on connections that fail some test for suspicious traffic.
Write a signal handler. Make sure it uses only raw I/O functions to log problems (open, write, close, not fwrite, not printf).
Check return values. Check for negative return value from write on a socket, but check all return values.
Thanks for all the comments and suggestions.
After looking through the code and adding the signal handling as Ben suggested, the applications themselves are far more stable. Thank you for all your input.
The original problem, however, was due to a rogue script that one of the admins was running as root that would randomly kill certain processes on the server-side machine (i won't get into what it was trying to do in reality; safe to say it was buggy).
Lesson learned: check the environment.
Thank you all for the advice.