Ok, so I'm a new with all the Kafka stuff and cppkafka library in particular. I'm trying to write my convenience wrapper on top of cppkafka. The producer side is quite straightforward and looks like it does what it supposed to do
cppkafka::Producer producer(cppkafka::Configuration{{"metadata.broker.list", "localhost:9092"}});
producer.produce(cppkafka::MessageBuilder(topic[0])
.partition((message_counter++) % partitions)
.payload(buffer.str()));
producer.flush();
Error handling and retries, here and below, were removed for brevity.
This code produces messages that I can see using whatever kafka UI.
The consumer side is somewhat more complicated
cppkafka::Consumer consumer(cppkafka::Configuration{
{"metadata.broker.list", "localhost:9092"},
{"group.id", "MyGroup"},
{"client.id", "MyClient"},
// {"auto.offset.reset", "smallest"},
// {"auto.offset.reset", "earliest"},
// {"enable.auto.offset.store", "false"},
{"enable.partition.eof", "false"},
{"enable.auto.commit", "false"}});
cppkafka::TopicPartitionList assignment;
consumer.set_assignment_callback([&, this](cppkafka::TopicPartitionList& partitions_) {
assignment = partitions_;
LOG_INFO(log, "Partitions assigned");
});
consumer.set_rebalance_error_callback([this](cppkafka::Error error) {
LOG_ERROR(log, "Rebalancing error. Reason: {}", error.to_string());
});
consumer.set_revocation_callback([&, this](const cppkafka::TopicPartitionList&) {
assignment.clear();
LOG_INFO(log, "Partitions revoked");
});
consumer.subscribe({topic});
auto start = std::chrono::high_resolution_clock::now();
while (some_condition)
{
auto subscriptions = consumer.get_subscription();
if (subscriptions.empty())
{
consumer.subscribe(topic);
}
cppkafka::Message msg = consumer.poll(100ms);
if (!msg)
{
if (!assignment.empty())
{
auto committed_offset = consumer.get_offsets_committed(consumer.get_assignment());
consumer.assign(committed_offset);
}
continue;
}
try
{
std::string_view payload_view(reinterpret_cast<const char *>(msg.get_payload().get_data()), msg.get_payload().get_size());
consumer.commit(msg);
}
....
}
Looks like this code not picking old uncommitted messages, the poll always returns null message. I have no idea whats going on here and why it leaves old messages in the queue. As one can see in the beginning of the last code snippet I've tried various consumer settings to no avail. What I'm missing here? Maybe I have to configure something on the Kafka side? Like, some topic settings? Or the problem with the consumer configuration?
One important (I think) detail. Messages may be added before any consumer connects to the Kafka. Maybe it is important?
Related
In my c++ project, I need to create Subjects having an initial value, that may be updated. On each subscription/update, subscribers may trigger then data processing... In a previous Angular (RxJS) project, this kind of behavior was handled with ReplaySubject(1).
I'm not able to reproduce this using c++ rxcpp lib.
I've looked up for documentation, snippets, tutorials, but without success.
Expected pseudocode (typescript):
private dataSub: ReplaySubject<Data> = new ReplaySubject<Data>(1);
private init = false;
// public Observable, immediatly share last published value
get currentData$(): Observable<Data> {
if (!this.init) {
return this.initData().pipe(switchMap(
() => this.dataSub.asObservable()
));
}
return this.dataSub.asObservable();
}
I think you are looking for rxcpp::subjects::replay.
Please try this:
auto coordination = rxcpp::observe_on_new_thread();
rxcpp::subjects::replay<int, decltype(coordination)> test(1, coordination);
// to emit data
test.get_observer().on_next(1);
test.get_observer().on_next(2);
test.get_observer().on_next(3);
// to subscribe
test.get_observable().subscribe([](auto && v)
printf("%d\n", v); // this will print '3'
});
Why is AWS SQS not a default connector for Apache Flink? Is there some technical limitation to doing this? Or was it just something that didn't get done? I want to implement this, any pointers would be appreciated
Probably too late for an answer to the original question... I wrote a SQS consumer as a SourceFunction, using the Java Messaging Service library for SQS:
SQSConsumer extends RichParallelSourceFunction<String> {
private volatile boolean isRunning;
private transient AmazonSQS sqs;
private transient SQSConnectionFactory connectionFactory;
private transient ExecutorService consumerExecutor;
#Override
public void open(Configuration parameters) throws Exception {
String region = ...
AWSCredentialsProvider credsProvider = ...
// may be use a blocking array backed thread pool to handle surges?
consumerExecutor = Executors.newCachedThreadPool();
ClientConfiguration clientConfig = PredefinedClientConfigurations.defaultConfig();
this.sqs = AmazonSQSAsyncClientBuilder.standard().withRegion(region).withCredentials(credsProvider)
.withClientConfiguration(clientConfig)
.withExecutorFactory(()->consumerExecutor).build();
this.connectionFactory = new SQSConnectionFactory(new ProviderConfiguration(), sqs);
this.isRunning = true;
}
#Override
public void run(SourceContext<String> ctx) throws Exception {
SQSConnection connection = connectionFactory.createConnection();
// ack each msg explicitly
Session session = connection.createSession(false, SQSSession.UNORDERED_ACKNOWLEDGE);
Queue queue = session.createQueue(<queueName>);
MessageConsumer msgConsumer = session.createConsumer(queue);
msgConsumer.setMessageListener(msg -> {
try {
String msgId = msg.getJMSMessageID();
String evt = ((TextMessage) msg).getText();
ctx.collect(evt);
msg.acknowledge();
} catch (JSMException e) {
// log and move on the next msg or bail with an exception
// have a dead letter queue is configured so this message is not lost
// msg is not acknowledged so it may be picked up again by another consumer instance
}
};
// check if we were canceled
if (!isRunning) {
return;
}
connection.start();
while (!consumerExecutor.awaitTermination(1, TimeUnit.MINUTES)) {
// keep waiting
}
}
#Override
public void cancel() {
isRunning = false;
// this method might be called before the task actually starts running
if (sqs != null) {
sqs.shutdown();
}
if(consumerExecutor != null) {
consumerExecutor.shutdown();
try {
consumerExecutor.awaitTermination(1, TimeUnit.MINUTES);
} catch (Exception e) {
//log e
}
}
}
#Override
public void close() throws Exception {
cancel();
super.close();
}
}
Note if you are using a standard SQS queue you may have to de-dup the messages depending on whether exactly-once guarantees are required.
Reference:
Working with JMS and Amazon SQS
At the moment, there is no connector for AWS SQS in Apache Flink. Have a look at the already existing connectors. I assume you already know about this, and would like to give some pointers. I was also looking for an SQS connector recently and found this mail thread.
Apache Kinesis Connector is somewhat similar to what you can implement on this. See whether you can get a start on this using this connector.
In regard to Indy 10 of IdHTTP, many things have been running perfectly, but there are a few things that don't work so well here. That is why, once again, I need your help.
Download button has been running perfectly. I'm using the following code :
void __fastcall TForm1::DownloadClick(TObject *Sender)
{
MyFile = SaveDialog->FileName;
TFileStream* Fist = new TFileStream(MyFile, fmCreate | fmShareDenyNone);
Download->Enabled = false;
Urlz = Edit1->Text;
Url->Caption = Urlz;
try
{
IdHTTP->Get(Edit1->Text, Fist);
IdHTTP->Connected();
IdHTTP->Response->ResponseCode = 200;
IdHTTP->ReadTimeout = 70000;
IdHTTP->ConnectTimeout = 70000;
IdHTTP->ReuseSocket;
Fist->Position = 0;
}
__finally
{
delete Fist;
Form1->Updated();
}
}
However, a "Cancel Resume" button is still can't resume interrupted downloads. Meant, it is always sending back the entire file every time I call Get() though I've used IdHTTP->Request->Ranges property.
I use the following code:
void __fastcall TForm1::CancelResumeClick(TObject *Sender)
{
MyFile = SaveDialog->FileName;;
TFileStream* TFist = new TFileStream(MyFile, fmCreate | fmShareDenyNone);
if (IdHTTP->Connected() == true)
{
IdHTTP->Disconnect();
CancelResume->Caption = "RESUME";
IdHTTP->Response->AcceptRanges = "Bytes";
}
else
{
try {
CancelResume->Caption = "CANCEL";
// IdHTTP->Request->Ranges == "0-100";
// IdHTTP->Request->Range = Format("bytes=%d-",ARRAYOFCONST((TFist->Position)));
IdHTTP->Request->Ranges->Add()->StartPos = TFist->Position;
IdHTTP->Get(Edit1->Text, TFist);
IdHTTP->Request->Referer = Edit1->Text;
IdHTTP->ConnectTimeout = 70000;
IdHTTP->ReadTimeout = 70000;
}
__finally {
delete TFist;
}
}
Meanwhile, by using the FormatBytes function, found here, has been able to shows only the size of download files. But still unable to determine the speed of download or transfer speed.
I'm using the following code:
void __fastcall TForm1::IdHTTPWork(TObject *ASender, TWorkMode AWorkMode, __int64 AWorkCount)
{
__int64 Romeo = 0;
Romeo = IdHTTP->Response->ContentStream->Position;
// Romeo = AWorkCount;
Download->Caption = FormatBytes(Romeo) + " (" + IntToStr(Romeo) + " Bytes)";
ForSpeed->Caption = FormatBytes(Romeo);
ProgressBar->Position = AWorkCount;
ProgressBar->Update();
Form1->Updated();
}
Please advise and give an example. Any help would sure be appreciated!
In your DownloadClick() method:
Calling Connected() is useless, since you don't do anything with the result. Nor is there any guarantee that the connection will remain connected, as the server could send a Connection: close response header. I don't see anything in your code that is asking for HTTP keep-alives. Let TIdHTTP manage the connection for you.
You are forcing the Response->ResponseCode to 200. Don't do that. Respect the response code that the server actually sent. The fact that no exception was raised means the response was successful whether it is 200 or 206.
You are reading the ReuseSocket property value and ignoring it.
There is no need to reset the Fist->Position property to 0 before closing the file.
Now, with that said, your CancelResumeClick() method has many issues.
You are using the fmCreate flag when opening the file. If the file already exists, you will overwrite it from scratch, thus TFist->Position will ALWAYS be 0. Use fmOpenReadWrite instead so an existing file will open as-is. And then you have to seek to the end of the file to provide the correct Position to the Ranges header.
You are relying on the socket's Connected() state to make decisions. DO NOT do that. The connection may be gone after the previous response, or may have timed out and been closed before the new request is made. The file can still be resumed either way. HTTP is stateless. It does not matter if the socket remains open between requests, or is closed in between. Every request is self-contained. Use information provided in the previous response to govern the next request. Not the socket state.
You are modifying the value of the Response->AcceptRanges property, instead of using the value provided by the previous response. The server tells you if the file supports resuming, so you have to remember that value, or query it before then attempting to resumed download.
When you actually call Get(), the server may or may not respect the requested Range, depending on whether the requested file supports byte ranges or not. If the server responds with a response code of 206, the requested range is accepted, and the server sends ONLY the requested bytes, so you need to APPEND them to your existing file. However, if the server response with a response code of 200, the server is sending the entire file from scratch, so you need to REPLACE your existing file with the new bytes. You are not taking that into account.
In your IdHTTPWork() method, in order to calculate the download/transfer speed, you have to keep track of how many bytes are actually being transferred in between each event firing. When the event is fired, save the current AWorkCount and tick count, and then the next time the event is fired, you can compare the new AWorkCount and current ticks to know how much time has elapsed and how many bytes were transferred. From those value, you can calculate the speed, and even the estimated time remaining.
As for your progress bar, you can't use AWorkCount alone to calculate a new position. That only works if you set the progress bar's Max to AWorkCountMax in the OnWorkBegin event, and that value is not always know before a download begins. You need to take into account the size of the file being downloaded, whether it is being downloaded fresh or being resumed, how many bytes are being requested during a resume, etc. So there is lot more work involved in displaying a progress bar for a HTTP download.
Now, to answer your two questions:
How to retrieve and save the download file to a disk by using its original name?
It is provided by the server in the filename parameter of the Content-Disposition header, and/or in the name parameter of the Content-Type header. If neither value is provided by the server, you can use the filename that is in the URL you are requesting. TIdHTTP has a URL property that provides the parsed version of the last requested URL.
However, since you are creating the file locally before sending your download request, you will have to create a local file using a temp filename, and then rename the local file after the download is complete. Otherwise, use TIdHTTP.Head() to determine the real filename (you can also use it to determine if resuming is supported) before creating the local file with that filename, then use TIdHTTP.Get() to download to that local file. Otherwise, download the file to memory using TMemoryStream instead of TFileStream, and then save with the desired filename when complete.
when I click http://get.videolan.org/vlc/2.2.1/win32/vlc-2.2.1-win32.exe then the server will process requests to its actual url. http://mirror.vodien.com/videolan/vlc/2.2.1/win32/vlc-2.2.1-win32.exe. The problem is that IdHTTP will not automatically grab through it.
That is because VideoLan is not using an HTTP redirect to send clients to the real URL (TIdHTTP supports HTTP redirects). VideoLan is using an HTML redirect instead (TIdHTTP does not support HTML redirects). When a webbrowser downloads the first URL, a 5 second countdown timer is displayed before the real download then begins. As such, you will have to manually detect that the server is sending you an HTML page instead of the real file (look at the TIdHTTP.Response.ContentType property for that), parse the HTML to determine the real URL, and then download it. This also means that you cannot download the first URL directly into your target local file, otherwise you will corrupt it, especially during a resume. You have to cache the server's response first, either to a temp file or to memory, so you can analyze it before deciding how to act on it. It also means you have to remember the real URL for resuming, you cannot resume the download using the original countdown URL.
Try something more like the following instead. It does not take into account for everything mentioned above (particularly speed/progress tracking, HTML redirects, etc), but should get you a little closer:
void __fastcall TForm1::DownloadClick(TObject *Sender)
{
Urlz = Edit1->Text;
Url->Caption = Urlz;
IdHTTP->Head(Urlz);
String FileName = IdHTTP->Response->RawHeaders->Params["Content-Disposition"]["filename"];
if (FileName.IsEmpty())
{
FileName = IdHTTP->Response->RawHeaders->Params["Content-Type"]["name"];
if (FileName.IsEmpty())
FileName = IdHTTP->URL->Document;
}
SaveDialog->FileName = FileName;
if (!SaveDialog->Execute()) return;
MyFile = SaveDialog->FileName;
TFileStream* Fist = new TFileStream(MyFile, fmCreate | fmShareDenyWrite);
try
{
try
{
Download->Enabled = false;
Resume->Enabled = false;
IdHTTP->Request->Clear();
//...
IdHTTP->ReadTimeout = 70000;
IdHTTP->ConnectTimeout = 70000;
IdHTTP->Get(Urlz, Fist);
}
__finally
{
delete Fist;
Download->Enabled = true;
Updated();
}
}
catch (const EIdHTTPProtocolException &)
{
DeleteFile(MyFile);
throw;
}
}
void __fastcall TForm1::ResumeClick(TObject *Sender)
{
TFileStream* Fist = new TFileStream(MyFile, fmOpenReadWrite | fmShareDenyWrite);
try
{
Download->Enabled = false;
Resume->Enabled = false;
IdHTTP->Request->Clear();
//...
Fist->Seek(0, soEnd);
IdHTTP->Request->Ranges->Add()->StartPos = Fist->Position;
IdHTTP->Request->Referer = Edit1->Text;
IdHTTP->ConnectTimeout = 70000;
IdHTTP->ReadTimeout = 70000;
IdHTTP->Get(Urlz, Fist);
}
__finally
{
delete Fist;
Download->Enabled = true;
Updated();
}
}
void __fastcall TForm1::IdHTTPHeadersAvailable(TObject*Sender, TIdHeaderList *AHeaders, bool &VContinue)
{
Resume->Enabled = ( ((IdHTTP->Response->ResponseCode == 200) || (IdHTTP->Response->ResponseCode == 206)) && TextIsSame(AHeaders->Values["Accept-Ranges"], "bytes") );
if ((IdHTTP->Response->ContentStream) && (IdHTTP->Request->Ranges->Count > 0) && (IdHTTP->Response->ResponseCode == 200))
IdHTTP->Response->ContentStream->Size = 0;
}
#Romeo:
Also, you can try a following function to determine the real download filename.
I've translated this to C++ based on the RRUZ'function. So far so good, I'm using it on my simple IdHTTP download program, too.
But, this translation result is of course still need value improvement input from Remy Lebeau, RRUZ, or any other master here.
String __fastcall GetRemoteFileName(const String URI)
{
String result;
try
{
TIdHTTP* HTTP = new TIdHTTP(NULL);
try
{
HTTP->Head(URI);
result = HTTP->Response->RawHeaders->Params["Content-Disposition"]["filename"];
if (result.IsEmpty())
{
result = HTTP->Response->RawHeaders->Params["Content-Type"]["name"];
if (result.IsEmpty())
result = HTTP->URL->Document;
}
}
__finally
{
delete HTTP;
}
}
catch(const Exception &ex)
{
ShowMessage(const_cast<Exception&>(ex).ToString());
}
return result;
}
Im basically facing a blocking problem.
I have my server coded based on C++ Boost.ASIO using 8 threads since the server has 8 logical cores.
My problem is a thread may face 0.2~1.5 seconds of blocking on a MySQL query and I honestly don't know how to go around that since MySQL C++ Connector does not support asynchronous queries, and I don't know how to design the server "correctly" to use multiple threads for doing the queries.
This is where I'm asking for opinions of what to do in this case.
Create 100 threads for async' query sql?
Could I have an opinion from experts about this?
Okay, the proper solution to this would be to extend Asio and write a mysql_service implementation to integrate this. I was almost going to find out how this is done right away, but I wanted to get started using an "emulation".
The idea is to have
your business processes using an io_service (as you are already doing)
a database "facade" interface that dispatches async queries into a different queue (io_service) and posts the completion handler back onto the business_process io_service
A subtle tweak needed here you need to keep the io_service on the business process side from shutting down as soon as it's job queue is empty, since it might still be awaiting a response from the database layer.
So, modeling this into a quick demo:
namespace database
{
// data types
struct sql_statement { std::string dml; };
struct sql_response { std::string echo_dml; }; // TODO cover response codes, resultset data etc.
I hope you will forgive my gross simplifications :/
struct service
{
service(unsigned max_concurrent_requests = 10)
: work(io_service::work(service_)),
latency(mt19937(), uniform_int<int>(200, 1500)) // random 0.2 ~ 1.5s
{
for (unsigned i = 0; i < max_concurrent_requests; ++i)
svc_threads.create_thread(boost::bind(&io_service::run, &service_));
}
friend struct connection;
private:
void async_query(io_service& external, sql_statement query, boost::function<void(sql_response response)> completion_handler)
{
service_.post(bind(&service::do_async_query, this, ref(external), std::move(query), completion_handler));
}
void do_async_query(io_service& external, sql_statement q, boost::function<void(sql_response response)> completion_handler)
{
this_thread::sleep_for(chrono::milliseconds(latency())); // simulate the latency of a db-roundtrip
external.post(bind(completion_handler, sql_response { q.dml }));
}
io_service service_;
thread_group svc_threads; // note the order of declaration
optional<io_service::work> work;
// for random delay
random::variate_generator<mt19937, uniform_int<int> > latency;
};
The service is what coordinates a maximum number of concurrent requests (on the "database io_service" side) and ping/pongs the completion back onto another io_service (the async_query/do_async_query combo). This stub implementation emulates latencies of 0.2~1.5s in the obvious way :)
Now comes the client "facade"
struct connection
{
connection(int connection_id, io_service& external, service& svc)
: connection_id(connection_id),
external_(external),
db_service_(svc)
{ }
void async_query(sql_statement query, boost::function<void(sql_response response)> completion_handler)
{
db_service_.async_query(external_, std::move(query), completion_handler);
}
private:
int connection_id;
io_service& external_;
service& db_service_;
};
connection is really only a convenience so we don't have to explicitly deal with various queues on the calling site.
Now, let's implement a demo business process in good old Asio style:
namespace domain
{
struct business_process : id_generator
{
business_process(io_service& app_service, database::service& db_service_)
: id(generate_id()), phase(0),
in_progress(io_service::work(app_service)),
db(id, app_service, db_service_)
{
app_service.post([=] { start_select(); });
}
private:
int id, phase;
optional<io_service::work> in_progress;
database::connection db;
void start_select() {
db.async_query({ "select * from tasks where completed = false" }, [=] (database::sql_response r) { handle_db_response(r); });
}
void handle_db_response(database::sql_response r) {
if (phase++ < 4)
{
if ((id + phase) % 3 == 0) // vary the behaviour slightly
{
db.async_query({ "insert into tasks (text, completed) values ('hello', false)" }, [=] (database::sql_response r) { handle_db_response(r); });
} else
{
db.async_query({ "update * tasks set text = 'update' where id = 123" }, [=] (database::sql_response r) { handle_db_response(r); });
}
} else
{
in_progress.reset();
lock_guard<mutex> lk(console_mx);
std::cout << "business_process " << id << " has completed its work\n";
}
}
};
}
This business process starts by posting itself on the app service. It then does a number of db queries in succession, and eventually exits (by doing in_progress.reset() the app service is made aware of this).
A demonstration main, starting 10 business processes on a single thread:
int main()
{
io_service app;
database::service db;
ptr_vector<domain::business_process> bps;
for (int i = 0; i < 10; ++i)
{
bps.push_back(new domain::business_process(app, db));
}
app.run();
}
In my sample, business_processes don't do any CPU intensive work, so there's no use in scheduling them across CPU's, but if you wanted you could easily achieve this, by replacing the app.run() line with:
thread_group g;
for (unsigned i = 0; i < thread::hardware_concurrency(); ++i)
g.create_thread(boost::bind(&io_service::run, &app));
g.join_all();
See the demo running Live On Coliru
I'm not a MySQL guru, but the following is generic multithreading advice.
Having NumberOfThreads == NumberOfCores is appropriate when none of the threads ever block and you are just splitting the load over all CPUs.
A common pattern is to have multiple threads per CPU, so one is executing while another is waiting on something.
In your case, I'd be inclined to set NumberOfThreads = n * NumberOfCores where 'n' is read from a config file, a registry entry or some other user-settable value. You can test the system with different values of 'n' to fund the optimum. I'd suggest somewhere around 3 for a first guess.
So the mongo c++ documentation says
On a failover situation, expect at least one operation to return an
error (throw an exception) before the failover is complete. Operations
are not retried
Kind of annoying, but that leaves it up to me to handle a failed operation. Ideally I would just like the application to sleep for a few seconds (app is single threaded). And retry with the hopes that a new primary mongod is established. In the case of a second failure, well I take it the connection is truly messed up and I just want to thrown an exception.
Within my MongodbManager class this means all operations have this kind of double try/catch block set up. I was wondering if there is a more elegant solution?
Example method:
template <typename T>
std::string
MongoManager::insert(std::string ns, T object)
{
mongo::BSONObj = convertToBson(object);
std::string result;
try {
connection_->insert(ns, oo); //connection_ = shared_ptr<DBClientReplicaSet>
result = connection_->getLastError();
lastOpSucceeded_ = true;
}
catch (mongo::SocketException& ex)
{
lastOpSucceeded_ = false;
boost::this_thread::sleep( boost::posix_time::seconds(5) );
}
// try again?
if (!lastOpSucceeded_) {
try {
connection_->insert(ns, oo);
result = connection_->getLastError();
lastOpSucceeded_ = true;
}
catch (mongo::SocketException& ex)
{
//do some clean up, throw exception
}
}
return result;
}
That's indeed sort of how you need to handle it. Perhaps instead of having two try/catch blocks I would use the following strategy:
keep a count of how many times you have tried
create a while loop with as terminator (count < 5 && lastOpSucceeded)
and then sleep with pow(2,count) to sleep more in every iteration.
And then when all else fails, bail out.