Some questions about protobuf - c++

We are building a RTB(real time bidding) platform. Using nginx as http server, the bidder is writen in lua, google protocol buffer for serializing data and Zlog for logs. After test runs, we got three error messages in the nginx error log:
"[libprotobuf Error, google/protobuf/wire_format.cc:1053]
String field contains invalid UTF-8 data when parsing a protocol buffer.
Use the 'bytes' type if you intend to send raw bytes."
So we went back to check the source code of protocol buffer, and found that this check is controlled by a macro(-DNDEBUG: it means NOT debug mode?, according to the comment). And -DNDEBUG disables GOOGLE_PROTOBUF_UTF8_VALIDATION(i think?). So, we enabled this macro(-DNDEBUG) in the configuration. However, after testing, we still got the same error message. And then, we changed all the "String" type to "Bytes" typr in XXX.proto. After testing, the same error message showed.
worker process 53574 exited on signal 11(core dumped),then process died.
lua entry thread aborted: runtime error:/home/bilin/rtb/src/lua/shared/log.lua:34: 'short' is not callable"
Hope somebody can help us solving those problems.
Thank you.

Related

Poco::Data::MySQL 'Got packets out of order' error

I got an ER_NET_PACKETS_OUT_OF_ORDER error when running a multithreaded C++ app using Poco::Data::MySQL and Poco::Data::SessionPool. The error message looks like this:
MySQL: [MySQL]: [Comment]: mysql_stmt_prepare error [mysql_stmt_error]: Got packets out of order [mysql_stmt_errno]: 1156 [mysql_stmt_sqlstate]: 08S01 [statemnt]: ...
The app is making queries from multiple threads every 100ms. The connections are provided by a common SessionPool.
I got around this problem by adding reset=true to the connection string. However, as stated in the official docs, adding this option may result in problems with encoding.

Twisted ssh - session execCommand implementation

Good day. I apologize for asking for obvious things because I'm writing in PHP and I know Python at the level "I started learning this yesterday". I've already spent a few days on this - but to no avail.
I downloaded twisted example of the SSH server for version 20.3 from here https://docs.twistedmatrix.com/en/twisted-20.3.0/conch/examples/. Line 162 has an execCommand method that I need to implement to make it work. Then I noticed a comment in this method "We don't support command execution sessions". Therefore, the question: Is this comment apply only to the example, or twisted library entirely. Ie, is it possible to implement this method to make the example server will work as I need?
More information. I don't think that this info is required to answer my questions above.
Why do I need it? I'm trying to compile an environment for writing functional (!) tests (there would be no such problems with the unit tests, I guess). Our API uses the SSH client (phpseclib / SSH2) by 30%+ of endpoints. Whatever I do, I had only 3 options of the results depending on how did I implement this method: (result: success, response: "" - empty; result: success, response: "1"; result: failed, response: "Unable to fulfill channel request at… SSH2.php:3853"). Those were for an SSH2 Client. If the error occurs (3rd case), the server shows logs in the terminal:
[SSHServerTransport, 0,127.0.0.1] Got remote error, code 11 reason: ""
[SSHServerTransport, 0,127.0.0.1] connection lost
I just found this works:
def execCommand(self, protocol, cmd):
protocol.write('Some text to return')
protocol.session.conn.sendEOF(protocol.session)
If I don't send EOF the client throws a timeout error.

Streaming MutationGroups into Spanner

I'm trying to stream MutationGroups into spanner with SpannerIO.
The goal is to write new MuationGroups every 10 seconds, as we will use spanner to query near-time KPI's.
When I don't use any windows, I get the following error:
Exception in thread "main" java.lang.IllegalStateException: GroupByKey cannot be applied to non-bounded PCollection in the GlobalWindow without a trigger. Use a Window.into or Window.triggering transform prior to GroupByKey.
at org.apache.beam.sdk.transforms.GroupByKey.applicableTo(GroupByKey.java:173)
at org.apache.beam.sdk.transforms.GroupByKey.expand(GroupByKey.java:204)
at org.apache.beam.sdk.transforms.GroupByKey.expand(GroupByKey.java:120)
at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:537)
at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:472)
at org.apache.beam.sdk.values.PCollection.apply(PCollection.java:286)
at org.apache.beam.sdk.transforms.Combine$PerKey.expand(Combine.java:1585)
at org.apache.beam.sdk.transforms.Combine$PerKey.expand(Combine.java:1470)
at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:537)
at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:491)
at org.apache.beam.sdk.values.PCollection.apply(PCollection.java:299)
at org.apache.beam.sdk.io.gcp.spanner.SpannerIO$WriteGrouped.expand(SpannerIO.java:868)
at org.apache.beam.sdk.io.gcp.spanner.SpannerIO$WriteGrouped.expand(SpannerIO.java:823)
at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:537)
at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:472)
at org.apache.beam.sdk.values.PCollection.apply(PCollection.java:286)
at quantum.base.transform.entity.spanner.SpannerProtoWrite.expand(SpannerProtoWrite.java:52)
at quantum.base.transform.entity.spanner.SpannerProtoWrite.expand(SpannerProtoWrite.java:20)
at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:537)
at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:491)
at org.apache.beam.sdk.values.PCollection.apply(PCollection.java:299)
at quantum.entitybuilder.pipeline.EntityBuilderPipeline$Write$SpannerWrite.expand(EntityBuilderPipeline.java:388)
at quantum.entitybuilder.pipeline.EntityBuilderPipeline$Write$SpannerWrite.expand(EntityBuilderPipeline.java:372)
at org.apache.beam.sdk.Pipeline.applyInternal(Pipeline.java:537)
at org.apache.beam.sdk.Pipeline.applyTransform(Pipeline.java:491)
at org.apache.beam.sdk.values.PCollection.apply(PCollection.java:299)
at quantum.entitybuilder.pipeline.EntityBuilderPipeline.main(EntityBuilderPipeline.java:122)
:entityBuilder FAILED
Because of the error above I assume the input collection needs to be windowed and triggered, as SpannerIO uses a GroupByKey (this is also what I need for my use case):
...
.apply("1-minute windows", Window.<MutationGroup>into(FixedWindows.of(Duration.standardMinutes(1)))
.triggering(Repeatedly.forever(AfterProcessingTime
.pastFirstElementInPane()
.plusDelayOf(Duration.standardSeconds(10))
).orFinally(AfterWatermark.pastEndOfWindow()))
.discardingFiredPanes()
.withAllowedLateness(Duration.ZERO))
.apply(SpannerIO.write()
.withProjectId(entityConfig.getSpannerProject())
.withInstanceId(entityConfig.getSpannerInstance())
.withDatabaseId(entityConfig.getSpannerDb())
.grouped());
When I do this, I get the following exceptions during runtime:
java.lang.IllegalArgumentException: Attempted to get side input window for GlobalWindow from non-global WindowFn
org.apache.beam.sdk.transforms.windowing.PartitioningWindowFn$1.getSideInputWindow(PartitioningWindowFn.java:49)
com.google.cloud.dataflow.worker.StreamingModeExecutionContext$StepContext.issueSideInputFetch(StreamingModeExecutionContext.java:631)
com.google.cloud.dataflow.worker.StreamingModeExecutionContext$UserStepContext.issueSideInputFetch(StreamingModeExecutionContext.java:683)
com.google.cloud.dataflow.worker.StreamingSideInputFetcher.storeIfBlocked(StreamingSideInputFetcher.java:182)
com.google.cloud.dataflow.worker.StreamingSideInputDoFnRunner.processElement(StreamingSideInputDoFnRunner.java:71)
com.google.cloud.dataflow.worker.SimpleParDoFn.processElement(SimpleParDoFn.java:323)
com.google.cloud.dataflow.worker.util.common.worker.ParDoOperation.process(ParDoOperation.java:43)
com.google.cloud.dataflow.worker.util.common.worker.OutputReceiver.process(OutputReceiver.java:48)
com.google.cloud.dataflow.worker.SimpleParDoFn$1.output(SimpleParDoFn.java:271)
org.apache.beam.runners.core.SimpleDoFnRunner.outputWindowedValue(SimpleDoFnRunner.java:219)
org.apache.beam.runners.core.SimpleDoFnRunner.access$700(SimpleDoFnRunner.java:69)
org.apache.beam.runners.core.SimpleDoFnRunner$DoFnProcessContext.output(SimpleDoFnRunner.java:517)
org.apache.beam.runners.core.SimpleDoFnRunner$DoFnProcessContext.output(SimpleDoFnRunner.java:505)
org.apache.beam.sdk.values.ValueWithRecordId$StripIdsDoFn.processElement(ValueWithRecordId.java:145)
After investigating further it appears to be due to the .apply(Wait.on(input)) in SpannerIO: It has a global side input which does not seem to work with my fixed windows, as the docs of Wait.java state:
If signal is globally windowed, main input must also be. This typically would be useful
* only in a batch pipeline, because the global window of an infinite PCollection never
* closes, so the wait signal will never be ready.
As a temporary workaround I tried the following:
add a GlobalWindow with triggers instead of fixed windows:
.apply("globalwindow", Window.<MutationGroup>into(new GlobalWindows())
.triggering(Repeatedly.forever(AfterProcessingTime
.pastFirstElementInPane()
.plusDelayOf(Duration.standardSeconds(10))
).orFinally(AfterWatermark.pastEndOfWindow()))
.discardingFiredPanes()
.withAllowedLateness(Duration.ZERO))
This results in writes to spanner only when I drain my pipeline. I have the impression the Wait.on() signal is only triggered when the Global windows closes, and doesn't work with triggers.
Disable the .apply(Wait.on(input)) in SpannerIO:
This results in the pipeline getting stuck on the view creation which
is described in this SO post:
SpannerIO Dataflow 2.3.0 stuck in CreateDataflowView.
When I check the worker logs for clues, I do get the following warnings:
logger: "org.apache.beam.sdk.coders.SerializableCoder"
message: "Can't verify serialized elements of type SpannerSchema have well defined equals method. This may produce incorrect results on some PipelineRunner
logger: "org.apache.beam.sdk.coders.SerializableCoder"
message: "Can't verify serialized elements of type BoundedSource have well defined equals method. This may produce incorrect results on some PipelineRunner"
Note that everything works with the DirectRunner and that I'm trying to use the DataflowRunner.
Does anyone have any other suggestions for things I can try to get this running? I can hardly imagine that I'm the only one trying to stream MutationGroups into spanner.
Thanks in advance!
Currently, SpannerIO connector is not supported with Beam Streaming. Please follow this Pull Request which adds streaming support for spanner IO connector.

Calling WSAGetLastError() from an IOCP thread return incorrect result

I have called WSARecv() which returned WSA_IO_PENDING. I have then sent an RST packet from the other end. The GetQueuedCompletionStatus() function which exists in another thread has returned FALSE as expected, but when I called WSAGetLastError() I got 64 instead of WSAECONNRESET.
So why WSAGetLastError() did not return WSAECONNRESET?
Edit:
I forgot to mention that when I call WSAGetLastError() directly after a failing WSARecv() (because of an RST packet being received), the error code returned is WSAECONNRESET and not 64.
So it looks like the error code returned depends on whether WSARecv() has failed directly after calling it, or has failed later when retrieving a completion packet.
This is a generic issue with IOCP, you are making a low-level call to the TCP/IP driver stack. Which, as all drivers do in Windows, report failure with NTSTATUS error codes. The expected error here is STATUS_CONNECTION_RESET.
These native error codes need to be translated to a winapi error code. This translation is normally context-sensitive, it depends on what winapi library issued the driver command. In other words, you can only ever get a WSAECONNRESET error back if it was the Winsock library that did the translation. But that's not what happened in your program, it was GetQueuedCompletionStatus() that handled the error.
Which is a generic helper function that handles IOCP for any device driver. There is no context, the OVERLAPPED structure is not nearly enough to indicate how the I/O request got started. Turn to this KB article, it documents the default mapping from NTSTATUS error codes to winapi error codes. The mapping that GetQueuedCompletionStatus() uses. Relevant entries in the list are:
STATUS_NETWORK_NAME_DELETED ERROR_NETNAME_DELETED
STATUS_LOCAL_DISCONNECT ERROR_NETNAME_DELETED
STATUS_REMOTE_DISCONNECT ERROR_NETNAME_DELETED
STATUS_ADDRESS_CLOSED ERROR_NETNAME_DELETED
STATUS_CONNECTION_DISCONNECTED ERROR_NETNAME_DELETED
STATUS_CONNECTION_RESET ERROR_NETNAME_DELETED
These were, ahem, not fantastic choices. Probably goes back to very early Windows, back when Lanman was the network layer of choice. WSAGetLastError() is pretty powerless to map ERROR_NETNAME_DELETED back to a WSA specific error, the NTSTATUS code was lost when GetQueuedCompletionStatus() set the "last error" code for the thread. So it doesn't, it just returns what it can.
What you'd expect is a WSAGetQueuedCompletionStatus() function so this error translation can happen correctly, using Winsock rules. There isn't one. These days I prefer to use the ultimate authority on how to write Windows code properly, the .NET Framework source as available from the Reference Source. I linked to the source for SocketAsyncEventArgs.CompletionCallback() method. Which contains the key:
// The Async IO completed with a failure.
// here we need to call WSAGetOverlappedResult() just so Marshal.GetLastWin32Error() will return the correct error.
bool success = UnsafeNclNativeMethods.OSSOCK.WSAGetOverlappedResult(
m_CurrentSocket.SafeHandle,
m_PtrNativeOverlapped,
out numBytes,
false,
out socketFlags);
socketError = (SocketError)Marshal.GetLastWin32Error();
Or in other words, you have to make an extra call to WSAGetOverlappedResult() to get the proper return value from GetLastError(). This is not very intuitive :)

How to debug "could not receive data from client: Connection reset by peer"

I'm running a django-celery application on Ubuntu-12.04.
When I run a celery task from my web interface, I get the following error, taken form postgresql-9.3 logfile (maximum level of log):
2013-11-12 13:57:01 GMT tss_usr 8113 LOG: could not receive data from client: Connection reset by peer
tss_usr is the postgresql user of the django application database and (in this example) 8113 is the pid of the process who killed the connection, I guess.
Have you got any idea on why this happens or at least how to debug this issue?
To make things work again I need to restart postgresql which is extremely uncomfortable.
I know this is an older post, but I just found it because I had the same error today in my postgres logs. I narrowed it down to a PDO select statement. I'm using Zend Framework 1.10.3 on Ubuntu Precise.
The following pdo statement generated an error if $opinion is a long text string. The column opinion is type Text in my postgres table. The query succeeds if $opinion is under a certain number of characters. 1000 characters works fine. 2000 characters fails with "could not receive data from client: Connection reset by peer".
$select = $this->db->select()
->from( 'datauserstopics' )
->where("opinion = ?",trim($opinion))
->where("datatopicsid = ?",trim($tid))
->where("datausersid= ?",$datausersid);
$stmt = $this->db->query($select);
I circumvented the problem by using:
->where("substr(opinion,1,100) = ?",trim(substr($opinion,1,100)))
This is not a perfect solution, but for my purposes, the select statement using substr() suffices.
Note that I have no problem inserting long strings into the same table/column. The disconnect problem only appears for me on the PDO select with relatively long text strings.
I'm getting it in 2017 with 9.4, I have no text fields, don't know what a PDO is. My select statement is about 50 bytes long, I'm trying to fetch an int4 and a double precision. I suspect the error message can mean multiple things.
I've since found https://dba.stackexchange.com/questions/142350/postgres-could-not-receive-data-from-client-connection-reset-by-peer which indicates it could be a problem with the client configuration. My client is libpg and PQconnectdb() is giving me a CONNECTION_OK return. It works at least partly.
For me, restarting the hypervisor where both the Postgres and the application using it helped. I've seen stack traces in dmesg before, though.