Check if data arrived OK

Check if data arrived OK - c++

I am using the boost::asio (tcp) to send and receive fixed size (100bytes) data from one pc to another. What's the best way to check if everything arrived OK without impacting performance?
One idea is to save the first and last character and put them first, so: "hello my...battle in the end" will became "hd hello my...battle in the end". The final string will be 102 characters and the receiver can perform a size check also
another idea is to use a hash but i guess this will be very intense for the cpu
Do you guys have any good idea?
NOTE: Please keep in mind: i will use this millions of times, every microsecond counts.
The data are words separated by spaces.

TCP is designed to be a reliable transmission protocol. Since you say you're using TCP, you can simply assume that if the data arrived and is of the full length, it arrived correctly.
If you're worried about data being corrupted in transmission beyond what TCP's 16-bit checksum can detect, you might add a 32-bit CRC to the end of your data.

Related

Most efficient way to use AWS SQS (with Golang)

When using the AWS SQS (Simple Queue Service) you pay for each request you make to the service (push, pull, ...). There is a maximum of 256kb for each message you can send to a queue.
To save money I'd like to buffer messages sent to my Go application before I send them out to SQS until I have enough data to efficiently use the 256kb limit.
Since my Go application is a webserver, my current idea is to use a string mutex and append messages as long as I would exceed the 256kb limit and then issue the SQS push event. To save even more space I could gzip every single message before appending it to the string mutex.
I wonder if there is some kind of gzip stream that I could use for this. My assumption is that gzipping all concatenated messages together will result in smaller size then gzipping every message before appending it to the string mutex. One way would be to gzip the string mutex after every append to validate its size. But that might be very slow.
Is there a better way? Or is there a total better approach involving channels? I'm still new to Go I have to admit.

I'd take the following approach
Use a channel to accept incoming "internal" messages to a go routine
In that go routine keep the messages in a "raw" format, so 10 messages is 10 raw uncompressed items
Each time a new raw item arrives, compress all the raw messages into one. If the size with the new message > 256k then compress messages EXCEPT the last one and push to SQS
This is computationally expensive. Each individual message causes a full compression of all pending messages. However it is efficient for SQS use

You could guesstimate the size of the gzipped messages and calculate whether you've reached the max size threshold. Keep track of a message size counter and for every new message increment the counter by it's expected compressed size. Do the actual compression and send to SQS only if your counter will exceed 256kb. So you could avoid compressing every time a new message comes in.
For a use-case like this, running a few tests on a sample set of messages should give the rough percentage of compression expected.

Before you get focused on compression, eliminate redundant data that is known on both sides. This is what encodings like msgpack, protobuf, AVRO, and so on do.
Let's say all of your messages are a struct like this:
type Foo struct {
bar string
qux int
}
and you were thinking of encoding it as JSON. Then the most efficient you could do is:
{"bar":"whatever","qux",123}
If you wanted to just append all of those together in memory, you might get something like this:
{"bar":"whatever","qux",123}{"bar":"hello world","qux",0}{"bar":"answer to life, the universe, and everything","qux",42}{"bar":"nice","qux",69}
A really good compression algorithm might look at hundreds of those messages and identify the repetitiveness of {"bar":" and ","qux",.
But compression has to do work to figure that out from your data each time.
If the receiving code already knows what "schema" (the {"bar": some_string, "qux": some_int} "shape" of your data) each message has, then you can just serialize the messages like this:
"whatever"123"hello world"0"answer to life, the universe, and everything"42"nice"69
Note that in this example encoding, you can't just start in the middle of the data and unambiguously find your place. If you have a bunch of messages such as {"bar":"1","qux":2}, {"bar":"2","qux":3}, {"bar":"3","qux":4}, then the encoding will produce: "1"2"2"3"3"4, and you can't just start in the middle and know for sure if you're looking at a number or a string - you have to count from the ends. Whether or not this matters will depend on your use case.
You can come up with other simple schemes that are more unambiguous or make the code for writing or reading messages easier or simpler, like using a field separator or message separator character which is escaped in your encoding of the other data (just like how \ and " would be escaped in quoted JSON strings).
If you can't have the receiver just know/hardcode the expected message schema - if you need the full flexibility of something like JSON and you always unmarshal into something like a map[string]interface{} or whatever - then you should consider using something like BSON.
Of course, you can't use msgpack, protobuf, AVRO, or BSON directly - they need a medium that allows arbitrary bytes like 0x0. And according to the AWS SQS FAQ:
Q: What kind of data can I include in a message?
Amazon SQS messages can contain up to 256 KB of text data, including XML, JSON and unformatted text. The following Unicode characters are accepted:
#x9 | #xA | #xD | [#x20 to #xD7FF] | [#xE000 to #xFFFD] | [#x10000 to #x10FFFF]
So if you want to aim for maximum space efficiency for your exact usecase, you'd have to write your own code which use the techniques from those encoding schemes, but only used bytes which bytes are allowed in SQS messages.
Relatedly, if you have a lot of integers, and you know most of them are small (or clump around a certain spot of the number line, so that by adding a constant offset to all of them you can make most of them small), you can use one of the variable length quantity techniques to encode all of those integers. In fact several of those common encoding schemes mentioned above use variable length quantities in their encoding of integers. If you use a "piece size" of six (6) bits (instead of the standard implicitly assumed piece size of eight (8) bits to match a byte) then you can use base64. Not full base64 encoding, because the padding will completely defeat the purpose - just map from the 64 possible values that fit in six bits to the 64 distinct ASCII characters that base64 uses.
Anyway, unless you know your data has a lot repetition (but not the kind that you can just not send, like the same field names in every message) I would start with all of that, and only then would I look at compression.
Even so, if you want minimal size, I would aim for LZMA, and if you want minimal computing overhead, I would use LZ4. Gzip is not bad per se - if it's much easier to use gzip then just use it - but if you're optimizing for either size or for speed, there are better options. I don't know if gzip is even a good "middle ground" of speed and output size and working memory size - it's pretty old and maybe there's compression algorithms which are just strictly superior in speed and output and memory size by now. I think gzip, depending on implementation, also includes headers and framing information (like version metadata, size, checksums, and so on), which if you really need to minimize for size you probably don't want, and in the context of SQS messages you probably don't need.

how to get txPower to calculate distance from RSSI

I got this code from google code :
void QBluetoothDeviceDiscoveryAgent::deviceDiscovered(const QBluetoothDeviceInfo &info)
QBluetoothDeviceInfo::rssi().
But how to get rssi distance from `QBluetoothServiceDiscoveryAgent ?
I tried with
QBluetoothServiceDiscoveryAgent serviceInfo;
quint i =serviceInfo.device().rssi();
here i = -43
how to convert it to distance?
I got the link
Understanding ibeacon distancing
but how to get the transmitter power? to calculate the distance according to formula?

Make sure you understood the implications of QBluetoothDeviceInfo::rssi(). Calling this functions returns immediately with the last stored value when the device was scanned last. If you only receive one advertisement-packet, which happens to be at e.x. -90dB, and then immediately connect, this function will keep returning -90 until you disconnect from it and scan it again. Connected devices usually don't send advertisement-packets so the RSSI you can read via Qt won't be updated during the connection.
As for proximity, it's not so easy to get good values. To accurately convert from RSSI to geometric distance you must know the sender's original/intended signal-strength (or TX-power-level == RSSI at 1m distance). This value will differ between devices. To make things worse, in practice it can also vary by a huge margin depending on things like the sender's battery-level, physical orientations of sender/receiver to eachother, quality of individual parts, random interference from other RF devices....
The BLE-folk has a blog explaining how you should do it. You can read it up here. The linked article doesn't read or assume the theoretical maximum RSSI of the sender but instead it propoposes to gather multiple RSSI-values over time (+ do some mean/mode filtering), and use the current mean-value in comparison with the previous value to determine if you are approaching or moving away from the sender. Paired with some fine-tuning using real-world data you gotta collect, plus documentation-reading and common-sense, you could probably develop a proximity calculation for many or even most sender-devices which would be accurate to about one meter or even less at close proximity. In the end it's a tradeoff between how many devices you wish to 'calibrate' for and those you are okay with having shifted values due to higher or lower TX-power-levels.
The downside being - you can't test for every possible device on the market and as I said earlier, different devices have different TX-power-levels. With this approach you can develop an algorithm to get pretty good measurements for devices which have approximately equal signal-configurations but others will seem far off. The article's author talks about creating different profiles for different vendors but that's not really gonna help (consider two identical beacons ("big/small"), one for large and one for small indoor locations - with RSSI alone you can't reliably determine if you're close to the small beacon or in medium range to the big one unless they identify themselves via GAP or otherwise (forget MAC-addresses if you plan to deploy on MacOS or iOS).
Also, prepare yourself for the joyride that is Android BLE development. Some vendors know that their BLE implementation is so terribly bad and broken, they even disabled the HCI-Logging-Feature on all their ROMs to hide it. Others can be BLE-nuked like Win98 by ethernet, back in the days.

Fastest way to transfer large arrays via sockets

I was trying to transfer large number of data (long int arrays) from multiple (8) remote computers to a single computer(main process). All these are connected via a 100 MBps LAN and are identical machines(so no worry about endianess).
Each remote machine has an array of 8GB long int's and I have to transmit it to the single computer for processing. My question is what is the best way to transfer these arrays quickly to the main process . I tried using traditional TCP to do this job and it takes a lot of time for transferring the data (about 28 minutes). Is there any way to boost this speed up? . Will switching to UDP help me? Will using multiple ports/sockets help me for buffering? Whats the best approach to solve such problems?
I probably cannot compress the data (as most of them are unique) and I need to send everything (as I carry out important operations in the main process)

First, upgrade your hardware. With 1GB NIC (or 10GB if you have the budget) and a decent switch you get 10x boost with no coding, transferring 8GB data takes about just one minute. Push it further with NIC bonding you double it again to just 30 seconds (or 60 times faster than your).
Next, adjust your algorithm, do you need to send the whole 8GB data frequently? Can you pipeline it, do in streaming way, or send only diffs (replica), so that you get good data processing throughput?
The last thing you can do is compression, and better do in chunks so that you don't compress the whole 8GB at once.

You can try to compress your array. There are several algorithm you can find and this post may help you. It provides an explanation for the three most known lossless algorithm :
1. Huffman a tree based algorithm it has a lot of applications and specialization
2. RLE for Run-length encoding is well suited for icons compression
3. LZ77 which use a dictionnary and is a basis to a lot of different algorithms
A lossless algorithm is what you need because you don't want to lose the datas in your array. That's why I wouldn't recommend UDP since it does not check if the data has been received.

Merging multiple video files into a single mpeg-ts file "on the fly"

First of all, sorry for my poor English. I am writting video streaming server in C++. I have multiple mpeg2-ts files (movies and advertisements) which I need to stream via HTTP as one single TS-FILE. The problem is that every mpeg-ts file has its own timestamps (PCR, PTS, DTS). And, as I understand, to make a continuous streaming flow, every new PCR (PTS, DTS) value should continue from the last PCR (PTS, DTS) value.
Here is a picture for better understanding of what I am saying about: http://i.stack.imgur.com/vL1m6.png (I can't include my picture directly in the message. Sorry)
I need to replace pcr`1, pcr`2, pcr`3 timestamps with the new ones. For example, I sent ts-packet containing the pcr3 timestamp and after a few more ts packets (not containing any value of PCR) I want to insert my advertisement. And my question is: how do I calculate the new values for pcr`1, pcr`2, pcr`3 and so on?
Is it correct to calculate the bitrate of the current video and then divide the amount of bits that the program have sent since the last PCR timestamp (in our case, it's pcr3) by this bitrate? I mean the following: (new timestamp) = (previous timestamp) + (the amount of bits) / (bitrate). Or is there a more efficient way to do it?
As for PTS and DTS timestamps, I read here that these timestamps can be non-linear. Is it will be correct to calculate it relative to the last original PCR that I received? I mean:
pts_new = (original_pts - last_original_pcr) + pcr_new.
dts_new = (original_dts - last_original_pcr) + pcr_new.
(original_pts - last_original_pcr) is the difference between pts and pcr values
pcr_new is the last modified pcr value
My program can read and edit these timestamps in mpeg-ts stream. Fortunately, there's a lot of literature on how to do it. But how do I calculate the new values for these timestamps?
I have just started to learn the specification of mpeg2-ts, and please correct my mistakes if I am wrong in something.
Thanks in advance. Hope you understood me.

Mpeg2 "Splicing" is an art-form, and is much more complicated than concatenating two streams. It requires manipulations which many companies have patented (http://www.google.com/patents/US6380991, http://www.google.com/patents/US6806909, http://www.google.com/patents/US6993081)
to answer some of your questions:
your calculation of the next pcr looks ok, although you need take into account many compliancy issues (etr290 for example)
for DTS/PTS you have much more work to do. the most basic splice will just restamp the ad's pts/dts in such a way that they continue from the last timestamp of the first TS.
ad first timestamp = last timestamp + frame interval
the trick lies within making sure that you have no "holes" in either the presentation time stamps or the decoding timestamps. This is the difficult part and requires deep understanding in MPEG2 buffers (tstd, eb, mb).
Good luck.

Real time plotting/data logging

I'm going to write a program that plots data from a sensor connected to the computer. The sensor value is going to be plotted as a function of the time (sensor value on the y-axis, time on the x-axis). I want to be able to add new values to the plot in real time. What would be best to do this with in C++?
Edit: And by the way, the program will be running on a Linux machine

Are you particularly concerned about the C++ aspect? I've done 10Hz or so rate data without breaking a sweat by putting gnuplot into a read/plot/refresh loop or with LiveGraph with no issues.

Write a function that can plot a std::deque in a way you like, then .push_back() values from the sensor onto the queue as they come available, and .pop_front() values from the queue if it becomes too long for nice plotting.
The exact nature of your plotting function depends on your platform, needs, sense of esthetics, etc.

You can use ring buffers. In such buffer you have read position and write position. This way one thread can write to buffer and other read and plot a graph. For efficiency you usually end up writing your own framework.
Size of such buffer can be estimated using eg.: data delivery speed from sensor (40KHz?), size of one probe and time span you would like to keep for plotting purposes.
It also depends whether you would like to store such data uncompressed, store rendered plot - all for further offline analysis. In non-RTOS environment your "real-time" depends on processing speed: how fast you can retrieve/store/process and plot data. Usually it is near-real time efficiency.

You might want to check out RRDtool to see whether it meets your requirements.
RRDtool is a high performance data logging and graphing system for time series data.

I did a similar thing for a device that had a permeability sensor attached via RS232.
package bytes received from sensor into packets
use a collection (mainly a list) to store them
prevent the collection to go over a fixed size by trashing least recent values before new ones arrive
find a suitable graphics library to draw with (maybe SDL if you wanna keep it easy and cross-platform), but this choice depends on what kind of graph you need (ncurses may be enough)
last but not least: since you are using a sensor I suppose your approach will be multi-threaded so think about it and use a synchronized collection or a collection that allows adding values when other threads are retrieving them (so forgot iterators, maybe an array is enough)
Btw I think there are so many libraries, just search for them:
first
second
...

I assume that you will deploy this application on a RTOS. But, what will be the data rate and what are real-time requirements! Therefore, as written above, a simple solution may be more than enough. But, if you have hard-real time constraints everything changes drastically. A multi-threaded design with data pipes may solve your real-time problems.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js