QProcess How to deal with too much input? - c++

I'm using 3 command line tools via QProcesses to play music on my Linux (Mint) desktop via the Jack server. It's all working very well, but the input from one of the tools 'jack_showtime' arrives at about 12,000 lines per second.
I only need to see one line every 0.1 seconds, but the only way I've found to get a full recent line is like:
j_s->readAll(); // j_s is the jack_showtime QProcess
waitAbit(20); // a 20 mS delay
QString aShowtimeLine = j_s->readLine();
aShowtimeLine = j_s->readLine();
What would be a better way to deal with so much unwanted input? It seems that; without the readAll, a line will be much too old. Without the delay, I get a blank line and without the two readLines I get part of a line.
I'd also be interested in a Bash script that could absorb most of the input, or similar.

I suggest something like this, such that no matter how fast or how slow you get input from the child process, you always use the only most recent value, every 100mS:
// at startup or in your class constructor or wherever
connect(j_s, SIGNAL(readyRead()), this, SLOT(ReadDataFromJack()));
connect(&_myQTimer, SIGNAL(timeout()), this, SLOT(UseOneLine()));
_myQTimer.start(100);
void MyClass :: ReadDataFromJack()
{
while(j_s->canReadLine())
{
char buf[1024];
qint64 bytesRead = j_s->readLine(buf, sizeof(buf));
if ((bytes > 0)&&(CanParseText(buf))
{
this->_mostRecentResult = ParseText(buf);
}
}
}
void MyClass :: UseOneLine()
{
printf("100mS have elapsed, time to use _mostRecentResult=%i for something!\n", this->_mostRecentResult)
}
(Note that CanParseText(buf) and ParseText(buf) above are imaginary placeholders for whatever code you use to parse ASCII text coming from your child process into data to be used by your program)

Bulls eye! Thank you. You seem to know what I already had, so it was easy to add the bits I didn't have. Mainly the limited size buffer and, as I've never seen a line longer than 79 characters, I reduced it to 100. I may have been on the right track, while looking for a script solution, when I tried to use 'stdbuf', but dealing with it all in my program is much better.
Lines received are easy to parse. I only want the first number (which can be as low as zero) from something like this:
frame = 293532731 frame_time = 114978548 usecs = 2421437949 state: Rolling
I use the following, which seems reasonably minimal:
QString recentLine = QString::fromLocal8Bit(buf);
recentLine.remove(0,8);
recentLine.chop(recentLine.length() - recentLine.indexOf(" "));
int numSamples = recentLine.toInt();
I put a counter in the ReadDataFromJack() class and see between 2,000 and 3,000 visits per 100mS!
The number represents the position of the 'play head' in samples (at 48k per second) and wont exceed the integer range in Qt, but I see you use a qint64 (long long) in your example. Should I do the same for my number?
Sorry it's an answer (and a further question), but it's too long for a comment.

Related

Storing timepoints or durations

I want to make a simple editor to shift subtitle times around. A Subtitle of WebVTT is made of chunks like this:
1
00:02:15.000 --> 00:02:20.000
- Hello World!
So as you can see there is a time that the subtitle will apear and a time that it dissapears. This time is also can be clicked to jump to that specific point of the video file.
Now I want to create a simple application that can shift these times by a given amount to left or right in time domain. What would be the best way to store these timepoints for making easy calculation and changes?
For example:
struct SubtitleElement {
std::chrono< ?????? > begin; // What is a good candidate here?
std::chrono< ?????? > end; // What is a good candidate here?
std::string text;
}
Later I want to have functions that operate on these elements. E.g.:
void shiftTime(SubtitleElement element, int millisecs) {
// reduce begin and end of the element by millisecs
}
DURATION getDuration(SubtitleElement& element) {
//return end - begin
}
DURATION totalDuration(vector<SubtitleElement> elements) {
// sum all the durations of the elements in vector
}
So what would the most clean and modern way of doing this? Also important is that it will be easy to convert the string "hh:mm:ss:ZZZ" to that member. Please note that I think the hh can be much more than 24, because its amount of time, not time of the day! E.g. a vido file can be 120 hours long!!
Since these are time points relative to the beginning of the video, not to some clock, I suggest keeping it simple and using std::chrono::milliseconds. They support all operations you require, except im not sure there is an existing implementation for parsing them from a string. But that should be very easy to build.

c++ Print on same line

I am using a chrono to get the time elapsed during a loop. I would like to display this time every loop run.
I can do this:
for(int i = 0;i<3;i++)
{
sleep(2 secs);
time= get_time();
cout<<"time is : "<<time;
}
But I have the output:
time is : 2 time is : 4 time is : 6
I could add an endl to have it in column but that is not what I want. My loop is about million times, so I don't really want to print a million lines.
I would just like to print like:
time is : 2
and then refresh it to
time is : 4
and so on. Is there a way to do this?
You can use endl with clrscr() .
printing to terminals is very easy, but it can be extremely hard at the same time. At its core, a terminal is simply a file, that you can use to write or read on. Performing tasks such as changing cursor's position is in fact system-specific and your code will have to be platform dependent.
But don't panic! People have done it before and even wrote libraries to do it. I think NCurses will do the job. https://www.gnu.org/software/ncurses/ncurses.html
I advice you to refer to this thread to see some issue related to your question: How to clear a specific line with NCurses?
I have never used ncurses my self, so I wish you best of luck!
Enjoy programming!

How to correctly determine fastest CDN, mirror, download server in C++

The question that I'm struggling with is how to determine in c++ that which is the server with fastest connection for the client do make git clone from or download tarball. So basically I want to choose from collection of known mirrors which one will be used for downloading content from.
Following code I wrote demonstrates that what I am trying to achieve more clearly perhaps, but I believe that's not something one should use in production :).
So lets say I have two known source mirrors git-1.exmple.com and git-2.example.com and I want to download tag-x.tar.gz from one which client has best connectivity to.
CDN.h
#include <iostream>
#include <cstdio>
#include <cstring>
#include <cstdlib>
#include <netdb.h>
#include <arpa/inet.h>
#include <sys/time.h>
using namespace std;
class CDN {
public:
long int dl_time;
string host;
string proto;
string path;
string dl_speed;
double kbs;
double mbs;
double sec;
long int ms;
CDN(string, string, string);
void get_download_speed();
bool operator < (const CDN&);
};
#endif
CDN.cpp
#include "CND.h"
CDN::CDN(string protocol, string hostname, string downloadpath)
{
proto = protocol;
host = hostname;
path = downloadpath;
dl_time = ms = sec = mbs = kbs = 0;
get_download_speed();
}
void CDN::get_download_speed()
{
struct timeval dl_started;
gettimeofday(&dl_started, NULL);
long int download_start = ((unsigned long long) dl_started.tv_sec * 1000000) + dl_started.tv_usec;
char buffer[256];
char cmd_output[32];
sprintf(buffer,"wget -O /dev/null --tries=1 --timeout=2 --no-dns-cache --no-cache %s://%s/%s 2>&1 | grep -o --color=never \"[0-9.]\\+ [KM]*B/s\"",proto.c_str(),host.c_str(),path.c_str());
fflush(stdout);
FILE *p = popen(buffer,"r");
fgets(cmd_output, sizeof(buffer), p);
cmd_output[strcspn(cmd_output, "\n")] = 0;
pclose(p);
dl_speed = string(cmd_output);
struct timeval download_ended;
gettimeofday(&download_ended, NULL);
long int download_end = ((unsigned long long)download_ended.tv_sec * 1000000) + download_ended.tv_usec;
size_t output_type_k = dl_speed.find("KB/s");
size_t output_type_m = dl_speed.find("MB/s");
if(output_type_k!=string::npos) {
string dl_bytes = dl_speed.substr(0,output_type_k-1);
double dl_mb = atof(dl_bytes.c_str()) / 1000;
kbs = atof(dl_bytes.c_str());
mbs = dl_mb;
} else if(output_type_m!=string::npos) {
string dl_bytes = dl_speed.substr(0,output_type_m-1);
double dl_kb = atof(dl_bytes.c_str()) * 1000;
kbs = dl_kb;
mbs = atof(dl_bytes.c_str());
} else {
cout << "Should catch the errors..." << endl;
}
ms = download_end-download_start;
sec = ((float)ms)/CLOCKS_PER_SEC;
}
bool CDN::operator < (const CDN& other)
{
if (dl_time < other.dl_time)
return true;
else
return false;
}
main.cpp
#include "CDN.h"
int main()
{
cout << "Checking CDN's" << endl;
char msg[128];
CDN cdn_1 = CDN("http","git-1.example.com","test.txt");
CDN cdn_2 = CDN("http","git-2.example.com","test.txt");
if(cdn_2 > cdn_1)
{
sprintf(msg,"Downloading tag-x.tar.gz from %s %s since it's faster than %s %s",
cdn_1.host.c_str(),cdn_1.dl_speed.c_str(),cdn_2.host.c_str(),cdn_2.dl_speed.c_str());
cout << msg << endl;
}
else
{
sprintf(msg,"Downloading tag-x.tar.gz from %s %s since it's faster than %s %s",
cdn_2.host.c_str(),cdn_2.dl_speed.c_str(),cdn_1.host.c_str(),cdn_1.dl_speed.c_str());
cout << msg << endl;
}
return 0;
}
So what are your thoughts and how would you approach this. What are the alternatives to replace this wget and achieve same clean way in c++
EDIT:
As #molbdnilo pointed correctly
ping measures latency, but you're interested in throughput.
So therefore I edited the demonstrating code to reflect that, however question remains same
For starters, trying to determine "fastest CDN mirror" is an inexact science. There is no universally accepted definition of what "fastest" means. The most one can hope for, here, is to choose a reasonable heuristic for what "fastest" means, and then measure this heuristic as precisely as can be under the circumstances.
In the code example here, the chosen heuristic seems to be how long it takes to download a sample file from each mirror via HTTP.
That's not such a bad choice to make, actually. You could reasonably make an argument that some other heuristic might be slightly better, but the basic test of how long it takes to transfer a sample file, from each candidate mirror, I would think is a very reasonable heuristic.
The big, big problem here I see here is the actual implementation of this heuristic. The way that this attempt -- to time the sample download -- is made, here, does not appear to be very reliable, and it will end up measuring a whole bunch of unrelated factors that have nothing do with network bandwidth.
I see at least several opportunities here where external factors completely unrelated to network throughput will muck up the measured timings, and make them less reliable than they should be.
So, let's take a look at the code, and see how it attempts to measure network latency. Here's the meat of it:
sprintf(buffer,"wget -O /dev/null --tries=1 --timeout=2 --no-dns-cache --no-cache %s://%s/%s 2>&1 | grep -o --color=never \"[0-9.]\\+ [KM]*B/s\"",proto.c_str(),host.c_str(),path.c_str());
fflush(stdout);
FILE *p = popen(buffer,"r");
fgets(cmd_output, sizeof(buffer), p);
cmd_output[strcspn(cmd_output, "\n")] = 0;
pclose(p);
... and gettimeofday() gets used to sample the system clock before and after, to figure out how long this took. Ok, that's great. But what would this actually measure?
It helps a lot here, to take a blank piece of paper, and just write down everything that happens here as part of the popen() call, step by step:
1) A new child process is fork()ed. The operating system kernel creates a new child process.
2) The new child process exec()s /bin/bash, or your default system shell, passing in a long string that starts with "wget", followed by a bunch of other parameters that you see above.
3) The operating system kernel loads "/bin/bash" as the new child process. The kernel loads and opens any and all shared libraries that the system shell normally needs to run.
4) The system shell process initializes. It reads the $HOME/.bashrc file and executes it, most likely, together with any standard shell initialization files and scripts that your system shell normally does. That itself can create a bunch of new processes, that have to be initialized and executed, before the new system shell process actually gets around to...
5) ...parsing the "wget" command it originally received as an argument, and exec()uting it.
6) The operating system kernel now loads "wget" as the new child process. The kernel loads and open any and all shared libraries that the wget process needs. Looking at my Linux box, "wget" loads no less than 25 separate shared libraries, including kerberos, and ssl libraries. Each one of those shared libraries get initialized.
7) The wget command performs a DNS lookup on the host, to obtain the IP address of the web server to connect to. If the local DNS server doesn't have the CDN mirror's hostname's IP address cached, it often takes several seconds to look up the CDN mirrors's DNS zone's authoritative DNS servers, then query them for the IP address, hopping this way and that way, across the intertubes.
Now, one moment... I seem have forgotten what we were trying to do here... Oh, I remember: which CDN mirror is "fastest", by downloading a sample file from each mirror, right? Yeah, that must be it!
Now, what does all of work done above, all of that work, have to do with determining which content mirror is the fastest???
Err... Not much, from the way it looks to me. Now, none of the above should really be such shocking news. After all, all of that is described in popen()'s manual page. If you read popen's manual page, it tells you that's ...what it does. Starts a new child process. Then executes the system shell, in order to execute the requested command. Etc, etc, etc...
Now, we're not talking about measuring time intervals that last many seconds, or minutes. If we're trying to measure something that takes a long time to execute, the relative overhead of popen()'s approach would be negligible, and not much to worry about. But the expected time to download the sample file, for the purpose of figuring out how fast each content mirror is -- I would expect that the actual download time would be relatively short. But it seems to me that the overhead to doing it this way, of forking an entirely new process, and executing first the system shell, then the wget command, with its massive list of dependencies, is going to be statistically significant.
And as I mentioned in the beginning, given that this is trying to determine the vaguely nebulous concept of "fastest mirror", which is already an inexact science -- it seems to me that you'd really want to get rid of as much utterly irrelevant overhead here -- as much as possible, in order to get as accurate of a result.
So, it seems to me that you don't really want to measure here anything other than what you're trying to measure: network bandwidth. And you certainly don't want to measure any of what transpires before any network activity takes place.
I still think that trying to time a sample download is a reasonable proposition. What's not reasonable here is all the popen and wget bloat. So, forget all of that. Throw it out the window. You want to measure how long it takes to download a sample file over HTTP, from each candidate mirror? Well, why don't you do just that?
1) Create a new socket().
2) Use getaddrinfo() to perform a DNS lookup, and obtain the candidate mirror's IP address.
3) connect() to the mirror's HTTP port.
4) Format the appropriate HTTP GET request, and send it to the server.
The above does pretty much what the popen/wget does, up to this point.
And only now I would start the clock running by grabbing the current gettimeofday(), then wait until I read the entire sample file from the socket, then grab the current gettimeofday() once more, to get the ending time of the transmission, and then calculate the actual time it took to receive the file from the mirror.
Only then, will I have some reasonable confidence that I'll be actually measuring the time it takes to receive a sample file from a CDN mirror, and completely ignoring the time it takes to execute a bunch of completely unrelated processes; and then by taking the same sample from multiple CDN mirrors, have any hope of picking one, using as much of a sensible heuristic, as possible.

Arduino substring doesn't work

I have a static method that searches (and returns) into String msg the value between a TAG
this is the code function:
static String genericCutterMessage(String TAG, String msg){
Serial.print("a-----");
Serial.println(msg);
Serial.print("b-----");
Serial.println(TAG);
if(msg.indexOf(TAG) >= 0){
Serial.print("msg ");
Serial.println(msg);
int startTx = msg.indexOf(TAG)+3;
int endTx = msg.indexOf(TAG,startTx)-2;
Serial.print("startTx ");
Serial.println(startTx);
Serial.print("endTx ");
Serial.println(endTx);
String newMsg = msg.substring(startTx,endTx);
Serial.print("d-----");
Serial.println(newMsg);
Serial.println("END");
Serial.println(newMsg.length());
return newMsg;
} else {
Serial.println("d-----TAG NOT FOUND");
return "";
}
}
and this is output
a-----[HS][TS]5132[/TS][TO]5000[/TO][/HS]
b-----HS
msg [HS][TS]5132[/TS][TO]5000[/TO][/HS]
startTx 4
endTx 30
d-----
END
0
fake -_-'....go on! <-- print out of genericCutterMessage
in that case I want return the string between HS tag, so my expected output is
[TS]5132[/TS][TO]5000[/TO]
but I don't know why I receive a void string.
to understand how substring works I just followed tutorial on official Arduino site
http://www.arduino.cc/en/Tutorial/StringSubstring
I'm not an expert in C++ and Arduino but this looks like a flushing or buffering problem, isn't it?
Any idea?
Your code is correct, this should not happen. Which forces you to consider the unexpected ways that this could possibly fail. There is really only one candidate mishap I can think of, your Arduino is running out of RAM. It has very little, the Uno only has 2 kilobytes for example. It doesn't take a lot of string munching to fill that up.
This is not reported in a smooth way. All I can do is point you to the relevant company page. Quoting:
If you run out of SRAM, your program may fail in unexpected ways; it will appear to upload successfully, but not run, or run strangely. To check if this is happening, you can try commenting out or shortening the strings or other data structures in your sketch (without changing the code). If it then runs successfully, you're probably running out of SRAM. There are a few things you can do to address this problem:
If your sketch talks to a program running on a (desktop/laptop) computer, you can try shifting data or calculations to the computer, reducing the load on the Arduino.
If you have lookup tables or other large arrays, use the smallest data type necessary to store the values you need; for example, an int takes up two bytes, while a byte uses only one (but can store a smaller range of values).
If you don't need to modify the strings or data while your sketch is running, you can store them in flash (program) memory instead of SRAM; to do this, use the PROGMEM keyword.
That's not very helpful in your specific case, you'll have to look at the rest of the program for candidates. Or upgrade your hardware, StackExchange has a dedicated site for Arduino enthusiasts, surely the best place to get advice.

WxTextCtrl unable to load large texts

I've read about the solutuon written here on a post a year ago
wx.TextCtrl.LoadFile()
Now I have a windows application that will generate color frequency statistics that are saved in 3D arrays. Here is a part of my code as you will see on the code below the printing of the statistics is dependent on a slider which specifies the threshold.
void Project1Frm::WxButton2Click(wxCommandEvent& event)
{
char stat[32] ="";
int ***report = pGLCanvas->GetPixel();
float max = pGLCanvas->GetMaxval();
float dist = WxSlider5->GetValue();
WxRichTextCtrl1->Clear();
WxRichTextCtrl1->SetMaxLength(100);
if(dist>0)
{
WxRichTextCtrl1->AppendText(wxT("Statistics\nR\tG\tB\t\n"));
for(int m=0; m<256; m++){
for(int n=0; n<256; n++){
for(int o=0; o<256; o++){
if((report[m][n][o]/max)>=(dist/100.0))
{
sprintf(stat,"%d\t%d\t%d\t%3.6f%%\n",m,n,o,report[m][n][o]/max*100.0);
WxRichTextCtrl1->AppendText(wxT(stat));
}
}
}
}
}
else if(dist==0) WxRichTextCtrl1->LoadFile("histodata.txt");
}
The solution I've tried so far is that when I am to print all the statistics I'll get it from a text file rather than going through the 3D array... I would like to ask if the Python implementation of the segmenting can be ported to C++ or are there better ways to deal with this problem. Thank you.
EDIT:
Another reason why I used a text file instead is that I observed that whenever I do sprintf only [with the line WxRichTextCtrl1->AppendText(wxT(stat)); was commented out] the computer starts to slow down.
-Ric
Disclaimer: My answer is more of an alternative than a solution.
I don't believe that there's any situation in which a user of this application is going to find it useful to have a scrolled text window containing ~16 million lines of numbers. It would be impossible to scroll to one specific location in the list that the user might need to see easily. This is all assuming that every single number you output here has some significance to the user of course (you are showing them on the screen for a reason). Providing the user with controls to look up specific, fixed (reasonable) ranges of those numbers would be a better solution, not only in regards to a better user experience, but also in helping to resolve your issue here.
On the other hand, if you still insist on one single window containing all 64 million numbers, you seem to have a very rigid data structure here, which means you can (and should) take advantage of using a virtual grid control (wxGrid), which is intended to work smoothly even with incredibly large data sets like this. The user will likely find this control easier to read and find the section of data they are looking for.