Related
I am writing a code to capture serial readings from the Arduino to C++
Is there a way to capture the readings line by line and then store it into an array? I have read another post similar to mine, but I am still unable to apply it.
Any help is greatly appreciated, thank you.
Environment setup:
Arduino UNO
ADXL 335 accelerometer
Ubuntu 16.04
C++
[Updated] applied solution from Bart
Cpp file
The reason why I added the "for-loop with print and break" is to analyze the array contents.
#include <stdio.h>
#include <string.h>
#include <iostream>
#include <unistd.h>
using namespace std;
char serialPortFilename[] = "/dev/ttyACM0";
int main()
{
char readBuffer[1024];
FILE *serPort = fopen(serialPortFilename, "r");
if (serPort == NULL)
{
printf("ERROR");
return 0;
}
while(1)
{
usleep(1000); //sync up Linux and Arduino
memset(readBuffer, 0, 1024);
fread(readBuffer, sizeof(char),1024,serPort);
for(int i=0; i<1024; i++){
printf("%c",readBuffer[i]);
}
break;
}
return 0;
}
Ino file
Fetching data from the Accelerometer
#include <stdio.h>
const int xPin = A0;
const int yPin = A1;
const int zPin = A2;
void setup() {
Serial.begin(9600);
}
void loop() {
int x = 0, y = 0, z = 0;
x = analogRead(xPin);
y = analogRead(yPin);
z = analogRead(zPin);
char buffer[16];
int n;
n = sprintf(buffer,"<%d,%d,%d>",x,y,z);
Serial.write(buffer);
}
Results
Running the code for three times
Click Here
The ideal outputs should be
<a,b,c><a,b,c><a,b,c>...
but right now, some of the outputs has the values inside "corrupted" (please see the fourth line from the top).
Even if use the start and end markers to determine a correct dataset, the data within the set is still wrong. I suspect the issue lies with the char array from C++, due to it being unsynchronized with Arduino. Else I need to send by Bytes from Arduino (not really sure how)
When dealing with two programs running on different processors they will never start sending/receiving at the same time. What you likely see is not that the results are merged wrong it is more likely the reading program started and stopped half way through the data.
When sending data over a line it is best that you:
On the Arduino:
First frame the data.
Send the frame.
On Linux:
Read in data in a buffer.
Search the buffer for a complete frame and deframe.
1. Framing the data
With framing the data I mean that you need a structure which you can recognize and validate on the receiving side. For example you could add the characters STX and ETX as control characters around your data. When the length of your data varies it is also required to send this.
In the following example we take that the data array is never longer than 255 bytes. This means that you can store the length in a single byte. Below you see pseudo code of how a frame could look like:
STX LENGTH DATA_ARRAY ETX
The total length of the bytes which will be send are thus the length of the data plus three.
2. Sending
Next you do not use println but Serial.write(buf, len) instead.
3. Receiving
On the receiving side you have a buffer in which all data received will be appended.
4. Deframing
Next each time new data has been added search for an STX character, assume the next character is the length. Using the length +1 you should find a ETX. If so you have found a valid frame and you can use the data. Next remove it from the buffer.
for(uint32_t i = 0; i < (buffer.size() - 2); ++i)
{
if(STX == buffer[i])
{
uint8_t length = buffer[i+2];
if(buffer.size() > (i + length + 3) && (ETX == buffer[i + length + 2]))
{
// Do something with the data.
// Clear the buffer from every thing before i + length + 3
buffer.clear(0, i + length + 3);
// Break the loop as by clearing the data the current index becomes invalid.
break;
}
}
}
For an example also using a Cyclic Redundancy Check (CRC) see here
I have a program that generates files containing random distributions of the character A - Z. I have written a method that reads these files (and counts each character) using fread with different buffer sizes in an attempt to determine the optimal block size for reads. Here is the method:
int get_histogram(FILE * fp, long *hist, int block_size, long *milliseconds, long *filelen)
{
char *buffer = new char[block_size];
bzero(buffer, block_size);
struct timeb t;
ftime(&t);
long start_in_ms = t.time * 1000 + t.millitm;
size_t bytes_read = 0;
while (!feof(fp))
{
bytes_read += fread(buffer, 1, block_size, fp);
if (ferror (fp))
{
return -1;
}
int i;
for (i = 0; i < block_size; i++)
{
int j;
for (j = 0; j < 26; j++)
{
if (buffer[i] == 'A' + j)
{
hist[j]++;
}
}
}
}
ftime(&t);
long end_in_ms = t.time * 1000 + t.millitm;
*milliseconds = end_in_ms - start_in_ms;
*filelen = bytes_read;
return 0;
}
However, when I plot bytes/second vs. block size (buffer size) using block sizes of 2 - 2^20, I get an optimal block size of 4 bytes -- which just can't be correct. Something must be wrong with my code but I can't find it.
Any advice is appreciated.
Regards.
EDIT:
The point of this exercise is to demonstrate the optimal buffer size by recording the read times (plus computation time) for different buffer sizes. The file pointer is opened and closed by the calling code.
There are many bugs in this code:
It uses new[], which is C++.
It doesn't free the allocated memory.
It always loops over block_size bytes of input, not bytes_read as returned by fread().
Also, the actual histogram code is rather inefficient, since it seems to loop over each character to determine which character it is.
UPDATE: Removed claim that using feof() before I/O is wrong, since that wasn't true. Thanks to Eric for pointing this out in a comment.
You're not stating what platform you're running this on, and what compile time parameters you use.
Of course, the fread() involves some overhead, leaving user mode and returning. On the other hand, instead of setting the hist[] information directly, you're looping through the alphabet. This is unnecessary and, without optimization, causes some overhead per byte.
I'd re-test this with hist[j-26]++ or something similar.
Typically, the best timing would be achieved if your buffer size equals the system's buffer size for the given media.
I just ran into a free(): invalid next size (fast) problem while writing a C++ program. And I failed to figure out why this could happen unfortunately. The code is given below.
bool not_corrupt(struct packet *pkt, int size)
{
if (!size) return false;
bool result = true;
char *exp_checksum = (char*)malloc(size * sizeof(char));
char *rec_checksum = (char*)malloc(size * sizeof(char));
char *rec_data = (char*)malloc(size * sizeof(char));
//memcpy(rec_checksum, pkt->data+HEADER_SIZE+SEQ_SIZE+DATA_SIZE, size);
//memcpy(rec_data, pkt->data+HEADER_SIZE+SEQ_SIZE, size);
for (int i = 0; i < size; i++) {
rec_checksum[i] = pkt->data[HEADER_SIZE+SEQ_SIZE+DATA_SIZE+i];
rec_data[i] = pkt->data[HEADER_SIZE+SEQ_SIZE+i];
}
do_checksum(exp_checksum, rec_data, DATA_SIZE);
for (int i = 0; i < size; i++) {
if (exp_checksum[i] != rec_checksum[i]) {
result = false;
break;
}
}
free(exp_checksum);
free(rec_checksum);
free(rec_data);
return result;
}
The macros used are:
#define RDT_PKTSIZE 128
#define SEQ_SIZE 4
#define HEADER_SIZE 1
#define DATA_SIZE ((RDT_PKTSIZE - HEADER_SIZE - SEQ_SIZE) / 2)
The struct used is:
struct packet {
char data[RDT_PKTSIZE];
};
This piece of code doesn't go wrong every time. It would crash with the free(): invalid next size (fast) sometimes in the free(exp_checksum); part.
What's even worse is that sometimes what's in rec_checksum stuff is just not equal to what's in pkt->data[HEADER_SIZE+SEQ_SIZE+DATA_SIZE] stuff, which should be the same according to the watch expressions from my debugging tools. Both memcpy and for methods are used but this problem remains.
I don't quite understand why this would happen. I would be very thankful if anyone could explain this to me.
Edit:
Here's the do_checksum() method, which is very simple:
void do_checksum(char* checksum, char* data, int size)
{
for (int i = 0; i < size; i++)
{
checksum[i] = ~data[i];
}
}
Edit 2:
Thanks for all.
I switched other part of my code from the usage of STL queue to STL vector, the results turn to be cool then.
But still I didn't figure out why. I am sure that I would never pop an empty queue.
The error you report is indicative of heap corruption. These can be hard to track down and tools like valgrind can be extremely helpful. Heap corruptions are often hard to debug with a simple debugger because the runtime error often occurs long after the actual corruption.
That said, the most obvious potential cause of your heap corruption, given the code posted so far, is if DATA_SIZE is greater than size. If that occurs then do_checksum will write beyond the end of exp_checksum.
Three immediate suggestions:
Check for size <= 0 (instead of "!size")
Check for size >= DATA_SIZE
Check for malloc returning NULL
Have you tried Valgrind?
Also, make sure to never send more than RDT_PKTSIZE as size to not_corrupt()
bool not_corrupt(struct packet *pkt, int size)
{
if (!size) return false;
if (size > RDT_PKTSIZE) return false;
/* ... */
Valgrind is good ... but validating all your inputs and checking all error conditions is even better.
Stepping through the code in the debugger isn't a bad idea, either.
I would also call "do_checksum (size)" (your actual size), instead of DATA_SIZE (presumably "maximum size").
DATA_SIZE is a macro defined the max length in my program so the size
should be less than DATA_SIZE
even if that is true, your logic only creates enough memory to hold size characters.
so you should call
do_checksum(exp_checksum, rec_data, size);
and, if you do not want to use std::string (which is fine), you should switch from malloc/free to new/delete when talking C++
Attention please:
I already implemented this stuff, just not in any way generic or elegant. This question is motivated by my wanting to learn more tricks with the stl, not the problem itself.
This I think is clear in the way I stated that I already solved the problem, but many people have answered in their best intentions with solutions to the problem, not answers to the question "how to solve this the stl way". I am really sorry if I phrased this question in a confusing way. I hate to waste people's time.
Ok, here it comes:
I get a string full of encoded Data.
It comes in:
N >> 64 bytes
every 3 byte get decoded into an int value
after at most 64 byte (yes,not divisible by 3!) comes a byte as checksum
followed by a line feed.
and so it goes on.
It ends when 2 successive linefeeds are found.
It looks like a nice or at least ok data format, but parsing it elegantly
the stl way is a real bit**.
I have done the thing "manually".
But I would be interested if there is an elegant way with the stl- or maybe boost- magic that doesn't incorporate copying the thing.
Clarification:
It gets really big sometimes. The N >> 64byte was more like a N >>> 64 byte ;-)
UPDATE
Ok, the N>64 bytes seems to be confusing. It is not important.
The sensor takes M measurements as integers. Encodes each of them into 3 bytes. and sends them one after another
when the sensor has sent 64byte of data, it inserts a checksum over the 64 byte and an LF. It doesn't care if one of the encoded integers is "broken up" by that. It just continues in the next line.(That has only the effect to make the data nicely human readable but kindof nasty to parse elegantly.)
if it has finished sending data it inserts a checksum-byte and LFLF
So one data chunk can look like this, for N=129=43x3:
|<--64byte-data-->|1byte checksum|LF
|<--64byte-data-->|1byte checksum|LF
|<--1byte-data-->|1byte checksum|LF
LF
When I have M=22 measurements, this means I have N=66 bytes of data.
After 64 byte it inserts the checksum and LF and continues.
This way it breaks up my last measurement
which is encoded in byte 64, 65 and 66. It now looks like this: 64, checksum, LF, 65, 66.
Since a multiple of 3 divided by 64 carries a residue 2 out of 3 times, and everytime
another one, it is nasty to parse.
I had 2 solutions:
check checksum, concatenate data to one string that only has data bytes, decode.
run through with iterators and one nasty if construct to avoid copying.
I just thought there might be someting better. I mused about std::transform, but it wouldn't work because of the 3 byte is one int thing.
As much as I like STL, I don't think there's anything wrong with doing things manually, especially if the problem does not really fall into the cases the STL has been made for. Then again, I'm not sure why you ask. Maybe you need an STL input iterator that (checks and) discards the check sums and LF characters and emits the integers?
I assume the encoding is such that LF can only appear at those places, i.e., some kind of Base-64 or similar?
It seems to me that something as simple as the following should solve the problem:
string line;
while( getline( input, line ) && line != "" ) {
int val = atoi( line.substr(0, 3 ).c_str() );
string data = line.substr( 3, line.size() - 4 );
char csum = line[ line.size() - 1 ];
// process val, data and csum
}
In a real implementation you would want to add error checking, but the basic logic should remain the same.
As others have said, there is no silver bullet in stl/boost to elegantly solve your problem. If you want to parse your chunk directly via pointer arithmetic, perhaps you can take inspiration from std::iostream and hide the messy pointer arithmetic in a custom stream class. Here's a half-arsed solution I came up with:
#include <cctype>
#include <iostream>
#include <vector>
#include <boost/lexical_cast.hpp>
class Stream
{
public:
enum StateFlags
{
goodbit = 0,
eofbit = 1 << 0, // End of input packet
failbit = 1 << 1 // Corrupt packet
};
Stream() : state_(failbit), csum_(0), pos_(0), end_(0) {}
Stream(char* begin, char* end) {open(begin, end);}
void open(char* begin, char* end)
{state_=goodbit; csum_=0; pos_=begin, end_=end;}
StateFlags rdstate() const {return static_cast<StateFlags>(state_);}
bool good() const {return state_ == goodbit;}
bool fail() const {return (state_ & failbit) != 0;}
bool eof() const {return (state_ & eofbit) != 0;}
Stream& read(int& measurement)
{
measurement = readDigit() * 100;
measurement += readDigit() * 10;
measurement += readDigit();
return *this;
}
private:
int readDigit()
{
int digit = 0;
// Check if we are at end of packet
if (pos_ == end_) {state_ |= eofbit; return 0;}
/* We should be at least csum|lf|lf away from end, and we are
not expecting csum or lf here. */
if (pos_+3 >= end_ || pos_[0] == '\n' || pos_[1] == '\n')
{
state_ |= failbit;
return 0;
}
if (!getDigit(digit)) {return 0;}
csum_ = (csum_ + digit) % 10;
++pos_;
// If we are at checksum, check and consume it, along with linefeed
if (pos_[1] == '\n')
{
int checksum = 0;
if (!getDigit(checksum) || (checksum != csum_)) {state_ |= failbit;}
csum_ = 0;
pos_ += 2;
// If there is a second linefeed, we are at end of packet
if (*pos_ == '\n') {pos_ = end_;}
}
return digit;
}
bool getDigit(int& digit)
{
bool success = std::isdigit(*pos_);
if (success)
digit = boost::lexical_cast<int>(*pos_);
else
state_ |= failbit;
return success;
}
int csum_;
unsigned int state_;
char* pos_;
char* end_;
};
int main()
{
// Use (8-byte + csum + LF) fragments for this example
char data[] = "\
001002003\n\
300400502\n\
060070081\n\n";
std::vector<int> measurements;
Stream s(data, data + sizeof(data));
int meas = 0;
while (s.read(meas).good())
{
measurements.push_back(meas);
std::cout << meas << " ";
}
return 0;
}
Maybe you'll want to add extra StateFlags to determine if failure is due to checksum error or framing error. Hope this helps.
You should think of your communication protocol as being layered. Treat
|<--64byte-data-->|1byte checksum|LF
as fragments to be reassembled into larger packets of contiguous data. Once the larger packet is reconstituted, it is easier to parse its data contiguously (you don't have to deal with measurements being split up across fragments). Many existing network protocols (such as UDP/IP) does this sort of reassembly of fragments into packets.
It's possible to read the fragments directly into their proper "slot" in the packet buffer. Since your fragments have footers instead of headers, and there is no out-of-order arrival of your fragments, this should be fairly easy to code (compared to copyless IP reassembly algorithms). Once you receive an "empty" fragment (the duplicate LF), this marks the end of the packet.
Here is some sample code to illustrate the idea:
#include <vector>
#include <cassert>
class Reassembler
{
public:
// Constructs reassembler with given packet buffer capacity
Reassembler(int capacity) : buf_(capacity) {reset();}
// Returns bytes remaining in packet buffer
int remaining() const {return buf_.end() - pos_;}
// Returns a pointer to where the next fragment should be read
char* back() {return &*pos_;}
// Advances the packet's position cursor for the next fragment
void push(int size) {pos_ += size; if (size == 0) complete_ = true;}
// Returns true if an empty fragment was pushed to indicate end of packet
bool isComplete() const {return complete_;}
// Resets the reassembler so it can process a new packet
void reset() {pos_ = buf_.begin(); complete_ = false;}
// Returns a pointer to the accumulated packet data
char* data() {return &buf_[0];}
// Returns the size in bytes of the accumulated packet data
int size() const {return pos_ - buf_.begin();}
private:
std::vector<char> buf_;
std::vector<char>::iterator pos_;
bool complete_;
};
int readFragment(char* dest, int maxBytes, char delimiter)
{
// Read next fragment from source and save to dest pointer
// Return number of bytes in fragment, except delimiter character
}
bool verifyChecksum(char* fragPtr, int size)
{
// Returns true if fragment checksum is valid
}
void processPacket(char* data, int size)
{
// Extract measurements which are now stored contiguously in packet
}
int main()
{
const int kChecksumSize = 1;
Reassembler reasm(1000); // Use realistic capacity here
while (true)
{
while (!reasm.isComplete())
{
char* fragDest = reasm.back();
int fragSize = readFragment(fragDest, reasm.remaining(), '\n');
if (fragSize > 1)
assert(verifyChecksum(fragDest, fragSize));
reasm.push(fragSize - kChecksumSize);
}
processPacket(reasm.data(), reasm.size());
reasm.reset();
}
}
The trick will be making an efficient readFragment function that stops at every newline delimiter and stores the incoming data into the given destination buffer pointer. If you tell me how you acquire your sensor data, then I can perhaps give you more ideas.
An elegant solution this isn't. It would be more so by using a "transition matrix", and only reading one character at a time. Not my style. Yet this code has a minimum of redundant data movement, and it seems to do the job. Minimally C++, it really is just a C program. Adding iterators is left as an exercise for the reader. The data stream wasn't completely defined, and there was no defined destination for the converted data. Assumptions noted in comments. Lots of printing should show functionality.
// convert series of 3 ASCII decimal digits to binary
// there is a checksum byte at least once every 64 bytes - it can split a digit series
// if the interval is less than 64 bytes, it must be followd by LF (to identify it)
// if the interval is a full 64 bytes, the checksum may or may not be followed by LF
// checksum restricted to a simple sum modulo 10 to keep ASCII format
// checksum computations are only printed to allowed continuation of demo, and so results can be
// inserted back in data for testing
// there is no verification of the 3 byte sets of digits
// results are just printed, non-zero return indicates error
int readData(void) {
int binValue = 0, digitNdx = 0, sensorCnt = 0, lineCnt = 0;
char oneDigit;
string sensorTxt;
while( getline( cin, sensorTxt ) ) {
int i, restart = 0, checkSum = 0, size = sensorTxt.size()-1;
if(size < 0)
break;
lineCnt++;
if(sensorTxt[0] == '#')
continue;
printf("INPUT: %s\n", &sensorTxt[0]); // gag
while(restart<size) {
for(i=0; i<min(64, size); i++) {
oneDigit = sensorTxt[i+restart] & 0xF;
checkSum += oneDigit;
binValue = binValue*10 + oneDigit;
//printf("%3d-%X ", binValue, sensorTxt[i+restart]);
digitNdx++;
if(digitNdx == 3) {
sensorCnt++;
printf("READING# %d (LINE %d) = %d CKSUM %d\n",
sensorCnt, lineCnt, binValue, checkSum);
digitNdx = 0;
binValue = 0;
}
}
oneDigit = sensorTxt[i+restart] & 0x0F;
char compCheckDigit = (10-(checkSum%10)) % 10;
printf(" CKSUM at sensorCnt %d ", sensorCnt);
if((checkSum+oneDigit) % 10)
printf("ERR got %c exp %c\n", oneDigit|0x30, compCheckDigit|0x30);
else
printf("OK\n");
i++;
restart += i;
}
}
if(digitNdx)
return -2;
else
return 0;
}
The data definition was extended with comments, you you can use the following as is:
# normal 64 byte lines with 3 digit value split across lines
00100200300400500600700800901001101201301401501601701801902002105
22023024025026027028029030031032033034035036037038039040041042046
# short lines, partial values - remove checksum digit to combine short lines
30449
0451
0460479
0480490500510520530540550560570580590600610620630641
# long line with embedded checksums every 64 bytes
001002003004005006007008009010011012013014015016017018019020021052202302402502602702802903003103203303403503603703803904004104204630440450460470480490500510520530540550560570580590600610620630640
# dangling digit at end of file (with OK checksum)
37
Why are you concerned with copying? Is it the time overhead or the space overhead?
It sounds like you are reading all of the unparsed data into a big buffer, and now you want to write code that makes the big buffer of unparsed data look like a slightly smaller buffer of parsed data (the data minus the checksums and linefeeds), to avoid the space overhead involved in copying it into the slightly smaller buffer.
Adding a complicated layer of abstraction isn't going to help with the time overhead unless you only need a small portion of the data. And if that's the case, maybe you could just figure out which small portion you need, and copy that. (Then again, most of the abstraction layer may be already written for you, e.g. the Boost iterator library.)
An easier way to reduce the space overhead is to read the data in smaller chunks (or a line at a time), and parse it as you go. Then you only need to store the parsed version in a big buffer. (This assumes you're reading it from a file / socket / port, rather than being passed a large buffer that is not under your control.)
Another way to reduce the space overhead is to overwrite the data in place as you parse it. You will incur the cost of copying it, but you will only need one buffer. (This assumes that the 64-byte data doesn't grow when you parse it, i.e. it is not compressed.)
here's a problem I've solved from a programming problem website(codechef.com in case anyone doesn't want to see this solution before trying themselves). This solved the problem in about 5.43 seconds with the test data, others have solved this same problem with the same test data in 0.14 seconds but with much more complex code. Can anyone point out specific areas of my code where I am losing performance? I'm still learning C++ so I know there are a million ways I could solve this problem, but I'd like to know if I can improve my own solution with some subtle changes rather than rewrite the whole thing. Or if there are any relatively simple solutions which are comparable in length but would perform better than mine I'd be interested to see them also.
Please keep in mind I'm learning C++ so my goal here is to improve the code I understand, not just to be given a perfect solution.
Thanks
Problem:
The purpose of this problem is to verify whether the method you are using to read input data is sufficiently fast to handle problems branded with the enormous Input/Output warning. You are expected to be able to process at least 2.5MB of input data per second at runtime. Time limit to process the test data is 8 seconds.
The input begins with two positive integers n k (n, k<=10^7). The next n lines of input contain one positive integer ti, not greater than 10^9, each.
Output
Write a single integer to output, denoting how many integers ti are divisible by k.
Example
Input:
7 3
1
51
966369
7
9
999996
11
Output:
4
Solution:
#include <iostream>
#include <stdio.h>
using namespace std;
int main(){
//n is number of integers to perform calculation on
//k is the divisor
//inputnum is the number to be divided by k
//total is the total number of inputnums divisible by k
int n,k,inputnum,total;
//initialize total to zero
total=0;
//read in n and k from stdin
scanf("%i%i",&n,&k);
//loop n times and if k divides into n, increment total
for (n; n>0; n--)
{
scanf("%i",&inputnum);
if(inputnum % k==0) total += 1;
}
//output value of total
printf("%i",total);
return 0;
}
The speed is not being determined by the computation—most of the time the program takes to run is consumed by i/o.
Add setvbuf calls before the first scanf for a significant improvement:
setvbuf(stdin, NULL, _IOFBF, 32768);
setvbuf(stdout, NULL, _IOFBF, 32768);
-- edit --
The alleged magic numbers are the new buffer size. By default, FILE uses a buffer of 512 bytes. Increasing this size decreases the number of times that the C++ runtime library has to issue a read or write call to the operating system, which is by far the most expensive operation in your algorithm.
By keeping the buffer size a multiple of 512, that eliminates buffer fragmentation. Whether the size should be 1024*10 or 1024*1024 depends on the system it is intended to run on. For 16 bit systems, a buffer size larger than 32K or 64K generally causes difficulty in allocating the buffer, and maybe managing it. For any larger system, make it as large as useful—depending on available memory and what else it will be competing against.
Lacking any known memory contention, choose sizes for the buffers at about the size of the associated files. That is, if the input file is 250K, use that as the buffer size. There is definitely a diminishing return as the buffer size increases. For the 250K example, a 100K buffer would require three reads, while a default 512 byte buffer requires 500 reads. Further increasing the buffer size so only one read is needed is unlikely to make a significant performance improvement over three reads.
I tested the following on 28311552 lines of input. It's 10 times faster than your code. What it does is read a large block at once, then finishes up to the next newline. The goal here is to reduce I/O costs, since scanf() is reading a character at a time. Even with stdio, the buffer is likely too small.
Once the block is ready, I parse the numbers directly in memory.
This isn't the most elegant of codes, and I might have some edge cases a bit off, but it's enough to get you going with a faster approach.
Here are the timings (without the optimizer my solution is only about 6-7 times faster than your original reference)
[xavier:~/tmp] dalke% g++ -O3 my_solution.cpp
[xavier:~/tmp] dalke% time ./a.out < c.dat
15728647
0.284u 0.057s 0:00.39 84.6% 0+0k 0+1io 0pf+0w
[xavier:~/tmp] dalke% g++ -O3 your_solution.cpp
[xavier:~/tmp] dalke% time ./a.out < c.dat
15728647
3.585u 0.087s 0:03.72 98.3% 0+0k 0+0io 0pf+0w
Here's the code.
#include <iostream>
#include <stdio.h>
using namespace std;
const int BUFFER_SIZE=400000;
const int EXTRA=30; // well over the size of an integer
void read_to_newline(char *buffer) {
int c;
while (1) {
c = getc_unlocked(stdin);
if (c == '\n' || c == EOF) {
*buffer = '\0';
return;
}
*buffer++ = c;
}
}
int main() {
char buffer[BUFFER_SIZE+EXTRA];
char *end_buffer;
char *startptr, *endptr;
//n is number of integers to perform calculation on
//k is the divisor
//inputnum is the number to be divided by k
//total is the total number of inputnums divisible by k
int n,k,inputnum,total,nbytes;
//initialize total to zero
total=0;
//read in n and k from stdin
read_to_newline(buffer);
sscanf(buffer, "%i%i",&n,&k);
while (1) {
// Read a large block of values
// There should be one integer per line, with nothing else.
// This might truncate an integer!
nbytes = fread(buffer, 1, BUFFER_SIZE, stdin);
if (nbytes == 0) {
cerr << "Reached end of file too early" << endl;
break;
}
// Make sure I read to the next newline.
read_to_newline(buffer+nbytes);
startptr = buffer;
while (n>0) {
inputnum = 0;
// I had used strtol but that was too slow
// inputnum = strtol(startptr, &endptr, 10);
// Instead, parse the integers myself.
endptr = startptr;
while (*endptr >= '0') {
inputnum = inputnum * 10 + *endptr - '0';
endptr++;
}
// *endptr might be a '\n' or '\0'
// Might occur with the last field
if (startptr == endptr) {
break;
}
// skip the newline; go to the
// first digit of the next number.
if (*endptr == '\n') {
endptr++;
}
// Test if this is a factor
if (inputnum % k==0) total += 1;
// Advance to the next number
startptr = endptr;
// Reduce the count by one
n--;
}
// Either we are done, or we need new data
if (n==0) {
break;
}
}
// output value of total
printf("%i\n",total);
return 0;
}
Oh, and it very much assumes the input data is in the right format.
try to replace if statement with count += ((n%k)==0);. that might help little bit.
but i think you really need to buffer your input into temporary array. reading one integer from input at a time is expensive. if you can separate data acquisition and data processing, compiler may be able to generate optimized code for mathematical operations.
The I/O operations are bottleneck. Try to limit them whenever you can, for instance load all data to a buffer or array with buffered stream in one step.
Although your example is so simple that I hardly see what you can eliminate - assuming it's a part of the question to do subsequent reading from stdin.
A few comments to the code: Your example doesn't make use of any streams - no need to include iostream header. You already load C library elements to global namespace by including stdio.h instead of C++ version of the header cstdio, so using namespace std not necessary.
You can read each line with gets(), and parse the strings yourself without scanf(). (Normally I wouldn't recommend gets(), but in this case, the input is well-specified.)
A sample C program to solve this problem:
#include <stdio.h>
int main() {
int n,k,in,tot=0,i;
char s[1024];
gets(s);
sscanf(s,"%d %d",&n,&k);
while(n--) {
gets(s);
in=s[0]-'0';
for(i=1; s[i]!=0; i++) {
in=in*10 + s[i]-'0'; /* For each digit read, multiply the previous
value of in with 10 and add the current digit */
}
tot += in%k==0; /* returns 1 if in%k is 0, 0 otherwise */
}
printf("%d\n",tot);
return 0;
}
This program is approximately 2.6 times faster than the solution you gave above (on my machine).
You could try to read input line by line and use atoi() for each input row. This should be a little bit faster than scanf, because you remove the "scan" overhead of the format string.
I think the code is fine. I ran it on my computer in less than 0.3s
I even ran it on much larger inputs in less than a second.
How are you timing it?
One small thing you could do is remove the if statement.
start with total=n and then inside the loop:
total -= int( (input % k) / k + 1) //0 if divisible, 1 if not
Though I doubt CodeChef will accept it, one possibility is to use multiple threads, one to handle the I/O, and another to process the data. This is especially effective on a multi-core processor, but can help even with a single core. For example, on Windows you code use code like this (no real attempt at conforming with CodeChef requirements -- I doubt they'll accept it with the timing data in the output):
#include <windows.h>
#include <process.h>
#include <iostream>
#include <time.h>
#include "queue.hpp"
namespace jvc = JVC_thread_queue;
struct buffer {
static const int initial_size = 1024 * 1024;
char buf[initial_size];
size_t size;
buffer() : size(initial_size) {}
};
jvc::queue<buffer *> outputs;
void read(HANDLE file) {
// read data from specified file, put into buffers for processing.
//
char temp[32];
int temp_len = 0;
int i;
buffer *b;
DWORD read;
do {
b = new buffer;
// If we have a partial line from the previous buffer, copy it into this one.
if (temp_len != 0)
memcpy(b->buf, temp, temp_len);
// Then fill the buffer with data.
ReadFile(file, b->buf+temp_len, b->size-temp_len, &read, NULL);
// Look for partial line at end of buffer.
for (i=read; b->buf[i] != '\n'; --i)
;
// copy partial line to holding area.
memcpy(temp, b->buf+i, temp_len=read-i);
// adjust size.
b->size = i;
// put buffer into queue for processing thread.
// transfers ownership.
outputs.add(b);
} while (read != 0);
}
// A simplified istrstream that can only read int's.
class num_reader {
buffer &b;
char *pos;
char *end;
public:
num_reader(buffer *buf) : b(*buf), pos(b.buf), end(pos+b.size) {}
num_reader &operator>>(int &value){
int v = 0;
// skip leading "stuff" up to the first digit.
while ((pos < end) && !isdigit(*pos))
++pos;
// read digits, create value from them.
while ((pos < end) && isdigit(*pos)) {
v = 10 * v + *pos-'0';
++pos;
}
value = v;
return *this;
}
// return stream status -- only whether we're at end
operator bool() { return pos < end; }
};
int result;
unsigned __stdcall processing_thread(void *) {
int value;
int n, k;
int count = 0;
// Read first buffer: n & k followed by values.
buffer *b = outputs.pop();
num_reader input(b);
input >> n;
input >> k;
while (input >> value && ++count < n)
result += ((value %k ) == 0);
// Ownership was transferred -- delete buffer when finished.
delete b;
// Then read subsequent buffers:
while ((b=outputs.pop()) && (b->size != 0)) {
num_reader input(b);
while (input >> value && ++count < n)
result += ((value %k) == 0);
// Ownership was transferred -- delete buffer when finished.
delete b;
}
return 0;
}
int main() {
HANDLE standard_input = GetStdHandle(STD_INPUT_HANDLE);
HANDLE processor = (HANDLE)_beginthreadex(NULL, 0, processing_thread, NULL, 0, NULL);
clock_t start = clock();
read(standard_input);
WaitForSingleObject(processor, INFINITE);
clock_t finish = clock();
std::cout << (float)(finish-start)/CLOCKS_PER_SEC << " Seconds.\n";
std::cout << result;
return 0;
}
This uses a thread-safe queue class I wrote years ago:
#ifndef QUEUE_H_INCLUDED
#define QUEUE_H_INCLUDED
namespace JVC_thread_queue {
template<class T, unsigned max = 256>
class queue {
HANDLE space_avail; // at least one slot empty
HANDLE data_avail; // at least one slot full
CRITICAL_SECTION mutex; // protect buffer, in_pos, out_pos
T buffer[max];
long in_pos, out_pos;
public:
queue() : in_pos(0), out_pos(0) {
space_avail = CreateSemaphore(NULL, max, max, NULL);
data_avail = CreateSemaphore(NULL, 0, max, NULL);
InitializeCriticalSection(&mutex);
}
void add(T data) {
WaitForSingleObject(space_avail, INFINITE);
EnterCriticalSection(&mutex);
buffer[in_pos] = data;
in_pos = (in_pos + 1) % max;
LeaveCriticalSection(&mutex);
ReleaseSemaphore(data_avail, 1, NULL);
}
T pop() {
WaitForSingleObject(data_avail,INFINITE);
EnterCriticalSection(&mutex);
T retval = buffer[out_pos];
out_pos = (out_pos + 1) % max;
LeaveCriticalSection(&mutex);
ReleaseSemaphore(space_avail, 1, NULL);
return retval;
}
~queue() {
DeleteCriticalSection(&mutex);
CloseHandle(data_avail);
CloseHandle(space_avail);
}
};
}
#endif
Exactly how much you gain from this depends on the amount of time spent reading versus the amount of time spent on other processing. In this case, the other processing is sufficiently trivial that it probably doesn't gain much. If more time was spent on processing the data, multi-threading would probably gain more.
2.5mb/sec is 400ns/byte.
There are two big per-byte processes, file input and parsing.
For the file input, I would just load it into a big memory buffer. fread should be able to read that in at roughly full disc bandwidth.
For the parsing, sscanf is built for generality, not speed. atoi should be pretty fast. My habit, for better or worse, is to do it myself, as in:
#define DIGIT(c)((c)>='0' && (c) <= '9')
bool parsInt(char* &p, int& num){
while(*p && *p <= ' ') p++; // scan over whitespace
if (!DIGIT(*p)) return false;
num = 0;
while(DIGIT(*p)){
num = num * 10 + (*p++ - '0');
}
return true;
}
The loops, first over leading whitespace, then over the digits, should be nearly as fast as the machine can go, certainly a lot less than 400ns/byte.
Dividing two large numbers is hard. Perhaps an improvement would be to first characterize k a little by looking at some of the smaller primes. Let's say 2, 3, and 5 for now. If k is divisible by any of these, than inputnum also needs to be or inputnum is not divisible by k. Of course there are more tricks to play (you could use bitwise and of inputnum to 1 to determine whether you are divisible by 2), but I think just removing the low prime possibilities will give a reasonable speed improvement (worth a shot anyway).