std::cin error for large file

std::cin error for large file - c++

I was benchmarking some I/O code with large input( 1Mb text file of integer, separated by tab, spaces or endline) then the normal cin method
int temp;
cin >> temp;
while(temp!=0){ cin >> temp;}
Got into an infinite loop with temp value at 15
, there is no such long sequences in the input file however
The cooked up integer parsing method, with fread however did just fine
with a clock time of around 0.02ms
void readAhead(size_t amount){
size_t remaining = stdinDataEnd - stdinPos;
if (remaining < amount){
memmove(stdinBuffer, stdinPos, remaining);
size_t sz = fread(stdinBuffer + remaining, 1, sizeof(stdinBuffer) - remaining, stdin);
stdinPos = stdinBuffer;
stdinDataEnd = stdinBuffer + remaining + sz;
if (stdinDataEnd != stdinBuffer + sizeof(stdinBuffer)){
*stdinDataEnd = 0;
}
}
}
int readInt(){
readAhead(16);
int x = 0;
bool neg = false;
// Skipp whitespace manually
while(*stdinPos == ' ' || *stdinPos == '\n' || *stdinPos == '\t'){
++stdinPos;
}
if (*stdinPos == '-') {
++stdinPos;
neg = true;
}
while (*stdinPos >= '0' && *stdinPos <= '9') {
x *= 10;
x += *stdinPos - '0';
++stdinPos;
}
return neg ? -x : x;
}
Any direction on how might cin get stuck ?

Related

run-length encoding is not working with big numbers

I have a assingment were I need to code and decode txt files, for example: hello how are you? has to be coded as hel2o how are you? and aaaaaaaaaajkle as a10jkle.
while ( ! invoer.eof ( ) ) {
if (kar >= '0' && kar <= '9') {
counter = kar-48;
while (counter > 1){
uitvoer.put(vorigeKar);
counter--;
}
}else if (kar == '/'){
kar = invoer.get();
uitvoer.put(kar);
}else{
uitvoer.put(kar);
}
vorigeKar = kar;
kar = invoer.get ( );
}
but the problem I have is if need to decode a12bhr, the answer is aaaaaaaaaaaabhr but I can't seem to get the 12 as number without problems, I also can't use any strings.

try to put a repeated character when next is not numeric or end of string.
For prepare this, it needs to make number by parsing string.
about this, I recommend you to find how to convert string to integer in real time at C++.
bool isNumeric(char ch) {
return '0' <= ch && ch <= '9';
}
string decode(const string& s) {
int counter = 0;
string result;
char prevCh;
for (int i = 0; i < s.length(); i++) {
if (isNumeric(s[i])) { // update counter
counter = counter * 10 + (s[i] - '0');
if (isNumeric(s[i + 1]) == false || i + 1 == s.length()) {
// now, put previous character stacked
while (counter-- > 1) {
result.push_back(prevCh);
}
counter = 0;
}
}
else {
result.push_back(s[i]);
prevCh = s[i];
}
}
return result;
}
now, decode("a12bhr3") returns aaaaaaaaaaaabhrrr. it works well.

Conver BCD Strings to Decimal

I am looking for better ways to optimize this function for better performance, speed its targeted towards embedded device. i welcome any pointers, suggestion thanks
function converts string BCD to Decimal
int ConvertBCDToDecimal(const std::string& str, int splitLength)
{
int NumSubstrings = str.length() / splitLength;
std::vector<std::string> ret;
int newvalue;
for (auto i = 0; i < NumSubstrings; i++)
{
ret.push_back(str.substr(i * splitLength, splitLength));
}
// If there are leftover characters, create a shorter item at the end.
if (str.length() % splitLength != 0)
{
ret.push_back(str.substr(splitLength * NumSubstrings));
}
string temp;
for (int i=0; i<(int)ret.size(); i++)
{
temp +=ReverseBCDFormat(ret[i]);
}
return newvalue =std::stoi(temp);
}
string ReverseBCDFormat(string num)
{
if( num == "0000")
{
return "0";
}
else if( num == "0001")
{
return "1";
}
else if( num == "0010")
{
return "2";
}
else if( num == "0011")
{
return "3";
}
else if( num == "0100")
{
return "4";
}
else if( num == "0101")
{
return "5";
}
else if( num == "0110")
{
return "6";
}
else if( num == "0111")
{
return "7";
}
else if( num == "1000")
{
return "8";
}
else if( num == "1001")
{
return "9";
}
else
{
return "0";
}
}
Update
this is what i plan to get, for a BCD Value::0010000000000000 Decimal Result 2000

BCD is a method of encoding decimal numbers, two to a byte.
For instance 0x12345678 is the BCD representation of the decimal number 12345678. But, that doesn't seem to be what you're processing. So, I'm not sure you mean BCD when you say BCD.
As for the code, you could speed it up quite a bit by iterating over each substring and directly calculating the value. At a minimum, change ReverseBCDFormat to return an integer instead of a string and calculate the string on the fly:
temp = temp * 10 + ReverseBCDFormat(...)
Something like that.

What you call BCD is not actually BCD.
With that out of the way, you can do this:
int ConvertBCDToDecimal(const std::string& str, int splitLength)
{
int ret = 0;
for (unsigned i = 0, n = unsigned(str.size()); i < n; )
{
int v = 0;
for (unsigned j = 0; j < splitLength && i < n; ++j, ++i)
v = 2*v + ('1' == str[i] ? 1 : 0); // or 2*v + (str[i]-'0')
ret = 10*ret + v;
}
return ret;
}
Get rid of all the useless vector making and string copying. You don't need any of those.
Also, I think your code has a bug when processing strings with lengths that aren't a multiple of splitLength. I think your code always considers them to be zero. In fact, now that I think about it, your code won't work with any splitLength other than 4.
BTW, if you provide some sample inputs along with their expected outputs, I would be able to actually verify my code against yours (given that your definition of BCD differs from that of most people, what your code does is not exactly clear.)

as soon as you're optimizing function, here is different variant:
int ConvertBCDToDecimal(const std::string& str) {
unsigned int result = 0;
const std::string::size_type l = str.length();
for (std::string::size_type i = 0; i < l; i += 4)
result = result * 10 + ((str[i] - '0') << 3) + ((str[i + 1] - '0') << 2) + ((str[i + 2] - '0') << 1) + (str[i + 3] - '0');
return result;
}
note: you don't need splitLength argument, as you know that every digit is 4 symbols

Htoi incorrect output at 10 digits

When I input
0x123456789
I get incorrect outputs, I can't figure out why. At first I thought it was a max possible int value problem, but I changed my variables to unsigned long and the problem was still there.
#include <iostream>
using namespace std;
long htoi(char s[]);
int main()
{
cout << "Enter Hex \n";
char hexstring[20];
cin >> hexstring;
cout << htoi(hexstring) << "\n";
}
//Converts string to hex
long htoi(char s[])
{
int charsize = 0;
while (s[charsize] != '\0')
{
charsize++;
}
int base = 1;
unsigned long total = 0;
unsigned long multiplier = 1;
for (int i = charsize; i >= 0; i--)
{
if (s[i] == '0' || s[i] == 'x' || s[i] == 'X' || s[i] == '\0')
{
continue;
}
if ( (s[i] >= '0') && (s[i] <= '9') )
{
total = total + ((s[i] - '0') * multiplier);
multiplier = multiplier * 16UL;
continue;
}
if ((s[i] >= 'A') && (s[i] <= 'F'))
{
total = total + ((s[i] - '7') * multiplier); //'7' equals 55 in decimal, while 'A' equals 65
multiplier = multiplier * 16UL;
continue;
}
if ((s[i] >= 'a') && (s[i] <= 'f'))
{
total = total + ((s[i] - 'W') * multiplier); //W equals 87 in decimal, while 'a' equals 97
multiplier = multiplier * 16UL;
continue;
}
}
return total;
}

long probably is 32 bits on your computer as well. Try long long.

You need more than 32 bits to store that number. Your long type could well be as small as 32 bits.
Use a std::uint64_t instead. This is always a 64 bit unsigned type. If your compiler doesn't support that, use a long long. That must be at least 64 bits.

The idea follows the polynomial nature of a number. 123 is the same as
1*102 + 2*101 + 3*100
In other words, I had to multiply the first digit by ten two times. I had to multiply 2 by ten one time. And I multiplied the last digit by one. Again, reading from left to right:
Multiply zero by ten and add the 1 → 0*10+1 = 1.
Multiply that by ten and add the 2 → 1*10+2 = 12.
Multiply that by ten and add the 3 → 12*10+3 = 123.
We will do the same thing:
#include <cctype>
#include <ciso646>
#include <iostream>
using namespace std;
unsigned long long hextodec( const std::string& s )
{
unsigned long long result = 0;
for (char c : s)
{
result *= 16;
if (isdigit( c )) result |= c - '0';
else result |= toupper( c ) - 'A' + 10;
}
return result;
}
int main( int argc, char** argv )
{
cout << hextodec( argv[1] ) << "\n";
}
You may notice that the function is more than three lines. I did that for clarity. C++ idioms can make that loop a single line:
for (char c : s)
result = (result << 4) | (isdigit( c ) ? (c - '0') : (toupper( c ) - 'A' + 10));
You can also do validation if you like. What I have presented is not the only way to do the digit-to-value conversion. There exist other methods that are just as good (and some that are better).
I do hope this helps.

I found out what was happening, when I inputted "1234567890" it would skip over the '0' so I had to modify the code. The other problem was that long was indeed 32-bits, so I changed it to uint64_t as suggested by #Bathsheba. Here's the final working code.
#include <iostream>
using namespace std;
uint64_t htoi(char s[]);
int main()
{
char hexstring[20];
cin >> hexstring;
cout << htoi(hexstring) << "\n";
}
//Converts string to hex
uint64_t htoi(char s[])
{
int charsize = 0;
while (s[charsize] != '\0')
{
charsize++;
}
int base = 1;
uint64_t total = 0;
uint64_t multiplier = 1;
for (int i = charsize; i >= 0; i--)
{
if (s[i] == 'x' || s[i] == 'X' || s[i] == '\0')
{
continue;
}
if ( (s[i] >= '0') && (s[i] <= '9') )
{
total = total + ((uint64_t)(s[i] - '0') * multiplier);
multiplier = multiplier * 16;
continue;
}
if ((s[i] >= 'A') && (s[i] <= 'F'))
{
total = total + ((uint64_t)(s[i] - '7') * multiplier); //'7' equals 55 in decimal, while 'A' equals 65
multiplier = multiplier * 16;
continue;
}
if ((s[i] >= 'a') && (s[i] <= 'f'))
{
total = total + ((uint64_t)(s[i] - 'W') * multiplier); //W equals 87 in decimal, while 'a' equals 97
multiplier = multiplier * 16;
continue;
}
}
return total;
}

LZ77 compression of palmdoc

I am trying to create a utility for generating palmdoc/mobipocket format ebook files, it is said that mobi uses LZ77 compression technique to compress their records, but I found that there is quite a deviation from standard LZ77, My main source of reference is Calibre ebook creator with C implementation for palmdoc
In this file, uncompress works well, but I have not been able to compress a mobi record identically similar either using other implementation or this (Calibre code doent decompress the same).
I found some differences like, (<-- my comments follow in code)
static Py_ssize_t <-- can replaced with size_t
cpalmdoc_do_compress(buffer *b, char *output) {
Py_ssize_t i = 0, j, chunk_len, dist;
unsigned int compound;
Byte c, n;
bool found;
char *head;
buffer temp;
head = output;
temp.data = (Byte *)PyMem_Malloc(sizeof(Byte)*8); temp.len = 0;
if (temp.data == NULL) return 0;
while (i < b->len) {
c = b->data[i];
//do repeats
if ( i > 10 && (b->len - i) > 10) { <-- ignores any match outside this range
found = false;
for (chunk_len = 10; chunk_len > 2; chunk_len--) {
j = cpalmdoc_rfind(b->data, i, chunk_len);
dist = i - j;
if (j < i && dist <= 2047) { <-- 2048 window size instead of 4096
found = true;
compound = (unsigned int)((dist << 3) + chunk_len-3);
*(output++) = CHAR(0x80 + (compound >> 8 ));
*(output++) = CHAR(compound & 0xFF);
i += chunk_len;
break;
}
}
if (found) continue;
}
//write single character
i++;
if (c == 32 && i < b->len) { <-- if space is encountered skip char & check for next sequence for match otherwise do this, due to this code had wrong result.
n = b->data[i];
if ( n >= 0x40 && n <= 0x7F) {
*(output++) = CHAR(n^0x80); i++; continue;
}
}
if (c == 0 || (c > 8 && c < 0x80))
*(output++) = CHAR(c);
else { // Write binary data <-- why binary data? LZ is for text encoding
j = i;
temp.data[0] = c; temp.len = 1;
while (j < b->len && temp.len < 8) {
c = b->data[j];
if (c == 0 || (c > 8 && c < 0x80)) break;
temp.data[temp.len++] = c; j++;
}
i += temp.len - 1;
*(output++) = (char)temp.len;
for (j=0; j < temp.len; j++) *(output++) = (char)temp.data[j];
}
}
PyMem_Free(temp.data);
return output - head;
}
is this implementation correct?

PalmDoc compression essentially is byte pair compression, i.e. a variant of LZ77

unknown C++ heap-corruption in InDev bit based encryption program, new to C++ programming

Ok, I've been trying to self learn C++ and as such decided to try make an encrypt/decrypt program. The idea is to open a file and edit bits according to the password. I'm having some problems with my code and by using break-points I have found that the error arises when I open the file (it is in the main() about a third of the way down). Visual C++ tells me that the heap has become corrupt, and I'm at a loss as to why. Any help would be greatly appreciated.
#include <iostream>
#include <fstream>
#include <sstream>
#include <stdio.h>
#include <fstream>
#include <sys/stat.h>
using namespace std;
unsigned char fileData[31];
bool *password;
int count(0), maxCount;
/*
* Programmer: P7r0
* Program: Encrypt/Decrypt
* Version: InDev
* Date Released: -
*
* Notes:
* -
*/
struct bits{
// Breaks each byte into its 8 bits
unsigned int b1 : 1;
unsigned int b2 : 1;
unsigned int b3 : 1;
unsigned int b4 : 1;
unsigned int b5 : 1;
unsigned int b6 : 1;
unsigned int b7 : 1;
unsigned int b8 : 1;
} ;
// Toggles the bits, ie if 1 make 0
int swap(int Obj){
if (Obj = 1){return 0;}
else if (Obj = 0){return 1;}
}
void conversion(string convert){
// User password to a boolean array
int ascii, loop, count, a, counter(0);
const char *code;
bool bin [ ] = {false,false,false,false,false,false,false,false};
// Create an array for the booleans
password = new bool [convert.length()];
code = convert.c_str();
for (loop = 0;loop < convert.length(); loop++){
for (a = 0;a < 8;a++){bin[a] = false;}
// Get the equivilent ASCII code
ascii = int(code[loop]);
while (ascii > 0){
// Develop a tempory binary array with code based off of the ASCII values
if (ascii >= 128){ascii -= 128;bin[0] = true;}
else if (ascii >= 64){ascii -= 64;bin[1] = true;}
else if (ascii >= 32){ascii -= 32;bin[2] = true;}
else if (ascii >= 16){ascii -= 16;bin[3] = true;}
else if (ascii >= 8){ascii -= 8;bin[4] = true;}
else if (ascii >= 4){ascii -= 4;bin[5] = true;}
else if (ascii >= 2){ascii -= 2;bin[6] = true;}
else if (ascii >= 1){ascii -= 1;bin[7] = true;}
}
for (count = 0; count < 8; count++){
// Move out of the tempory array into the main array for global use
//cout << bin[count];
password[counter] = bin[count];
counter++;
}
//cout << ":\n";
}
}
int encrypt(int loop){
// Changes everything bit by bit in blocks of bytes the size of loop, typically 32
int a, b, counter(0);
bits bit;
for (a = 0; a == loop; a++){
bit = * (bits*)(&fileData[a]);
cout << bit.b1 << "\t";
for (b = 0; b == 7; b++){
if (count = maxCount){count = 0;}
if (password[count] = true){
// If current password array is true then toggle current bit
if (b = 0){bit.b1 = swap(bit.b1);}
else if (b = 1){bit.b2 = swap(bit.b2);}
else if (b = 2){bit.b3 = swap(bit.b3);}
else if (b = 3){bit.b4 = swap(bit.b4);}
else if (b = 4){bit.b5 = swap(bit.b5);}
else if (b = 5){bit.b6 = swap(bit.b6);}
else if (b = 6){bit.b7 = swap(bit.b7);}
else if (b = 7){bit.b8 = swap(bit.b8);}
count++;}
else {count++;}
}
cout << counter;
fileData[counter] = *(unsigned char*)(&bit);
counter++;
}
return 0;
}
int main(){
fstream file;
char *remainder;
int counter, size, temp, b(0), stackCount(0);
long begin, end;
string usrin, pass, pause, filedir;
cout << "Please input password, must be one word\n";
cin >> pass;
maxCount = pass.length();
conversion(pass);
// Change password data stored at its location as to avoid unwanted detection of the password
pass = "default";
cout << "\nPlease input file path\n";
cin >> filedir;
//The error seems to be here
file.open(filedir.c_str(),ios::in | ios::out | ios::binary);
// Check that the file is open
if (file.is_open()){
cout << "Encrypting...\n";
counter = 32;
// Work out size (bytes) of the file
begin = file.tellg();
file.seekg(0,ios::end);
end = file.tellg();
file.seekg(0,ios::beg);
b = file.tellg();
size = end-begin;
while((int)b <= size){
// Had to typecast as the unsigned/signed mis-match was throwing compile errors
file.read((char*)(&fileData),counter);
encrypt(counter);
if (size - b >= 32){
file.write((char*)(&fileData),counter);
b = file.tellg();
} else if (size - b < 32 && size - b > 0) {
remainder = new char [size - b];
for (int a = 0; a != size - b; a++){remainder[a] = fileData[a];}
file.write((char*)(remainder),size - b);
// To cancel out of the while loop
b += 1;
} else if (size - b == 0){b += 1;}
}
file.close();
cout << "\nEncrypted.\nPlease enter a letter to continue\n";
cin >> pause;
// Prompt user if unable to open the file
} else {cout << "Failed to open the file";}
return 0;
}

In your conversion()-method you have the following code:
int counter(0);
// ...
password = new bool [convert.length()];
// ...
for (count = 0; count < 8; count++){
password[counter] = bin[count];
counter++;
}
If the length of convert is less than 8, you will be writing outside the password-array inside the loop.
A heap-corruption will usually not be detected at once, which is why you do not get the error until opening the file.

Not sure if this is the cause of your problem, but in any case it is unwise to write directly to the file that you are reading from. Write to a temp file and rename the files when done.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

std::cin error for large file - c++

Related

run-length encoding is not working with big numbers

Conver BCD Strings to Decimal

Htoi incorrect output at 10 digits

LZ77 compression of palmdoc

unknown C++ heap-corruption in InDev bit based encryption program, new to C++ programming

Categories

Resources