Simple byte changing with fgetc/fwrite fails - c++

UPDATE
I solved it with the answer that's marked as valid, but with one slight difference. I open the file using fopen(file, "r+b"), not fopen(file, "r+"). The b opens it in binary mode, and doesn't screw up the file.
I was doing a simple program which I called "fuzzer".
This is my code:
int main(int argc, char* argv[]){
// Here go some checks, such as argc being correct, etc.
// ...
// Read source file
FILE *fSource;
fSource = fopen(argv[1], "r+");
if(fSource == NULL){
cout << "Can't open file!";
return 2;
}
// Loop source file
char b;
int i = 0;
while((b = fgetc(fSource)) != EOF){
b ^= 0x13;
fseek(fSource, i++, SEEK_SET);
fwrite(&b, 1, sizeof(b), fSource);
}
fclose(fSource);
cout << "Fuzzed.";
return 0;
}
However, it doesn't work. Before, I used while(!feof), but it didn't work either, and I saw that it's not correct, so I changed it to (b = fgetc()) != EOF (I suppose it's correct, right?).
When I run it, it gets stuck on an endless loop, and it doesn't modify the original file, but rather appends tildes to it (and the file quickly increases its size, until I stop it). If I change the open mode from "a+" to "r+", it simply deletes the contents of the file (but it at least doesn't get stuck in an endless loop).
Note: I understand that this isn't any kind of obfuscation or encryption. I'm not trying to encode files, just practicing with C++ and files.

This code worked for me when tested on an Ubuntu 12.04 derivative with GCC 4.9.0:
#include <iostream>
#include <stdio.h>
using namespace std;
int main(int argc, char* argv[])
{
if (argc != 2)
{
cerr << "Usage: " << argv[0] << " file\n";
return 1;
}
FILE *fSource = fopen(argv[1], "r+");
if (fSource == NULL)
{
cerr << "Can't open file: " << argv[1] << "\n";
return 2;
}
int c;
int i = 0;
while ((c = fgetc(fSource)) != EOF)
{
char b = c ^ 0x13;
fseek(fSource, i++, SEEK_SET);
fwrite(&b, 1, sizeof(b), fSource);
fseek(fSource, i, SEEK_SET);
}
fclose(fSource);
cout << "Fuzzed: " << argv[1] << "\n";
return 0;
}
It reports file names; it reports errors to standard error (cerr); it uses int c; to read the character, but copies that to char b so that the fwrite() works. When run on (a copy of) its own source code, the first time the output looks like gibberish, and the second time recovers the original.
This loop, using fputc() instead of fwrite(), also works without needing the intermediate variable b:
while ((c = fgetc(fSource)) != EOF)
{
fseek(fSource, i++, SEEK_SET);
fputc(c ^ 0x13, fSource);
fseek(fSource, i, SEEK_SET);
}
The use of an fseek() after the read and after the write is mandated by the C standard. I'm not sure whether that's the main cause of your trouble, but it could in theory be one of the issues.

You need int b;. A char can never be EOF. The manual describes all this. All in all, something like this:
for (int b, i = 0; (b = fgetc(fSource)) != EOF; ++i)
{
unsigned char x = b;
x ^= 0x13;
fseek(fSource, i, SEEK_SET);
fwrite(&x, 1, 1, fSource);
fseek(fSource, i + 1, SEEK_SET);
}
You should also open the file with mode "rb+", and seek between each read and write (thanks #Jonathan Leffler).

Related

Splitting csv large files into small files with dynamic names using C++

I am a beginner, so I apologise if my question looks childish. I have 38 large files in a folder. I want to split each of the files into smaller parts with dynamic name. Line 1 to line 13 works well. The challenge is in line 16-19. The output shows that the whole data from the ifstream is not appearing as char. This error makes it difficult to split the files. Please what am I getting wrong
#define SEGMENT 728300 //approximate target size of small file
using namespace std;
long file_size(char *name);//function definition below
int main(int argc, char **argv)
{
char input_file_1[100]; // input file
strcpy(input_file_1,argv[1]);
string PathToData = "path to the files";
TString name = PathToData+input_file_1;
std::cout << "Reading file " << name << endl;
char getdata[35000];
ifstream csv_db(name);
while(csv_db.getline(getdata,sizeof(csv_db)))
if (csv_db.eof())
csv_db.close();
int segments=0, i, accum;
FILE *fp1, *fp2;
unsigned int huga=strlen(getdata);
char largeFileName[huga + 100]; // Make sure there's enough space
strcpy(largeFileName, getdata);
std::cout << largeFileName << endl;
std::cout << largeFileName << endl;
long sizeFile = file_size(largeFileName);
segments = sizeFile/SEGMENT + 1980;//ensure end of file
char filename[360]={"path to folder where to keep the result"};
char smallFileName[360];
char line[1080];
fp1 = fopen(largeFileName, "r");
if(fp1)
{
for(i=1980;i<segments;i++)
{
accum = 0;
sprintf(smallFileName, "%s%d.csv", filename, i);
fp2 = fopen(smallFileName, "w");
if(fp2)
{
while(fgets(line, 1080, fp1) && accum <= SEGMENT)
{
accum += strlen(line);//track size of growing file
fputs(line, fp2);
}
fclose(fp2);
}
}
fclose(fp1);
}
return 0;
}
long file_size(char *name)
{
FILE *fp = fopen(name, "rb"); //must be binary read to get bytes
long size=-1;
if(fp)
{
fseek (fp, 0, SEEK_END);
size = ftell(fp)+1;
fclose(fp);
}
return size;
}

Cannot Read Binary files in byte mode in C++

I am trying to read a binary file's data sadly opening in C++ is a lot different than in python for these things as they have byte mode. It seems C++ does not have that.
for (auto p = directory_iterator(path); p != directory_iterator(); p++) {
if (!is_directory(p->path()))
byte tmpdata;
std::ifstream tmpreader;
tmpreader.open(desfile, std::ios_base::binary);
int currentByte = tmpreader.get();
while (currentByte >= 0)
{
//std::cout << "Does this get Called?" << std::endl;
int currentByte = tmpreader.get();
tmpdata = currentByte;
}
tmpreader.close()
}
else
{
continue;
}
I want basically a clone of Python's methods of opening a file in 'rb' mode. To have to actual byte data of all of the contents (which is not readable as it has nonprintable chars even for C++. Most of which probably cant be converted to signed chars just because it contains zlib compressed data that I need to feed in my DLL to decompress it all.
I do know that in Python I can do something like this:
file_object = open('[file here]', 'rb')
turns out that replacing the C++ Code above with this helps. However fopen is depreciated but I dont care.
What the Code above did not do was work because I was not reading from the buffer data. I did realize later that fopen, fseek, fread, and fclose was the functions I needed for read bytes mode ('rb').
for (auto p = directory_iterator(path); p != directory_iterator(); p++) {
if (!is_directory(p->path()))
{
std::string desfile = p->path().filename().string();
byte tmpdata;
unsigned char* data2;
FILE *fp = fopen("data.d", "rb");
fseek(fp, 0, SEEK_END); // GO TO END OF FILE
size_t size = ftell(fp);
fseek(fp, 0, SEEK_SET); // GO BACK TO START
data2 = new unsigned char[size];
tmpdata = fread(data2, 1, size, fp);
fclose(fp);
}
else
{
continue;
}
int currentByte = tmpreader.get();
while (currentByte >= 0)
{
//std::cout << "Does this get Called?" << std::endl;
int currentByte = tmpreader.get();
//^ here!
You are declaring a second variable hiding the outer one. However, this inner one is only valid within the while loop's body, so the while condition checks the outer variable which is not modified any more. Rather do it this way:
int currentByte;
while ((currentByte = tmpreader.get()) >= 0)
{

how to make 10 copies of initial file, if first file is as-1.txt second should be as-2.txt and so on

Loop isn't making 10 copies and i have no idea how to change file names
#include "iostream"
#include "fstream"
#include "windows.h"
using namespace std;
void main()
{
char str[200];
ifstream myfile("as-1.txt");
if (!myfile)
{
cerr << "file not opening";
exit(1);
}
for (int i = 0; i < 10; i++)
{
ofstream myfile2("as-2.txt");
while (!myfile.eof())
{
myfile.getline(str, 200);
myfile2 << str << endl;
}
}
system("pause");
}
Solution using plain C API from <cstdio>. Easily customizable.
const char* file_name_format = "as-%d.txt"; //Change that if you need different name pattern
const char* original_file_name = "as-1.txt"; //Original file
const size_t max_file_name = 255;
FILE* original_file = fopen(original_file_name, "r+");
if(!original_file)
//file not found, handle error
fseek(original_file, 0, SEEK_END); //(*)
long file_size = ftell(original_file);
fseek(original_file, 0, SEEK_SET);
char* original_content = (char*)malloc(file_size);
fread(original_content, file_size, 1, original_file);
fclose(original_file);
size_t copies_num = 10;
size_t first_copy_number = 2;
char file_name[max_file_name];
for(size_t n = first_copy_number; n < first_copy_number + copies_num; ++n)
{
snprintf(file_name, max_file_name, file_name_format, n);
FILE* file = fopen(file_name, "w");
fwrite(original_content, file_size, 1, file);
fclose(file);
}
free(original_content);
(*) As noted on this page, SEEK_END may not necessarily be supported (i.e. it is not a portable solution). However most POSIX-compliant systems (including the most popular Linux distros), Windows family and OSX support this without any problems.
Oh, and one more thing. This line
while (!myfile.eof())
is not quite correct. Read this question - it explains why you shouldn't write such code.
int main()
{
const int copies_of_file = 10;
for (int i = 1; i <= copies_of_file; ++i)
{
std::ostringstream name;
name << "filename as-" << i << ".txt";
std::ofstream ofile(name.str().c_str());
ofile.close();
}
return 0;
}
That will make 10 copies of a blank .txt file named "filename as-1.txt" "filename as-2.txt" etc.
Note also the use of int main: main always has a return of int, never void

Number of open file in a C++ program

Is there a simple way to get the number of files opened by a c++ program.
I would like to do it from my code, ideally in C++.
I found this blog article which is using a loop through all the available file descriptor and testing the result of fstat but I am wondering if there is any simpler way to do that.
Edit
It seems that there are no other solution than keeping a count of the files opened. Thanks to everybody for your help.
Kevin
Since the files are FILE *, we could do something like this:
In a headerfile that gets included everywhere:
#define fopen(x, y) debug_fopen(x, y, __FILE__, __LINE__)
#define fclose(x) debug_fclose(x)
in "debugfile.cpp" (must obviously NOT use the above #define's)
struct FileInfo
{
FileInfo(const char *nm, const char fl, int ln) :
name(nm), file(fl), line(ln) {}
std::string name;
const char *file;
int line;
};
std::map<FILE*, FileInfo> filemap;
FILE *debug_fopen(const char *fname, const char *mode, const char *file, int line)
{
FILE *f = fopen(fname, mode);
if (f)
{
FileInfo inf(fname, file, line);
filemap[f] = inf;
}
}
int debug_fclose(FILE *f)
{
int res = fclose(f);
filemap.erase(f);
return res;
}
// Called at some points.
void debug_list_openfiles()
{
for( i : filemap )
{
cerr << "File" << (void *) i.first << " opened as " << i.second.name
<< " at " << i.second.file << ":" << i.second.line << endl;
}
}
(I haven't compiled this code, and it's meant to show the concept, it may have minor bugs, but I think the concept would hold - as long as your code, and not some third party library is leaking)
This is a legitimate question: I count open file descriptors in unit tests to verify none has leaked. On Linux systems there's one entry in /proc/self/fd for each open file descriptor, so you just have to count them. In c++17 it looks like this:
long file_descriptor_count() {
return std::distance(std::filesystem::directory_iterator("/proc/self/fd"), std::filesystem::iterator{});
}
there is a good practice that the scope of file opened in the smallest possible, open dump all information you want, or buffer into the fd, then close.
so this mean usual case we will have 3 fd the std in/out/err, plus all opened files.
keep track of your open files manually is the best, if you keep files opened.
put a global fdCounter variable,
increment it after a successful file open, decremented after closing
If you are under linux, this information is available under /proc/you_pid/fd.
Then use, lstat on each file descriptor to keep only regular files.
If you encapsulated it properly, it should be simple to add reference counters or logging to it and print them to the console.
One approach to debug it is to override the open calls with your own implementation and from there call the real thing. Then you can also put some logging in, to see if you loose file descriptors. How do you open the files? With open() or are you using fopen()?
Something like this maybe:
#include <fstream>
#include <iostream>
#include <stdlib.h>
#include <fcntl.h>
inline int log_open(char *p, int flags, int mode)
{
int fd = ::open(p, flags, mode);
std::cout << "OPEN: " << fd << std::endl;
return fd;
}
inline int log_close(int fd)
{
int rc = ::close(fd);
std::cout << "CLOSE: " << fd << std::endl;
return rc;
}
#define open(p, f, m) log_open(p, f, m)
#define close(fd) log_close(fd)
int main(int argc, char *argv[])
{
int fd = open("tmp.txt", O_RDWR | O_CREAT | O_TRUNC, 0666);
std::cout << "FD: " << fd << std::endl;
if(fd != -1)
close(fd);
return 0;
}
In my experience, by the time you need to count the number of file descriptors, you don't know where they were opened, by what submodule or library. Thus, wrapping open/close is not a viable strategy. Brute-force counting seems to be the only way.
The domain with the orig blog post no longer resolves in DNS. I copy two proposals from Find current number of open filehandle ( NOT lsof )
int j, n = 0;
// count open file descriptors
for (j = 0; j < FDMAX; ++j) // FDMAX should be retrieved from process limits,
// but a constant value of >=4K should be
// adequate for most systems
{
int fd = dup (j);
if (fd < 0)
continue;
++n;
close (fd);
}
printf ("%d file descriptors open\n", n);
and also this:
#include <stdio.h>
#include <sys/types.h>
#include <dirent.h>
int main (void)
{
DIR *dp;
struct dirent *ep;
dp = opendir ("/proc/MYPID/fd/");
if (dp != NULL)
{
while (ep = readdir (dp))
puts (ep->d_name);
(void) closedir (dp);
}
else
perror ("Couldn't open the directory");
return 0;
}

Help Editing Code to Fix "Argument list too long" Error

I am currently doing some testing with a new addition to the ICU dictionary-based break iterator.
I have code that allows me to test the word-breaking on a text document but when the text document is too large it gives the error: bash: ./a.out: Argument list too long
I am not sure how to edit the code to break-up the argument list when it gets too long so that a file of any size can be run through the code. The original code author is quite busy, would someone be willing to help out?
I tried removing the printing of what is being examined to see if that would help, but I still get the error on large files (printing what is being examined isn't necessary - I just need the result).
If the code could be modified to read the source text file line by line and export the results line by line to another text file (ending up with all the lines when it is done), that would be perfect.
The code is as follows:
/*
Written by George Rhoten to test how word segmentation works.
Code inspired by the break ICU sample.
Here is an example to run this code under Cygwin.
PATH=$PATH:icu-test/source/lib ./a.exe "`cat input.txt`" > output.txt
Encode input.txt as UTF-8.
The output text is UTF-8.
*/
#include <stdio.h>
#include <unicode/brkiter.h>
#include <unicode/ucnv.h>
#define ZW_SPACE "\xE2\x80\x8B"
void printUnicodeString(const UnicodeString &s) {
int32_t len = s.length() * U8_MAX_LENGTH + 1;
char *charBuf = new char[len];
len = s.extract(0, s.length(), charBuf, len, NULL);
charBuf[len] = 0;
printf("%s", charBuf);
delete charBuf;
}
/* Creating and using text boundaries */
int main(int argc, char **argv)
{
ucnv_setDefaultName("UTF-8");
UnicodeString stringToExamine("Aaa bbb ccc. Ddd eee fff.");
printf("Examining: ");
if (argc > 1) {
// Override the default charset.
stringToExamine = UnicodeString(argv[1]);
if (stringToExamine.charAt(0) == 0xFEFF) {
// Remove the BOM
stringToExamine = UnicodeString(stringToExamine, 1);
}
}
printUnicodeString(stringToExamine);
puts("");
//print each sentence in forward and reverse order
UErrorCode status = U_ZERO_ERROR;
BreakIterator* boundary = BreakIterator::createWordInstance(NULL, status);
if (U_FAILURE(status)) {
printf("Failed to create sentence break iterator. status = %s",
u_errorName(status));
exit(1);
}
printf("Result: ");
//print each word in order
boundary->setText(stringToExamine);
int32_t start = boundary->first();
int32_t end = boundary->next();
while (end != BreakIterator::DONE) {
if (start != 0) {
printf(ZW_SPACE);
}
printUnicodeString(UnicodeString(stringToExamine, start, end-start));
start = end;
end = boundary->next();
}
delete boundary;
return 0;
}
Thanks so much!
-Nathan
The Argument list too long error message is coming from the bash shell and is happening before your code even gets started executing.
The only code you can fix to eliminate this problem is the bash source code (or maybe it is in the kernel) and then, you're always going to run into a limit. If you increase from 2048 files on command line to 10,000, then some day you'll need to process 10,001 files ;-)
There are numerous solutions to managing 'too big' argument lists.
The standardized solution is the xargs utility.
find / -print | xargs echo
is a un-helpful, but working example.
See How to use "xargs" properly when argument list is too long for more info.
Even xargs has problems, because file names can contain spaces, new-line chars, and other unfriendly stuff.
I hope this helps.
The code below reads the content of a file whos name is given as the first parameter on the command-line and places it in a str::buffer. Then, instead of calling the function UnicodeString with argv[1], use that buffer instead.
#include<iostream>
#include<fstream>
using namespace std;
int main(int argc, char **argv)
{
std::string buffer;
if(argc > 1) {
std::ifstream t;
t.open(argv[1]);
std::string line;
while(t){
std::getline(t, line);
buffer += line + '\n';
}
}
cout << buffer;
return 0;
}
Update:
Input to UnicodeString should be char*. The function GetFileIntoCharPointer does that.
Note that only the most rudimentary error checking is implemented below!
#include<iostream>
#include<fstream>
using namespace std;
char * GetFileIntoCharPointer(char *pFile, long &lRet)
{
FILE * fp = fopen(pFile,"rb");
if (fp == NULL) return 0;
fseek(fp, 0, SEEK_END);
long size = ftell(fp);
fseek(fp, 0, SEEK_SET);
char *pData = new char[size + 1];
lRet = fread(pData, sizeof(char), size, fp);
fclose(fp);
return pData;
}
int main(int argc, char **argv)
{
long Len;
char * Data = GetFileIntoCharPointer(argv[1], Len);
std::cout << Data << std::endl;
if (Data != NULL)
delete [] Data;
return 0;
}