Splitting csv large files into small files with dynamic names using C++ - c++

I am a beginner, so I apologise if my question looks childish. I have 38 large files in a folder. I want to split each of the files into smaller parts with dynamic name. Line 1 to line 13 works well. The challenge is in line 16-19. The output shows that the whole data from the ifstream is not appearing as char. This error makes it difficult to split the files. Please what am I getting wrong
#define SEGMENT 728300 //approximate target size of small file
using namespace std;
long file_size(char *name);//function definition below
int main(int argc, char **argv)
{
char input_file_1[100]; // input file
strcpy(input_file_1,argv[1]);
string PathToData = "path to the files";
TString name = PathToData+input_file_1;
std::cout << "Reading file " << name << endl;
char getdata[35000];
ifstream csv_db(name);
while(csv_db.getline(getdata,sizeof(csv_db)))
if (csv_db.eof())
csv_db.close();
int segments=0, i, accum;
FILE *fp1, *fp2;
unsigned int huga=strlen(getdata);
char largeFileName[huga + 100]; // Make sure there's enough space
strcpy(largeFileName, getdata);
std::cout << largeFileName << endl;
std::cout << largeFileName << endl;
long sizeFile = file_size(largeFileName);
segments = sizeFile/SEGMENT + 1980;//ensure end of file
char filename[360]={"path to folder where to keep the result"};
char smallFileName[360];
char line[1080];
fp1 = fopen(largeFileName, "r");
if(fp1)
{
for(i=1980;i<segments;i++)
{
accum = 0;
sprintf(smallFileName, "%s%d.csv", filename, i);
fp2 = fopen(smallFileName, "w");
if(fp2)
{
while(fgets(line, 1080, fp1) && accum <= SEGMENT)
{
accum += strlen(line);//track size of growing file
fputs(line, fp2);
}
fclose(fp2);
}
}
fclose(fp1);
}
return 0;
}
long file_size(char *name)
{
FILE *fp = fopen(name, "rb"); //must be binary read to get bytes
long size=-1;
if(fp)
{
fseek (fp, 0, SEEK_END);
size = ftell(fp)+1;
fclose(fp);
}
return size;
}

Related

fscanf line with condition

my goal is to read in a data file consisting of just one number per line and write the data into a histogram. There are some comments in the file behind # characters. I want to skip these lines.
I have started writing:
TH1F *hist = new TH1F("hist","",4096, -0.5,4095.5);
//TF1 *fitfunc;
char filename[100];
double val;
int i;
char line[256];
sprintf(filename,"test.dat");
FILE* pfile = fopen(filename, "r");
for (i=0;i<=14;i++) {
fgets(line,256,pfile);
cout<<line<<endl;
fscanf(pfile, "%lf /n", &val);
hist->SetBinContent(i,val);
}
But only every other line gets written as "line" while the others are fscanfed.
Would be very nice, if someone could give me a hint.
...so this will obviously not work properly:
TH1F *hist = new TH1F("hist","",4096, -0.5,4095.5);
//TF1 *fitfunc;
char filename[100];
double val;
int i;
char zeile[256];
sprintf(filename,"test.dat");
FILE* pfile = fopen(filename, "r");
for (i=0;i<=14;i++)
{
fgets(zeile,256,pfile);
cout<<"fgets: "<<zeile<<endl;
if (zeile[0]!='#')
{
fscanf(pfile, "%lf /n", &val);
cout<<"val: "<<val<<endl;
hist->SetBinContent(i,val);
}
}
You need to use sscanf() instead of fscanf() after you've read the line with fgets():
TH1F *hist = new TH1F("hist", "", 4096, -0.5, 4095.5);
char filename[100];
char zeile[256];
sprintf(filename, "test.dat");
FILE *pfile = fopen(filename, "r");
if (pfile == 0)
…handle error; do not continue…
for (int i = 0; i < 14 && fgets(zeile, sizeof(zeile), pfile) != 0; i++)
{
cout << "fgets: " << zeile << endl;
if (zeile[0] != '#')
{
double val;
if (sscanf(zeile, "%lf", &val) == 1)
{
cout << "val: " << val << endl;
hist->SetBinContent(i, val);
}
// else … optionally report that line was erroneous
}
}
I left the sprintf() for the file name in place, but it provides marginal value. I'd be tempted to use const char *filename = "test.dat"; so that the error message can report the file name that failed to open without repeating the string literal.
Converted into a standalone test program:
#include <cstdlib>
#include <iostream>
using namespace std;
int main()
{
char filename[100];
char zeile[256];
sprintf(filename, "test.dat");
FILE *pfile = fopen(filename, "r");
if (pfile != 0)
{
for (int i = 0; i < 14 && fgets(zeile, sizeof(zeile), pfile) != 0; i++)
{
cout << "fgets: " << zeile;
if (zeile[0] != '#')
{
double val;
if (sscanf(zeile, "%lf", &val) == 1)
cout << "val: " << val << endl;
}
}
fclose(pfile);
}
return 0;
}
and given a test data file test.dat containing:
1.234
2.345
#3.456
#4.567
5.678
the output from the program shown is:
fgets: 1.234
val: 1.234
fgets: 2.345
val: 2.345
fgets: #3.456
fgets: #4.567
fgets: 5.678
val: 5.678
This generates the three expected val lines and reads but ignores the two comment lines.

Inflation of pdf stream using zlib blank sometimes

I am a beginner programmer trying to inflate text stream from pdfs. I have adopted and slightly altered some open source code which uses zlib, and generally it works very well. However, I have been testing on some different pdfs lately and some of the inflated streams are returning blank. Could anybody advise me as to why?
I have come across this question below which seems to address the same problem but does not really give a definitive answer
zLib inflate has empty result in some cases
#include <iostream>
#include <fstream>
#include <string>
#include "zlib.h"
int main()
{
//Discard existing output:
//Open the PDF source file:
std::ifstream filei("C:\\Users\\dpbowe\\Desktop\\PIDSearch\\P&ID.PDF", std::ios::in|std::ios::binary|std::ios::ate);
if (!filei) std::cout << "Error Opening Input File" << std::endl;
//decoded output
std::ofstream fileo;
fileo.open("C:\\Users\\dpbowe\\Desktop\\Decoded.txt", std::ios::binary | std::ofstream::out);
if (!fileother) std::cout << "Error opening output file" << std::endl;
if (filei && fileo)
{
//Get the file length:
long filelen = filei.tellg(); //fseek==0 if ok
filei.seekg(0, std::ios::beg);
//Read the entire file into memory (!):
char* buffer = new char [filelen];
if (buffer == NULL) {fputs("Memory error", stderr); exit(EXIT_FAILURE);}
filei.read(buffer,filelen);
if (buffer == '\0') {fputs("Reading error", stderr); exit(EXIT_FAILURE);}
bool morestreams = true;
//Now search the buffer repeated for streams of data
while (morestreams)
{
//Search for stream, endstream. Should check the filter of the object to make sure it if FlateDecode, but skip that for now!
size_t streamstart = FindStringInBuffer (buffer, "stream", filelen); //This is my own search function
size_t streamend = FindStringInBuffer (buffer, "endstream", filelen); //This is my own search function
if (streamstart>0 && streamend>streamstart)
{
//Skip to beginning and end of the data stream:
streamstart += 6;
if (buffer[streamstart]==0x0d && buffer[streamstart+1]==0x0a) streamstart+=2;
else if (buffer[streamstart]==0x0a) streamstart++;
if (buffer[streamend-2]==0x0d && buffer[streamend-1]==0x0a) streamend-=2;
else if (buffer[streamend-1]==0x0a) streamend--;
//Assume output will fit into 10 times input buffer:
size_t outsize = (streamend - streamstart)*10;
char* output = new char [outsize]; ZeroMemory(output, outsize);
//Now use zlib to inflate:
z_stream zstrm; ZeroMemory(&zstrm, sizeof(zstrm));
zstrm.avail_in = streamend - streamstart + 1;
zstrm.avail_out = outsize;
zstrm.next_in = (Bytef*)(buffer + streamstart);
zstrm.next_out = (Bytef*)output;
int rsti = inflateInit(&zstrm);
if (rsti == Z_OK)
{
int rst2 = inflate (&zstrm, Z_FINISH);
if (rst2 >= 0)
{
size_t totout = zstrm.total_out;
//Write inflated output to file "Decoded.txt"
fileother<<output;
fileother<<"\r\nStream End\r\n\r\n";
}
else std::cout<<"output uncompressed stream is blank"<<std::endl;
}
delete[] output; output=0;
buffer+= streamend + 7;
filelen = filelen - (streamend+7);
}
else
{
morestreams = false;
std::cout<<"End of File"<<std::endl;
}
}
filei.close();
}
else
{
std::cout << "File Could Not Be Accessed\n";
}
if (fileo) fileo.close();
}

how to make 10 copies of initial file, if first file is as-1.txt second should be as-2.txt and so on

Loop isn't making 10 copies and i have no idea how to change file names
#include "iostream"
#include "fstream"
#include "windows.h"
using namespace std;
void main()
{
char str[200];
ifstream myfile("as-1.txt");
if (!myfile)
{
cerr << "file not opening";
exit(1);
}
for (int i = 0; i < 10; i++)
{
ofstream myfile2("as-2.txt");
while (!myfile.eof())
{
myfile.getline(str, 200);
myfile2 << str << endl;
}
}
system("pause");
}
Solution using plain C API from <cstdio>. Easily customizable.
const char* file_name_format = "as-%d.txt"; //Change that if you need different name pattern
const char* original_file_name = "as-1.txt"; //Original file
const size_t max_file_name = 255;
FILE* original_file = fopen(original_file_name, "r+");
if(!original_file)
//file not found, handle error
fseek(original_file, 0, SEEK_END); //(*)
long file_size = ftell(original_file);
fseek(original_file, 0, SEEK_SET);
char* original_content = (char*)malloc(file_size);
fread(original_content, file_size, 1, original_file);
fclose(original_file);
size_t copies_num = 10;
size_t first_copy_number = 2;
char file_name[max_file_name];
for(size_t n = first_copy_number; n < first_copy_number + copies_num; ++n)
{
snprintf(file_name, max_file_name, file_name_format, n);
FILE* file = fopen(file_name, "w");
fwrite(original_content, file_size, 1, file);
fclose(file);
}
free(original_content);
(*) As noted on this page, SEEK_END may not necessarily be supported (i.e. it is not a portable solution). However most POSIX-compliant systems (including the most popular Linux distros), Windows family and OSX support this without any problems.
Oh, and one more thing. This line
while (!myfile.eof())
is not quite correct. Read this question - it explains why you shouldn't write such code.
int main()
{
const int copies_of_file = 10;
for (int i = 1; i <= copies_of_file; ++i)
{
std::ostringstream name;
name << "filename as-" << i << ".txt";
std::ofstream ofile(name.str().c_str());
ofile.close();
}
return 0;
}
That will make 10 copies of a blank .txt file named "filename as-1.txt" "filename as-2.txt" etc.
Note also the use of int main: main always has a return of int, never void

OpenSSL SHA256 Wrong result

I have following piece of code that is supposed to calculate the SHA256 of a file. I am reading the file chunk by chunk and using EVP_DigestUpdate for the chunk. When I test the code with the file that has content
Test Message
Hello World
in Windows, it gives me SHA256 value of 97b2bc0cd1c3849436c6532d9c8de85456e1ce926d1e872a1e9b76a33183655f but the value is supposed to be 318b20b83a6730b928c46163a2a1cefee4466132731c95c39613acb547ccb715, which can be verified here too.
Here is the code:
#include <openssl\evp.h>
#include <iostream>
#include <string>
#include <fstream>
#include <cstdio>
const int MAX_BUFFER_SIZE = 1024;
std::string FileChecksum(std::string, std::string);
int main()
{
std::string checksum = FileChecksum("C:\\Users\\Dell\\Downloads\\somefile.txt","sha256");
std::cout << checksum << std::endl;
return 0;
}
std::string FileChecksum(std::string file_path, std::string algorithm)
{
EVP_MD_CTX *mdctx;
const EVP_MD *md;
unsigned char md_value[EVP_MAX_MD_SIZE];
int i;
unsigned int md_len;
OpenSSL_add_all_digests();
md = EVP_get_digestbyname(algorithm.c_str());
if(!md) {
printf("Unknown message digest %s\n",algorithm);
exit(1);
}
mdctx = EVP_MD_CTX_create();
std::ifstream readfile(file_path,std::ifstream::in|std::ifstream::binary);
if(!readfile.is_open())
{
std::cout << "COuldnot open file\n";
return 0;
}
readfile.seekg(0, std::ios::end);
long filelen = readfile.tellg();
std::cout << "LEN IS " << filelen << std::endl;
readfile.seekg(0, std::ios::beg);
if(filelen == -1)
{
std::cout << "Return Null \n";
return 0;
}
EVP_DigestInit_ex(mdctx, md, NULL);
long temp_fil = filelen;
while(!readfile.eof() && readfile.is_open() && temp_fil>0)
{
int bufferS = (temp_fil < MAX_BUFFER_SIZE) ? temp_fil : MAX_BUFFER_SIZE;
char *buffer = new char[bufferS+1];
buffer[bufferS] = 0;
readfile.read(buffer, bufferS);
std::cout << strlen(buffer) << std::endl;
EVP_DigestUpdate(mdctx, buffer, strlen(buffer));
temp_fil -= bufferS;
delete[] buffer;
}
EVP_DigestFinal_ex(mdctx, md_value, &md_len);
EVP_MD_CTX_destroy(mdctx);
printf("Digest is: ");
//char *checksum_msg = new char[md_len];
//int cx(0);
for(i = 0; i < md_len; i++)
{
//_snprintf(checksum_msg+cx,md_len-cx,"%02x",md_value[i]);
printf("%02x", md_value[i]);
}
//std::string res(checksum_msg);
//delete[] checksum_msg;
printf("\n");
/* Call this once before exit. */
EVP_cleanup();
return "";
}
I tried to write the hash generated by program as string using _snprintf but it didn't worked. How can I generate the correct hash and return the value as string from FileChecksum Function? Platform is Windows.
EDIT: It seems the problem was because of CRLF issue. As Windows in saving file using \r\n, the Checksum calculated was different. How to handle this?
MS-DOS used the CR-LF convention,So basically while saving the file in windows, \r\n comes in effect for carriage return and newline. And while testing on online (given by you), only \n character comes in effect.
Thus either you have to check the checksum of Test Message\r\nHello World\r\n in string which is equivalent to creating and reading file in windows(as given above), which is the case here.
However, the checksum of files,wherever created, will be same.
Note: your code works fine :)
It seems the problem was associated with the value of length I passed in EVP_DigestUpdate. I had passed value from strlen, but replacing it with bufferS did fixed the issue.
The code was modified as:
while(!readfile.eof() && readfile.is_open() && temp_fil>0)
{
int bufferS = (temp_fil < MAX_BUFFER_SIZE) ? temp_fil : MAX_BUFFER_SIZE;
char *buffer = new char[bufferS+1];
buffer[bufferS] = 0;
readfile.read(buffer, bufferS);
EVP_DigestUpdate(mdctx, buffer, bufferS);
temp_fil -= bufferS;
delete[] buffer;
}
and to send the checksum string, I modified the code as:
EVP_DigestFinal_ex(mdctx, md_value, &md_len);
EVP_MD_CTX_destroy(mdctx);
char str[128] = { 0 };
char *ptr = str;
std::string ret;
for(i = 0; i < md_len; i++)
{
//_snprintf(checksum_msg+cx,md_len-cx,"%02x",md_value[i]);
sprintf(ptr,"%02x", md_value[i]);
ptr += 2;
}
ret = str;
/* Call this once before exit. */
EVP_cleanup();
return ret;
As for the wrong checksum earlier, the problem was associated in how windows keeps the line feed. As suggested by Zangetsu, Windows was making text file as CRLF, but linux and the site I mentioned earlier was using LF. Thus there was difference in the checksum value. For files other than text, eg dll the code now computes correct checksum as string

wav file writing and loading using cpp

I have been working on a project based on a wav file with cpp. but no matter what i put in, I always get the output that's only the 1/2 size of the original input. I'm using the algorithm as following:
http://rogerchansdigitalworld.blogspot.com/2010/05/how-to-read-wav-format-file-in-c.html
http://ltheory.com/blog/writeWav.txt
anyone can give a hint?
my main function is:
int main()
{
string filename = "U:\\workspace\\ECE420\\psola.github\\aa.wav";
string directory;
const size_t last_slash_idx = filename.rfind('\\');
if (std::string::npos != last_slash_idx)
{
directory = filename.substr(0, last_slash_idx);
}
string filename1 = "U:\\workspace\\ECE420\\psola.github\\voiceprocessed.wav";
string directory1;
const size_t last_slash_idx1 = filename1.rfind('\\');
if (std::string::npos != last_slash_idx1)
{
directory1 = filename1.substr(0, last_slash_idx1);
}
WavData song;
int num = filename.size();
char file_char[100];
for (int a=0;a<=num;a++){
file_char[a]=filename[a];
}
WavData song1;//
int num1 = filename1.size();
char file_char1[100];
for (int a=0;a<=num1;a++){
file_char1[a]=filename1[a];
}
loadWaveFile(file_char,&song);
cout<<"there are "<<song.size/2<<" samples in this WAV file."<<endl;
writeWAVData(file_char1, song.data, song.size, 44100, 1);
cout<<"2,there are "<<song1.size/2<<" samples in this WAV file."<<endl;
cout<<"success"<<endl;
return 0;
}