Code for searching for a string in a binary file - c++

I have asked this question a few days ago:
How to look for an ANSI string in a binary file?
and I got a really nice answer, what later turned into a much harder question: Can input iterators be used where forward iterators are expected? what is now really not on a level what I could understand.
I am still learning C++ and I am looking for an easy way to search for a string in a binary file.
Could someone show me a simple code for a minimalistic C++ console program which looks for a string in a binary file and outputs the locations to stdout?
Possibly, can you show me
a version where the file is being copied to memory (supposing the binary file is small)
and an other one which uses the proper way from the linked questions
Sorry if it sounds like I'm asking for someone's code, but I am just learning C++ and I think maybe others could benefit from this question if someone could post some high quality code what is nice to learn from.

Your requirement specification is unclear, for example - where does "121" appear in "12121"... just at the first character (after which searching continues at the 4th), or at the 3rd as well? The code below uses the former approach.
#include <iostream>
#include <fstream>
#include <string>
#include <string.h>
int main(int argc, const char* argv[])
{
if (argc != 3)
{
std::cerr << "Usage: " << argv[0] << " filename search_term\n"
"Prints offsets where search_term is found in file.\n";
return 1;
}
const char* filename = argv[1];
const char* search_term = argv[2];
size_t search_term_size = strlen(search_term);
std::ifstream file(filename, std::ios::binary);
if (file)
{
file.seekg(0, std::ios::end);
size_t file_size = file.tellg();
file.seekg(0, std::ios::beg);
std::string file_content;
file_content.reserve(file_size);
char buffer[16384];
std::streamsize chars_read;
while (file.read(buffer, sizeof buffer), chars_read = file.gcount())
file_content.append(buffer, chars_read);
if (file.eof())
{
for (std::string::size_type offset = 0, found_at;
file_size > offset &&
(found_at = file_content.find(search_term, offset)) !=
std::string::npos;
offset = found_at + search_term_size)
std::cout << found_at << std::endl;
}
}
}

This is one way to do part 1. Not sure I would I describe it as high quality but maybe on the minimalist side.
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
int main(int argc, char *argv[])
{
std::ifstream ifs(argv[1], ios::binary);
std::string str((std::istreambuf_iterator<char>(ifs)), std::istreambuf_iterator<char>());
size_t pos = str.find(argv[2]);
if (pos != string::npos)
cout << "string found at position: " << int(pos) << endl;
else
cout << "could not find string" << endl;
return 0;
}

Related

Splitting read file C++

Lets start with that I have absolutely no experience with C++ , but I got this project to connect a POS with a verifone. We do not have the standard verifone SDK but something custom.
At fist I needed to prepair data to send to C++ and C++ will send it to the Verifone. This is where I am getting stuck, I have a .txt file, which I can read with C++ but now I need to split the data.
This is my current code:
#include "stdafx.h"
#include <sstream>
#include <iostream>
#include <fstream>
#include <vector>
using namespace std;
string file_get_contents(const char *filename)
{
ifstream in(filename);
if (in.fail())
{
cerr << "File not found: " << filename << endl;
return "";
}
std::stringstream buffer;
buffer << in.rdbuf();
in.close();
return buffer.str();
}
int main(int argc, char **argv)
{
vector<string> strings;
string contents = file_get_contents("C:/wamp/www/cmd/config.txt");
string s;
while (contents, s, '||') {
cout << s << endl;
strings.push_back(s);
}
cout << s; // ECHO CONTENTS
std::cin.ignore(); // pause
return 0;
}
With this code my console just stays blank, no data is being displayed.
The full string I am splitting is:
"notepad://amount=10320.53||session_id=7946548443287465/"
The result that I want is to get an array that uses "amount" and "session_id" as keys and their values as value.
What is the best way of achieving this?
I used the following code to actually display the string in my console which was working:
int main(int argc, char **argv)
{
string contents = file_get_contents("config.txt");
cout << contents; // ECHO CONTENTS
std::cin.ignore(); // pause
return 0;
}
This shows how to use a regex to extract the information you want, there are a lot of online resources on how to read files properly so I left that part out.
#include <iostream>
#include <regex>
#include <unordered_map>
#include <string>
int main(int argc, char **argv)
{
std::regex pattern("amount=([[:digit:]\\.]*)\\|\\|session_id=([[:digit:]]*)");
std::smatch results;
std::unordered_map<std::string, std::string> data;
std::string contents = "notepad://amount=10320.53||session_id=7946548443287465/";
//string contents = file_get_contents("C:/wamp/www/cmd/file.txt");
if(std::regex_search(contents, results, pattern))
{
data["amount"] = results[1];
data["session_id"] = results[2];
}
std::cout << "Amount: " << data["amount"] << std::endl;
std::cout << "Seesion ID: " << data["session_id"] << std::endl;
return 0;
}

C++ find all files of type in folder?

I am trying to list all the files of a certain type in a folder, so that I can loop through them. This should be simple, surely, but I can't get it.
I have found some example using dirent.h, but I need to do this in straight c++.
What is the best way to go about this?
Thanks.
You cannot do this in "straight C++", because C++ does not have a filesystem API yet.
I'd traditionally recommend Boost.Filesystem here, but you allegedly want to "avoid using third party headers if [you] can".
So your best bet is to use POSIX dirent.h, as you have been doing all along. It's about as "non-third party" as you're going to get for the time being.
Something like this? This finds all suid files in folders you specify, but can be modified to find any number of things, or use a regex for the extension if that is what you mean by 'type'.
#include <sys/stat.h>
#include <sys/types.h>
#include <iostream>
#include <string>
#include <sstream>
#include <dirent.h>
#include <vector>
bool is_suid(const char *file)
{
struct stat results;
stat(file, &results);
if (results.st_mode & S_ISUID) return true;
return false;
}
void help_me(char *me) {
std::cout
<< "Usage:" << std::endl
<< " " << me << " /bin/ /usr/sbin/ /usr/bin/ /usr/bin/libexec/" << std::endl;
exit(1);
}
int main(int argc, char **argv)
{
if (argc < 2) help_me(argv[0]);
std::string file_str;
std::vector<std::string> file_list;
for (int path_num = 1; path_num != argc; path_num++) {
const char * path = argv[path_num];
DIR *the_dir;
struct dirent *this_dir;
the_dir = opendir(path);
if (the_dir != NULL) while (this_dir = readdir(the_dir)) file_list.push_back(std::string(this_dir->d_name));
std::string name;
for(int file_num = 0; file_num != file_list.size(); file_num++) {
name = file_list[file_num];
std::string path_to_file = std::string(path) + file_list[file_num];
if (is_suid(path_to_file.c_str()) == true) std::cout << path_to_file << std::endl;
}
file_list.clear();
file_list.shrink_to_fit();
}
exit(0);
}

How to determine size of a huge binary file in c++

To determine a size of a binary file seems to always involve read the whole file into memory. How do I determine the size of a very large binary file which is known way bigger than the memory can take?
On most systems, there's stat() and
fstat() functions (not part of ANSI-C, but part of POSIX). For Linux, look at the man page.
EDIT: For Windows, the documentation is here.
EDIT: For a more portable version, use the Boost library:
#include <iostream>
#include <boost/filesystem.hpp>
using namespace boost::filesystem;
int main(int argc, char* argv[])
{
if (argc < 2)
{
std::cout << "Usage: tut1 path\n";
return 1;
}
std::cout << argv[1] << " " << file_size(argv[1]) << '\n';
return 0;
}
#include <cstdio>
FILE *fp = std::fopen("filename", "rb");
std::fseek(fp, 0, SEEK_END);
long filesize = std::ftell(fp);
std::fclose(fp);
Or, use ifstream:
#include <fstream>
std::ifstream fstrm("filename", ios_base::in | ios_base::binary);
fstrm.seekg(0, ios_base::end);
long filesize = fstrm.tellg();
This should work:
uintmax_t file_size(std::string path) {
return std::ifstream(path, std::ios::binary|std::ios::ate).tellg();
}

Binary file simple read then write not working

Below is the code. For right now, all I want to do is read in a binary file and then write that same binary to make sure I did the reading and the writing correctly [i.e. without changing the file].
I used test.rar (size 333 bytes -- a rar'ed txt file). The output file was 133kb and fails to extract using winrar (after being renamed test.rar). So I must be doing something wrong and cannot find the mistake.
Also, when I let the commented code run, it outputs "This program cannot run in dos" and starts making beeps and boops repeatedly as it iterates through my vector. It's as if using cout with this data is executing a program. If you know what is causing that, it would be nice to know.
#include "Dip.h"
#include <iostream>
#include <vector>
#include <fstream>
using namespace std;
#define USAGE "s\n"
int main(int argc, char **argv)
{
if (argc < 1)
{
cout << USAGE;
return 1;
}
ifstream in(argv[0], ios::binary);
fstream::streampos beg = in.tellg();
in.seekg(0, ios::end);
const fstream::streampos BUFFER_SIZE = in.tellg() - beg;
vector<char> outputBuffer;
if(BUFFER_SIZE)
{
in.seekg(0, ios::beg);
outputBuffer.resize(BUFFER_SIZE);
in.read(&outputBuffer[0], outputBuffer.size());
in.close();
std::ofstream out("output_file", ios::binary);
out.write(&outputBuffer[0], outputBuffer.size());
out.close();
}
else
{
cout << "main::file is empty" << endl;
return 1;
}
//for(vector<char>::const_iterator itr = outputBuffer.begin(); itr !=outputBuffer.end(); ++itr)
//cout << *itr;
// success!
return 0;
}

How can I get the duration of an MP3 file (CBR or VBR) with a very small library or native code c/c++?

I can't use any mp3 code that is patented by Fraunhofer, so no encoders OR decoders (e.g. ffmpeg, lame, MAD, etc.), plus it's too big.
I am doing this on Windows, but DirectShow's IMediaDet seems to slow down over time, calling it a few hundred times brings my system to a crawl, even re-using the same interface object and just putting the file name and getting duration!
So, is there some code out there that can read VBR files with C/C++ and get the duration?
There was another post on here to do CBR in C++, but the code makes a ton of assumptions and wont work for VBR of course.
Most MP3 files have an ID3 header. It is not hard to decode that and get the duration.
Here is some very basic and ugly code that illustrates the technique.
#include <iostream>
#include <iomanip>
size_t GetMP3Duration(const std::string sFileName);
int main(int argc, char* argv[])
{
try
{
size_t nLen = GetMP3Duration(argv[1]);
if (nLen==0)
{
std::cout << "Not Found" << std::endl;
}
else
{
std::cout << nLen << " miliseconds" << std::endl;
std::cout << nLen/60000 << ":";
nLen %= 60000;
std::cout << nLen/1000 << ".";
std::cout << std::setw(3) << std::setfill('0') << nLen%1000 << std::endl;
}
}
catch (std::exception &e)
{
std::cout << "Exception: " << e.what() << std::endl;
}
return 0;
}
#include <cstring>
#include <vector>
#include <iostream>
#include <fstream>
#include <cctype>
#include <cstdlib>
unsigned DecodeMP3SafeInt(unsigned nVal)
{
// nVal has 4 bytes (8-bits each)
// - discard most significant bit from each byte
// - reverse byte order
// - concatenate the 4 * 7-bit nibbles into a 24-bit size.
unsigned char *pValParts = reinterpret_cast<unsigned char *>(&nVal);
return (pValParts[3] & 0x7F) |
((pValParts[2] & 0x7F) << 7) |
((pValParts[1] & 0x7F) << 14) |
((pValParts[0] & 0x7F) << 21);
}
#pragma pack(1)
struct MP3Hdr {
char tag[3];
unsigned char maj_ver;
unsigned char min_ver;
unsigned char flags;
unsigned int size;
};
struct MP3ExtHdr {
unsigned int size;
unsigned char num_flag_bytes;
unsigned char extended_flags;
};
struct MP3FrameHdr {
char frame_id[4];
unsigned size;
unsigned char flags[2];
};
#pragma pack()
size_t GetMP3Duration(const std::string sFileName)
{
std::ifstream fin(sFileName.c_str(), std::ifstream::binary);
if (!fin)
throw std::invalid_argument("Cannot open file");
// Read Header
MP3Hdr hdr = { 0 };
fin.read(reinterpret_cast<char *>(&hdr), sizeof(hdr));
if (!fin.good())
throw std::invalid_argument("Error reading file");
if (0 != ::memcmp(hdr.tag, "ID3", 3))
throw std::invalid_argument("Not an MP3 File");
// Read extended header, if present
if (0 != (hdr.flags&0x40))
{
fin.seekg(sizeof(MP3ExtHdr), std::ifstream::cur);
if (!fin.good())
throw std::invalid_argument("Error reading file");
}
// read a chunk of file.
const size_t nDefaultSize(2048);
std::vector<char> vBuff(nDefaultSize);
fin.read(&vBuff[0], vBuff.size());
size_t nSize = fin.gcount();
if (!nSize)
throw std::invalid_argument("Error reading file");
vBuff.resize(nSize);
size_t nUsed = 0;
while (nSize-nUsed > sizeof(MP3FrameHdr))
{
MP3FrameHdr *pFrame = reinterpret_cast<MP3FrameHdr *>(&vBuff[nUsed]);
nUsed += sizeof(MP3FrameHdr);
size_t nDataLen = DecodeMP3SafeInt(pFrame->size);
if (nDataLen > (nSize-nUsed))
throw std::invalid_argument("Corrupt file");
if (!::isupper(pFrame->flags[0])) // past end of tags
return 0;
if (0 == ::memcmp(pFrame->frame_id, "TLEN", 4))
{
// skip an int
nUsed += sizeof(int);
// data is next
return atol(&vBuff[nUsed]);
}
else
{
nUsed += nDataLen;
}
}
return 0;
}
Jeff,
the only valid way is to go through whole mp3 file, find every mp3 frame inside of it and compute total duration for them.
Main characteristic of mp3 file is that their density might differ, and also that lot's of other binary data could be included inside of it. ID3 tags for example, that any decoder will skip upon reading.
Anyway - look here for mp3 frame header info:
http://www.mp3-converter.com/mp3codec/mp3_anatomy.htm
try to create code that will correctly parse header by header, calculate their duration (from sampling frequency) and then total the durations for all frames.
You don't have to decode the frames, just use headers from them.
If you don't mind LGPL try http://sourceforge.net/projects/mpg123net/
I found a library that does it, LGPL v3: http://www.codeproject.com/KB/audio-video/mpegaudioinfo.aspx
How about tagLib or id3lib?
They are not decoders per se, they are more of extracting the track/artist/album and host of other information that will enable you to do what you need to do...