Get parent directory from file in C++ - c++

I need to get parent directory from file in C++:
For example:
Input:
D:\Devs\Test\sprite.png
Output:
D:\Devs\Test\ [or D:\Devs\Test]
I can do this with a function:
char *str = "D:\\Devs\\Test\\sprite.png";
for(int i = strlen(str) - 1; i>0; --i)
{
if( str[i] == '\\' )
{
str[i] = '\0';
break;
}
}
But, I just want to know there is exist a built-in function.
I use VC++ 2003.
Thanks in advance.

If you're using std::string instead of a C-style char array, you can use string::find_last_of and string::substr in the following manner:
std::string str = "D:\\Devs\\Test\\sprite.png";
str = str.substr(0, str.find_last_of("/\\"));

Now, with C++17 is possible to use std::filesystem::path::parent_path:
#include <filesystem>
namespace fs = std::filesystem;
int main() {
fs::path p = "D:\\Devs\\Test\\sprite.png";
std::cout << "parent of " << p << " is " << p.parent_path() << std::endl;
// parent of "D:\\Devs\\Test\\sprite.png" is "D:\\Devs\\Test"
std::string as_string = p.parent_path().string();
return 0;
}

Heavy duty and cross platform way would be to use boost::filesystem::parent_path(). But obviously this adds overhead you may not desire.
Alternatively you could make use of cstring's strrchr function something like this:
include <cstring>
char * lastSlash = strrchr( str, '\\');
if ( *lastSlash != '\n') *(lastSlash +1) = '\n';

Editing a const string is undefined behavior, so declare something like below:
char str[] = "D:\\Devs\\Test\\sprite.png";
You can use below 1 liner to get your desired result:
*(strrchr(str, '\\') + 1) = 0; // put extra NULL check before if path can have 0 '\' also

On POSIX-compliant systems (*nix) there is a commonly available function for this dirname(3). On windows there is _splitpath.
The _splitpath function breaks a path
into its four components.
void _splitpath(
const char *path,
char *drive,
char *dir,
char *fname,
char *ext
);
So the result (it's what I think you are looking for) would be in dir.
Here's an example:
int main()
{
char *path = "c:\\that\\rainy\\day";
char dir[256];
char drive[8];
errno_t rc;
rc = _splitpath_s(
path, /* the path */
drive, /* drive */
8, /* drive buffer size */
dir, /* dir buffer */
256, /* dir buffer size */
NULL, /* filename */
0, /* filename size */
NULL, /* extension */
0 /* extension size */
);
if (rc != 0) {
cerr << GetLastError();
exit (EXIT_FAILURE);
}
cout << drive << dir << endl;
return EXIT_SUCCESS;
}

On Windows platforms, you can use
PathRemoveFileSpec or PathCchRemoveFileSpec
to achieve this.
However for portability I'd go with the other approaches that are suggested here.

You can use dirname to get the parent directory
Check this link for more info
Raghu

Related

C++ for-loop logic: break after getting strings from cmdline arguments?

With the help of many people here, I've been writing a program that writes the contents of the Windows clipboard to a text file. (I'm working in Visual Studio 2010.) I've been trying to work out the logic of a for loop that will test the command-line arguments (if any); the arguments can be
a codepage number
a filename or path
or both, in any order. If no codepage is specified (or if the user specifies an invalid codepage), the program uses the default Windows codepage (typically 1252). If no filename is specified, the program writes the output to "#clip.txt".
I know my method of reading the arguments is inefficient, but it's the best I can figure out right now. I use two for loops. The first checks each command-line parameter; if the string is NOT all-digits, it uses the string as a filename and then breaks. The next loop again checks each parameter, and if the string is all-digits, it assigns it as the codepage number and then breaks.
The idea is that if the user enters
clipwrite 500 850
only the first (500) should get used as the codepage. And if the user enters
clipwrite foo.txt bar.txt
the output should be written to foo.txt.
My code seems to work correctly if the user enters no arguments, one argument only, or one number and one alpha string. But I'm clearly doing something wrong, because if the user enters
clipwrite 500 850
then 850 gets used (it should be ignored). And if the user enters
clipwrite foo.txt bar.txt
the program crashes. Can anyone help me sort what's wrong with my logic? Here's the relevant code (which uses a command-line parsing routine to get argc and argv):
if (argc > 1) {
// get name of output file if specified
for ( i = 1; i < argc; i++ ) {
if (i < 3) {
string argstr = argv[i];
//if string is not digits-only, use as filename
for (size_t n = 0; n <argstr.length(); n++) {
if (!isdigit( argstr[ n ]) ) {
OutFile = argv[i];
break;
}
}
}
}
// get codepage number if specified
for ( i = 1; i < argc; i++ ) {
if (i < 3) {
string argstr = argv[i];
for (size_t n = 0; n <argstr.length(); n++) {
if (!isdigit( argstr[ n ]) ) {
// if all chars are digits
} else {
// convert codepage string to integer
int cpint = atoi(argstr.c_str());
// check if codepage is valid; if so use it
if (IsValidCodePage(cpint)) {
codepage = "."+argstr;
}
break;
}
}
}
}
}
Many thanks for any help with this beginner-level problem.
Maybe something like this:
#include <iostream>
#include <string>
#include <cstring>
#include <cctype>
#include <cstdlib>
int main(int, char **argv) {
std::string filename = "#clip.txt";
int codepage = 1252;
bool bFilenameSet = false;
bool bCodepageSet = false;
for (++argv; *argv; ++argv) { // *argv == NULL at end of arguments
char *p = *argv;
for ( ; *p; ++p)
if (!isdigit(*p))
break;
if (*p) { // non-digit found
if (!bFilenameSet) {
filename = *argv;
bFilenameSet = true;
}
}
else {
if (!bCodepageSet) {
codepage = atoi(*argv);
bCodepageSet = true;
}
}
}
std::cout<< "Filename: "<< filename<< "\n";
std::cout<< "Codepage: "<< codepage<< "\n";
return 0;
}
I ran your program but it wasn't complete, so I assumed that isValidCodePage() function always returns true.
What I can see from your code is that you are overwriting codepage and Outfile because you are only breaking the inner loop, see this article for an explaination of the break statement
I don't see any immediate reason for a crash, but:
When issuing clipwrite 500 850 you use the codepage 850 since your break; only leaves the inner loop but your code keeps iterating over
the arguments and your codepage variable gets overwritten.
Your usage of isdigit is faulty. Whenever a string starts with a digit you try to interpret it as an integer even if its 1bla.txt.
atoi() is evil since it fails to report if a given string can't be parsed as a number. Better use std::stringstream and >> operator.
May be you should do it like this:
int cpint = -1;
std::string fname="";
for ( int i = 1; i < argc && i<3; i++ ) {
std::stringstream argss(argv[i]);
// Check if the string is a decimal
// and only a decimal
if( !(argss >> cpint) || !argss.eof()) {
fname=argv[i];
}
}
if(!fname.empty())
std::cerr << "filename '" << fname "'" << std::endl;
if(cpint!=-1)
std::cerr << "codepage: #" << cpint << std::endl;
Not really tested but I hope you get the idea
Using the answers here I finally got everything working, though I know my code is still inefficient. Here is the VS2010 source code that I used for this clipboard writing utility. Thanks to all who responded.
// ClipWrite.cpp
#include "stdafx.h"
#include <Windows.h>
#include <shellapi.h>
#include <iostream>
#include <fstream>
#include <codecvt> // for wstring_convert
#include <locale> // for codecvt_byname
#include <sstream>
using namespace std;
// helper gets path to this application
string ExePath() {
char buffer[MAX_PATH];
GetModuleFileNameA( NULL, buffer, MAX_PATH );
string::size_type pos = string( buffer ).find_last_of( "\\/" );
return string( buffer ).substr( 0, pos);
//return std::string( buffer ).substr( 0, pos);
}
// set variable for command-line arguments
char **argv = NULL;
// helper to get command-line arguments
int ParseCommandLine() {
int argc, BuffSize, i;
WCHAR *wcCommandLine;
LPWSTR *argw;
wcCommandLine = GetCommandLineW();
argw = CommandLineToArgvW( wcCommandLine, &argc);
argv = (char **)GlobalAlloc( LPTR, argc + 1);
for( i=0; i < argc; i++) {
BuffSize = WideCharToMultiByte( CP_ACP, WC_COMPOSITECHECK, argw[i], -1, NULL, 0, NULL, NULL );
argv[i] = (char *)GlobalAlloc( LPTR, BuffSize );
WideCharToMultiByte( CP_ACP, WC_COMPOSITECHECK, argw[i], BuffSize * sizeof( WCHAR ),argv[i], BuffSize, NULL, NULL );
}
return argc;
}
int CALLBACK WinMain(
_In_ HINSTANCE hInstance,
_In_ HINSTANCE hPrevInstance,
_In_ LPSTR lpCmdLine,
_In_ int nCmdShow)
{
// for logging in case of error
int writelog = 0;
string logtext = "";
// create output filename
string filename = ExePath() + "\\#clip.txt";
// get default codepage from Windows, typically 1252
int iCP=GetACP();
string sCP;
ostringstream convert;
convert << iCP;
sCP = convert.str();
// construct string to use for conversion routines (e.g. ".1252")
string sDefaultCP = "."+sCP;
string sOutputCP = "."+sCP;
// read command line for alternate codepage and/or filename
int i, argc;
argc = ParseCommandLine( );
if (argc > 1) {
bool bFilenameSet = false;
bool bCodepageSet = false;
int cpint = -1;
for ( i = 1; i < argc && i<3; i++ ) {
std::string argstr = argv[i];
//if string has only digits, use as codepage;
for (size_t n = 0; n <argstr.length(); n++) {
if (!isdigit( argstr[ n ]) ) {
if (!bFilenameSet) {
filename = argv[i];
bFilenameSet = true;
}
} else {
// convert codepage string to integer
if (!bCodepageSet) {
std::stringstream argss(argv[i]);
if( (argss >> cpint) || !argss.eof()) {
argstr = argv[i];
logtext = logtext + "Requested codepage (if any): " + argstr + "\n";
cout << "Requested codepage (if any): " << argstr << endl;
// check if codepage is valid; if so, use it
if (IsValidCodePage(cpint)) {
sCP = argstr;
sOutputCP = "."+argstr;
}
bCodepageSet = true;
}
}
}
}
}
}
cout << "Codepage used: " + sCP << endl;
// get clipboard text
string cliptext = "";
if (OpenClipboard(NULL)) {
if(IsClipboardFormatAvailable(CF_TEXT)) {
HGLOBAL hglb = GetClipboardData(CF_TEXT);
if (hglb != NULL) {
LPSTR lptstr = (LPSTR)GlobalLock(hglb);
if (lptstr != NULL) {
// read the contents of lptstr
cliptext = (char*)hglb;
// release the lock
GlobalUnlock(hglb);
}
}
}
CloseClipboard();
}
// create conversion routines
typedef std::codecvt_byname<wchar_t,char,std::mbstate_t> codecvt;
std::wstring_convert<codecvt> cp1252(new codecvt(sDefaultCP));
std::wstring_convert<codecvt> outpage(new codecvt(sOutputCP));
ofstream OutStream; // open an output stream
OutStream.open(filename, std::ios_base::binary | ios::out | ios::trunc);
// make sure file is successfully opened
if(!OutStream) {
writelog = 1;
logtext = logtext + "Error opening file " + filename + " for writing.\n";
//return 1;
} else {
// convert to DOS/Win codepage number in "outpage"
OutStream << outpage.to_bytes(cp1252.from_bytes(cliptext)).c_str();
OutStream.close(); // close output stream
if (writelog == 1) {
logtext = logtext + "Output file: " + filename + "\n";
}
}
if (writelog == 1) {
logtext = logtext + "Codepage used: " + sCP + "\n";
string LogFile = ExePath() + "\\#log.txt";
ofstream LogStream;
LogStream.open(LogFile, ios::out | ios::trunc);
if(!LogStream) {
cout << "Error opening file " << LogFile << " for writing.\n";
return 1;
}
LogStream << logtext;
LogStream.close(); // close output stream
}
return 0;
}

OpenSSL SHA256 Wrong result

I have following piece of code that is supposed to calculate the SHA256 of a file. I am reading the file chunk by chunk and using EVP_DigestUpdate for the chunk. When I test the code with the file that has content
Test Message
Hello World
in Windows, it gives me SHA256 value of 97b2bc0cd1c3849436c6532d9c8de85456e1ce926d1e872a1e9b76a33183655f but the value is supposed to be 318b20b83a6730b928c46163a2a1cefee4466132731c95c39613acb547ccb715, which can be verified here too.
Here is the code:
#include <openssl\evp.h>
#include <iostream>
#include <string>
#include <fstream>
#include <cstdio>
const int MAX_BUFFER_SIZE = 1024;
std::string FileChecksum(std::string, std::string);
int main()
{
std::string checksum = FileChecksum("C:\\Users\\Dell\\Downloads\\somefile.txt","sha256");
std::cout << checksum << std::endl;
return 0;
}
std::string FileChecksum(std::string file_path, std::string algorithm)
{
EVP_MD_CTX *mdctx;
const EVP_MD *md;
unsigned char md_value[EVP_MAX_MD_SIZE];
int i;
unsigned int md_len;
OpenSSL_add_all_digests();
md = EVP_get_digestbyname(algorithm.c_str());
if(!md) {
printf("Unknown message digest %s\n",algorithm);
exit(1);
}
mdctx = EVP_MD_CTX_create();
std::ifstream readfile(file_path,std::ifstream::in|std::ifstream::binary);
if(!readfile.is_open())
{
std::cout << "COuldnot open file\n";
return 0;
}
readfile.seekg(0, std::ios::end);
long filelen = readfile.tellg();
std::cout << "LEN IS " << filelen << std::endl;
readfile.seekg(0, std::ios::beg);
if(filelen == -1)
{
std::cout << "Return Null \n";
return 0;
}
EVP_DigestInit_ex(mdctx, md, NULL);
long temp_fil = filelen;
while(!readfile.eof() && readfile.is_open() && temp_fil>0)
{
int bufferS = (temp_fil < MAX_BUFFER_SIZE) ? temp_fil : MAX_BUFFER_SIZE;
char *buffer = new char[bufferS+1];
buffer[bufferS] = 0;
readfile.read(buffer, bufferS);
std::cout << strlen(buffer) << std::endl;
EVP_DigestUpdate(mdctx, buffer, strlen(buffer));
temp_fil -= bufferS;
delete[] buffer;
}
EVP_DigestFinal_ex(mdctx, md_value, &md_len);
EVP_MD_CTX_destroy(mdctx);
printf("Digest is: ");
//char *checksum_msg = new char[md_len];
//int cx(0);
for(i = 0; i < md_len; i++)
{
//_snprintf(checksum_msg+cx,md_len-cx,"%02x",md_value[i]);
printf("%02x", md_value[i]);
}
//std::string res(checksum_msg);
//delete[] checksum_msg;
printf("\n");
/* Call this once before exit. */
EVP_cleanup();
return "";
}
I tried to write the hash generated by program as string using _snprintf but it didn't worked. How can I generate the correct hash and return the value as string from FileChecksum Function? Platform is Windows.
EDIT: It seems the problem was because of CRLF issue. As Windows in saving file using \r\n, the Checksum calculated was different. How to handle this?
MS-DOS used the CR-LF convention,So basically while saving the file in windows, \r\n comes in effect for carriage return and newline. And while testing on online (given by you), only \n character comes in effect.
Thus either you have to check the checksum of Test Message\r\nHello World\r\n in string which is equivalent to creating and reading file in windows(as given above), which is the case here.
However, the checksum of files,wherever created, will be same.
Note: your code works fine :)
It seems the problem was associated with the value of length I passed in EVP_DigestUpdate. I had passed value from strlen, but replacing it with bufferS did fixed the issue.
The code was modified as:
while(!readfile.eof() && readfile.is_open() && temp_fil>0)
{
int bufferS = (temp_fil < MAX_BUFFER_SIZE) ? temp_fil : MAX_BUFFER_SIZE;
char *buffer = new char[bufferS+1];
buffer[bufferS] = 0;
readfile.read(buffer, bufferS);
EVP_DigestUpdate(mdctx, buffer, bufferS);
temp_fil -= bufferS;
delete[] buffer;
}
and to send the checksum string, I modified the code as:
EVP_DigestFinal_ex(mdctx, md_value, &md_len);
EVP_MD_CTX_destroy(mdctx);
char str[128] = { 0 };
char *ptr = str;
std::string ret;
for(i = 0; i < md_len; i++)
{
//_snprintf(checksum_msg+cx,md_len-cx,"%02x",md_value[i]);
sprintf(ptr,"%02x", md_value[i]);
ptr += 2;
}
ret = str;
/* Call this once before exit. */
EVP_cleanup();
return ret;
As for the wrong checksum earlier, the problem was associated in how windows keeps the line feed. As suggested by Zangetsu, Windows was making text file as CRLF, but linux and the site I mentioned earlier was using LF. Thus there was difference in the checksum value. For files other than text, eg dll the code now computes correct checksum as string

How to list all CSV files in a Windows Directory using C++?

I'm a bit new to C++ and I've to list all CSV files in a Windows Directory,
I've googled and I found a lot of ways to list all files in a directory and
I came up with the following solution:
int listFiles(string addDir, vector<string> &list) {
DIR *dir = 0;
struct dirent *entrada = 0;
int isFile = 32768;
dir = opendir(addDir.c_str());
if (dir == 0) {
cerr << "Could not open the directory." << endl;
exit(1);
}
while (entrada = readdir(dir))
if (entrada->d_type == isFile)
{
list.push_back(entrada->d_name);
cout << entrada->d_name << endl;
}
closedir(dir);
return 0;
}
It is using the dirent.h for Windows (I'm using VS2013) but the problems are:
- Is it correct to set isFile = 32768? Will it always work on Windows?
- How to know if the file is a CSV file?
Another thing, I've tried to use windows.h / FindNextFile but it didn't work.
Is it better to use FindNextFile or the above solution?
I guess FindNextFile would be easier to list only the CSV File, but I don't know how to do it.
My exit should be a string because it is an input of a function that reads the CSV Files.
Tks guys.
PS: I cant use boost...
int listFiles(const string& addDir, vector<string> &list, const std::string& _ext) {
DIR *dir = 0;
struct dirent *entrada = 0;
int isFile = 32768;
std::string ext("." + _ext);
for (string::size_type i = 0; i < ext.length(); ++i)
ext[i] = tolower(ext[i]);
dir = opendir(addDir.c_str());
if (dir == 0) {
cerr << "Could not open the directory." << endl;
exit(1);
}
while (entrada = readdir(dir))
if (entrada->d_type == isFile)
{
const char *name = entrada->d_name;
size_t len = strlen(entrada->d_name);
if (len >= ext.length()) {
std::string fext(name + len - ext.length());
for (string::size_type i = 0; i < fext.length(); ++i)
fext[i] = tolower(fext[i]);
if (fext == ext) {
list.push_back(entrada->d_name);
cout << entrada->d_name << endl;
}
}
}
closedir(dir);
return 0;
}
int main()
{
vector<string> flist;
listFiles("c:\\", flist, "csv");
system("PAUSE");
}
If you want to use FindNextFile, msdn has an example for enumerating all fiels in a directory here which you can adapt.
EDIT: To expand on the windows API method:
argv is of type TCHAR*, meaning either char* or wchar_t* depending on #ifdef UNICODE. It's a type used by all Windows API calls which take a string parameter. To create a TCHAR literal you can use TEXT("text"). To create a wchar_t literal you can use L"text". If you do not want to use TCHAR semantics you can redefine main to be of type int main(int argc, char* argv), or int wmain(int argc, wchar_t* arv). Converting between the two types involves dealing with unicode and code pages, which you should probably use a 3rd party library for.
Converting from ASCII (std::string or char* with char points in 0-127) to unicode(std::wstring or wchar_t* is a simple matter of creating a std::wstring(std::string.cbegin(), std::string.cend()).
Here is a code example demonstrating use of WinAPI functions to list files in a directory:
#include <windows.h>
#incldue <string>
#include <iostream>
#ifdef UNICODE
typedef std::wstring tstring;
#else
typedef std::string tstring;
#endif
#ifdef UNICODE
std::wostream& tcout = std::wcout;
std::wostream& tcerr = std::wcerr;
#else
std::ostream& tcout = std::cout;
std::ostream& tcerr = std::cerr;
#endif
int listFiles(const tstring& directory, std::vector<tstring> &list, const tstring& extension)
{
using std::endl;
WIN32_FIND_DATA file;
HANDLE hListing;
int error;
tstring query;
if (directory.length() > MAX_PATH - 2 - extension.length())
tcerr << "directory name too long" << endl;
query = directory + TEXT("*.") + extension;
hListing = FindFirstFile(query.c_str(), &file);
if (hListing == INVALID_HANDLE_VALUE) {
error = GetLastError();
if (error == ERROR_FILE_NOT_FOUND)
tcout << "no ." << extension << " files found in directory " << directory << endl;
return error;
}
do
{
if ((file.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY) == 0)
{
tcout << file.cFileName << endl;
list.push_back(file.cFileName);
}
} while (FindNextFile(hListing, &file) != 0);
error = GetLastError();
if (error == ERROR_NO_MORE_FILES)
error = 0;
FindClose(hListing);
return error;
}
int _tmain(int argc, TCHAR* argv[])
{
std::vector<tstring> files;
listFiles(TEXT("C:\\"), files, TEXT("sys"));
if (argc > 1)
listFiles(argv[1], files, TEXT("csv"));
}
If you want to simplify it, you can make your application either locked in unicode or completely ignorant of unicode by removing all T (TCHAR, TEXT(), the newly-defined tstring, tcout, tcerr) variants and using purely wide or non-wide types (ie. char*, string, simple literals, cout OR wchar_t*, wstring, L"" literals, wcout).
If you do this, you need to use the specialied functions of WINAPI functions (i.e. FindFirstFileA for non-wide and FindFirstFileW for wide)

How to find the full path of the C++ Linux program from within?

I have this requirement where I need to find the full path for the C++ program from within. For Windows, I have the following solution. The argv[0] may or may not contain the full path. But I need to be certain.
TCHAR drive[_MAX_DRIVE], dir[_MAX_DIR], base[_MAX_FNAME], ext[_MAX_EXT];
TCHAR fullPath[255+1];
_splitpath(argv[0],drive,dir,base,ext);
SearchPath(NULL,base,ext,255,fullPath,NULL);
What is the Linux (gcc) equivalent for the above code? Would love to see a portable code.
On Linux (Posix?) you have a symbolic link /proc/self/exe which links to the full path of the executable.
On Windows, use GetModuleFileName.
Never rely on argv[0], which is not guaranteed to be anything useful.
Note that paths and file systems are not part of the language and thus necessarily a platform-dependent feature.
The top answer to this question lists techniques for a whole bunch of OSes.
string get_path( )
{
char arg1[20];
char exepath[PATH_MAX + 1] = {0};
sprintf( arg1, "/proc/%d/exe", getpid() );
readlink( arg1, exepath, PATH_MAX );
return string( exepath );
}
For Linux:
Function to execute system command
int syscommand(string aCommand, string & result) {
FILE * f;
if ( !(f = popen( aCommand.c_str(), "r" )) ) {
cout << "Can not open file" << endl;
return NEGATIVE_ANSWER;
}
const int BUFSIZE = 4096;
char buf[ BUFSIZE ];
if (fgets(buf,BUFSIZE,f)!=NULL) {
result = buf;
}
pclose( f );
return POSITIVE_ANSWER;
}
Then we get app name
string getBundleName () {
pid_t procpid = getpid();
stringstream toCom;
toCom << "cat /proc/" << procpid << "/comm";
string fRes="";
syscommand(toCom.str(),fRes);
size_t last_pos = fRes.find_last_not_of(" \n\r\t") + 1;
if (last_pos != string::npos) {
fRes.erase(last_pos);
}
return fRes;
}
Then we extract application path
string getBundlePath () {
pid_t procpid = getpid();
string appName = getBundleName();
stringstream command;
command << "readlink /proc/" << procpid << "/exe | sed \"s/\\(\\/" << appName << "\\)$//\"";
string fRes;
syscommand(command.str(),fRes);
return fRes;
}
Do not forget to trim the line after
If you came here when Googling for GetModuleFileName Linux... you're probably looking for the ability to do this for dynamically-loaded libraries. This is how you do it:
struct link_map *lm;
dlinfo(module, RTLD_DI_LINKMAP, &lm);
lm->l_name // use this
#include <string>
#include <unistd.h>
#include <limits.h>
std::string getApplicationDirectory() {
char result[ PATH_MAX ];
ssize_t count = readlink( "/proc/self/exe", result, PATH_MAX );
std::string appPath = std::string( result, (count > 0) ? count : 0 );
std::size_t found = appPath.find_last_of("/\\");
return appPath.substr(0,found);
}

Help Editing Code to Fix "Argument list too long" Error

I am currently doing some testing with a new addition to the ICU dictionary-based break iterator.
I have code that allows me to test the word-breaking on a text document but when the text document is too large it gives the error: bash: ./a.out: Argument list too long
I am not sure how to edit the code to break-up the argument list when it gets too long so that a file of any size can be run through the code. The original code author is quite busy, would someone be willing to help out?
I tried removing the printing of what is being examined to see if that would help, but I still get the error on large files (printing what is being examined isn't necessary - I just need the result).
If the code could be modified to read the source text file line by line and export the results line by line to another text file (ending up with all the lines when it is done), that would be perfect.
The code is as follows:
/*
Written by George Rhoten to test how word segmentation works.
Code inspired by the break ICU sample.
Here is an example to run this code under Cygwin.
PATH=$PATH:icu-test/source/lib ./a.exe "`cat input.txt`" > output.txt
Encode input.txt as UTF-8.
The output text is UTF-8.
*/
#include <stdio.h>
#include <unicode/brkiter.h>
#include <unicode/ucnv.h>
#define ZW_SPACE "\xE2\x80\x8B"
void printUnicodeString(const UnicodeString &s) {
int32_t len = s.length() * U8_MAX_LENGTH + 1;
char *charBuf = new char[len];
len = s.extract(0, s.length(), charBuf, len, NULL);
charBuf[len] = 0;
printf("%s", charBuf);
delete charBuf;
}
/* Creating and using text boundaries */
int main(int argc, char **argv)
{
ucnv_setDefaultName("UTF-8");
UnicodeString stringToExamine("Aaa bbb ccc. Ddd eee fff.");
printf("Examining: ");
if (argc > 1) {
// Override the default charset.
stringToExamine = UnicodeString(argv[1]);
if (stringToExamine.charAt(0) == 0xFEFF) {
// Remove the BOM
stringToExamine = UnicodeString(stringToExamine, 1);
}
}
printUnicodeString(stringToExamine);
puts("");
//print each sentence in forward and reverse order
UErrorCode status = U_ZERO_ERROR;
BreakIterator* boundary = BreakIterator::createWordInstance(NULL, status);
if (U_FAILURE(status)) {
printf("Failed to create sentence break iterator. status = %s",
u_errorName(status));
exit(1);
}
printf("Result: ");
//print each word in order
boundary->setText(stringToExamine);
int32_t start = boundary->first();
int32_t end = boundary->next();
while (end != BreakIterator::DONE) {
if (start != 0) {
printf(ZW_SPACE);
}
printUnicodeString(UnicodeString(stringToExamine, start, end-start));
start = end;
end = boundary->next();
}
delete boundary;
return 0;
}
Thanks so much!
-Nathan
The Argument list too long error message is coming from the bash shell and is happening before your code even gets started executing.
The only code you can fix to eliminate this problem is the bash source code (or maybe it is in the kernel) and then, you're always going to run into a limit. If you increase from 2048 files on command line to 10,000, then some day you'll need to process 10,001 files ;-)
There are numerous solutions to managing 'too big' argument lists.
The standardized solution is the xargs utility.
find / -print | xargs echo
is a un-helpful, but working example.
See How to use "xargs" properly when argument list is too long for more info.
Even xargs has problems, because file names can contain spaces, new-line chars, and other unfriendly stuff.
I hope this helps.
The code below reads the content of a file whos name is given as the first parameter on the command-line and places it in a str::buffer. Then, instead of calling the function UnicodeString with argv[1], use that buffer instead.
#include<iostream>
#include<fstream>
using namespace std;
int main(int argc, char **argv)
{
std::string buffer;
if(argc > 1) {
std::ifstream t;
t.open(argv[1]);
std::string line;
while(t){
std::getline(t, line);
buffer += line + '\n';
}
}
cout << buffer;
return 0;
}
Update:
Input to UnicodeString should be char*. The function GetFileIntoCharPointer does that.
Note that only the most rudimentary error checking is implemented below!
#include<iostream>
#include<fstream>
using namespace std;
char * GetFileIntoCharPointer(char *pFile, long &lRet)
{
FILE * fp = fopen(pFile,"rb");
if (fp == NULL) return 0;
fseek(fp, 0, SEEK_END);
long size = ftell(fp);
fseek(fp, 0, SEEK_SET);
char *pData = new char[size + 1];
lRet = fread(pData, sizeof(char), size, fp);
fclose(fp);
return pData;
}
int main(int argc, char **argv)
{
long Len;
char * Data = GetFileIntoCharPointer(argv[1], Len);
std::cout << Data << std::endl;
if (Data != NULL)
delete [] Data;
return 0;
}