Get extension std::out_of_range error - c++

What is wrong with this piece of code? Is there another way to do this?
It keeps throwing std::out_of_range error.
std::wstring ext(FileInformation.cFileName);
ext = ext.substr(ext.find(L"."));

What's wrong is that you are not handling the case that the file has no extension.
What happens is that ext.find(L".") returns std::wstring::npos (the highest possible number - indicating "not found") because it doesn't find a dot.
You are then calling ext.substr(std::wstring::npos) which is of course out of range.
You have to check for this case:
std::wstring ext(FileInformation.cFileName);
std::size_t dotPos = ext.find(L".");
if(dotPos != std::wstring::npos) {
ext = ext.substr(dotPos);
} else {
ext = L"."; // assuming you want to treat an empty extension like this
}
However, if your goal is to extract the file extension, there are some more gotchas you need to be aware of:
Windows considers only the part after the last dot as the file extension. Your code will give .a.b for a file named file.a.b while Windows will consider the file extension to be just .b. So you probably need to use rfind instead of find, which searches backwards.
But then there is another subtlety: A file extension can't contain a space (file.hello world is a file without extension), so you would need to check this as well...
Therefore, since you are obviously already using WinAPI, I'd advise you to use the WinAPI function made for exactly this purpose: PathFindExtension. This way, you can't get it wrong.
Example (assuming you still want a dot - remove it otherwise):
ext = std::wstring(L".") + *PathFindExtension(ext.c_str());
Alternatively, there would also be the boost library which also has a way to extract the file extension using boost::path::extension, but it's kinda heavy and if you don't already use boost, it's not worth considering it.

FileInformation.cFileName did not include a dot then find() will return string::npos.
So you need to check for string::npos at first befor using substr cause if first argument to substr is greater than the string length, it throws out_of_range.

There can be no "." in FileInformation.cFileName and find will return std::wstring::npos so first check the return of find and then call substr.
Maybe something like this:
std::wstring ext(FileInformation.cFileName);
std::size_t found=ext.find(L".");
if (found!=std::wstring::npos)
ext = ext.substr(found);

Related

How do I get warn()'s output into a string?

I'm using the non-standard function warn() (provided by BSD) to output an error message if a file can't be opened, like so:
std::string path = get_path() ;
std::ifstream file(path) ;
if (file.is_open()) { /* do something */ }
else {
warn("%s", path.c_str()) ;
// uses errno to figure out what the error was and outputs it nicely along with the filename
}
That's all very well for outputting it, but what if I want to use the entire string somewhere else, in addition to printing it? The warn() functions don't seem to have a form that writes the error to a string. I've tried rolling my own, but it seems awfully cumbersome in comparison (besides not getting the program's name):
this->foo((boost::format("%s: %s") % path % strerror(errno)).str()) ;
So how do I get warn()'s output as a string?
warn puts its output on the standard error output. So you would have to create a mechanism to redirect standard error output to a location that you can read back into a string. The most straight forward way may be to redirect standard error to a file, and then read the file back as a string. You could, for instance, try to use dup2() to accomplish this (as explained in the answer to this question).
However, wrapping your own version of warn is probably a better choice. You may consider the C vsnprintf() function to implement it, though. There are answers to this question that address both using boost::format and vsnprintf().
You're right — there's no sprintf analog (i.e. that is, no hypothetical swarn function).
Your approach seems viable.
It would appear that your gyrations produce a result similar to:
path + ": " + strerror(errno);
At a guess, the "program's name" that it's including is probably just argv[0], so you could apparently produce a roughly equivalent of your warn that just returns a std::string with something on this general order:
std::string warn_s(std::string const &path) {
char *pname = strrchr(argv[0], '/');
if (pname == NULL)
pname = argv[0];
return path + pname + ": " + strerror(errno);
}
The major difficulty here is that argv is local to main, so you'll probably need to either save it into an accessible location in main, or else use some non-standard mechanism to re-retrieve that data in your function.
Unfortunately, the documentation for warn I was able to find was poor enough that a bit of testing/trial and error will probably be needed if you want to duplicate its output precisely.

Stuck with removing "\r" from text files! C++

OK so I've almost completed a program. However whilst it works on Windows I would prefer to run it on my Mac to test differences in performance (my Mac has much faster hardware).
I have an unordered map that is storing in values from a text file and I am also copying this map to reverse the key/value pairs.
The text files keep adding a new line, and from research I've found it to be because Windows adds it's own carriage return (why?!) and it's at the end of every second element in my map.
The file is "stringx,stringy" and so am using stringstream to split the string x and y into the key/value pair.
EDIT: thanks for the answers guys, worked a treat!
That isn't how std::string::replace works, you should read up on how it works here.
In order to do a basic replace, you could write your own function to do it, however in your case it seems to be a trimming issue since the carriage return is usually on the right side of the string.
You can remove the carriage return and new line by doing something like this:
std::string& rtrim(std::string& str) {
size_t endpos = str.find_last_not_of("\r\n");
if(endpos != std::string::npos) {
str.substr(0,endpos+1).swap(str);
}
return str;
}
On some implementations, like Windows, using a read mode of "r" or a write mode of "w" will cause "\r\n" to be read/written when you meant to pass "\n" through. Use "wb" or "rb". For iostream functions, I believe you need to pass in the ios::binary flag.
Windows uses "\r\n" to end lines. Usually programs that are supposed to run on various platforms use some #ifdef to handle similar differences.
I think I understand what the question is now. It's not about dealing with the differences in code - you are actually trying to use a "DOS/Windows" file on a non-Dos/Windows machine - you need to use dos2unix to fix up the end of lines on your file!

Read text file step-by-step

I have a file which has text like this:
#1#14#ADEADE#CAH0F#0#0.....
I need to create a code that will find text that follows # symbol, store it to variable and then writes it to file WITHOUT # symbol, but with a space before. So from previous code I will get:
1 14 ADEADE CAH0F 0 0......
I first tried to did it in Python, but files are really big and it takes a really huge time to process file, so I decided to write this part in C++. However, I know nothing about C++ regex, and I'm looking for help. Could you, please, recommend me an easy regex library (I don't know C++ very well) or the well-documented one? It would be even better, if you provide a small example (I know how to perform transmission to file, using fstream, but I need help with how to read file as I said before).
This looks like a job for std::locale and his trusty sidekick imbue:
#include <locale>
#include <iostream>
struct hash_is_space : std::ctype<char> {
hash_is_space() : std::ctype<char>(get_table()) {}
static mask const* get_table()
{
static mask rc[table_size];
rc['#'] = std::ctype_base::space;
return &rc[0];
}
};
int main() {
using std::string;
using std::cin;
using std::locale;
cin.imbue(locale(cin.getloc(), new hash_is_space));
string word;
while(cin >> word) {
std::cout << word << " ";
}
std::cout << "\n";
}
IMO, C++ is not the best choice for your task. But if you have to do it in C++ I would suggest you have a look at Boost.Regex, part of the Boost library.
If you are on Unix, a simple sed 's/#/ /' <infile >outfile would suffice.
Sed stands for 'stream editor' (and supports regexes! whoo!), so it would be well-suited for the performance that you are looking for.
Alright, I'm just going to make this an answer instead of a comment. Don't use regex. It's almost certainly overkill for this task. I'm a little rusty with C++, so I'll not post any ugly code, but essentially what you could do is parse the file one character at a time, putting anything that wasn't a # into a buffer, then writing it out to the output file along with a space when you do hit a #. In C# at least two really easy methods for solving this come to mind:
StreamReader fileReader = new StreamReader(new FileStream("myFile.txt"),
FileMode.Open);
string fileContents = fileReader.ReadToEnd();
string outFileContents = fileContents.Replace("#", " ");
StreamWriter outFileWriter = new StreamWriter(new FileStream("outFile.txt"),
Encoding.UTF8);
outFileWriter.Write(outFileContents);
outFileWriter.Flush();
Alternatively, you could replace
string outFileContents = fileContents.Replace("#", " ");
With
StringBuilder outFileContents = new StringBuilder();
string[] parts = fileContents.Split("#");
foreach (string part in parts)
{
outFileContents.Append(part);
outFileContents.Append(" ");
}
I'm not saying you should do it either of these ways or my suggested method for C++, nor that any of these methods are ideal - I'm just pointing out here that there are many many ways to parse strings. Regex is awesome and powerful and may even save the day in extreme circumstances, but it's not the only way to parse text, and may even destroy the world if used for the wrong thing. Really.
If you insist on using regex (or are forced to, as in for a homework assignment), then I suggest you listen to Chris and use Boost.Regex. Alternatively, I understand Boost has a good string library as well if you'd like to try something else. Just look out for Cthulhu if you do use regex.
You've left out one crucial point: if you have two (or more) consecutive #s in the input, should they turn into one space, or the same number of spaces are there are #s?
If you want to turn the entire string into a single space, then #Rob's solution should work quite nicely.
If you want each # turned into a space, then it's probably easiest to just write C-style code:
#include <stdio.h>
int main() {
int ch;
while (EOF!=(ch=getchar()))
if (ch == '#')
putchar(' ');
else
putchar(ch);
return 0;
}
So, you want to replace each ONE character '#' with ONE character ' ' , right ?
Then it's easy to do since you can replace any portion of the file with string of exactly the same length without perturbating the organisation of the file.
Repeating such a replacement allows to make transformation of the file chunk by chunk; so you avoid to read all the file in memory, which is problematic when the file is very big.
Here's the code in Python 2.7 .
Maybe, the replacement chunk by chunk will be unsifficient to make it faster and you'll have a hard time to write the same in C++. But in general, when I proposed such codes, it has increased the execution's time satisfactorily.
def treat_file(file_path, chunk_size):
from os import fsync
from os.path import getsize
file_size = getsize(file_path)
with open(file_path,'rb+') as g:
fd = g.fileno() # file descriptor, it's an integer
while True:
x = g.read(chunk_size)
g.seek(- len(x),1)
g.write(x.replace('#',' '))
g.flush()
fsync(fd)
if g.tell() == file_size:
break
Comments:
open(file_path,'rb+')
it's absolutely obligatory to open the file in binary mode 'b' to control precisely the positions and movements of the file's pointer;
mode '+' is to be able to read AND write in the file
fd = g.fileno()
file descriptor, it's an integer
x = g.read(chunk_size)
reads a chunk of size chunk_size . It would be tricky to give it the size of the reading buffer, but I don't know how to find this buffer's size. Hence a good idea is to give it a power of 2 value.
g.seek(- len(x),1)
the file's pointer is moved back to the position from which the reading of the chunk has just been made. It must be len(x), not chunk_size because the last chunk read is in general less long than chink_size
g.write(x.replace('#',' '))
writes on the same length with the modified chunk
g.flush()
fsync(fd)
these two instructions force the writing, otherwise the modified chunk could remain in the writing buffer and written at uncontrolled moment
if g.tell() >= file_size: break
after the reading of the last portion of file , whatever is its length (less or equal to chunk_size), the file's pointer is at the maximum position of the file, that is to say file_size and the program must stop
.
In case you would like to replace several consecutive '###...' with only one, the code is easily modifiable to respect this requirement, since writing a shortened chunk doesn't erase characters still unread more far in the file. It only needs 2 files's pointers.

Check whether a string is a valid filename with Qt

Is there a way with Qt 4.6 to check if a given QString is a valid filename (or directory name) on the current operating system ? I want to check for the name to be valid, not for the file to exist.
Examples:
// Some valid names
test
under_score
.dotted-name
// Some specific names
colon:name // valid under UNIX OSes, but not on Windows
what? // valid under UNIX OSes, but still not on Windows
How would I achieve this ? Is there some Qt built-in function ?
I'd like to avoid creating an empty file, but if there is no other reliable way, I would still like to see how to do it in a "clean" way.
Many thanks.
This is the answer I got from Silje Johansen - Support Engineer - Trolltech ASA (in March 2008 though)
However. the complexity of including locale settings and finding
a unified way to query the filesystems on Linux/Unix about their
functionality is close to impossible.
However, to my knowledge, all applications I know of ignore this
problem.
(read: they aren't going to implement it)
Boost doesn't solve the problem either, they give only some vague notion of the maximum length of paths, especially if you want to be cross platform. As far as I know many have tried and failed to crack this problem (at least in theory, in practice it is most definitely possible to write a program that creates valid filenames in most cases.
If you want to implement this yourself, it might be worth considering a few not immediately obvious things such as:
Complications with invalid characters
The difference between file system limitations and OS and software limitations. Windows Explorer, which I consider part of the Windows OS does not fully support NTFS for example. Files containing ':' and '?', etc... can happily reside on an ntfs partition, but Explorer just chokes on them. Other than that, you can play safe and use the recommendations from Boost Filesystem.
Complications with path length
The second problem not fully tackled by the boost page is length of the full path. Probably the only thing that is certain at this moment is that no OS/filesystem combination supports indefinite path lengths. However, statements like "Windows maximum paths are limited to 260 chars" are wrong. The unicode API from Windows does allow you to create paths up to 32,767 utf-16 characters long. I haven't checked, but I imagine Explorer choking equally devoted, which would make this feature utterly useless for software having any users other than yourself (on the other hand you might prefer not to have your software choke in chorus).
There exists an old variable that goes by the name of PATH_MAX, which sounds promising, but the problem is that PATH_MAX simply isn't.
To end with a constructive note, here are some ideas on possible ways to code a solution.
Use defines to make OS specific sections. (Qt can help you with this)
Use the advice given on the boost page and OS and filesystem documentation to decide on your illegal characters
For path length the only workable idea that springs to my mind is a binary tree trial an error approach using the system call's error handling to check on a valid path length. This is quite aloof, but might be the only possibility of getting accurate results on a variety of systems.
Get good at elegant error handling.
Hope this has given some insights.
Based on User7116's answer here:
How do I check if a given string is a legal/valid file name under Windows?
I quit being lazy - looking for elegant solutions, and just coded it. I got:
bool isLegalFilePath(QString path)
{
if (!path.length())
return false;
// Anything following the raw filename prefix should be legal.
if (path.left(4)=="\\\\?\\")
return true;
// Windows filenames are not case sensitive.
path = path.toUpper();
// Trim the drive letter off
if (path[1]==':' && (path[0]>='A' && path[0]<='Z'))
path = path.right(path.length()-2);
QString illegal="<>:\"|?*";
foreach (const QChar& c, path)
{
// Check for control characters
if (c.toLatin1() >= 0 && c.toLatin1() < 32)
return false;
// Check for illegal characters
if (illegal.contains(c))
return false;
}
// Check for device names in filenames
static QStringList devices;
if (!devices.count())
devices << "CON" << "PRN" << "AUX" << "NUL" << "COM0" << "COM1" << "COM2"
<< "COM3" << "COM4" << "COM5" << "COM6" << "COM7" << "COM8" << "COM9" << "LPT0"
<< "LPT1" << "LPT2" << "LPT3" << "LPT4" << "LPT5" << "LPT6" << "LPT7" << "LPT8"
<< "LPT9";
const QFileInfo fi(path);
const QString basename = fi.baseName();
foreach (const QString& d, devices)
if (basename == d)
// Note: Names with ':' other than with a drive letter have already been rejected.
return false;
// Check for trailing periods or spaces
if (path.right(1)=="." || path.right(1)==" ")
return false;
// Check for pathnames that are too long (disregarding raw pathnames)
if (path.length()>260)
return false;
// Exclude raw device names
if (path.left(4)=="\\\\.\\")
return false;
// Since we are checking for a filename, it mustn't be a directory
if (path.right(1)=="\\")
return false;
return true;
}
Features:
Probably faster than using regexes
Checks for illegal characters and excludes device names (note that '' is not illegal, since it can be in path names)
Allows drive letters
Allows full path names
Allows network path names
Allows anything after \\?\ (raw file names)
Disallows anything starting with \\.\ (raw device names)
Disallows names ending in "\" (i.e. directory names)
Disallows names longer than 260 characters not starting with \\?\
Disallows trailing spaces and periods
Note that it does not check the length of filenames starting with \\?, since that is not a hard and fast rule. Also note, as pointed out here, names containing multiple backslashes and forward slashes are NOT rejected by the win32 API.
I don't think that Qt has a built-in function, but if Boost is an option, you can use Boost.Filesystem's name_check functions.
If Boost isn't an option, its page on name_check functions is still a good overview of what to check for on various platforms.
Difficult to do reliably on windows (some odd things such as a file named "com" still being invalid) and do you want to handle unicode, or subst tricks to allow a >260 char filename.
There is already a good answer here How do I check if a given string is a legal / valid file name under Windows?
see example (from Digia Qt Creator sources) in: https://qt.gitorious.org/qt-creator/qt-creator/source/4df7656394bc63088f67a0bae8733f400671d1b6:src/libs/utils/filenamevalidatinglineedit.cpp
I'd just create a simple function to validate the filename for the platform, which just searches through the string for any invalid characters. Don't think there's a built-in function in Qt. You could use #ifdefs inside the function to determine what platform you're on. Clean enough I'd say.

How do you change the filename extension stored in a string in C++?

Alright here's the deal, I'm taking an intro to C++ class at my university and am having trouble figuring out how to change the extension of a file. First, what we are suppose to do is read in a .txt file and count words, sentences, vowels etc. Well I got this but the next step is what's troubling me. We are then suppose to create a new file using the same file name as the input file but with the extension .code instead of .txt (in that new file we are then to encode the string by adding random numbers to the ASCII code of each character if you were interested). Being a beginner in programming, I'm not quite sure how to do this. I'm using the following piece of code to at first get the input file:
cout << "Enter filename: ";
cin >> filename;
infile.open(filename.c_str());
I'm assuming to create a new file I'm going to be using something like:
outfile.open("test.code");
But I won't know what the file name is until the user enters it so I can't say "test.txt". So if anyone knows how to change that extenstion when I create a new file I would very much appreciate it!
I occasionally ask myself this question and end up on this page, so for future reference, here is the single-line syntax:
string newfilename=filename.substr(0,filename.find_last_of('.'))+".code";
There are several approaches to this.
You can take the super lazy approach, and have them enter in just the file name, and not the .txt extension. In which case you can append .txt to it to open the input file.
infile.open(filename + ".txt");
Then you just call
outfile.open(filename + ".code");
The next approach would be to take the entire filename including extension, and just append .code to it so you'd have test.txt.code.
It's a bit ambiguous if this is acceptable or not.
Finally, you can use std::string methods find, and replace to get the filename with no extension, and use that.
Of course, if this were not homework but a real-world project, you'd probably do yourself -- as well as other people reading your code -- a favor by using Boost.Filesystem's replace_extension() instead of rolling your own. There's just no functionality that is simple enough that you couldn't come up with a bug, at least in some corner case.
Not to give it away since learning is the whole point of the exercise, but here's a hint.
You're probably going to want a combination of find_last_of and replace.
Here is a few hints. You have a filename already entered - what you want to do is get the part of the filename that doesn't include the extension:
std::string basename(const std::string &filename)
{
// fill this bit in
}
Having written that function, you can use it to create the name of the new file:
std::string codeFile = basename(filename) + ".code";
outFile.open(codeFile);
Pseudo code would be to do something like
outFilename = filename;
<change outFilename>
outfile.open(outFilename);
For changing outFilename, look at strrchr and strcpy as a starting point (might be more appropriate methods -- that would work great with a char* though)
In Windows (at least) you can use _splitpath to dissect the base name from the rest of the pieces, and then reassemble them using your favorite string formatter.
why not using the string method find_last_of() ?
std::string new_filename = filename;
size_type result = new_filename.find_last_of('.');
// Does new_filename.erase(std::string::npos) working here in place of this following test?
if (std::string::npos != result)
new_filename.erase(result);
// append extension:
filename.append(".code");
I would just append ".code" to the filename the user entered. If they entered "test.txt" then the output file would be "test.txt.code". If they entered a file name with no extension, like "test" then the output file would be "test.code".
I use this technique all the time with programs that generate output files and some sort of related logging/diagnostic output. It's simple to implement and, in my opinion, makes the relationships between files much more explicit.
How about using strstr:
char* lastSlash;
char* newExtension = ".code";
ChangeFileExtension(char* filename) {
lastSlash = strstr(filename, ".");
strcpy(lastSlash, newExtension);
}
What you'll need to do is copy the original filename into a new variable where you can change the extension. Something like this:
string outFilename;
size_t extPos = filename.rfind('.');
if (extPos != string::npos)
{
// Copy everything up to (but not including) the '.'
outFilename.assign(filename, 0, extPos);
// Add the new extension.
outFilename.append(".code");
// outFilename now has the filename with the .code extension.
}
It's possible you could use the "filename" variable if you don't need to keep the original filename around for later use. In that case you could just use:
size_t extPos = filename.rfind('.');
if (extPos != string::npos)
{
// Erase the current extension.
filename.erase(extPos);
// Add the new extension.
filename.append(".code");
}
The key is to look at the definition of the C++ string class and understand what each member function does. Using rfind will search backwards through the string and you won't accidentally hit any extensions in folder names that might be part of the original filename (e.g. "C:\MyStuff.School\MyFile.txt"). When working with the offsets from find, rfind, etc., you'll also want to be careful to use them properly when passing them as counts to other methods (e.g. do you use assign(filename, 0, extPos-1), assign(filename, 0, extPos), assign(filename, 0, extPos+1)).
Hope that helps.
size_t pos = filename.rfind('.');
if(pos != string::npos)
filename.replace(pos, filename.length() - pos, ".code");
else
filename.append(".code");
Very Easy:
string str = "file.ext";
str[str.size()-3]='a';
str[str.size()-2]='b';
str[str.size()-1]='c';
cout<<str;
Result:
"file.abc"