Simple text file formatter crashes under Linux, but fine in Windows - c++

I've made a simple .acf file to .json file formatter. But for some reason it runs correctly under Windows with GCC using msys2 - But after executing a string insert or replace - it segmentation faults every time.
What it does is convert the below file into a json compatible format. It appends commas after each entry, applies attribute set symbol and puts braces around it.
Save as test.acf:
"AppState"
{
"appid" "730"
"Universe" "1"
"name" "Counter-Strike: Global Offensive"
"StateFlags" "4"
"installdir" "Counter-Strike Global Offensive"
"LastUpdated" "1462547468"
"UpdateResult" "0"
"SizeOnDisk" "14990577143"
"buildid" "1110931"
"LastOwner" "76561198013962068"
"BytesToDownload" "8768"
"BytesDownloaded" "8768"
"AutoUpdateBehavior" "1"
"AllowOtherDownloadsWhileRunning" "0"
"UserConfig"
{
"Language" "english"
}
"MountedDepots"
{
"731" "205709710082221598"
"734" "5169984513691014102"
}
}
Minimal main code with defects triple slashed:
#include <iostream>
#include <fstream>
#include <string>
int main(int argc, char* argv[])
{
file.open("test.acf");
std::string data((std::istreambuf_iterator<char>(file)), (std::istreambuf_iterator<char>()));
int indexQuote = 0;
int index[4];
int insertCommaNext = -1;
string delims = "\"{}"; // It skips between braces and quotes only
std::size_t found = data.find_first_of(delims);
while(found != std::string::npos)
{
int inc = 1; // 0-4 depending on the quote - 0"key1" 2"value3" 4{
char c = data.at(found);
if (c != '"') {
if (c == '}')
insertCommaNext = found + 1; // Record index to insert comma after (following closing brace)
else if (c == '{') {
///data.insert(index[1] + 1, ":");
///inc++;
}
indexQuote = 0;
} else {
if (insertCommaNext != -1) {
///data.insert(insertCommaNext, ",");
///inc++;
insertCommaNext = -1;
}
index[indexQuote] = found;
if (indexQuote == 2) { // Join 'key: value' by placing the comma
///data.replace(index[1] + 1, 1, ":");
} else if (indexQuote == 4) { // Add comma after each key/value entry
indexQuote = 0;
///data.insert(index[3] + 1, ",");
///inc++;
}
indexQuote++;
}
found = data.find_first_of(delims, found + inc);
}
data = "{" + data + "}";
}
If you uncomment any of the triple slashed /// lines - containing an insert/replace, it will crash.
I'm certian the code quality is not great, there's probably better ways to achieve this. Cheers.

The problem is that indexQuote gets higher than 3, so index[indexQuote] = found; goes out of bounds. You have the case below that resets indexQuote to 0, you have to do that before you try to call index[indexQuote].
For reference, I debugged this by adding prints everywhere and printing all the variables until I found where it crashed.

Related

CppUnitTestFramework: Test Method Fails, Stack Trace Lists Line Number at the End of Method, Debug Test Passes

I know, I know - that question title is very much all over the place. However, I am not sure what could be an issue here that is causing what I am witnessing.
I have the following method in class Project that is being unit tested:
bool Project::DetermineID(std::string configFile, std::string& ID)
{
std::ifstream config;
config.open(configFile);
if (!config.is_open()) {
WARNING << "Failed to open the configuration file for processing ID at: " << configFile;
return false;
}
std::string line = "";
ID = "";
bool isConfigurationSection = false;
bool isConfiguration = false;
std::string tempID = "";
while (std::getline(config, line))
{
std::transform(line.begin(), line.end(), line.begin(), ::toupper); // transform the line to all capital letters
boost::trim(line);
if ((line.find("IDENTIFICATIONS") != std::string::npos) && (!isConfigurationSection)) {
// remove the "IDENTIFICATIONS" part from the current line we're working with
std::size_t idStartPos = line.find("IDENTIFICATIONS");
line = line.substr(idStartPos + strlen("IDENTIFICATIONS"), line.length() - idStartPos - strlen("IDENTIFICATIONS"));
boost::trim(line);
isConfigurationSection = true;
}
if ((line.find('{') != std::string::npos) && isConfigurationSection) {
std::size_t bracketPos = line.find('{');
// we are working within the ids configuration section
// determine if this is the first character of the line, or if there is an ID that precedes the {
if (bracketPos == 0) {
// is the first char
// remove the bracket and keep processing
line = line.substr(1, line.length() - 1);
boost::trim(line);
}
else {
// the text before { is a temp ID
tempID = line.substr(0, bracketPos - 1);
isConfiguration = true;
line = line.substr(bracketPos, line.length() - bracketPos);
boost::trim(line);
}
}
if ((line.find("PORT") != std::string::npos) && isConfiguration) {
std::size_t indexOfEqualSign = line.find('=');
if (indexOfEqualSign == std::string::npos) {
WARNING << "Unable to determine the port # assigned to " << tempID;
}
else {
std::string portString = "";
portString = line.substr(indexOfEqualSign + 1, line.length() - indexOfEqualSign - 1);
boost::trim(portString);
// confirm that the obtained port string is not an empty value
if (portString.empty()) {
WARNING << "Failed to obtain the \"Port\" value that is set to " << tempID;
}
else {
// attempt to convert the string to int
int workingPortNum = 0;
try {
workingPortNum = std::stoi(portString);
}
catch (...) {
WARNING << "Failed to convert the obtained \"Port\" value that is set to " << tempID;
}
if (workingPortNum != 0) {
// check if this port # is the same port # we are publishing data on
if (workingPortNum == this->port) {
ID = tempID;
break;
}
}
}
}
}
}
config.close();
if (ID.empty())
return false;
else
return true;
}
The goal of this method is to parse any text file for the ID portion, based on matching the port # that the application is publishing data to.
Format of the file is like this:
Idenntifications {
ID {
port = 1001
}
}
In a separate Visual Studio project that unit tests various methods, including this Project::DetermineID method.
#define STRINGIFY(x) #x
#define EXPAND(x) STRINGIFY(x)
TEST_CLASS(ProjectUnitTests) {
Project* parser;
std::string projectDirectory;
TEST_METHOD_INITIALIZE(ProjectUnitTestInitialization) {
projectDirectory = EXPAND(UNITTESTPRJ);
projectDirectory.erase(0, 1);
projectDirectory.erase(projectDirectory.size() - 2);
parser = Project::getClass(); // singleton method getter/initializer
}
// Other test methods are present and pass/fail accordingly
TEST_METHOD(DetermineID) {
std::string ID = "";
bool x = parser ->DetermineAdapterID(projectDirectory + "normal.cfg", ID);
Assert::IsTrue(x);
}
};
Now, when I run the tests, DetermineID fails and the stack trace states:
DetermineID
Source: Project Tests.cpp line 86
Duration: 2 sec
Message:
Assert failed
Stack Trace:
ProjectUnitTests::DetermineID() line 91
Now, in my test .cpp file, TEST_METHOD(DetermineID) { is present on line 86. But that method's } is located on line 91, as the stack trace indicates.
And, when debugging, the unit test passes, because the return of x in the TEST_METHOD is true.
Only when running the test individually or running all tests does that test method fail.
Some notes that may be relevant:
This is a single-threaded application with no tasks scheduled (no race condition to worry about supposedly)
There is another method in the Project class that also processes a file with an std::ifstream same as this method does
That method has its own test method that has been written and passes without any problems
The test method also access the "normal.cfg" file
Yes, this->port has an assigned value
Thus, my questions are:
Why does the stack trace reference the closing bracket for the test method instead of the single Assert within the method that is supposedly failing?
How to get the unit test to pass when it is ran? (Since it currently only plasses during debugging where I can confirm that x is true).
If the issue is a race condition where perhaps the other test method is accessing the "normal.cfg" file, why does the test method fail even when the method is individually ran?
Any support/assistance here is very much appreciated. Thank you!

C++ Parsing char array as a script file (syntax)

I have made a simple Script reading class in C++ which allows me to read and parse scripts.
Basically there's a FILE class, which then I proceed to open with "fopen".
In functions I proceed to call "fgetc" and "ftell" to parse the script file as needed, note this ain't an interpreter.
Every script file is supposed to follow a syntax, but this is why I'm asking here for a solution.
Here's how a script looks like:
# Script File Comment
USERNAME = "Joe"
PASSWORD = "pw0001"
ACCESSLEVEL = 3
DATABASE = ("localhost",3306,"db","user","password")
Basically I have a few functions:
// This function searches for "variables"
nextToken();
// After I have the variable, e.g: USERNAME, PASSWORD, ACCESSLEVEL or DATABASE
// I proceed to call this function
// This function reads the char array for (,-{}()[]=) these are symbols
readSymbol();
// In a condition I check what "token/variable" I got and proceed to read
// it accordingly
// e.g; for USERNAME I do:
readString(); // reads text inside "
// e.g; for ACCESSLEVEL I do:
readNumber(); // reads digits until the next char ain't a digit
// e.g; for DATABASE I do:
readSymbol(); // (
readString(); // 127.0.0.1
readSymbol(); // ,
readNumber(); // 3306
readSymbol(); // ,
readString(); // db
readSymbol(); // ,
readString(); // user
readSymbol(); // ,
readString(); // password
readSymbol(); // )
I would like to be able to read a variable declaration like this:
DATABASELIST = {"data1","data2","data3"}
or
DATABASELIST = {"data1"}
I could easily do readSymbol and readString to read for 3 different string definitions inside the variable, however this list is supposed to have custom user data, like 5 different strings, or 8 different strings - depends.
And I seriously have no idea how can I do this with the parser I wrote.
Please note that I am basing this in some Pseudo code I took from a scripter for this type of format, I have the pseudo code extracted from IDA, if you would like to see it for better understanding post here
Here's an example of my "readSymbol" function.
READSYMBOL
int TReadScriptFile::readSymbol()
{
int currentData = 0;
int stringStart = -1;
// Check if we can't read anymore
if (end)
return 0;
while (true)
{
// Basically get chars in the script
currentData = fgetc(File);
// Check for end of file
if (currentData == -1)
{
end = true;
break;
}
if (stringStart == -1)
{
if (isdigit(currentData) || isalpha(currentData))
{
printf("TReadScriptFile::readSymbol: Symbol expected\n");
close();
return 0;
}
else if
(
currentData == '=' || currentData == ',' ||
currentData == '(' || currentData == ')' ||
currentData == '{' || currentData == '}' ||
currentData == '>' || currentData == '<' ||
currentData == ':' || currentData == '-'
)
{
#ifdef __DEBUG__
printf("Symbol: %c\n", currentData);
#endif
stringStart = ftell(File);
break;
}
}
}
return 1;
}
NEXTTOKEN
int TReadScriptFile::nextToken()
{
int currentData = 0;
int stringStart = -1;
int stringEnd = -1;
RecursionDepth = -1;
memset(String, 0, 4000);
// Check if we can't read anymore
if (end)
return 0;
while (true)
{
// ** Syntax **
if (isdigit(getNext()) || getNext() == -1)
{
printf("No more tokens left.\n");
end = true;
close();
return 0;
}
// End
// Basically get chars in the script
currentData = fgetc(File);
// Check for end of file
if (currentData == -1)
{
end = true;
break;
}
// Syntax Checking Part, this really isn't needed but w/e
if (stringStart == -1)
{
if (currentData == '=' || isdigit(currentData))
{
printf("TReadScriptFile::nextToken: Syntax Error: string expected\n");
close();
return 0;
}
}
// End Syntax Checking
// It's a comment line, we should skip
if (currentData == '#')
{
seekNewLn();
continue;
}
// There are no variables, yet
if (stringStart == -1)
{
// We found a letter, we are near a token!
if (isalpha(currentData))
{
stringStart = ftell(File);
// We might as well add the letter to the string
RecursionDepth++;
String[RecursionDepth] = currentData;
continue;
}
}
else if (stringStart != -1)
{
// Let's wait until we get an identifier or space
// We found a digit, error
if (isdigit(currentData))
{
printf("TReadScriptFile::nextToken: string expected\n");
close();
return 0;
}
// We found a space, maybe we should stop looking for tokens?
else if (isspace(currentData))
{
#ifdef __DEBUG__
printf("Token: %s\n", String);
#endif
break;
}
RecursionDepth++;
String[RecursionDepth] = currentData;
}
}
return 1;
}
I found a good example of the approach I followed here:
http://llvm.org/docs/tutorial/LangImpl1.html
One mechanism to deal with DATABASE_LIST would be this:
After finding the variable DATABASE_LIST read a symbol using readSymbol() checking if it is a { then in a loop do readString() add it to a std::vector (or some other suitable container) then check for a , or } (using readSymbol()) . If it is a ,(comma) then you go back and read another string add to the vector etc. until you do finally reach } . When you are finished you'd have a vector (dynamic array) of strings that represent a DATABASE_LIST

c++ cli comparing hexadecimal bytes from a file not working

I have this file called ab.exe it contains this in hexadecimal
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000BBAAE8CAFDFFFF83C408000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000054AAE8CAFDFFFF83C40800000000000000000000000000000000000000000000000000000000000000000000000000AAE8CAFDFFFF83C4088D000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
I have this code in c++ that is suppose to detect if a string of hexadecimal is in a file or not and if it is add it to the list box.
array<Byte>^ target1 = { 0xAA,0xE8,0xCA,0xFD,0xFF,0xFF,0x83,0xC4,0x08,0x8D };
array<Byte>^ target2 = { 0x54,0xAA,0xE8,0xCA,0xFD,0xFF,0xFF,0x83,0xC4,0x08 };
array<Byte>^ target3 = { 0xBB,0xAA,0xE8,0xCA,0xFD,0xFF,0xFF,0x83,0xC4,0x08 };
int matched1 = 0;
int matched2 = 0;
int matched3 = 0;
FileStream^ fs2 = gcnew FileStream(line, FileMode::Open, FileAccess::Read, FileShare::ReadWrite);
int value;
do
{
value = fs2->ReadByte();
if (value == target1[matched1]) {
matched1++;
}
else
matched1 = 0;
if (value == target2[matched2]) {
matched2++;
}
else
matched2 = 0;
if (value == target3[matched3]) {
matched3++;
}
else
matched3 = 0;
if(matched1 == target1->Length)
{
listBox1->Items->Add(line + "1");
}
if(matched2 == target2->Length)
{
listBox1->Items->Add(line + "2");
}
if(matched3 == target3->Length)
{
listBox1->Items->Add(line + "3");
}
} while (value != -1);
fs2->Close();
the problem is that it only adds line + 3 to the list box and not line + 1 or line + 2 to the list box
I do not know why that is because all 3 of the strings are in the file so they all should be added to the list box. for some reason only the last one is being added because I tried just having 2 and the second one got added.can someone show me why they are not all being added to the list box.
thanks
Update1
after playing around with it some more it is not the last target that gets added each time, It is the first string that appears in the file that gets added. I stepped through the program using message boxes and what is happening is lets say 54AAE8CAFDFFFF83C408 is the first string to appear in the file then line + 2 will be added, but then for some reason the matched integer for all 3 stop counting they just = 0 the rest of the file. can someone explain to me why that is and how to fix it.
Update2
here is the answer to the problem. all I needed to do was just add a matched = 0; after each add to list box command.
listBox1->Items->Add(line + "1");
matched1 = 0;
listBox1->Items->Add(line + "2");
matched2 = 0;
listBox1->Items->Add(line + "3");
matched3 = 0;
It seems to me that after the first matching of one pattern (here target3) you read beyond last byte of target3 (because of matched3++), this may cause undesired behavior.
Update1:
if(matched1 == target1->Length)
{
matched1 = 0; // pattern matched so reset counter
...
}

Getting word under caret - C++, wxWidgets

I am writing a text editor using the wxWidgets framework. I need to get the word under caret from the text control. Here is what I came up with.
static bool IsWordBoundary(wxString& text)
{
return (text.Cmp(wxT(" ")) == 0 ||
text.Cmp(wxT('\n')) == 0 ||
text.Cmp(wxT('\t')) == 0 ||
text.Cmp(wxT('\r')) == 0);
}
static wxString GetWordUnderCaret(wxTextCtrl* control)
{
int insertion_point = control->GetInsertionPoint();
wxTextPos last_position = control->GetLastPosition();
int start_at, ends_at = 0;
// Finding starting position:
// from the current caret position, move back each character until
// we hit a word boundary.
int caret_pos = insertion_point;
start_at = caret_pos;
while (caret_pos)
{
wxString text = control->GetRange (caret_pos - 1, caret_pos);
if (IsWordBoundary (text)) {
break;
}
start_at = --caret_pos;
}
// Finding ending position:
// from the current caret position, move forward each character until
// we hit a word boundary.
caret_pos = ends_at = insertion_point;
while (caret_pos < last_position)
{
wxString text = control->GetRange (caret_pos, caret_pos + 1);
if (IsWordBoundary (text)) {
break;
}
ends_at = ++caret_pos;
}
return (control->GetRange (start_at, ends_at));
}
This code works as expected. But I am wondering is this the best way to approach the problem? Do you see any possible fixes on the above code?
Any help would be great!
Is punctuation part of a word? It is in your code -- is that what you want?
Here is how I would do it:
wxString word_boundary_marks = " \n\t\r";
wxString text_in_control = control->GetValue();
int ends_at = text_in_control.find_first_of( word_boundary_marks, insertion_point) - 1;
int start_at = text_in_control.Mid(0,insertion_point).find_last_of(word_boundary_marks) + 1;
I haven't tested this, so there likely are one or two "off-by-one" errors and you should add checks for "not found", end of string, and any other word markers. My code should give you the basis for what you need.

How to get file extension from string in C++

Given a string "filename.conf", how to I verify the extension part?
I need a cross platform solution.
Is this too simple of a solution?
#include <iostream>
#include <string>
int main()
{
std::string fn = "filename.conf";
if(fn.substr(fn.find_last_of(".") + 1) == "conf") {
std::cout << "Yes..." << std::endl;
} else {
std::cout << "No..." << std::endl;
}
}
The best way is to not write any code that does it but call existing methods. In windows, the PathFindExtension method is probably the simplest.
So why would you not write your own?
Well, take the strrchr example, what happens when you use that method on the following string "c:\program files\AppleGate.Net\readme"? Is ".Net\readme" the extension? It is easy to write something that works for a few example cases, but can be much harder to write something that works for all cases.
With C++17 and its std::filesystem::path::extension (the library is the successor to boost::filesystem) you would make your statement more expressive than using e.g. std::string.
#include <iostream>
#include <filesystem> // C++17
namespace fs = std::filesystem;
int main()
{
fs::path filePath = "my/path/to/myFile.conf";
if (filePath.extension() == ".conf") // Heed the dot.
{
std::cout << filePath.stem() << " is a valid type."; // Output: "myFile is a valid type."
}
else
{
std::cout << filePath.filename() << " is an invalid type."; // Output: e.g. "myFile.cfg is an invalid type"
}
}
See also std::filesystem::path::stem, std::filesystem::path::filename.
You have to make sure you take care of file names with more then one dot.
example: c:\.directoryname\file.name.with.too.many.dots.ext would not be handled correctly by strchr or find.
My favorite would be the boost filesystem library that have an extension(path) function
Assuming you have access to STL:
std::string filename("filename.conf");
std::string::size_type idx;
idx = filename.rfind('.');
if(idx != std::string::npos)
{
std::string extension = filename.substr(idx+1);
}
else
{
// No extension found
}
Edit: This is a cross platform solution since you didn't mention the platform. If you're specifically on Windows, you'll want to leverage the Windows specific functions mentioned by others in the thread.
Someone else mentioned boost but I just wanted to add the actual code to do this:
#include <boost/filesystem.hpp>
using std::string;
string texture = foo->GetTextureFilename();
string file_extension = boost::filesystem::extension(texture);
cout << "attempting load texture named " << texture
<< " whose extensions seems to be "
<< file_extension << endl;
// Use JPEG or PNG loader function, or report invalid extension
actually the STL can do this without much code, I advise you learn a bit about the STL because it lets you do some fancy things, anyways this is what I use.
std::string GetFileExtension(const std::string& FileName)
{
if(FileName.find_last_of(".") != std::string::npos)
return FileName.substr(FileName.find_last_of(".")+1);
return "";
}
this solution will always return the extension even on strings like "this.a.b.c.d.e.s.mp3" if it cannot find the extension it will return "".
Actually, the easiest way is
char* ext;
ext = strrchr(filename,'.')
One thing to remember: if '.' doesn't exist in filename, ext will be NULL.
I've stumbled onto this question today myself, even though I already had a working code I figured out that it wouldn't work in some cases.
While some people already suggested using some external libraries, I prefer to write my own code for learning purposes.
Some answers included the method I was using in the first place (looking for the last "."), but I remembered that on linux hidden files/folders start with ".".
So if file file is hidden and has no extension, the whole file name would be taken for extension.
To avoid that I wrote this piece of code:
bool getFileExtension(const char * dir_separator, const std::string & file, std::string & ext)
{
std::size_t ext_pos = file.rfind(".");
std::size_t dir_pos = file.rfind(dir_separator);
if(ext_pos>dir_pos+1)
{
ext.append(file.begin()+ext_pos,file.end());
return true;
}
return false;
}
I haven't tested this fully, but I think that it should work.
I'd go with boost::filesystem::extension (std::filesystem::path::extension with C++17) but if you cannot use Boost and you just have to verify the extension, a simple solution is:
bool ends_with(const std::string &filename, const std::string &ext)
{
return ext.length() <= filename.length() &&
std::equal(ext.rbegin(), ext.rend(), filename.rbegin());
}
if (ends_with(filename, ".conf"))
{ /* ... */ }
Using std::string's find/rfind solves THIS problem, but if you work a lot with paths then you should look at boost::filesystem::path since it will make your code much cleaner than fiddling with raw string indexes/iterators.
I suggest boost since it's a high quality, well tested, (open source and commercially) free and fully portable library.
For char array-type strings you can use this:
#include <ctype.h>
#include <string.h>
int main()
{
char filename[] = "apples.bmp";
char extension[] = ".jpeg";
if(compare_extension(filename, extension) == true)
{
// .....
} else {
// .....
}
return 0;
}
bool compare_extension(char *filename, char *extension)
{
/* Sanity checks */
if(filename == NULL || extension == NULL)
return false;
if(strlen(filename) == 0 || strlen(extension) == 0)
return false;
if(strchr(filename, '.') == NULL || strchr(extension, '.') == NULL)
return false;
/* Iterate backwards through respective strings and compare each char one at a time */
for(int i = 0; i < strlen(filename); i++)
{
if(tolower(filename[strlen(filename) - i - 1]) == tolower(extension[strlen(extension) - i - 1]))
{
if(i == strlen(extension) - 1)
return true;
} else
break;
}
return false;
}
Can handle file paths in addition to filenames. Works with both C and C++. And cross-platform.
If you use Qt library, you can give a try to QFileInfo's suffix()
Good answers but I see most of them has some problems:
First of all I think a good answer should work for complete file names which have their path headings, also it should work for linux or windows or as mentioned it should be cross platform. For most of answers; file names with no extension but a path with a folder name including dot, the function will fail to return the correct extension: examples of some test cases could be as follow:
const char filename1 = {"C:\\init.d\\doc"}; // => No extention
const char filename2 = {"..\\doc"}; //relative path name => No extention
const char filename3 = {""}; //emputy file name => No extention
const char filename4 = {"testing"}; //only single name => No extention
const char filename5 = {"tested/k.doc"}; // normal file name => doc
const char filename6 = {".."}; // parent folder => No extention
const char filename7 = {"/"}; // linux root => No extention
const char filename8 = {"/bin/test.d.config/lx.wize.str"}; // ordinary path! => str
"brian newman" suggestion will fail for filename1 and filename4.
and most of other answers which are based on reverse find will fail for filename1.
I suggest including the following method in your source:
which is function returning index of first character of extension or the length of given string if not found.
size_t find_ext_idx(const char* fileName)
{
size_t len = strlen(fileName);
size_t idx = len-1;
for(size_t i = 0; *(fileName+i); i++) {
if (*(fileName+i) == '.') {
idx = i;
} else if (*(fileName + i) == '/' || *(fileName + i) == '\\') {
idx = len - 1;
}
}
return idx+1;
}
you could use the above code in your c++ application like below:
std::string get_file_ext(const char* fileName)
{
return std::string(fileName).substr(find_ext_idx(fileName));
}
The last point in some cases the a folder is given to file name as argument and includes a dot in the folder name the function will return folder's dot trailing so better first to user check that the given name is a filename and not folder name.
This is a solution I came up with. Then, I noticed that it is similar to what #serengeor posted.
It works with std::string and find_last_of, but the basic idea will also work if modified to use char arrays and strrchr.
It handles hidden files, and extra dots representing the current directory. It is platform independent.
string PathGetExtension( string const & path )
{
string ext;
// Find the last dot, if any.
size_t dotIdx = path.find_last_of( "." );
if ( dotIdx != string::npos )
{
// Find the last directory separator, if any.
size_t dirSepIdx = path.find_last_of( "/\\" );
// If the dot is at the beginning of the file name, do not treat it as a file extension.
// e.g., a hidden file: ".alpha".
// This test also incidentally avoids a dot that is really a current directory indicator.
// e.g.: "alpha/./bravo"
if ( dotIdx > dirSepIdx + 1 )
{
ext = path.substr( dotIdx );
}
}
return ext;
}
Unit test:
int TestPathGetExtension( void )
{
int errCount = 0;
string tests[][2] =
{
{ "/alpha/bravo.txt", ".txt" },
{ "/alpha/.bravo", "" },
{ ".alpha", "" },
{ "./alpha.txt", ".txt" },
{ "alpha/./bravo", "" },
{ "alpha/./bravo.txt", ".txt" },
{ "./alpha", "" },
{ "c:\\alpha\\bravo.net\\charlie.txt", ".txt" },
};
int n = sizeof( tests ) / sizeof( tests[0] );
for ( int i = 0; i < n; ++i )
{
string ext = PathGetExtension( tests[i][0] );
if ( ext != tests[i][1] )
{
++errCount;
}
}
return errCount;
}
A NET/CLI version using System::String
System::String^ GetFileExtension(System::String^ FileName)
{
int Ext=FileName->LastIndexOf('.');
if( Ext != -1 )
return FileName->Substring(Ext+1);
return "";
}
_splitpath, _wsplitpath, _splitpath_s, _wsplitpath_w
This is Windows (Platform SDK) only
You can use strrchr() to find last occurence of .(dot) and get .(dot) based extensions files.
Check the below code for example.
#include<stdio.h>
void GetFileExtension(const char* file_name) {
int ext = '.';
const char* extension = NULL;
extension = strrchr(file_name, ext);
if(extension == NULL){
printf("Invalid extension encountered\n");
return;
}
printf("File extension is %s\n", extension);
}
int main()
{
const char* file_name = "c:\\.directoryname\\file.name.with.too.many.dots.ext";
GetFileExtension(file_name);
return 0;
}
Here's a function that takes a path/filename as a string and returns the extension as a string. It is all standard c++, and should work cross-platform for most platforms.
Unlike several other answers here, it handles the odd cases that windows' PathFindExtension handles, based on PathFindExtensions's documentation.
wstring get_file_extension( wstring filename )
{
size_t last_dot_offset = filename.rfind(L'.');
// This assumes your directory separators are either \ or /
size_t last_dirsep_offset = max( filename.rfind(L'\\'), filename.rfind(L'/') );
// no dot = no extension
if( last_dot_offset == wstring::npos )
return L"";
// directory separator after last dot = extension of directory, not file.
// for example, given C:\temp.old\file_that_has_no_extension we should return "" not "old"
if( (last_dirsep_offset != wstring::npos) && (last_dirsep_offset > last_dot_offset) )
return L"";
return filename.substr( last_dot_offset + 1 );
}
I use these two functions to get the extension and filename without extension:
std::string fileExtension(std::string file){
std::size_t found = file.find_last_of(".");
return file.substr(found+1);
}
std::string fileNameWithoutExtension(std::string file){
std::size_t found = file.find_last_of(".");
return file.substr(0,found);
}
And these regex approaches for certain extra requirements:
std::string fileExtension(std::string file){
std::regex re(".*[^\\.]+\\.([^\\.]+$)");
std::smatch result;
if(std::regex_match(file,result,re))return result[1];
else return "";
}
std::string fileNameWithoutExtension(std::string file){
std::regex re("(.*[^\\.]+)\\.[^\\.]+$");
std::smatch result;
if(std::regex_match(file,result,re))return result[1];
else return file;
}
Extra requirements that are met by the regex method:
If filename is like .config or something like this, extension will be an empty string and filename without extension will be .config.
If filename doesn't have any extension, extention will be an empty string, filename without extension will be the filename unchanged.
EDIT:
The extra requirements can also be met by the following:
std::string fileExtension(const std::string& file){
std::string::size_type pos=file.find_last_of('.');
if(pos!=std::string::npos&&pos!=0)return file.substr(pos+1);
else return "";
}
std::string fileNameWithoutExtension(const std::string& file){
std::string::size_type pos=file.find_last_of('.');
if(pos!=std::string::npos&&pos!=0)return file.substr(0,pos);
else return file;
}
Note:
Pass only the filenames (not path) in the above functions.
Try to use strstr
char* lastSlash;
lastSlash = strstr(filename, ".");
Or you can use this:
char *ExtractFileExt(char *FileName)
{
std::string s = FileName;
int Len = s.length();
while(TRUE)
{
if(FileName[Len] != '.')
Len--;
else
{
char *Ext = new char[s.length()-Len+1];
for(int a=0; a<s.length()-Len; a++)
Ext[a] = FileName[s.length()-(s.length()-Len)+a];
Ext[s.length()-Len] = '\0';
return Ext;
}
}
}
This code is cross-platform
So, using std::filesystem is the best answer, but if for whatever reason you don't have C++17 features available, this will work even if the input string includes directories:
string getextn (const string &fn) {
int sep = fn.find_last_of(".\\/");
return (sep >= 0 && fn[sep] == '.') ? fn.substr(sep) : "";
}
I'm adding this because the rest of the answers here are either strangely complicated or fail if the path to the file contains a dot and the file doesn't. I think the fact that find_last_of can look for multiple characters is often overlooked.
It works with both / and \ path separators. It fails if the extension itself contains a slash but that's usually too rare to matter. It doesn't do any filtering for filenames that start with a dot and contain no other dots -- if this matters to you then this is the least unreasonable answer here.
Example input / output:
/ => ''
./ => ''
./pathname/ => ''
./path.name/ => ''
pathname/ => ''
path.name/ => ''
c:\path.name\ => ''
/. => '.'
./. => '.'
./pathname/. => '.'
./path.name/. => '.'
pathname/. => '.'
path.name/. => '.'
c:\path.name\. => '.'
/.git_ignore => '.git_ignore'
./.git_ignore => '.git_ignore'
./pathname/.git_ignore => '.git_ignore'
./path.name/.git_ignore => '.git_ignore'
pathname/.git_ignore => '.git_ignore'
path.name/.git_ignore => '.git_ignore'
c:\path.name\.git_ignore => '.git_ignore'
/filename => ''
./filename => ''
./pathname/filename => ''
./path.name/filename => ''
pathname/filename => ''
path.name/filename => ''
c:\path.name\filename => ''
/filename. => '.'
./filename. => '.'
./pathname/filename. => '.'
./path.name/filename. => '.'
pathname/filename. => '.'
path.name/filename. => '.'
c:\path.name\filename. => '.'
/filename.tar => '.tar'
./filename.tar => '.tar'
./pathname/filename.tar => '.tar'
./path.name/filename.tar => '.tar'
pathname/filename.tar => '.tar'
path.name/filename.tar => '.tar'
c:\path.name\filename.tar => '.tar'
/filename.tar.gz => '.gz'
./filename.tar.gz => '.gz'
./pathname/filename.tar.gz => '.gz'
./path.name/filename.tar.gz => '.gz'
pathname/filename.tar.gz => '.gz'
path.name/filename.tar.gz => '.gz'
c:\path.name\filename.tar.gz => '.gz'
If you happen to use Poco libraries you can do:
#include <Poco/Path.h>
...
std::string fileExt = Poco::Path("/home/user/myFile.abc").getExtension(); // == "abc"
If you consider the extension as the last dot and the possible characters after it, but only if they don't contain the directory separator character, the following function returns the extension starting index, or -1 if no extension found. When you have that you can do what ever you want, like strip the extension, change it, check it etc.
long get_extension_index(string path, char dir_separator = '/') {
// Look from the end for the first '.',
// but give up if finding a dir separator char first
for(long i = path.length() - 1; i >= 0; --i) {
if(path[i] == '.') {
return i;
}
if(path[i] == dir_separator) {
return -1;
}
}
return -1;
}
I used PathFindExtension() function to know whether it is a valid tif file or not.
#include <Shlwapi.h>
bool A2iAWrapperUtility::isValidImageFile(string imageFile)
{
char * pStrExtension = ::PathFindExtension(imageFile.c_str());
if (pStrExtension != NULL && strcmp(pStrExtension, ".tif") == 0)
{
return true;
}
return false;
}