C++ Reading a filename from stdin - c++

So I'm trying to build a version of wc, and one of the key abilities of this program is that you can specify files in two ways:
wc file.txt
and
wc < file.txt
I have figured out how to implement the first way, but I am struggling with the second way. How could I approach this?

The way tools like this work, which includes many others like grep, is if there's no arguments on the command-line that specify file-names, input is read from std::cin.
In a simple sense, if argc is 1 then you have only the executable name as an argument so no files were specified. In a more practical situation you'd use something like an argument parser which may interpret various flags, but which will give a count of non-flag arguments.

Related

How can I replicate compile time hex string interpretation at run time!? c++

In my code the following line gives me data that performs the task its meant for:
const char *key = "\xf1`\xf8\a\\\x9cT\x82z\x18\x5\xb9\xbc\x80\xca\x15";
The problem is that it gets converted at compile time according to rules that I don't fully understand. How does "\x" work in a String?
What I'd like to do is to get the same result but from a string exactly like that fed in at run time. I have tried a lot of things and looked for answers but none that match closely enough for me to be able to apply.
I understand that \x denotes a hex number. But I don't know in which form that gets 'baked out' by the compiler (gcc).
What does that ` translate into?
Does the "\a" do something similar to "\x"?
This is indeed provided by the compiler, but this part is not member of the standard library. That means that you are left with 3 ways:
dynamically write a C++ source file containing the string, and writing it on its standard output. Compile it and (providing popen is available) execute it from your main program and read its input. Pretty ugly isn't it...
use the source of an existing compiler, or directly its internal libraries. Clang is probably a good starting point because it has been designed to be modular. But it could require a good amount of work to find where that damned specific point is coded and how to use that...
just mimic what the compiler does, and write your own parser by hand. It is not that hard, and will learn you why tests are useful...
If it was not clear until here, I strongly urge you to use the third way ;-)
If you want to translate "escape" codes in strings that you get as input at run-time then you need to do it yourself, explicitly.
One way is to read the input into one string. Then copy the characters from that source string into a new destination string, one by one. If you see a backslash then you discard it, fetch the next character, and if it's an x you can use e.g. std::stoi to convert the next few characters into its corresponding integer value, and append that number to the destination string (either adding it with std::to_string, or using output string streams and the normal "output" operator <<).

Indexing string literals for c++ project

I have a huge c++ project and I find myself rgrep-ing for patterns that I know are in string literals. Is there a way to get clang or xtags or cscope or whatever to build a file with a mapping of each string literal in the project to the file and line where it was found?
I don't know of a way to make cscope or friends to do this. You could almost certainly write a custom Starscope extractor that would do this, if you don't mind writing a dozen or so lines of Ruby (starscope: https://github.com/eapache/starscope, adding an extractor: https://github.com/eapache/starscope/blob/master/doc/LANGUAGE_SUPPORT.md#how-to-add-another-language)
Alternatively it may just be enough to use something like ag instead, which is grep-like but generally a lot faster: https://github.com/ggreer/the_silver_searcher

GDB backtrace with long function names

I am doing some debugging of an application that uses boost::spirit. This means that backtraces are very deep and that many of the intermediate layers have function names that take several pages to print. The length of the function names makes examining the backtrace difficult. How can I have gdb limit the length of a function name to 1 or 2 lines? I'd still like the see the full path to the file and line number, but I don't need four pages of template parameters!
I don't think it can be done directly right now. I think it would be a reasonable feature.
However, you can write your own implementation of "bt" in Python and then apply whatever transforms you like. This isn't actually very hard.

How to get multiple file types from FindFirstFile() is it possible?

If I want to get list of JPEG files I'll pass *.jpg to the function (in the end of the parameter) and FindFileNext() will return .jpg files one by one. What if I want the function to return jpg and mp3 files? is it possible to do it with one function call without making two strings to pass to the function?
You could pass *.* mask to the function and do additional checking once you receive next file.
In MSDN documentation:
http://msdn.microsoft.com/en-us/library/windows/desktop/aa364418(v=vs.85).aspx
You can read:
lpFileName [in]
The directory or path, and the file name, which can include wildcard characters, for example, an asterisk (*) or a question mark (?).
So, you can at most use wildcards, which is not enough to match two different extensions.
You might need to perform two searches, first with jpg and second with mp3. If you are concerned about efficiency, ist best to profile each method.
The answers you've received imply only two possibilities: either search two entirely separate times, once for *.jpg and once for *.mp3, or else search once for *.* (and figure out on your own whether a file matches what you care about or not).
At least in this particular case, there's a little bit of middle ground. You can search for *.?p? because the second letter of both extensions you care about is p. With this you'll still need to do some sort of comparison on your own to find whether a given file really has one of the two extensions you care about. As such, it won't simplify your code a whole lot.
At the same time, it can speed up the search quite a bit. A call to FindNextFile has a fair amount of overhead, so if the directory you're looking at has a lot of files that don't match the ?p? extension, avoiding retrieving them all only to ignore them can save quite a bit of time.
Of course, this is specific to the case where you have at least one matching letter, so it's not really a completely general technique.

C++ - Splitting Filename and File Extension

Ok, first of all I don't want to use Boost, or any external libraries. I just want to use the C++ Standard Library. I can easily split strings with a given delimiter with my split() function:
void split(std::string &string, std::vector<std::string> &tokens, const char &delim) {
std::string ea;
std::stringstream stream(string);
while(getline(stream, ea, delim))
tokens.push_back(ea);
}
I do this on filenames. But there's a problem. There are files that have extensions like: tar.gz, tar.bz2, etc. Also there are some filenames that have extra dots. Some.file.name.tar.gz. I wish to separate Some.file.name and tar.gz Note: The number of dots in a filename isn't constant.
I also tried PathFindExtension but no luck. Is this possible? If so, please enlighten me. Thank you.
Edit: I'm very sorry about not specifying the OS. It's Windows.
I think you could use std::string find_last_of to get the index of the last ., and substr to cut the string (although the "complex extensions" involving multiple dots will require additional work).
There is no way of doing what you want that does not involve a database of extensions for your purpose. There's nothing magical about extensions, they are just part of a filename (if you gunzip foo.tar.gz you'll likely get a foo.tar, so for this application .gz actually is "the extension"). So, in order to do what you want, build a database of extensions that you want to look for and fall back on "last dot" if you don't find one.
There's nothing in the C++ standard library -- that is, it's not in the Standard --, but every operating system I know of provides this functionality in a variety of ways.
In Windows you can use _splitpath(), and in Linux you can use dirname() & basename()
The problem is indeed filenames like *.tar.gz, which can not be split consistently, due to the fact that (at least in Windows) the .tar part isn't part of the extension. You'll either have to keep a list for these special cases and use a one-dot string::rfind for the rest or find some pre-implemented way. Note that the .tar.* extensions aren't infinite, and very much standardized (there's about ten of them I think).
You could create a look-up table of file extensions that you think you might encounter. And also add a command line option to add a new one to the look-up table if you encounter anything new. Then parse through the file name to see if it any entry in the look-up table is a sub-string in the file name.
EDIT: You can also refer to this question: C++/STL string: How to mimic regex like function with wildcards?