searching files? - c++

This code is a part of a larger code that indexes files, and tokenizes the words in each file so that you can be able to search a certain word in the large amount of file you have. (like Google)
This function is supposed to search your files for a word that you want to find. But I don't completely understand how it works!
Can someone please explain what this code does and how it does it?
In addition, I have several questions:
1) What exactly in "infile"?
2) What does the built-in function c_str() do?
3) Why does the variable "currentlineno" start at 1? Couldn't the first line in a file start at 0?
4) What is the difference between ++x and x++?
5) What is the difference between the condition "currentlineno < lineNumber" and "currentlineno != lineNumber" ?
This is the code:
void DisplayResult(string fileName, int lineNumber)
{
ifstream infile(fileName.c_str(), ifstream::in);
char line[1000];
int currentlineno = 1;
while(currentlineno < lineNumber)
{
infile.getline(line, 1000);
++currentlineno;
}
infile.getline(line, 1000);
cout<<endl<<"\nResult from ("<<fileName<<" ), line #"<<lineNumber<<": "<<endl;
cout<<"\t"<<line;
infile.close();
}

This function display the line at the corresponding line number pass by parameter.
1/ Infile permits to open a file as in put streams : http://www.cplusplus.com/reference/fstream/ifstream/
2/ c_str() permits to pass to a string structure to a simple char* (a char array). It is the structure use in the language C, which explains why the method name is "c_str". In C++, we usually use string more than char* cause it is really simpler.
3/ Why currentlineno start at 1 ? The function read the file content before the given line number. The, read one more time to display the wanted line.
4/ ++x is pre-incrementation, x++ is post-incrementation.
When you use ++x, x is incremented before to use it, otherwise, with x++, x is incremented after.
int x = 1;
cout << ++x; // display 2
x = 1;
cout << x++; // display 1
5/ Look at operators : http://www.cplusplus.com/doc/tutorial/operators/

1) What exactly in "infile"?
ANS:: Construct object and optionally open file. Link
2) What does the built-in function c_str() do?
ANS:: It is needed to get a const char* representation of the text stored
inside a std::string class. Link
3) Why does the variable "currentlineno" start at 1? Couldn't the first line in a file start at 0?
ANS:: Depends on the second input parameter of the function DisplayResult.
4) What is the difference between ++x and x++?
ANS:: See this. Probably you may have heard of Post-Increment and Pre-Increment.
5) What is the difference between the condition "currentlineno < lineNumber" and "currentlineno != lineNumber" ?
ANS:: Value of currentlineno should not exceed the value of lineNumber when condition is currentlineno < lineNumber. Value of currentlineno may exceed or may be less than the value of lineNumber but should not be equal to the value of lineNumber when condition is currentlineno != lineNumber.

This function does not search for words.
It takes as input a file name and a line number. It tries to find and read that line.
The output starts with a line stating: "The result from (fileName ), line #lineNumber: "
It is followed by a text indented by a tab and followed by the found line contents. This second line of output is left incomplete (not followed by a newline).
The found contents is empty, if the file has has less than the requested number of lines or if any of the lines before the requested line has more than 999 characters.
If the requested line has more than 999 characters it is truncated to 999 characters.
Other questions:
1) infile is a function-scope object of automatic storage duration and type std::basic_ifstream<char, std::char_traits<char>>, which is initialized for reading from the file named in fileName.
2) The member function c_str() built into the standard library string class returns a pointer to the string contents as a non-modifiable, nul-terminated character array, which is the format typically used in C for strings (type const char *). For historical reasons the file-based standard library streams take their file name arguments in this format.
3) Humans typically count line numbers starting with one. That is the convention used for the lineNumber parameter. The algorithm used must match this. The currentlineno local variable is used to mean 'the number of the next line to be read'. As such it must be initialized with 1. (This is somewhat confusing, considering the name of the variable.) Other implementations that initialize the line counter with 0 are possible - and indeed natural to most C++ programmers.
4) See any textbook or online reference of C++. Look for "pre-increment" (++x) and "post-increment" (x++) operators. They have the same side effect (increment x), but differ in the value of the expression. If you don't use the result they are equivalent (for basic types).
C++ programmers usually prefer pre-increment as it can generally be implemented more efficiently for user-defined types.
5) Even more basic textbook question. a < b tests for a less-than relationship, a != b tests for inequality.
Note: All answers assume that the types used are from the standard C++ library, i.e that appropriate includes of the <string> and <iostream> headers and necessary using directives or declarations are used.

Related

integers, chars and floating points in structs

So, I'm having some issues with my c++ code. I have the following code, but so far I can't get most of the data stored into the structured data type.
//structured data declaration
struct item
{
int itemCode;
char description[20];
float price;
};
And then the get code looks like this.
cout << setprecision(2) << fixed << showpoint;
ofstream salesFile ("Sales.txt");
ifstream stockFile ("Stock.txt");
for (counter = 0; counter < 9; counter++)
{
stockFile >> instock[counter].itemCode;
stockFile.getline (instock[counter].description, 20);
stockFile >> instock[counter].price;
}
The output should have looked like:
1234 "description here" 999.99
Quantity X
And this was the output:
1234 0.00
Quantity 5
If you have a file format that is of the form (for one entry)
1234
description here
999.99
(across multiple lines) then the explanation is simple
Th reading code in your loop, which does
stockFile >> instock[counter].itemCode;
stockFile.getline (instock[counter].description, 20);
stockFile >> instock[counter].price;
will work in this sequence
The value of instock[counter].itemCode will receive the value 1234. However (and this is important to understand) the newline after the 1234 will still be waiting in the stream to be read.
The call of getline() will encounter the newline, and return immediately. instock[counter].description will contain the string "".
The expression stockFile >> instock[counter].price will encounter the d in description. This cannot be interpreted as an integral value, so instock[counter].price will be unchanged.
Assuming some preceding code (which you haven't shown) sets instock[counter].price to 999.99 the above sequence of events will explain your output.
The real problem is that you are mixing styles of input on the one stream. In this case, mixing usage of streaming operators >> with use of line-oriented input (getline()). As per my description of the sequence above, different styles of input interact in different ways, because (as in this case) they behave differently when encountering a newline.
Some people will just tell you to skip over the newline after reading instock[counter].itemCode. That advice is flawed, since it doesn't cope well with changes (e.g. what happens if the file format changes to include an additional field on another line?, what happens if the file isn't "quite" in the expected format for some reason?).
The more general solution is to avoid mixing styles of input on the one stream. A common way would be to use getline() to read all data from the stream (i.e. not use >> to interact directly with stockFile). Then interpret/parse each string to find the information needed.
Incidentally, rather than using arrays of char to hold a string, try using the standard std::string (from standard header <string>). This has the advantage that std::string can adjust its length as needed. std::getline() also has an overload that can happily read to an std::string. Once data is read from your stream as an std::string, it can be interpreted as needed.
There are many ways of interpreting a string (e.g. to extract integral values from it). I'll leave finding an approach for that as an exercise - you will learn more by doing it yourself.

Reading from a file into a List using C++

I'm a little new to using file input/output so bear with me.
I've got a function called RunList(filename), that takes the name of the file as input and returns nothing. The file will have the format of having one line that is useless and I plan on using ignore() on and then the next line which is important has the format
"i 1 2 3 4 5 ...."
where the numbers go on for a very long way, about 250000 or so.
So what I want to do is to open this file, ignore the first line, and then for each number in the file I want to use the function void insert(x, p) which is a function I have defined to insert x after the current iterator position p. The end result is that I want to have my list contain all of the numbers in the file after the "i" and be in the same order. I have also defined the functions ListItr find(x) and ListItr first() which will return the iterator to the position that views x and to the first potion respectively.
Could anyone provide me with a means of doing this? I was thinking of using a for() loop and taking in each word at a time from the file and using my function to insert each element, but I'm a little lost as to how to do this, as I said I'm very new to using file input/output.
So, my RunList function currently looks something like this, although obviously its not finished nor does it really work, hence me needing some help on it.
void Runlist(filename){
ifstream in;
in.open(filename);
in.ignore(1000, '\n'); //this is me trying to ignore the first line
for (int i, i < 250000, i++){
int number;
in >> number
void insert(number, i)
}
}
But the plan was, I select the file, ignore the first line, then set up a for loop where i can use my void insert(number, i) to insert each number, but then i don't really understand how to read in each word at a time, or to preserve the order because if I just kept using the function on each number over and over then the list would have the numbers in the reverse order I believe.
There are several issues in your code:
You do not specify void for the return type of the function.
Instead of ignore, you could just drop the first line when reading by using getline once.
Your for loop usage is also pretty invalid: commas instead of semi-colons
No initialization of i, and so on.
insert is not shown, but you could probably use append anyway since that is what you seem to be doing.
i is not an "iterator" either, so probably you meant index.
You are having a function declaration in the middle of the function rather than calling it.
This pseudo code should get you going about understanding the input file stream class and its usage for this in C++:
void Runlist(filename)
{
ifstream in(filename, ifstream::in);
in.getline(0, 1024);
int number;
while (in >> number)
append(number);
in.close();
}
Disclaimer: this pseudo code is missing proper error checking, and so on.

Does an empty string contain an empty string in C++?

Just had an interesting argument in the comment to one of my questions. My opponent claims that the statement "" does not contain "" is wrong.
My reasoning is that if "" contained another "", that one would also contain "" and so on.
Who is wrong?
P.S.
I am talking about a std::string
P.S. P.S
I was not talking about substrings, but even if I add to my question " as a substring", it still makes no sense. An empty substring is nonsense. If you allow empty substrings to be contained in strings, that means you have an infinity of empty substrings. What is the point of that?
Edit:
Am I the only one that thinks there's something wrong with the function std::string::find?
C++ reference clearly says
Return Value: The position of the first character of the first match.
Ok, let's assume it makes sense for a minute and run this code:
string empty1 = "";
string empty2 = "";
int postition = empty1.find(empty2);
cout << "found \"\" at index " << position << endl;
The output is: found "" at index 0
Nonsense part: how can there be index 0 in a string of length 0? It is nonsense.
To be able to even have a 0th position, the string must be at least 1 character long.
And C++ is giving a exception in this case, which proves my point:
cout << empty2.at( empty1.find(empty2) ) << endl;
If it really contained an empty string it would had no problem printing it out.
It depends on what you mean by "contains".
The empty string is a substring of the empty string, and so is contained in that sense.
On the other hand, if you consider a string as a collection of characters, the empty string can't contain the empty string, because its elements are characters, not strings.
Relating to sets, the set
{2}
is a subset of the set
A = {1, 2, 3}
but {2} is not a member of A - all A's members are numbers, not sets.
In the same way, {} is a subset of {}, but {} is not an element in {} (it can't be because it's empty).
So you're both right.
C++ agrees with your "opponent":
#include <iostream>
#include <string>
using namespace std;
int main()
{
bool contains = string("").find(string("")) != string::npos;
cout << "\"\" contains \"\": "
<< boolalpha << contains;
}
Output: "" contains "": true
Demo
It's easy. String A contains sub-string B if there is an argument offset such that A.substr(offset, B.size()) == B. No special cases for empty strings needed.
So, let's see. std::string("").substr(0,0) turns out to be std::string(""). And we can even check your "counter-example". std::string("").substr(0,0).substr(0,0) is also well-defined and empty. Turtles all the way down.
The first thing that is unclear is whether you are talking about std::string or null terminated C strings, the second thing is why should it matter?. I will assume std::string.
The requirements on std::string determine how the component must behave, not what its internal representation must be (although some of the requirements affect the internal representation). As long as the requirements for the component are met, whether it holds something internally is an implementation detail that you might not even be able to test.
In the particular case of an empty string, there is nothing that mandates that it holds anything. It could just hold a size member set to 0 and a pointer (for the dynamically allocated memory if/when not empty) also set to 0. The requirement in operator[] requires that it returns a reference to a character with value 0, but since that character cannot be modified without causing undefined behavior, and since strict aliasing rules allow reading from an lvalue of char type, the implementation could just return a reference to one of the bytes in the size member (all set to 0) in the case of an empty string.
Some implementations of std::string use small object optimizations, in those implementations there will be memory reserved for small strings, including an empty string. While the std::string will obviously not contain a std::string internally, it might contain the sequence of characters that compose an empty string (i.e. a terminating null character)
empty string doesn't contain anything - it's EMPTY. :)
Of course an empty string does not contain an empty string. It'll be turtles all the way down if it did.
Take String empty = ""; that is declaring a string literal that is empty, if you want a string literal to represent a string literal that is empty you would need String representsEMpty = """"; but of course, you need to escape it, giving you string actuallyRepresentsEmpty = "\"\"";
ps, I am taking a pragmatic approach to this. Leave the maths nonsense at the door.
Thinking about you amendment, it could be possible that your 'opponent' meant was that an 'empty' std::string still has an internal storage for characters which is itself empty of characters. That would be an implementation detail I am sure, it could perhaps just keep a certain size (say 10) array of characters 'just incase', so it will technically not be empty.
Of course, there is the trick question answer that 'nothing' fits into anything infinite times, a sort of 'divide by zero' situation.
Today I had the same question since I'm currently bound to a lousy STL implementation (dating back to the pre-C++98 era) that differs from C++98 and all following standards:
TEST_ASSERT(std::string().find(std::string()) == string::npos); // WRONG!!! (non-standard)
This is especially bad if you try to write portable code because it's so hard to prove that no feature depends on that behaviour. Sadly in my case that's actually true: it does string processing to shorten phone numbers input depending on a subscriber line spec.
On Cppreference, I see in std::basic_string::find an explicit description about empty strings that I think matches exactly the case in question:
an empty substring is found at pos if and only if pos <= size()
The referred pos defines the position where to start the search, it defaults to 0 (the beginning).
A standard-compliant C++ Standard Library will pass the following tests:
TEST_ASSERT(std::string().find(std::string()) == 0);
TEST_ASSERT(std::string().substr(0, 0).empty());
TEST_ASSERT(std::string().substr().empty());
This interpretation of "contain" answers the question with yes.

Uninitialized char

I'm reading over a C++ class for parsing CSV files in one of my programming books for class. I primarily write in C# for work and don't interact with C++ code very often. One of the functions, getline, uses an uninitialized char variable and I'm confused as to whether it's a typo or not.
// getline: get one line, grow as needed
int Csv::getline(string& str)
{
char c;
for (line = ""; fin.get(c) && !endofline(c); )
line += c;
split();
str = line;
return !fin.eof();
}
fin is an istream. The documentation I'm reading shows the get (char& c); function being passed a reference, but which char in the stream is returned? What's the initial value of c?
The initial value of c is undefined but it does not matter what the initial value of c is since the call to get will set the value. Since there is a sequence point after the left hand side of the || and && operators we know that all the side effects of get will have been effected and endofline will see the modified value of c.

Standard Stream Printing Unrelated Characters

I was coding something and my code didn't work correctly in some situations, so I decided to write some output to file for debugging. The program just concatenates some characters from a string (and it didn't get out of bounds) and printed them to the file. It has no thing as error reporting or something, and the input string is just a bunch of random characters. But, i get some junk in the output, such as:
f::xsgetn error reading the file
sgetn error reading the file
ilebuf::xsgetn error reading the file
(I removed program's output and this is just the extra stuff.)
As far as I know, if there are any errors, an exception must be thrown. What happens and how can I fix it?
The same thing happens when I print the output using standard output. All used libraries are standard libraries (eg. iostream, fstream, etc.)
PS: For some reasons, I can't publish all the code. Here is the part that creates the output and passes it to stream: (tri is and string, and is defined previously. Center is an integer and is inside the bounds of the string. fout is a previously defined file stream.)
string op = "" + tri[center];
fout << center << "<>" << op << endl;
Since tri is a string, tri[center] is a char.
The type of "" is const char[], which can't be added to a char.
Instead it is implicitly converted to const char*, which can be added to a char.
Unfortunately for you, the result of that is that the integer value of tri[center]is added to that pointer as an offset, not as a string concatenation, and the particular area of memory that the result refers to doesn't contain what you're looking for but instead contains other static strings like e.g. "error reading the file".
To fix it, use
string op = string("") + tri[center];
instead.
I encountered the same problem in another program, where I had written:
str += blablabla + "#";
and I saw some unrelated characters being printed. I fixed it this way:
str = str + blablabla + "#";
and it worked!
There is some problem with the += operator for string.