Cant understand how to use seekg tellg to achieve the following result - c++

As part of a larger program, my task is to read each line of an input file and store the index offset to each line. Later given all the index offsets, I want to be able to go to that position in the file directly and print the line corresponding to that offset. Can someone help me in figuring out what I am doing incorrectly in the following code.
#include <iostream>
#include <sstream>
#include <vector>
#include <iterator>
#include <stdio.h>
#include <string.h>
void create_index_info(ifstream& myFile, std::unordered_map<size_t, std::pair<std::streampos, size_t>>& map_index_start_end_pos)
{
size_t uiLength;
size_t uiCount = 0;
std::string line;
while (getline(myFile, line))
{
start_pos = myFile.tellg();
uiLength = strlen(line.c_str());
map_index_start_end_pos.emplace( uiCount, std::make_pair(start_pos, uiLength) );
uiCount++;
}
}
void print_index_info(ifstream& myFile, const std::unordered_map<size_t, std::pair<std::streampos, size_t>>& map_index_start_end_pos)
{
size_t uiLength;
for(auto it = map_index_start_end_pos.begin(); it != map_index_start_end_pos.end(); it++)
{
auto res = it->second;
myFile.clear();
myFile.seekg(res.first, ios::beg);
uiLength = res.second;
char* buffer = (char*)malloc(uiLength * sizeof(char));
myFile.read(buffer, uiLength);
for(size_t uiCount = 0; uiCount < uiLength; uiCount++)
{
std::cout<< buffer[uiCount];
}
std::cout<<"\n";
free(buffer);
buffer = NULL;
}
}
int main()
{
std::unordered_map<size_t, std::pair<std::streampos, size_t>> map_index_start_end_pos;
ifstream myFile("Filename.txt");
create_index_info(myFile, map_index_start_end_pos);
myFile.close();
ifstream myFile1("Filename.txt");
print_index_info(myFile1, map_index_start_end_pos);
myFile1.close();
return 0;
}
The data in the input text file contains the following entries:
9 10 11
8 7 5
67 34 12 45 9
20
The ideal output for the code should be same as the input modulo upto ordering of the lines being printed (I am using an unordered map, so the output need not be printed in the same order as it was fed). However my code is outputting some garbage, mostly characters of the form \x00. Can someone please help me figure out what is wrong in my code.

Your program does not compile.
Why are you using a map? Why a vector is not enough?
Why getline and not ignore? ignore is faster and does not allocate memory.
You are probably looking for this:
#include <iostream>
#include <vector>
#include <string>
#include <fstream>
std::istream& get_line_idx( std::istream& is, std::vector<std::streampos>& result )
{
for( std::streampos pos = is.tellg(); is.ignore( std::numeric_limits<std::streamsize>::max(), '\n' ); pos = is.tellg() )
result.push_back( pos );
return is;
}
void print( std::istream& is, std::vector<std::streampos>& pos )
{
for( auto curpos : pos )
{
std::string line;
is.seekg( curpos );
std::getline( is, line );
std::cout << line << std::endl;
}
}
int main()
{
std::vector<std::streampos> result;
{
std::ifstream is { "c:\\temp\\test.txt" };
if( !is )
return -1;
get_line_idx( is, result );
}
{
std::ifstream is { "c:\\temp\\test.txt" };
if( !is )
return -2;
print( is, result );
}
return 0;
}

Related

Counting words in an input string in C++ **with consideration for typos

I've been looking for ways to count the number of words in a string, but specifically for strings that may contain typos (i.e. "_This_is_a___test" as opposed to "This_is_a_test"). Most of the pages I've looked at only handle single spaces.
This is actually my first time programming in C++, and I don't have much other programming experience to speak of (2 years of college in C and Java). Although what I have is functional, I'm also aware it's complex, and I'm wondering if there is a more efficient way to achieve the same results?
This is what I have currently. Before I run the string through numWords(), I run it through a trim function that removes leading whitespace, then check that there are still characters remaining.
int numWords(string str) {
int count = 1;
for (int i = 0; i < str.size(); i++) {
if (str[i] == ' ' || str[i] == '\t' || str[i] == '\n') {
bool repeat = true;
int j = 1;
while (j < (str.size() - i) && repeat) {
if (str[i + j] != ' ' && str[i + j] != '\t' && str[i + j] != '\n') {
repeat = false;
i = i + j;
count++;
}
else
j++;
}
}
}
return count;
}
Also, I wrote mine to take a string argument, but most of the examples I've seen used (char* str) instead, which I wasn't sure how to use with my input string.
You don't need all those stringstreams to count word boundary
#include <string>
#include <cctype>
int numWords(std::string str)
{
bool space = true; // not in word
int count = 0;
for(auto c:str){
if(std::isspace(c))space=true;
else{
if(space)++count;
space=false;
}
}
return count;
}
One solution is to utilize std::istringstream to count the number of words and to skip over spaces automatically.
#include <sstream>
#include <string>
#include <iostream>
int numWords(std::string str)
{
int count = 0;
std::istringstream strm(str);
std::string word;
while (strm >> word)
++count;
return count;
}
int main()
{
std::cout << numWords(" This is a test ");
}
Output:
4
Albeit as mentioned std::istringstream is more "heavier" in terms of performance than writing your own loop.
Sam's comment made me write a function that does not allocate strings for words. But just creates string_views on the input string.
#include <cassert>
#include <cctype>
#include <vector>
#include <string_view>
#include <iostream>
std::vector<std::string_view> get_words(const std::string& input)
{
std::vector<std::string_view> words;
// the first word begins at an alpha character
auto begin_of_word = std::find_if(input.begin(), input.end(), [](const char c) { return std::isalpha(c); });
auto end_of_word = input.begin();
auto end_of_input = input.end();
// parse the whole string
while (end_of_word != end_of_input)
{
// as long as you see text characters move end_of_word one back
while ((end_of_word != end_of_input) && std::isalpha(*end_of_word)) end_of_word++;
// create a string view from begin of word to end of word.
// no new string memory will be allocated
// std::vector will do some dynamic memory allocation to store string_view (metadata of word positions)
words.emplace_back(begin_of_word, end_of_word);
// then skip all non readable characters.
while ((end_of_word != end_of_input) && !std::isalpha(*end_of_word) ) end_of_word++;
// and if we haven't reached the end then we are at the beginning of a new word.
if ( end_of_word != input.end()) begin_of_word = end_of_word;
}
return words;
}
int main()
{
std::string input{ "This, this is a test!" };
auto words = get_words(input);
for (const auto& word : words)
{
std::cout << word << "\n";
}
return 0;
}
You can use standard function std::distance with std::istringstream the following way
#include <iostream>
#include <sstream>
#include <string>
#include <iterator>
int main()
{
std::string s( " This is a test" );
std::istringstream iss( s );
auto count = std::distance( std::istream_iterator<std::string>( iss ),
std::istream_iterator<std::string>() );
std::cout << count << '\n';
}
The program output is
4
If you want you can place the call of std::distance in a separate function like
#include <iostream>
#include <sstream>
#include <string>
#include <iterator>
size_t numWords( const std::string &s )
{
std::istringstream iss( s );
return std::distance( std::istream_iterator<std::string>( iss ),
std::istream_iterator<std::string>() );
}
int main()
{
std::string s( " This is a test" );
std::cout << numWords( s ) << '\n';
}
If separators can include other characters apart from white space characters as for example punctuations then you should use methods of the class std::string or std::string_view find_first_of and find_first_not_of.
Here is a demonstration program.
#include <iostream>
#include <string>
#include <string_view>
size_t numWords( const std::string_view s, std::string_view delim = " \t" )
{
size_t count = 0;
for ( std::string_view::size_type pos = 0;
( pos = s.find_first_not_of( delim, pos ) ) != std::string_view::npos;
pos = s.find_first_of( delim, pos ) )
{
++count;
}
return count;
}
int main()
{
std::string s( "Is it a test ? Yes ! Now we will run it ..." );
std::cout << numWords( s, " \t!?.," ) << '\n';
}
The program output is
10
you can do it easily with regex
int numWords(std::string str)
{
std::regex re("\\S+"); // or `[^ \t\n]+` to exactly match the question
return std::distance(
std::sregex_iterator(str.begin(), str.end(), re),
std::sregex_iterator()
);
}

C++ std::fstream how to move to a certain line and column of a file

So I have been searching on how to move to a certain line and column in a file , but I can't seem to find an answer . I want something like this :
std::fstream file("example.txt");
file.move_to( line number , column number );
The following function can do just you want:
std::string GetLine(std::istream& fs, long long index)
{
std::string line;
for (size_t i = 0; i <= index; i++)
{
std::getline(fs, line);
}
return line;
}
The above function gets the line at index (index == 0 - 1st line, index == 2 - 3rd line, etc.).
Usage:
#include <iostream>
#include <string>
#include <fstream>
std::string GetLine(std::istream& fs, long long index)
{
std::string line;
for (size_t i = 0; i <= index; i++)
{
std::getline(fs, line);
}
return line;
}
int main()
{
std::ifstream fs("input.txt"); // fstream works too
std::cout << GetLine(fs, 1);
}
input.txt
This is
a
test file
Output:
a

How to start reading file from a particular position c++

I am reading a file using fstream and getline functions. I want to give a starting position e.g. my file has 13 lines I want to start reading it from 7th line for example. Here is my code:
#include<iostream>
#include <stdlib.h>
#include <string>
#include <vector>
#include<iterator> // for iterators
#include<map>
using namespace std;
int main()
{
string line;
int start= 7;
unsigned long int index;
For( int z=1; z<=13; z++){
if (f_node.is_open())
{
getline(f_node, line);
if ((line.find("$EndNodes") != string::npos))
{
cout << "$EndNodes found file closed .... " << endl;
f_node.close();
return false;
}
// Point index.
int i = 0;
int j = line.find_first_of(" ", i);
index = strtoul((line.substr(i, j)).c_str(), NULL, 0);//
}
}
I am reading only indexes and I want to start it from 7th index How to do it?
To discard some number of lines, something like:
#include <fstream>
#include <string>
int main() {
std::ifstream infile{"myfile.txt"};
std::string line;
int starting_line = 7;
// Read and discard beginning lines
for (int n = 1; n < starting_line; n += 1) {
if (!std::getline(infile, line)) {
// Error or premature end of file! Handle appropriately.
}
}
while (std::getline(infile, line)) {
// Do something with the lines you care about.
}
return 0;
}
Except with actual error checking and handling and such.
"there is no way to tell code the starting position like seekg and tellg?" No. NL is just like any other character, it does not receive any special treatment.
You simply must scan the stream, counting the new-line character:
std::istream& seek_line(std::istream& is, const int n, std::ios_base::seekdir way = std::ios_base::beg)
{
is.seekg(0, way);
int i = 0;
char c;
while (is.get(c) && i < n)
if (c == '\n')
++i;
is.putback(c);
return is;
}
And this is how you use the above function:
int main()
{
using namespace std;
ifstream is{ "c:\\temp\\test.txt" };
if (!is)
return -1;
if (!seek_line(is, 3))
return -2;
string s;
getline(is, s);
cout << s << endl;
return 0;
}

Input from text file into char *array[9]

I have a file with 9 words and i have to store each word into the char array of 9 pointers but i keep getting an error message. I cannot use vectors!
#include <iostream>
#include <fstream>
using namespace std;
int main()
{
char *words[9];
ifstream inStream;
inStream.open("sentence.txt");
if (inStream.fail())
{
cout << "Input file opening failed.\n";
exit(1);
}
for ( int i = 0; i < 10; i++)
{
inStream >> words[i];
}
inStream.close();
return 0;
}
The declaration
char *words[9];
declares a raw array of pointers. This array is not initialized so the pointers have indeterminate values. Using any of them would be Undefined Behavior.
Instead you want
vector<string> words;
where vector is std::vector from the <vector> header, and string is std::string from the <string> header.
Use the push_back member function to add strings to the end of the vector.
Also you need to move the close call out of the loop. Otherwise it will close the file in the first iteration.
This approach gives the code (off the cuff, disclaimer...)
#include <fstream>
#include <iostream>
#include <vector>
#include <string>
using namespace std;
int main()
{
vector<string> words;
ifstream inStream;
inStream.open("sentence.txt");
for ( int i = 0; i < 10; i++)
{
string word;
if( inStream >> word )
words.push_back( word );
}
inStream.close();
}
If you can't use std::string and std::vector then you need to initialize the array of pointers, and make sure that you don't read more into the buffers than there's room for.
The main problem here is that >> is unsafe for reading into a raw array given by a pointer. It doesn't know how large that array is. It can easily lead to a buffer overrun, with dire consequences.
And so this gets a bit complicated, but it can look like this:
#include <ctype.h> // isspace
#include <fstream>
#include <iostream>
#include <locale.h> // setlocale, LC_ALL
#include <stdlib.h> // EXIT_FAILURE
using namespace std;
void fail( char const* const message )
{
cerr << "! " << message << "\n";
exit( EXIT_FAILURE );
}
void readWordFrom( istream& stream, char* const p_buffer, int const buffer_size )
{
int charCode;
// Skip whitespace:
while( (charCode = stream.get()) != EOF and isspace( charCode ) ) {}
int n_read = 0;
char* p = p_buffer;
while( n_read < buffer_size - 1 and charCode != EOF and not isspace( charCode ) )
{
*p = charCode; ++p;
++n_read;
charCode = stream.get();
}
*p = '\0'; // Terminating null-byte.
if( charCode != EOF )
{
stream.putback( charCode );
if( not isspace( charCode ) )
{
assert( n_read == buffer_size - 1 ); // We exceeded buffer size.
stream.setstate( ios::failbit );
}
}
}
int main()
{
static int const n_words = 9;
static int const max_word_length = 80;
static int const buffer_size = max_word_length + 1; // For end byte.
char *words[n_words];
for( auto& p_word : words ) { p_word = new char[buffer_size]; }
ifstream inStream{ "sentence.txt" };
if( inStream.fail() ) { fail( "Input file opening failed." ); }
setlocale( LC_ALL, "" ); // Pedantically necessary for `isspace`.
for( auto const p_word : words )
{
readWordFrom( inStream, p_word, buffer_size );
if( inStream.fail() ) { fail( "Reading a word failed." ); }
}
for( auto const p_word : words ) { cout << p_word << "\n"; }
for( auto const p_word : words ) { delete[] p_word; }
}
You never allocate any memory for your char* pointers kept in the array.
The idiomatic way to write a c++ code would be:
#include <iostream>
#include <fstream>
#include <vector>
int main() {
std::vector<std::string> words(9);
std::ifstream inStream;
inStream.open("sentence.txt");
for ( int i = 0; inStream && i < 9; i++) {
inStream >> words[i];
}
}
The inStream.close() isn't necessary, and even wrong inside the loop. The std::istream will be closed automatically as soon the variable goes out of scope.
There are a few problems with your code.
char *words[9];
This allocates space for 9 pointers, not nine strings. Since you don't know how big the strings are you have two choices. You can either "guess" how much you'll need and limit the inputs accordingly, or you can use dynamic memory allocation (malloc or new) to create the space you need to store the strings. Dynamic memory would be my choice.
for ( int i = 0; i < 10; i++)
This loop will execute on words[0] through words[9]. However, there is no words[9] (that would be the tenth word) so you'll overwrite memory that you have not allocated
inStream >> words[i];
This will send your input stream to memory that you don't "own". You need to allocate space for the words to live before capturing them from the input stream. To do this correctly, you'll need to know how much space each word will need so you can allocate it.
you could try something like this:
int main()
{
char *words[9];
char tempInput[256]; // space to capture the input, up to a maximum size of 256 chars
ifstream inStream;
inStream.open("sentence.txt");
if (inStream.fail())
{
cout << "Input file opening failed.\n";
exit(1);
}
for ( int i = 0; i < 9; i++)
{
//Clear the input buffer
memset(tempInput, 0, 256);
//Capture the next word
inStream >> tempInput;
//allocate space to save the word
words[i] = new char(strlen(tempInput));
//Copy the word to its final location
strcpy(words[i], tempInput)
}
inStream.close();
return 0;
}

Counting occurrences of word in vector of characters

I have written a program to store a text file in vector of characters .
#include<iostream>
#include<fstream>
#include <algorithm>
#include<vector>
using namespace std;
int main()
{
vector<char> vec;
ifstream file("text.txt");
if(!file.eof() && !file.fail())
{
file.seekg(0, std::ios_base::end);
std::streampos fileSize = file.tellg();
vec.resize(fileSize);
file.seekg(0, std::ios_base::beg);
file.read(&vec[0], fileSize);
}
int c = count(vec.begin(), vec.end(), 'U');
cout << c;
return 0;
}
I want to count occurrence of "USER" in the text file , but using count i can only count number of characters . How can i count number of occurrences of "USER" in the vector of character?
For example
text.txt
USERABRUSER#$$* 34 USER ABC RR IERUSER
Then the count of "USER" is 4. Words can only be in uppercase.
std::string has a find member function that will find an occurrence of one string inside another. You can use that to count occurrences something like this:
size_t count(std::string const &haystack, std::string const &needle) {
auto occurrences = 0;
auto len = needle.size();
auto pos = 0;
while (std::string::npos != (pos = haystack.find(needle, pos))) {
++occurrences;
pos += len;
}
return occurrences;
}
For example:
int main() {
std::string input{ "USERABRUSER#$$* 34 USER ABC RR IERUSER" };
std::cout << count(input, "USER");
}
...produces an output of 4.
This is how I would do it:
#include <fstream>
#include <sstream>
#include <iostream>
#include <unordered_map>
#include <string>
using namespace std;
int main() {
unordered_map<string, size_t> data;
string line;
ifstream file("text.txt");
while (getline(file, line)) {
istringstream is(line);
string word;
while (is >> word) {
++data[word];
}
}
cout << data["USER"] << endl;
return 0;
}
Let's try again. Once again, a vector isn't necessary. This is what I would consider to be the most C++ idiomatic way. It uses std::string's find() method to repeatedly find the substring in order until the end of the string is reached.
#include <fstream>
#include <iostream>
#include <string>
int main() {
// Read entire file into a single string.
std::ifstream file_stream("text.txt");
std::string file_contents(std::istreambuf_iterator<char>(file_stream),
std::istreambuf_iterator<char>());
unsigned count = 0;
std::string substr = "USER";
for (size_t i = file_contents.find(substr); i != std::string::npos;
i = str.find(substr, i + substr.length())) {
++count;
}
}