Okay, so I'm having a bit of a problem here. The thing is this code works on a friend's computer but I'm getting segmentation faults when I try to run it.
I am reading a file looking like so:
word 2 wor ord
anotherword 7 ano oth the her erw wor ord
...
And I want to parse every word of the file. The first two words (e.g. word and 2) are to be erased but saving the first one in another variable in the process.
I've looked around a bit on accomplishing this, and I've come up with this half-assed piece of code that seems to work on my friends' computer but not mine.
Dictionary::Dictionary() {
ifstream ip;
ip.open("words.txt", ifstream::in);
string input;
string buf;
vector<string> tokens; // Holds words
while(getline(ip, input)){
if(input != " ") {
stringstream ss(input);
while(ss >> buf) {
tokens.push_back(buf);
}
string werd = tokens.at(0);
tokens.erase(tokens.begin()); // Remove the word from the vector
tokens.erase(tokens.begin()); // Remove the number indicating trigrams
Word curr(werd, tokens);
words[werd.length()].push_back(curr); // Put the word at the vector with word length i.
tokens.clear();
}
}
ip.close();
}
What's the best of of parsing this kind of structure in a file and removing the first two elements but saving the others? As you can see, I'm making a Word object that contains a string and a vector for later use.
Regards
EDIT; It seems to add the first line fine, but on removal of the second element, it crashes with a segmentation fault error.
EDIT; words.txt contain this:
addict 4 add ddi dic ict
sinister 6 ini ist nis sin ste ter
test 2 est tes
cplusplus 7 cpl lus lus plu plu spl usp
Without leading blank spaces or ending blanks. Not that it reads all the way anyway.
Word.cc:
#include <string>
#include <vector>
#include <algorithm>
#include "word.h"
using namespace std;
Word::Word(const string& w, const vector<string>& t) : word(w), trigrams(t) {}
string Word::get_word() const {
return word;
}
unsigned int Word::get_matches(const vector<string>& t) const {
vector<string> sharedTrigrams;
set_intersection(t.begin(),t.end(), trigrams.begin(), trigrams.end(), back_inserter(sharedTrigrams));
return sharedTrigrams.size();
}
First of all, there is error in the number of closing }s in your posted code. If you indent them properly, you will see that your code is:
while(getline(ip, input))
{
if(input != " ")
{
stringstream ss(input);
while(ss >> buf) {
tokens.push_back(buf);
}
}
string werd = tokens.at(0);
tokens.erase(tokens.begin());
tokens.erase(tokens.begin());
Word curr(werd, tokens);
words[werd.length()].push_back(curr);
tokens.clear();
}
}
Assuming that is a small typo in posting, the other problem is that tokens is an empty list when input == " " yet you continue to use tokens as though it has 2 or more items in it.
You can fix that by moving everything inside the if statement.
while(getline(ip, input))
{
if(input != " ")
{
stringstream ss(input);
while(ss >> buf) {
tokens.push_back(buf);
}
string werd = tokens.at(0);
tokens.erase(tokens.begin());
tokens.erase(tokens.begin());
Word curr(werd, tokens);
words[werd.length()].push_back(curr);
tokens.clear();
}
}
I would add further checks to make it more robust.
while(getline(ip, input))
{
if(input != " ")
{
stringstream ss(input);
while(ss >> buf) {
tokens.push_back(buf);
}
string werd;
if ( !tokens.empty() )
{
werd = tokens.at(0);
tokens.erase(tokens.begin());
}
if ( !tokens.empty() )
{
tokens.erase(tokens.begin());
}
Word curr(werd, tokens);
words[werd.length()].push_back(curr);
tokens.clear();
}
}
You forgot to include the initialization of the variable "words" in your code. Just looking at it, I am guessing you are initializing "words" to be a fixed-length array of vectors, but then read a word that is off the end of the array. Bang, you're dead. Add a check to "werd.length()" to ensure it is strictly less than the length of "words".
ifstream ip;
ip.open("words.txt", ifstream::in);
string input;
while(getline(ip, input)){
istringstream iss(input);
string str;
unsigned int count = 0;
if(iss >> str >> count) {
vector<string> tokens { istream_iterator<string>(iss), istream_iterator<string>() }; // Holds words
if(tokens.size() == count)
words[str.length()].emplace_back(str, tokens);
}
}
ip.close();
This is what I used to make it work.
Related
I'm doing the assignment and I'm at the end of my powers. Right now I can't figure out what's missing or what I could change.
I need the program to read me a file. If it finds the beginning of the search word, it lists the word and its meaning. If he finds it more than once, he writes only that word without meaning.
Right now, if the program finds more words, it writes the meaning for the first word and writes the word for the other words found.
I don't know what other cycle I could use. If you could help me, I would be grateful.
#include <iostream>
#include <fstream>
#include <sstream>
#include <string>
#include<bits/stdc++.h>
using namespace std;
int main()
{
ifstream dictionary("dictionary.txt");
if(!dictionary.is_open()){
cout<< "File failed to open" << endl;
return 0;
}
int option;
cout << "1.<starting>" << endl;
cout << "4.<stop>" << endl;
cin >> option;
string find_word;
string word, meaning;
string line;
string found;
int count = 0;
if (option == 1)
{
cout << "Find the meaning of the word beginning with the characters:";
cin >> find_word;
while (getline(dictionary,line))
{
stringstream ss(line);
getline (ss, word, ';');
getline (ss, meaning, ';');
if (word.rfind(find_word, 0) != string::npos)
{
count++;
if (count <=1)
{
found = word + meaning;
cout << found << endl;
}
if (count >= 2)
{
found = word ;
cout << found << endl;
}
}
}
}
if (option == 4)
{
return 0;
}
dictionary.close();
return 0;
}
EDIT
dictionary.txt looks like this:
attention; attentionmeaning
attention; attentionmeaning2
computer; computermeaning
criminal; criminalmeaning
boat; boatmeaning
alien; alienmeaning
atter; meaning
.
.
etc.
For example input is:
Find the meaning of the word beginning with the characters: att
this is what i get now (output):
attention attentionmeaning
attention
atter
this is what i expect (desire output):
attention
attention
atter
if program find only one searching word it should write this:
Find the meaning of the word beginning with the characters: bo
output:
boat boatmeaning
As it was already suggested, while reading the file, you don't know if there will be more than one entries matching your search term. That being said, you need some intermediate structure to store all the matching entries.
After you have gathered all the results, you can easily check if the data contains more than one result, in which case you only print the "word" without the meaning. In case there is only one result, you can print the "word" together with its meaning.
The code for that could look something like this:
struct Entry {
std::string name;
std::string meaning;
bool startsWith(const std::string& str) {
return name.find(str) != std::string::npos;
}
};
Entry createEntry(const std::string& line) {
Entry entry;
std::stringstream ss(line);
std::getline(ss, entry.name, ';');
std::getline(ss, entry.meaning, ';');
return entry;
}
int main() {
std::string query = "att";
std::ifstream dictionary("dictionary.txt");
std::vector<Entry> entries;
std::string line;
while (std::getline(dictionary, line)) {
Entry entry = createEntry(line);
if (entry.startsWith(query)) {
entries.emplace_back(std::move(entry));
}
}
for (const Entry& entry : entries) {
std::cout << entry.name << (entries.size() > 1 ? "\n" : " " + entry.meaning + '\n');
}
}
This code could definitely be more optimized, but for the sake of simplicity, this should suffice.
Demo
The problem is that at the first time through the loop you do not know if there is one or more valid words that follow from your string. I would suggest you create an empty list outside the loop, and push all the word and meaning pairs that match onto the list. Then after if the size of the list is 1 you can output the word and meaning pair else use a for loop to loop through and just print the words.
I am writing a program that reads data from a file.
the .txt file looks like this:
Text, numbers, : 5,3,5
Text, numbers, : 1,3,7,8
I was successful in extracting the "Text" and "Numbers", However when I came across the numbers after the : "5,3,5", I was really stuck.
I need to change these numbers to ints and add them to a vector of int, so I got rid of the commas, then tried using stoi to convert them to ints, however, stoi was only "extracting" the first number, in this case, only 5 and 1, this is what I've tried:
while(getline(file, line)){
stringstream ss(line);
getline(ss, text, ',');
getline (ss, nums, ':');
getline (ss, numList, ',' );
replace(numList.begin(), numList.end(), ',' , ' ');
vec.push_back(stoi(numList));
randomStruct str = {text, nums, numList};
randomStructVec.push_back(str);
}
I need the output to look like this when printing the elements of the vector:
5 3 5
1 3 7 8
what I'm getting is :
5
1
and other times I get duplicate numbers as well:
5
1111
or
5555
11
I need a way to make the stoi function convert all the numbers on that one line of string to ints, and store them in a vec of ints.
Any help, would be greatly appreciated.
Check out my solution at How do I tokenize a string that uses the String Toolkit Library
Here is a paired down version for your case:
#include <iostream>
#include <vector>
#include <string>
#include <strtk.hpp> //String Toolkit Library
const char *whitespace = " \t\r\n\f";
const char *whitespace_and_punctuation = " \t\r\n\f;,=";
int main()
{
// parsing a string into a vector of integers with separators
// of spaces and punctuation
std::string s("3; 4, 4, 8,-1");
std::vector<int> values;
if( strtk::parse( s, whitespace_and_punctuation, values ) )
{
for(size_t i = 0; i < values.size(); ++i )
std::cout << values[i] << std::endl;
}
return 0;
}
You will notice the conversion of the values into a vector of integers.
The library is header only. The library is VERY fast and can handle most anything you need to do with a string and parsing.
There is too main problem in your code.
First of all getline (ss, numList, ',' ); will stop on the first value of the list. In fact, when your list is 5,3,5, getline (ss, numList, ','); will read 5 then , so it will stop. At this point, numValue == "5"
This is quite simple to fix : Just remove the delimiter char, so getline(ss, numList);. Using this, numValue == "5,3,5"
Alright, now you have all your value. You replace ',' by ' ' in order to separate your numbers. Good, numList = "5 3 5".
And then is your second error : vec.push_back(stoi(numList));. stoi(numList) return an int and is not able to get through space characters. So it will only convert the first 5 and return it. You will never get the other numbers, as you don't even do a loop.
Here is my solution : convert your string to a stringstream and use >> operator
std::stringstream numStream(numList);
int value;
while(numList >> value)
vec.push_back(value);
So we end up with your final code (I removed stds, as it seems that you wrote using namespace std somewhere in your code)
struct randomStruct
{
string txt,
string nb,
vector<int> data
}
// -------
while(getline(file, line)){
stringstream ss(line);
getline(ss, text, ',');
getline (ss, nums, ':');
getline (ss, numList);
replace(numList.begin(), numList.end(), ',' , ' ');
stringstream numStream(numList);
int value;
while(numStream >> value)
vec.push_back(value);
randomStruct str = {text, nums, vec};
randomStructVec.push_back(str);
vec.clear();
}
// Accessing and printing data
for (auto str : randomStructVec)
{
for (auto i : str.data)
{
cout << i << " ";
}
cout << endl;
}
Use this function stoi_() in this function, I am converting a string into a number if that string contains the characters in range 0-9 otherwise create a new string and repeat this process until you reach the end of string. To handle negative numbers you have to add one more condition.
vector<int> stoi_(string s){
vector<int>ans;
int i = 0;
int n =s.length();
while(i<n){
string temp = ""; // current number
if(s[i]=='-' && (i+1<n && (s[i+1]>='0' && s[i+1]<='9'))){ // handle -ve numbers
temp = temp + s[i];
i++;
}
while(i<n && s[i]>='0' && s[i]<='9'){ // if current character is number append it into cur number
temp = temp + s[i];
i++;
}
if(temp.length()>0){
ans.push_back(stoi(temp)); // here using stoi() for simplicity
}
else{
i++;
}
}
return ans;
}
I am trying to read through a text file that can possibly look like below.
HI bye
goodbye
foo bar
boy girl
one two three
I am trying to take the lines with only two words and store them in a map, the first word would be the key and second word would be the value.
below is the code I came up with but I can't figure out how to ignore the lines that do not have two words on them.
this only works properly if every line has two words. I understand why this is only working if every line has two words but, I'm not sure what condition I can add to prevent this.
pair myPair;
map myMap;
while(getline(file2, line, '\0'))
{
stringstream ss(line);
string word;
while(!ss.eof())
{
ss >> word;
myPair.first = word;
ss >> word;
myPair.second = word;
myMap.insert(myPair);
}
}
map<string, string>::iterator it=myMap.begin();
for(it=myMap.begin(); it != myMap.end(); it++)
{
cout<<it->first<<" "<<it->second<<endl;
}
Read two words into a temporary pair. If you can't, do not add the pair to the map. If you can read two words, see if you can read a third word. If you can, you have too many words on the line. Do not add.
Example:
while(getline(file2, line, '\0'))
{
stringstream ss(line);
pair<string,string> myPair;
string junk;
if (ss >> myPair.first >> myPair.second && !(ss >> junk))
{ // successfully read into pair, but not into a third junk variable
myMap.insert(myPair);
}
}
let me suggest a little different implementation
std::string line;
while (std::getline(infile, line)) {
// Vector of string to save tokens
vector <string> tokens;
// stringstream class check1
stringstream check1(line);
string intermediate;
// Tokenizing w.r.t. space ' '
while(getline(check1, intermediate, ' ')) {
tokens.push_back(intermediate);
}
if (tokens.size() == 2) {
// your condition of 2 words in a line apply
// process 1. and 2. item of vector here
}
}
You can use fscanf for take input from file and sscanf for take input from string with format. sscanf return how many input successfully take with given format. so you can easily check, how many word have a line.
#include<stdio.h>
#include<stdlib.h>
#include <iostream>
using namespace std;
int main()
{
char line[100];
FILE *fp = fopen("inp.txt", "r");
while(fscanf(fp, " %[^\n]s", line) == 1)
{
cout<<line<<endl;
char s1[100], s2[100];
int take = sscanf(line, "%s %s", s1, s2);
cout<<take<<endl;
}
return 0;
}
Description of the program : The program must read in a variable amount of words until a sentinel value is specified ("#" in this case). It stores the words in a vector array.
Problem : I use a getline to read in the string and parse the string with a stringstream. My problem is that the stringstream is not swallowing the new line character at the end of each line and is instead extracting it.
Some solutions I have thought of is to cut off the last character by creating a subset or checking if the next extracted word is a new line character, but I feel there is a better cost efficient solution such as changing the conditions for my loops.
I have included a minimized version of the overall code that reproduces the problem.
#include <iostream>
#include <sstream>
#include <string>
using namespace std;
int main()
{
const int MAX_LIST_SIZE = 1000;
string str;
string list[MAX_LIST_SIZE];
int numWords = 0;
// program starts here
getline(cin, str); // read innput
stringstream parse(str); // use stringstream to parse input
while(str != "#") // read in until sentinel value
{
while(!parse.fail()) // until all words are extracted from the line
{
parse >> list[numWords]; // store words
numWords++;
}
getline(cin,str); // get next line
parse.clear();
parse.str(str);
}
// print number of words
cout << "Number of words : " << numWords << endl;
}
And a set of test input data that will produce the problem
Input:
apples oranges mangos
bananas
pineapples strawberries
Output:
Number of words : 9
Expected Output:
Number of words : 6
I would appreciate any suggestions on how to deal with this problem in an efficient manner.
Your logic for parsing out the stream isn't quite correct. fail() only becomes true after a >> operation fails, so you'll doing an extra increment each time. For example:
while(!parse.fail())
{
parse >> list[numWords]; // fails
numWords++; // increment numWords anyway
} // THEN check !fail(), but we incremented already!
All of these operations have returns that you should check as you go to avoid this problem:
while (getline(cin, str)) { // fails if no more lines in cin
if (str != "#") { // doesn't need to be a while
stringstream parse(str);
while (parse >> list[numWords]) { // fails if no more words
++numWords; // *only* increment if we got one!
}
}
}
Even better would be to not use an array at all for the list of words:
std::vector<std::string> words;
Which can be used in the inner loop:
std::string temp;
while (parse >> temp) {
words.push_back(temp);
}
The increment on numwords happens one more time than you intend at the end of each line. Use a std::vector< std::string > for your list. Then you can use list.size().
I am trying to split a string using spaces as a delimiter. I would like to store each token in an array or vector.
I have tried.
string tempInput;
cin >> tempInput;
string input[5];
stringstream ss(tempInput); // Insert the string into a stream
int i=0;
while (ss >> tempInput){
input[i] = tempInput;
i++;
}
The problem is that if i input "this is a test", the array only seems to store input[0] = "this". It does not contain values for input[2] through input[4].
I have also tried using a vector but with the same result.
Go to the duplicate questions to learn how to split a string into words, but your method is actually correct. The actual problem lies in how you are reading the input before trying to split it:
string tempInput;
cin >> tempInput; // !!!
When you use the cin >> tempInput, you are only getting the first word from the input, not the whole text. There are two possible ways of working your way out of that, the simplest of which is forgetting about the stringstream and directly iterating on input:
std::string tempInput;
std::vector< std::string > tokens;
while ( std::cin >> tempInput ) {
tokens.push_back( tempInput );
}
// alternatively, including algorithm and iterator headers:
std::vector< std::string > tokens;
std::copy( std::istream_iterator<std::string>( std::cin ),
std::istream_iterator<std::string>(),
std::back_inserter(tokens) );
This approach will give you all the tokens in the input in a single vector. If you need to work with each line separatedly then you should use getline from the <string> header instead of the cin >> tempInput:
std::string tempInput;
while ( getline( std::cin, tempInput ) ) { // read line
// tokenize the line, possibly with your own code or
// any answer in the 'duplicate' question
}
Notice that it’s much easier just to use copy:
vector<string> tokens;
copy(istream_iterator<string>(cin),
istream_iterator<string>(),
back_inserter(tokens));
As for why your code doesn’t work: you’re reusing tempInput. Don’t do that. Furthermore, you’re first reading a single word from cin, not the whole string. That’s why only a single word is put into the stringstream.
The easiest way: Boost.Tokenizer
std::vector<std::string> tokens;
std::string s = "This is, a test";
boost::tokenizer<> tok(s);
for(boost::tokenizer<>::iterator it=tok.begin(); it != tok.end(); ++it)
{
tokens.push_back(*it);
}
// tokens is ["This", "is", "a", "test"]
You can parameter the delimiters and escape sequences to only take spaces if you wish, by default it tokenize on both spaces and punctuation.
Here a little algorithm where it splits the string into a list just like python does.
std::list<std::string> split(std::string text, std::string split_word) {
std::list<std::string> list;
std::string word = "";
int is_word_over = 0;
for (int i = 0; i <= text.length(); i++) {
if (i <= text.length() - split_word.length()) {
if (text.substr(i, split_word.length()) == split_word) {
list.insert(list.end(), word);
word = "";
is_word_over = 1;
}
//now we want that it jumps the rest of the split character
else if (is_word_over >= 1) {
if (is_word_over != split_word.length()) {
is_word_over += 1;
continue;
}
else {
word += text[i];
is_word_over = 0;
}
}
else {
word += text[i];
}
}
else {
word += text[i];
}
}
list.insert(list.end(), word);
return list;
}
There probably exists a more optimal way to write this.