string parsing for C++ - c++

I have a text file that has #'s in it...It looks something like this.
#Stuff
1
2
3
#MoreStuff
a
b
c
I am trying to use std::string::find() function to get the positions of the # and then go from there, but I'm not sure how to actually code this.
This is my attempt:
int pos1=0;
while(i<string.size()){
int next=string.find('#', pos1);
i++;}

Here's one i made a while ago... (in C)
int char_pos(char c, char *str) {
char *pch=strchr(str,c);
return (pch-str)+1;
}
Port it to C++ and there you go! ;)
If : Not Found Then returns Negative.
Else : Return 'Positive', Char's 1st found position (1st match)

It's hard to tell from your question what you mean by "position", but it looks like you are trying to do something like this:
#include <fstream>
#include <iostream>
int main()
{
std::ifstream incoming{"string-parsing-for-c.txt"};
std::string const hash{"#"};
std::string line;
for (auto line_number = 0U; std::getline(incoming, line); ++line_number)
{
auto const column = line.find(hash);
if (std::string::npos != column)
{
std::cout << hash << " found on line " << line_number
<< " in column " << column << ".\n";
}
}
}
...or possibly this:
#include <fstream>
#include <iostream>
int main()
{
std::ifstream incoming{"string-parsing-for-c.txt"};
char const hash{'#'};
char byte{};
for (auto offset = 0U; incoming.read(&byte, 1); ++offset)
{
if (hash == byte)
{
std::cout << hash << " found at offset " << offset << ".\n";
}
}
}

Related

c++ stack overflow due to recursive function how can I improve the data handling

I'm tackling a exercise which is designed to cause exactly this problem, of overloading the memory. Pretty much I'm loading various file sizes from 1,000 to 5 million lines of entries like this in a txt file (1 line = 1 entry):
SHFIv,aiSdG
PlgNB,bPHoP
ZHWJU,gfwgC
UAygL,Vqvhi
BlyzX,LLbCo
jbvrT,Utblj
...
pretty much every entry has 2 values separated by comma, in my code, I separate these values and try to find another matching value, there are always only 2 exactly matching values and each time 1 value is found the other one with which it is paired points to another pair, and so on until the final one gets found.
For example SHFIv,aiSdG would point to aiSdG,YDUVo.
I know my code is not very efficient, partly due to using recursion, but I could'nt figure out a better way to do the job, so any suggestions on how to possibly improve it to handle larger inputs would be greatly appriciated
#include <iostream>
#include <fstream>
#include <string>
#include <sstream>
#include <map>
#include <unordered_map>
#include <stdio.h>
#include <vector>
#include <iterator>
#include <utility>
#include <functional>
#include <algorithm>
using namespace std;
template<typename T>
void search_bricks_backwards(string resume, vector<T>& vec, vector<string>& vec2) {
int index = 0;
for (const auto& pair : vec) {
//cout << "iteration " << index << endl;
if (pair.second == resume) {
vec2.insert(vec2.begin(), resume);
cout << "found " << resume << " and " << pair.second << endl;
search_bricks_backwards(pair.first, vec, vec2);
}
if (index + 1 == vec.size()) {
cout << "end of backward search, exitting..." << endl;
}
index++;
}
}
template<typename T>
void search_bricks(string start, vector<T>& vec, vector<string>& vec2) {
int index = 0;
for (const auto& pair : vec) {
//cout << "iteration " << index << endl;
if (pair.first == start) {
vec2.push_back(start);
cout << "found " << start << " and " << pair.first << endl;
search_bricks(pair.second, vec, vec2);
}
if (index + 1 == vec.size()) {
//search_bricks_backwards(start, vec, vec2);
// this also gets called on every recursion rather than just once
// as I originally intended when the forward iteration gets finished
}
index++;
}
}
template<typename T> // printing function
void printVectorElements(vector<T>& vec)
{
for (auto i = 0; i < vec.size(); ++i) {
cout << "(" << vec.at(i).first << ","
<< vec.at(i).second << ")" << endl ;
}
cout << endl;
}
vector<string> split(string s, string delimiter) { // filtering function
size_t pos_start = 0, pos_end, delim_len = delimiter.length();
string token;
vector<string> res;
while ((pos_end = s.find(delimiter, pos_start)) != string::npos) {
token = s.substr(pos_start, pos_end - pos_start);
pos_start = pos_end + delim_len;
res.push_back(token);
}
res.push_back(s.substr(pos_start));
return res;
}
int main()
{
vector<pair<string, string>> bricks;
vector<string> sorted_bricks;
ifstream inFile;
inFile.open("input-pairs-5K.txt"); // transferring data from .txt to a string
stringstream strStream;
strStream << inFile.rdbuf();
string str = strStream.str();
istringstream iss(str);
for (string line; getline(iss, line); )
// filtering data from string and dividing on ","
{
string delimiter = ",";
string s = line;
vector<string> v = split(s, delimiter);
string s1 = v.at(0);
string s2 = v.at(1);
bricks.push_back(make_pair(s1, s2));
}
search_bricks(bricks[0].second, bricks, sorted_bricks);
//printVectorElements(bricks);
//for (auto i = sorted_bricks.begin(); i != sorted_bricks.end(); ++i)
//cout << *i << " "; // this is just to check if vectors have data
}
Here is link to the 1k test data that works for me (only for the search bricks without backwards searching since it triggers on every recursion) again thanks for any suggestions on how to improve or get rid of the recursion. I don't code in c++ often and don't really know how else to tackle this.
Although implementing non-recursive version of your algorithm is canonical solution, if you really need to solve the problem without code modification, you can increase the stack size by modifying compiler option. ~100Mb will be usually sufficient.
In MSVC : /STACK:commit 104857600
In gcc : --stack, 104857600

How do I remove repeated words from a string and only show it once with their wordcount

Basically, I have to show each word with their count but repeated words show up again in my program.
How do I remove them by using loops or should I use 2d arrays to store both the word and count?
#include <iostream>
#include <stdio.h>
#include <iomanip>
#include <cstring>
#include <conio.h>
#include <time.h>
using namespace std;
char* getstring();
void xyz(char*);
void tokenizing(char*);
int main()
{
char* pa = getstring();
xyz(pa);
tokenizing(pa);
_getch();
}
char* getstring()
{
static char pa[100];
cout << "Enter a paragraph: " << endl;
cin.getline(pa, 1000, '#');
return pa;
}
void xyz(char* pa)
{
cout << pa << endl;
}
void tokenizing(char* pa)
{
char sepa[] = " ,.\n\t";
char* token;
char* nexttoken;
int size = strlen(pa);
token = strtok_s(pa, sepa, &nexttoken);
while (token != NULL) {
int wordcount = 0;
if (token != NULL) {
int sizex = strlen(token);
//char** fin;
int j;
for (int i = 0; i <= size; i++) {
for (j = 0; j < sizex; j++) {
if (pa[i + j] != token[j]) {
break;
}
}
if (j == sizex) {
wordcount++;
}
}
//for (int w = 0; w < size; w++)
//fin[w] = token;
//cout << fin[w];
cout << token;
cout << " " << wordcount << "\n";
}
token = strtok_s(NULL, sepa, &nexttoken);
}
}
This is the output I get:
I want to show, for example, the word "i" once with its count of 5, and then not show it again.
First of all, since you are using c++, I would recommend you to split text in c++ way(some examples are here), and store every word in map or unordered_map. Example of my realization you can find here
But if you don't want to rewrite your code, you can simply add a variable that will indicate whether a copy of the word was found before or after the word position. If a copy was not found in front, then print your word
This post gives an example to save each word from your 'strtok' function into a vector of string. Then, use string.compare to have each word compared with word[0]. Those indexes match with word[0] are marked in an int array 'used'. The count of match equals to the number marks in the array used ('nused'). Those words of marked are then removed from the vector, and the remaining carries on to the next comparing process. The program ends when no word remained.
You may write a word comparing function to replace 'str.compare(str2)', if you prefer not to use std::vector and std::string.
#include <iostream>
#include <string>
#include <vector>
#include<iomanip>
#include<cstring>
using namespace std;
char* getstring();
void xyz(char*);
void tokenizing(char*);
int main()
{
char* pa = getstring();
xyz(pa);
tokenizing(pa);
}
char* getstring()
{
static char pa[100] = "this is a test and is a test and is test.";
return pa;
}
void xyz(char* pa)
{
cout << pa << endl;
}
void tokenizing(char* pa)
{
char sepa[] = " ,.\n\t";
char* token;
char* nexttoken;
std::vector<std::string> word;
int used[64];
std::string tok;
int nword = 0, nsize, nused;
int size = strlen(pa);
token = strtok_s(pa, sepa, &nexttoken);
while (token)
{
word.push_back(token);
++nword;
token = strtok_s(NULL, sepa, &nexttoken);
}
for (int i = 0; i<nword; i++) std::cout << word[i] << std::endl;
std::cout << "total " << nword << " words.\n" << std::endl;
nsize = nword;
while (nsize > 0)
{
nused = 0;
tok = word[0] ;
used[nused++] = 0;
for (int i=1; i<nsize; i++)
{
if ( tok.compare(word[i]) == 0 )
{
used[nused++] = i; }
}
std::cout << tok << " : " << nused << std::endl;
for (int i=nused-1; i>=0; --i)
{
for (int j=used[i]; j<(nsize+i-nused); j++) word[j] = word[j+1];
}
nsize -= nused;
}
}
Notice that the removal of used words has to do in backward order. If you do it in sequential order, the marked indexes in the 'used' array will need to be changed. A running test:
$ ./a.out
this is a test and is a test and is test.
this
is
a
test
and
is
a
test
and
is
test
total 11 words.
this : 1
is : 3
a : 2
test : 3
and : 2
I read your last comment.
But I am very sorry, I do not know C. So, I will answer in C++.
But anyway, I will answer with the C++ standard approach. That is usually only 10 lines of code . . .
#include <iostream>
#include <algorithm>
#include <map>
#include <string>
#include <regex>
// Regex Helpers
// Regex to find a word
static const std::regex reWord{ R"(\w+)" };
// Result of search for one word in the string
static std::smatch smWord;
int main() {
std::cout << "\nPlease enter text: \n";
if (std::string line; std::getline(std::cin, line)) {
// Words and its appearance count
std::map<std::string, int> words{};
// Count the words
for (std::string s{ line }; std::regex_search(s, smWord, reWord); s = smWord.suffix())
words[smWord[0]]++;
// Show result
for (const auto& [word, count] : words) std::cout << word << "\t\t--> " << count << '\n';
}
return 0;
}

How to compare two text files and find the similarities between then?

i have loaded both of my files into an array and im trying to compare both of the files to get the comparisons inside the file. However when I run my code I don't receive an output.
This is the contents of both files.
file1
tdogicatzhpigu
file2
dog
pig
cat
rat
fox
cow
So when it does a comparison between the words from search1.txt and the words from text1.txt. I want to find the occurence of each word from search1.txt in text1.txt
What I want to eventually output is whether it has been found the index of the location inside the array.
e.g
"dog". Found, location 1.
Here is my code
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
int main()
{
ifstream file1("text1.txt");
if (file1.is_open())
{
string myArray[1];
for (int i = 0; i < 1; i++)
{
file1 >> myArray[i];
any further help would be greatly appreciated. Thanks in advance.
I believe the goal is to search the text in file1 for each word in file2.
You can't use equality for the two strings, as they aren't equal. You'll need to use the std::string::find method:
std::string target_string;
std::getline(file1, target_string);
std::string keyword;
while (getline(file2, keyword))
{
const std::string::size_type position = target_string.find(keyword);
std::cout << "string " << keyword << " ";
if (position == std::string::npos)
{
std::cout << "not found.\n";
}
else
{
std::cout << "found at position " << position << "\n";
}
}
Edit 1:
An implemented example:
#include <iostream>
#include <string>
using std::cout;
using std::string;
using std::endl;
int main()
{
const std::string target_string = "tdogicatzhpigu";
const std::string key_list[] =
{
"dog",
"pig",
"cat",
"rat",
"fox",
"cow",
};
static const unsigned int key_quantity =
sizeof(key_list) / sizeof(key_list[0]);
for (unsigned int i = 0; i < key_quantity; ++i)
{
const std::string::size_type position = target_string.find(key_list[i]);
std::cout << "string " << key_list[i] << " ";
if (position == std::string::npos)
{
std::cout << "not found.\n";
}
else
{
std::cout << "found at position " << position << "\n";
}
}
return 0;
}

C++ reading sentences

string a = MwZwXxZwDwJrBxHrHxMrGrJrGwHxMrFrZrZrDrKwZxLrZrFwZxErMrXxArZw;
Assume i have this data in my string . I want to record how many M , Z , X , D , J (including those capital letters i didn't mentions ) in in string how can do it ? My friends say use vector can do it but i does not really know how to use vector is there any alternative way to do it .
I tried using for loops to do and find the M , and reset the pointer to 0 to continue find the next capital value , but not sure is there any easier way to do it .
first I'll show you a 'easier' way to me.
#include <iostream>
#include <map>
using namespace std;
int main(int argc, const char * argv[]) {
string str = "MwZwXxZwDwJrBxHrHxMrGrJrGwHxMrFrZrZrDrKwZxLrZrFwZxErMrXxArZw";
map<char,int> map;
for (int i=0; i<str.length(); i++) {
char ch = str[i];
if (isupper(ch)) {
map[ch] ++;
}
}
for (auto item : map) {
cout<<item.first<<':'<<item.second<<endl;
}
return 0;
}
you'll only need to use 1 loop to solve your problem.
the 'isupper(int _c)' is a function from the standard library, it can tell you wether a character is a capital letter.
the 'map' is a data structure from the standard library too, it can do key-value storage for you.
this program outputs this:
A:1
B:1
D:2
E:1
F:2
G:2
H:3
J:2
K:1
L:1
M:4
X:2
Z:8
is this what you want?
Use regex.
using namespace std;
// regex_search example
#include <iostream>
#include <string>
#include <regex>
int main ()
{
std::string s ("MwZwXxZwDwJrBxHrHxMrGrJrGwHxMrFrZrZrDrKwZxLrZrFwZxErMrXxArZw;");
std::smatch m;
std::regex e ("[A-Z\s]+");
map<string,int> map;
std::cout << "Target sequence: " << s << std::endl;
std::cout << "Regular expression: [A-Z\s]+" << std::endl;
std::cout << "The following matches and submatches were found:" << std::endl;
while (std::regex_search (s,m,e)) {
for (auto x:m)
{
//cout << x << " ";
map[x.str()] ++;
}
//cout << std::endl;
s = m.suffix().str();
}
for (auto item : map) {
cout<<item.first<<':'<<item.second<<endl;
}
return 0;
}
The most direct translation of "loop through the string and count the uppercase letters" into C++ I can think of:
#include <iostream>
#include <map>
#include <cctype>
int main()
{
string a = "MwZwXxZwDwJrBxHrHxMrGrJrGwHxMrFrZrZrDrKwZxLrZrFwZxErMrXxArZw";
std::map<char, int> count;
// Loop through the string...
for (auto c: a)
{
// ... and count the uppercase letters.
if (std::isupper(c))
{
count[c] += 1;
}
}
// Show the result.
for (auto it: count)
{
std::cout << it.first << ": " << it.second << std::endl;
}
}

Read and print a csv file with more than 2 column in c++ using multimap

I'm a beginner in c++ and required to write a c++ program to read and print a csv file like this.
DateTime,value1,value2
12/07/16 13:00,3.60,50000
14/07/16 20:00,4.55,3000
May I know how can I proceed with the programming?
I manage to get the date only via a simple multimap code.
I spent some time to make almost (read notice at the end) exact solution for you.
I assume that your program is a console application that receives the original csv-file name as a command line argument.
So see the following code and make required changes if you like:
#include <iostream>
#include <fstream>
#include <sstream>
#include <vector>
#include <map>
#include <string>
std::vector<std::string> getLineFromCSV(std::istream& str, std::map<int, int>& widthMap)
{
std::vector<std::string> result;
std::string line;
std::getline(str, line);
std::stringstream lineStream(line);
std::string cell;
int cellCnt = 0;
while (std::getline(lineStream, cell, ','))
{
result.push_back(cell);
int width = cell.length();
if (width > widthMap[cellCnt])
widthMap[cellCnt] = width;
cellCnt++;
}
return result;
}
int main(int argc, char * argv[])
{
std::vector<std::vector<std::string>> result; // table with data
std::map<int, int> columnWidths; // map to store maximum length (value) of a string in the column (key)
std::ifstream inpfile;
// check file name in the argv[1]
if (argc > 1)
{
inpfile.open(argv[1]);
if (!inpfile.is_open())
{
std::cout << "File " << argv[1] << " cannot be read!" << std::endl;
return 1;
}
}
else
{
std::cout << "Run progran as: " << argv[0] << " input_file.csv" << std::endl;
return 2;
}
// read from file stream line by line
while (inpfile.good())
{
result.push_back(getLineFromCSV(inpfile, columnWidths));
}
// close the file
inpfile.close();
// output the results
std::cout << "Content of the file:" << std::endl;
for (std::vector<std::vector<std::string>>::iterator i = result.begin(); i != result.end(); i++)
{
int rawLen = i->size();
for (int j = 0; j < rawLen; j++)
{
std::cout.width(columnWidths[j]);
std::cout << (*i)[j] << " | ";
}
std::cout << std::endl;
}
return 0;
}
NOTE: Your task is just to replace a vector of vectors (type std::vector<std::vector<std::string>> that are used for result) to a multimap (I hope you understand what should be a key in your solution)
Of course, there are lots of possible solutions for that task (if you open this question and look through the answers you will understand this).
First of all, I propose to consider the following example and to try make your task in the simplest way:
#include <iostream>
#include <sstream>
#include <vector>
#include <string>
using namespace std;
int main()
{
string str = "12/07/16 13:00,3.60,50000";
stringstream ss(str);
vector<string> singleRow;
char ch;
string s = "";
while (ss >> ch)
{
s += ch;
if (ss.peek() == ',' || ss.peek() == EOF )
{
ss.ignore();
singleRow.push_back(s);
s.clear();
}
}
for (vector<string>::iterator i = singleRow.begin(); i != singleRow.end(); i++)
cout << *i << endl;
return 0;
}
I think it can be useful for you.