How to compare two text files and find the similarities between then?

How to compare two text files and find the similarities between then? - c++

i have loaded both of my files into an array and im trying to compare both of the files to get the comparisons inside the file. However when I run my code I don't receive an output.
This is the contents of both files.
file1
tdogicatzhpigu
file2
dog
pig
cat
rat
fox
cow
So when it does a comparison between the words from search1.txt and the words from text1.txt. I want to find the occurence of each word from search1.txt in text1.txt
What I want to eventually output is whether it has been found the index of the location inside the array.
e.g
"dog". Found, location 1.
Here is my code
#include <iostream>
#include <fstream>
#include <string>
using namespace std;
int main()
{
ifstream file1("text1.txt");
if (file1.is_open())
{
string myArray[1];
for (int i = 0; i < 1; i++)
{
file1 >> myArray[i];
any further help would be greatly appreciated. Thanks in advance.

I believe the goal is to search the text in file1 for each word in file2.
You can't use equality for the two strings, as they aren't equal. You'll need to use the std::string::find method:
std::string target_string;
std::getline(file1, target_string);
std::string keyword;
while (getline(file2, keyword))
{
const std::string::size_type position = target_string.find(keyword);
std::cout << "string " << keyword << " ";
if (position == std::string::npos)
{
std::cout << "not found.\n";
}
else
{
std::cout << "found at position " << position << "\n";
}
}
Edit 1:
An implemented example:
#include <iostream>
#include <string>
using std::cout;
using std::string;
using std::endl;
int main()
{
const std::string target_string = "tdogicatzhpigu";
const std::string key_list[] =
{
"dog",
"pig",
"cat",
"rat",
"fox",
"cow",
};
static const unsigned int key_quantity =
sizeof(key_list) / sizeof(key_list[0]);
for (unsigned int i = 0; i < key_quantity; ++i)
{
const std::string::size_type position = target_string.find(key_list[i]);
std::cout << "string " << key_list[i] << " ";
if (position == std::string::npos)
{
std::cout << "not found.\n";
}
else
{
std::cout << "found at position " << position << "\n";
}
}
return 0;
}

Related

How can I find the positions from characters in a string with string::find?

I need the positions of characters in a string.
The String contains:
"username":"secret", "password":"also secret", "id":"secret too", "token":"secret"
and I need the positions of the quotation marks from the token that are bold: "token":"secret".
I have experimented with the code from http://www.cplusplus.com/reference/string/string/find
but everything didn't work. Can anyone help me?
Here is what i have tried but it only gives out a 0:
#include <iostream>
#include <string>
int main() {
std::string buffer("\"username\":\"secret\", \"password\":\"also secret\", \"id\":\"secret too\", \"token\":\"secret\"");
size_t found = buffer.find('"');
if (found == std::string::npos)std::cout << "something went wrong\n";
if (found != std::string::npos)
std::cout << "first " << '"' << " found at: " << found << '\n';
for (int j = 0; j <= 17; ++j) {
found = buffer.find('"');
found + 1, 6;
if (found != std::string::npos)
std::cout << "second " << '"' << " found at : " << found << '\n';
}
return 0;

There are so many possible solutions. So, it is hard to answer.
What basically needs to be done, is to iterate through the string, position by position, then check if the character is the searched one, and then do something with the result.
A first simple implementation could be:
#include <iostream>
#include <string>
const std::string buffer("\"username\":\"secret\", \"password\":\"also secret\", \"id\":\"secret too\", \"token\":\"secret\"");
int main() {
for (size_t position{}, counter{}; position < buffer.length(); ++position) {
if (buffer[position] == '\"') {
++counter;
std::cout << "Character \" number " << counter << " found at position " << position << '\n';
}
}
return 0;
}
But then, your question was about the usage of std::string.find(). In your implementation, you start always the search at the beginning of the std::string. And because of that, you will always find the same " at position 0.
Solution: After you have found the first match, use the resulting pos (incremented by one) as the second parameter to the std::string.find() function. Then you will start the search after the first found " and hence find the next one. And all this can be done in a normal for-loop.
See below the next easy example:
#include <iostream>
#include <string>
const std::string buffer("\"username\":\"secret\", \"password\":\"also secret\", \"id\":\"secret too\", \"token\":\"secret\"");
int main() {
for (size_t position{}, counter{}; std::string::npos != (position = buffer.find("\"", position)); ++position, ++counter) {
std::cout << "Character \" number " << counter << " found at position " << position << '\n';
}
return 0;
}
There are more solutions, depending on what you really want to do. You coud extract all keywords and data with a simple regex.
Something like this:
#include <iostream>
#include <string>
#include <regex>
#include <vector>
const std::regex re{ R"(\"([ a-zA-Z0-9]+)\")" };
const std::string buffer("\"username\":\"secret\", \"password\":\"also secret\", \"id\":\"secret too\", \"token\":\"secret\"");
int main() {
std::vector part(std::sregex_token_iterator(buffer.begin(), buffer.end(), re, 1), {});
std::cout << part[7] << '\n';
return 0;
}
Or, you can split everything into tokens and values. Like this:
#include <iostream>
#include <string>
#include <regex>
#include <vector>
#include <map>
#include <iomanip>
const std::regex re1{ "," };
const std::regex re2{ R"(\"([^\"]+)\")" };
const std::string buffer("\"username\":\"secret\", \"password\":\"also secret\", \"id\":\"secret too\", \"token\":\"secret\"");
int main() {
std::vector<std::string> block(std::sregex_token_iterator(buffer.begin(), buffer.end(), re1, -1), {});
std::map<std::string, std::string> entry{};
for (const auto& b : block) {
std::vector blockPart(std::sregex_token_iterator(b.begin(), b.end(), re2, 1), {});
entry[blockPart[0]] = blockPart[1];
}
for (const auto& [token, value] : entry)
std::cout << std::setw(20) << token << " --> " << value << '\n';
return 0;
}
But if you have a complex given format, like JSON, there are so many special cases that the only meaningful approach is to use an existing library.

How do I remove repeated words from a string and only show it once with their wordcount

Basically, I have to show each word with their count but repeated words show up again in my program.
How do I remove them by using loops or should I use 2d arrays to store both the word and count?
#include <iostream>
#include <stdio.h>
#include <iomanip>
#include <cstring>
#include <conio.h>
#include <time.h>
using namespace std;
char* getstring();
void xyz(char*);
void tokenizing(char*);
int main()
{
char* pa = getstring();
xyz(pa);
tokenizing(pa);
_getch();
}
char* getstring()
{
static char pa[100];
cout << "Enter a paragraph: " << endl;
cin.getline(pa, 1000, '#');
return pa;
}
void xyz(char* pa)
{
cout << pa << endl;
}
void tokenizing(char* pa)
{
char sepa[] = " ,.\n\t";
char* token;
char* nexttoken;
int size = strlen(pa);
token = strtok_s(pa, sepa, &nexttoken);
while (token != NULL) {
int wordcount = 0;
if (token != NULL) {
int sizex = strlen(token);
//char** fin;
int j;
for (int i = 0; i <= size; i++) {
for (j = 0; j < sizex; j++) {
if (pa[i + j] != token[j]) {
break;
}
}
if (j == sizex) {
wordcount++;
}
}
//for (int w = 0; w < size; w++)
//fin[w] = token;
//cout << fin[w];
cout << token;
cout << " " << wordcount << "\n";
}
token = strtok_s(NULL, sepa, &nexttoken);
}
}
This is the output I get:
I want to show, for example, the word "i" once with its count of 5, and then not show it again.

First of all, since you are using c++, I would recommend you to split text in c++ way(some examples are here), and store every word in map or unordered_map. Example of my realization you can find here
But if you don't want to rewrite your code, you can simply add a variable that will indicate whether a copy of the word was found before or after the word position. If a copy was not found in front, then print your word

This post gives an example to save each word from your 'strtok' function into a vector of string. Then, use string.compare to have each word compared with word[0]. Those indexes match with word[0] are marked in an int array 'used'. The count of match equals to the number marks in the array used ('nused'). Those words of marked are then removed from the vector, and the remaining carries on to the next comparing process. The program ends when no word remained.
You may write a word comparing function to replace 'str.compare(str2)', if you prefer not to use std::vector and std::string.
#include <iostream>
#include <string>
#include <vector>
#include<iomanip>
#include<cstring>
using namespace std;
char* getstring();
void xyz(char*);
void tokenizing(char*);
int main()
{
char* pa = getstring();
xyz(pa);
tokenizing(pa);
}
char* getstring()
{
static char pa[100] = "this is a test and is a test and is test.";
return pa;
}
void xyz(char* pa)
{
cout << pa << endl;
}
void tokenizing(char* pa)
{
char sepa[] = " ,.\n\t";
char* token;
char* nexttoken;
std::vector<std::string> word;
int used[64];
std::string tok;
int nword = 0, nsize, nused;
int size = strlen(pa);
token = strtok_s(pa, sepa, &nexttoken);
while (token)
{
word.push_back(token);
++nword;
token = strtok_s(NULL, sepa, &nexttoken);
}
for (int i = 0; i<nword; i++) std::cout << word[i] << std::endl;
std::cout << "total " << nword << " words.\n" << std::endl;
nsize = nword;
while (nsize > 0)
{
nused = 0;
tok = word[0] ;
used[nused++] = 0;
for (int i=1; i<nsize; i++)
{
if ( tok.compare(word[i]) == 0 )
{
used[nused++] = i; }
}
std::cout << tok << " : " << nused << std::endl;
for (int i=nused-1; i>=0; --i)
{
for (int j=used[i]; j<(nsize+i-nused); j++) word[j] = word[j+1];
}
nsize -= nused;
}
}
Notice that the removal of used words has to do in backward order. If you do it in sequential order, the marked indexes in the 'used' array will need to be changed. A running test:
$ ./a.out
this is a test and is a test and is test.
this
is
a
test
and
is
a
test
and
is
test
total 11 words.
this : 1
is : 3
a : 2
test : 3
and : 2

I read your last comment.
But I am very sorry, I do not know C. So, I will answer in C++.
But anyway, I will answer with the C++ standard approach. That is usually only 10 lines of code . . .
#include <iostream>
#include <algorithm>
#include <map>
#include <string>
#include <regex>
// Regex Helpers
// Regex to find a word
static const std::regex reWord{ R"(\w+)" };
// Result of search for one word in the string
static std::smatch smWord;
int main() {
std::cout << "\nPlease enter text: \n";
if (std::string line; std::getline(std::cin, line)) {
// Words and its appearance count
std::map<std::string, int> words{};
// Count the words
for (std::string s{ line }; std::regex_search(s, smWord, reWord); s = smWord.suffix())
words[smWord[0]]++;
// Show result
for (const auto& [word, count] : words) std::cout << word << "\t\t--> " << count << '\n';
}
return 0;
}

How to run a string search algorithm through whole body of text

I am using the brute force string search algorithm to search through a small sentence, however I want the algorithm to return every time it finds the certain string instead of finding it once and then stopping
//Declare and initialise variables
string pat, text;
text = "This is a test sentence, find test within this string";
cout << text << endl;
//User input for pat
cout << "Please enter the string you want to search for" << endl;
cin >> pat;
//Set the length of the pat and text
int patLength = pat.size();
int textLength = text.size();
//Algorithm
for (int i = 0; i < textLength - patLength; ++i)
{
//Do while loop to run through the whole text
do
{
int j;
for (j = 0; j < patLength; j++)
{
if (text[i + j] != pat[j])
break; // Doesn't match here.
}
if (j == patLength)
{
finds.push(i); // Matched here.
}
} while (i < textLength);
}
//Print output
cout << "String: " << pat << " was found at positions: " << finds.top();
The program stores each find in a queue. When I run this program, it asks for the 'pat', then does nothing. I have done a bit of debugging and found that it is probably the do while loop. However I can't find a fix

You could use the std::string::find function combined with a function that you call for each find.
#include <iostream>
#include <functional>
#include <vector>
#include <sstream>
void Algorithm(
const std::string& text, const std::string& pat,
std::function<void(const std::string&,size_t)> f, std::vector<size_t>& positions)
{
size_t pos=0;
while((pos=text.find(pat, pos)) != std::string::npos) {
// store the position
positions.push_back(pos);
// call the supplied function
f(text, pos++);
}
}
// function to call for each position in which the pattern is found
void gotit(const std::string& found_in, size_t pos) {
std::cout << "Found in \"" << found_in << "\" # " << pos << "\n";
}
int main(int argc, char* argv[]) {
std::vector<std::string> args(argv+1, argv+argc);
if(args.size()==0)
args.push_back("This is a test sentence, find test within this string");
for(const auto& text : args) {
std::vector<size_t> found_at;
std::cout << "Please enter the string you want to search for: ";
std::string pat;
std::cin >> pat;
Algorithm(text, pat, gotit, found_at);
std::cout << "collected positions:\n";
for(size_t pos : found_at) {
std::cout << pos << "\n";
}
}
}

My first bit of advice would be to structure your code into separate functions.
Let's say you have a function that returns the position of the pattern's first occurrence in a sequence of characters:
using position = typename std::string::const_iterator;
position first_occurrence(position text_begin, position text_end, const std::string& pattern);
If there is no more occurrence of the pattern, it returns text_end.
You can now write a very simple loop:
auto occurrence = first_occurrence(text_begin, pattern);
while (occurrence != text_end) {
occurrences.push_back(occurrence);
occurrence = first_occurence(occurrence + 1, text_end, pattern);
}
to accumulate all the occurrences of the pattern.
The first_occurrence function already exists in the standard library under the name of std::search. Since C++17, you can customize this function with pattern-searching specialized searchers, such as std::boyer_moore_searcher: it pre-processes the pattern to make it faster to look for in the string. Here's an example application to your problem:
#include <algorithm>
#include <string>
#include <vector>
#include <functional>
using occurrence = typename std::string::const_iterator;
std::vector<occurrence> find_occurrences(const std::string& input, const std::string& pattern) {
auto engine = std::boyer_moore_searcher(pattern.begin(), pattern.end());
std::vector<occurrence> occurrences;
auto it = std::search(input.begin(), input.end(), engine);
while (it != input.end()) {
occurrences.push_back(it);
it = std::search(std::next(it), input.end(), engine);
}
return occurrences;
}
#include <iostream>
int main() {
std::string text = "This is a test sentence, find test within this string";
std::string pattern = "st";
auto occs = find_occurrences(text, pattern);
for (auto occ: occs) std::cout << std::string(occ, std::next(occ, pattern.size())) << std::endl;
}

C++ reading sentences

string a = MwZwXxZwDwJrBxHrHxMrGrJrGwHxMrFrZrZrDrKwZxLrZrFwZxErMrXxArZw;
Assume i have this data in my string . I want to record how many M , Z , X , D , J (including those capital letters i didn't mentions ) in in string how can do it ? My friends say use vector can do it but i does not really know how to use vector is there any alternative way to do it .
I tried using for loops to do and find the M , and reset the pointer to 0 to continue find the next capital value , but not sure is there any easier way to do it .

first I'll show you a 'easier' way to me.
#include <iostream>
#include <map>
using namespace std;
int main(int argc, const char * argv[]) {
string str = "MwZwXxZwDwJrBxHrHxMrGrJrGwHxMrFrZrZrDrKwZxLrZrFwZxErMrXxArZw";
map<char,int> map;
for (int i=0; i<str.length(); i++) {
char ch = str[i];
if (isupper(ch)) {
map[ch] ++;
}
}
for (auto item : map) {
cout<<item.first<<':'<<item.second<<endl;
}
return 0;
}
you'll only need to use 1 loop to solve your problem.
the 'isupper(int _c)' is a function from the standard library, it can tell you wether a character is a capital letter.
the 'map' is a data structure from the standard library too, it can do key-value storage for you.
this program outputs this:
A:1
B:1
D:2
E:1
F:2
G:2
H:3
J:2
K:1
L:1
M:4
X:2
Z:8
is this what you want?

Use regex.
using namespace std;
// regex_search example
#include <iostream>
#include <string>
#include <regex>
int main ()
{
std::string s ("MwZwXxZwDwJrBxHrHxMrGrJrGwHxMrFrZrZrDrKwZxLrZrFwZxErMrXxArZw;");
std::smatch m;
std::regex e ("[A-Z\s]+");
map<string,int> map;
std::cout << "Target sequence: " << s << std::endl;
std::cout << "Regular expression: [A-Z\s]+" << std::endl;
std::cout << "The following matches and submatches were found:" << std::endl;
while (std::regex_search (s,m,e)) {
for (auto x:m)
{
//cout << x << " ";
map[x.str()] ++;
}
//cout << std::endl;
s = m.suffix().str();
}
for (auto item : map) {
cout<<item.first<<':'<<item.second<<endl;
}
return 0;
}

The most direct translation of "loop through the string and count the uppercase letters" into C++ I can think of:
#include <iostream>
#include <map>
#include <cctype>
int main()
{
string a = "MwZwXxZwDwJrBxHrHxMrGrJrGwHxMrFrZrZrDrKwZxLrZrFwZxErMrXxArZw";
std::map<char, int> count;
// Loop through the string...
for (auto c: a)
{
// ... and count the uppercase letters.
if (std::isupper(c))
{
count[c] += 1;
}
}
// Show the result.
for (auto it: count)
{
std::cout << it.first << ": " << it.second << std::endl;
}
}

string parsing for C++

I have a text file that has #'s in it...It looks something like this.
#Stuff
1
2
3
#MoreStuff
a
b
c
I am trying to use std::string::find() function to get the positions of the # and then go from there, but I'm not sure how to actually code this.
This is my attempt:
int pos1=0;
while(i<string.size()){
int next=string.find('#', pos1);
i++;}

Here's one i made a while ago... (in C)
int char_pos(char c, char *str) {
char *pch=strchr(str,c);
return (pch-str)+1;
}
Port it to C++ and there you go! ;)
If : Not Found Then returns Negative.
Else : Return 'Positive', Char's 1st found position (1st match)

It's hard to tell from your question what you mean by "position", but it looks like you are trying to do something like this:
#include <fstream>
#include <iostream>
int main()
{
std::ifstream incoming{"string-parsing-for-c.txt"};
std::string const hash{"#"};
std::string line;
for (auto line_number = 0U; std::getline(incoming, line); ++line_number)
{
auto const column = line.find(hash);
if (std::string::npos != column)
{
std::cout << hash << " found on line " << line_number
<< " in column " << column << ".\n";
}
}
}
...or possibly this:
#include <fstream>
#include <iostream>
int main()
{
std::ifstream incoming{"string-parsing-for-c.txt"};
char const hash{'#'};
char byte{};
for (auto offset = 0U; incoming.read(&byte, 1); ++offset)
{
if (hash == byte)
{
std::cout << hash << " found at offset " << offset << ".\n";
}
}
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to compare two text files and find the similarities between then? - c++

Related

How can I find the positions from characters in a string with string::find?

How do I remove repeated words from a string and only show it once with their wordcount

How to run a string search algorithm through whole body of text

C++ reading sentences

string parsing for C++

Categories

Resources