Identifying related words with the use of Wildcards in c++

Identifying related words with the use of Wildcards in c++ - c++

Ok, I've been trying to come up with a solution to my problem. The problem is:
Given a list of 3 letter words (size of the list is irrelevant I think), how can I identify those words in the list that differ with the first word in the list by at most one letter.
Say I have the word pat
then I would like to identify all the words in the list that are:
pa_ such as pay
p_t such as pot
_ot such as rot
Is there a way to implement wildcards in c++?

[This may be more complex than the assignment requires but it avoids slow string comparisons and regular expressions]
Given that everything is a 3-letter word, you might consider representing each word as a 4-byte integer. For example, in "pat" the letter 'p' is 0x70 (ascii), 'a' is 0x61 and 't' is 0x74 so represent "pat" by the integer 0x706174. Do likewise for all the 3-letter words in the test list.
Next, the combination of tests required for 2 of the 3 letters to match (in same order) is:
p?t where the test is 0x70??74
?at where the test is 0x??6174
pa? where the test is 0x7061??
PS can I just add that the stackoverflow 'code sample' button which is suppose to reformat your selection as code is weird in Firefox. This 1-minute post has taken 20 minutes to format!
// assume array words[] of strings
int word0 = calc_int_from_word(words[0]);
for (int ii = 1; ii < words.count; ii++)
{
int wordii = calc_int_from_word(words[ii]);
if (wordii & 0xFFFF00 == word0 & 0xFFFF00 ||
wordii & 0xFF00FF == word0 & 0xFF00FF ||
wordii & 0x00FFFF == word0 & 0x00FFFF)
{
// words[ii] matches words[0] in at least two letters
}
}

try looking at the strcmp function and some of the other related functions listed here : http://www.cplusplus.com/reference/cstring/strcmp/
....or you could use regex like Tony the Lion just said.
EDIT:
....also is the strcspn function
http://www.cplusplus.com/reference/cstring/strcspn/
/* strcspn example */
#include <stdio.h>
#include <string.h>
int main ()
{
char str[] = "fcba73";
char keys[] = "1234567890";
int i;
i = strcspn (str,keys);
printf ("The first number in str is at position %d.\n",i+1);
return 0;
}
Output:
The first number in str is at position 5
There's also strstr which finds the first occurence of str1 in str2.
for instance:
string str1 = "po";
string str2 = "potlock";
char * pch;
pch = strstr (str2,str1);
Second EDIT:
you could definitely do a loop with
if (strcspn (str1,str2)){
//there is at least one match
}else {
//no matches
}

Related

How to sort non-numeric strings by converting them to integers? Is there a way to convert strings to unique integers while being ordered?

I am trying to convert strings to integers and sort them based on the integer value. These values should be unique to the string, no other string should be able to produce the same value. And if a string1 is bigger than string2, its integer value should be greater. Ex: since "orange" > "apple", "orange" should have a greater integer value. How can I do this?
I know there are an infinite number of possibilities between just 'a' and 'b' but I am not trying to fit every single possibility into a number. I am just trying to possibly sort, let say 1 million values, not an infinite amount.
I was able to get the values to be unique using the following:
long int order = 0;
for (auto letter : word)
order = order * 26 + letter - 'a' + 1;
return order;
but this obviously does not work since the value for "apple" will be greater than the value for "z".
This is not a homework assignment or a puzzle, this is something I thought of myself. Your help is appreciated, thank you!

You are almost there ... just a minor tweaks are needed:
you are multiplying by 26
however you have letters (a..z) and empty space so you should multiply by 27 instead !!!
Add zeropading
in order to make starting letter the most significant digit you should zeropad/align the strings to common length... if you are using 32bit integers then max size of string is:
floor(log27(2^32)) = 6
floor(32/log2(27)) = 6
Here small example:
int lexhash(char *s)
{
int i,h;
for (h=0,i=0;i<6;i++) // process string
{
if (s[i]==0) break;
h*=27;
h+=s[i]-'a'+1;
}
for (;i<6;i++) h*=27; // zeropad missing letters
return h;
}
returning these:
14348907 a
28697814 b
43046721 c
373071582 z
15470838 abc
358171551 xyz
23175774 apple
224829626 orange
ordered by hash:
14348907 a
15470838 abc
23175774 apple
28697814 b
43046721 c
224829626 orange
358171551 xyz
373071582 z
This will handle all lowercase a..z strings up to 6 characters length which is:
26^6 + 26^5 +26^4 + 26^3 + 26^2 + 26^1 = 321272406 possibilities
For more just use bigger bitwidth for the hash. Do not forget to use unsigned type if you use the highest bit of it too (not the case for 32bit)

You can use position of char:
std::string s("apple");
int result = 0;
for (size_t i = 0; i < s.size(); ++i)
result += (s[i] - 'a') * static_cast<int>(i + 1);
return result;
By the way, you are trying to get something very similar to hash function.

my run-length encoding doesn't work with big numbers

I have a assingment were I need to code and decode txt files, for example: hello how are you? has to be coded as hel2o how are you? and aaaaaaaaaajkle as a10jkle.
while ( ! invoer.eof ( ) ) {
if (kar >= '0' && kar <= '9') {
counter = kar-48;
while (counter > 1){
uitvoer.put(vorigeKar);
counter--;
}
}else if (kar == '/'){
kar = invoer.get();
uitvoer.put(kar);
}else{
uitvoer.put(kar);
}
vorigeKar = kar;
kar = invoer.get ( );
}
but the problem I have is if need to decode a12bhr, the answer is aaaaaaaaaaaabhr but I can't seem to get the 12 as number without problems, I also can't use any strings or array's.
c++

I believe that you are making following mistake: imagine you give a32, then you read the character a and save it as vorigeKar (previous character, I am , Flemish so I understand Dutch :-) ).
Then you read 3, you understand that it is a number and you repeat vorigeKar three times, which leads to aaa. Then you read 2 and repeat vorigeKar two times, leading to aaaaa (five times, five equals 3 + 2).
You need to learn how to keep on reading numeric characters, and translate them into complete numbers (like 32, or 12 in your case).

Like #Dominique said in his answers, You're doing it wrong.
Let me tell you my logic, you can try it.
Pesudo Code + Logic:
Store word as a char array or string, so that it'll be easy to print at last
Loop{
Read - a //check if it's number by subtracting from '0'
Read - 1 //check if number = true. Store it in int res[] = res*10 + 1
//Also store the previous index in an index array(ie) index of char 'a' if you encounter a number first time.
Read - 2 //check if number = true. Store it in res = res*10 + 2
Read - b , h and so on till "space" character
If you encounter another number, then store it's previous character's index in index array and then store the number in a res[] array.
Now using index array you can get the index of your repeating character to be printed and print it for it's corresponding times which we have stored in the result array.
This goes for the second, third...etc:- numbers in your word till the end of the word
}

First, even though you say you can't use strings, you still need to know the basic principle behind how to turn a stream of digit characters into an integer.
Assuming the number is positive, here is a simple function that turns a series of digits into a number:
#include <iostream>
#include <cctype>
int runningTotal(char ch, int lastNum)
{
return lastNum * 10 + (ch -'0');
}
int main()
{
// As a test
char s[] = "a123b23cd1/";
int totalNumber = 0;
for (size_t i = 0; s[i] != '/'; ++i)
{
char digit = s[i]; // This is the character "read from the file"
if ( isdigit( digit) )
totalNumber = runningTotal(digit, totalNumber);
else
{
if ( totalNumber > 0 )
std::cout << totalNumber << "\n";
totalNumber = 0;
}
}
std::cout << totalNumber;
}
Output:
123
23
1
So what was done? The character array is the "file". I then loop for each character, building up the number. The runningTotal is a function that builds the integer from each digit character encountered. When a non-digit is found, we output that number and start the total from 0 again.
The code does not save the letter to "multiply" -- I leave that to you as homework. But the code above illustrates how to take digits and create the number from them. For using a file, you would simply replace the for loop with the reading of each character from the file.

Given a word and a text, return the count of the occurrences of anagrams of the word in the text [duplicate]

This question already has answers here:
Given a word and a text, we need to return the occurrences of anagrams
(6 answers)
Closed 9 years ago.
For eg. word is for and the text is forxxorfxdofr, anagrams of for will be ofr, orf, fro, etc. So the answer would be 3 for this particular example.
Here is what I came up with.
#include<iostream>
#include<cstring>
using namespace std;
int countAnagram (char *pattern, char *text)
{
int patternLength = strlen(pattern);
int textLength = strlen(text);
int dp1[256] = {0}, dp2[256] = {0}, i, j;
for (i = 0; i < patternLength; i++)
{
dp1[pattern[i]]++;
dp2[text[i]]++;
}
int found = 0, temp = 0;
for (i = 0; i < 256; i++)
{
if (dp1[i]!=dp2[i])
{
temp = 1;
break;
}
}
if (temp == 0)
found++;
for (i = 0; i < textLength - patternLength; i++)
{
temp = 0;
dp2[text[i]]--;
dp2[text[i+patternLength]]++;
for (j = 0; j < 256; j++)
{
if (dp1[j]!=dp2[j])
{
temp = 1;
break;
}
}
if (temp == 0)
found++;
}
return found;
}
int main()
{
char pattern[] = "for";
char text[] = "ofrghofrof";
cout << countAnagram(pattern, text);
}
Does there exist a faster algorithm for the said problem?

Most of the time will be spent searching, so to make the algorithm more time efficient, the objective is to reduce the quantities of searches or optimize the search.
Method 1: A table of search starting positions.
Create a vector of lists, one vector slot for each letter of the alphabet. This can be space-optimized later.
Each slot will contain a list of indices into the text.
Example text: forxxorfxdofr
Slot List
'f' 0 --> 7 --> 11
'o' 1 --> 5 --> 10
'r' 2 --> 6 --> 12
For each word, look up the letter in the vector to get a list of indexes into the text. For each index in the list, compare the text string position from the list item to the word.
So with the above table and the word "ofr", the first compare occurs at index 1, second compare at index 5 and last compare at index 10.
You could eliminate near-end of text indices where (index + word length > text length).

You can use the commutativity of multiplication, along with uniqueness of primal decomposition. This relies on my previous answer here
Create a mapping from each character into a list of prime numbers (as small as possible). For e.g. a-->2, b-->3, c-->5, etc.. This can be kept in a simple array.
Now, convert the given word into the multiplication of the primes matching each of its characters. This results will be equal to a similar multiplication of any anagram of that word.
Now sweep over the array, and at any given step, maintain the multiplication of the primes matching the last L characters (where L is the length of your word). So every time you advance you do
mul = mul * char2prime(text[i]) / char2prime(text[i-L])
Whenever this multiplication equals that of your word - increment the overall counter, and you're done
Note that this method would work well on short words, but the primes multiplication can overflow a 64b var pretty fast (by ~9-10 letters), so you'll have to use a large number math library to support longer words.

This algorithm is reasonably efficient if the pattern to be anagrammed is so short that the best way to search it is to simply scan it. To allow longer patterns, the scans represented here by the 'for jj' and 'for mm' loops could be replaced by more sophisticated search techniques.
// sLine -- string to be searched
// sWord -- pattern to be anagrammed
// (in this pseudo-language, the index of the first character in a string is 0)
// iAnagrams -- count of anagrams found
iLineLim = length(sLine)-1
iWordLim = length(sWord)-1
// we need a 'deleted' marker char that will never appear in the input strings
chNil = chr(0)
iAnagrams = 0 // well we haven't found any yet have we
// examine every posn in sLine where an anagram could possibly start
for ii from 0 to iLineLim-iWordLim do {
chK = sLine[ii]
// does the char at this position in sLine also appear in sWord
for jj from 0 to iWordLim do {
if sWord[jj]=chK then {
// yes -- we have a candidate starting posn in sLine
// is there an anagram of sWord at this position in sLine
sCopy = sWord // make a temp copy that we will delete one char at a time
sCopy[jj] = chNil // delete the char we already found in sLine
// the rest of the anagram would have to be in the next iWordLim positions
for kk from ii+1 to ii+iWordLim do {
chK = sLine[kk]
cc = false
for mm from 0 to iWordLim do { // look for anagram char
if sCopy[mm]=chK then { // found one
cc = true
sCopy[mm] = chNil // delete it from copy
break // out of 'for mm'
}
}
if not cc then break // out of 'for kk' -- no anagram char here
}
if cc then { iAnagrams = iAnagrams+1 }
break // out of 'for jj'
}
}
}
-Al.

How to remove a character from the string and change data if need it?

I have possible inputs 1M 2M .. 11M and 1Y (M and Y stand for months ) and I want to output "somestring1 somestring2.... and somestring12" note M and Y are removed and the last string is changed to 12
Example: input "11M" "hello" output: hello11
input "1Y" "hello" output: hello1
char * (const char * date, const char * somestr)
{
// just need to output final string no need to change the original string
cout<< finalStr<<endl;
}

The second string is getting output as a whole itself. So no change in its output.
The second string would be output as long as M or Y are encountered. As Stack Overflow discourages providing exact source codes, so I can give you some portion of it. There is a condition to be placed which is up to you to figure out.(The second answer gives that as well)
Code would be somewhat like this.
//Code for first string. Just for output.
for (auto i = 0 ; date[i] != '\0' ; ++i)
{
// A condition comes here.
cout << date[i] ;
}
And note that this is considering you just output the string. Otherwise you can create another string and add up the two or concatenate the existing ones.

is this homework? If not, here's what i'd suggest. (i ask about homework because you may have restrictions, not because we're not here to help)
1) do a find on 'M' in your string (using find), insert a '\0' at that position if one is found (btw i'm assuming you have well formatted input)
2) do a find on 'Y'. if one is found, insert a '\0' at that position. then do an atoi() or stringstream conversion on your string to convert to number. multiply by 12.
3) concatenate your string representation of part 1 or part 2 to your somestr
4) output.
This can probably be done in < 10 lines if i could be bothered.
the a.find('M') part and its checks can be conditional operator, then the conversion/concatenation in two or three lines at most.

Compare part of the string

Okay so here is what I'm trying to accomplish.
First of all below table is just an example of what I created, in my assignment I'm not suppose to know any of these. Which means I don't know what they will pass and what is the length of each string.
I'm trying to accomplish one task is to get to be able to compare part of the string
//In Array `phrase` // in array `word`
"Backdoor", 0 "mark" 3 (matches "Market")
"DVD", 1 "of" 2 (matches "Get off")
"Get off", 2 "" -1 (no match)
"Market", 3 "VD" 1 (matches "DVD")
So as you can see from the above codes from the left hand side is the set of array which I store them in my class and they have upto 10 words
Here is the class definition.
class data
{
char phrase[10][40];
public:
int match(const char word[ ]);
};
so I'm using member function to access this private data.
int data::match(const char word[ ])
{
int n,
const int wordLength = strlen(word);
for (n=0 ; n <= 10; n++)
{
if (strncmp (phrase[n],word,wordLength) == 0)
{
return n;
}
}
return -1;
}
The above code that I'm trying to make it work is that it should match and and return if it found the match by returning the index n if not found should always return -1.
What happen now is always return 10.

You're almost there but your code is incomplete so I''m shootin in the dark on a few things.
You may have one too many variables representing an index. Unless n and i are different you should only use one. Also try to use more descriptive names, pos seems to represent the length of the text you are searching.
for (n=0 ; n <= searchLength ; n++)
Since the length of word never changes you don't need to call strlen every time. Create a variable to store the length in before the for loop.
const int wordLength = strlen(word);
I'm assuming the text you are searching is stored in a char array. This means you'll need to pass a pointer to the first element stored at n.
if (strncmp (&phrase[n],word,wordLength) == 0)
In the end you have something that looks like the following:
char word[256] = "there";
char phrase[256] = "hello there hippie!";
const int wordLength = strlen(word);
const int searchLength = strlen(phrase);
for (int n = 0; n <= searchLength; n++)
{
// or phrase + n
if (strncmp(&phrase[n], word, wordLength) == 0)
{
return n;
}
}
return -1;
Note: The final example is now complete to the point of returning a match.

I'm puzzled about your problem. There are some cases unclear. For eaxmple abcdefg --- abcde Match "abcde"? how many words match? any other examples, abcdefg --- dcb Match "c"?and abcdefg --- aoodeoofoo Match "a" or "adef"? if you want to find the first matched word, it's OK and very simple. But if you are to find the longest and discontinuous string, it is a big question. I think you should have a research about LCS problem (Longest Common Subsequence)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Identifying related words with the use of Wildcards in c++ - c++

Related

How to sort non-numeric strings by converting them to integers? Is there a way to convert strings to unique integers while being ordered?

my run-length encoding doesn't work with big numbers

Given a word and a text, return the count of the occurrences of anagrams of the word in the text [duplicate]

How to remove a character from the string and change data if need it?

Compare part of the string

Categories

Resources