Best way to detect grouped words

Best way to detect grouped words - c++

A word is grouped if, for each letter in the word, all occurrences of that letter form exactly one consecutive sequence. In other words, no two equal letters are separated by one or more letters that are different.
Given a vector<string> return the number of grouped words.
For example :
{"ab", "aa", "aca", "ba", "bb"}
return 4.
Here, "aca" is not a grouped word.
My quick and dirty solution :
int howMany(vector <string> words) {
int ans = 0;
for (int i = 0; i < words.size(); i++) {
bool grouped = true;
for (int j = 0; j < words[i].size()-1; j++)
if (words[i][j] != words[i][j+1])
for (int k = j+1; k < words[i].size(); k++)
if (words[i][j] == words[i][k])
grouped = false;
if (grouped) ans++;
}
return ans;
}
I want a better algorithm for the same problem.

Try the following :
bool isGrouped( string const& str )
{
set<char> foundCharacters;
char currentCharacter='\0';
for( int i = 0 ; i < str.size() ; ++i )
{
char c = str[i];
if( c != currentCharacter )
{
if( foundCharacters.insert(c).second )
{
currentCharacter = c;
}
else
{
return false;
}
}
}
return true;
}

Just considering one word, here is an O(n log n) destructive algorithm:
std::string::iterator unq_end = std::unique( word.begin(), word.end() );
std::sort( word.begin(), unq_end );
return std::unique( word.begin(), unq_end ) == unq_end;
Edit: The first call to unique reduces runs of consecutive letters to single letters. The call to sort groups identical letters together. The second call to unique checks whether sort formed any new groups of consecutive letters. If it did, then the word must not be grouped.
Advantage over the others posted is that it doesn't require storage — although that's not much of an advantage.
Here's a simple version of the alternative algo, also requiring only O(1) storage (and yes, also tested):
if ( word.empty() ) return true;
bitset<CHAR_MAX+1> symbols;
for ( string::const_iterator it = word.begin() + 1; it != word.end(); ++ it ) {
if ( it[0] == it[-1] ) continue;
if ( symbols[ it[0] ] ) return false;
symbols[ it[-1] ] = true;
}
return ! symbols[ * word.rbegin() ];
Note that you would need minor modifications to work with characters outside ASCII. bitset comes from the header <bitset>.

You could use a Set of some kind (preferable one with O(1) insertion and lookup times).
Each time you encounter a character that differs from the previous one, check if the set contains it. If it does, your match fails. If it doesn't, add it to the set and carry on.

This might work in two loops per word:
1) Loop over the word counting the number of distinct symbols that appear. (This will require extra storage at most equal to the length of the string - probably some sort of hash.)
2) Loop over the word counting the number of times symbol n is different from symbol n+1.
If those two values aren't different by exactly one, the word is not grouped.

Here's a way with two loops per word, except one of the loops isn't up until the word length, but up until the alphabet size. Worst case is O(NLs), where N = number of words, L = length of words, s = alphabet size:
for each word wrd:
{
for each character c in the alphabet:
{
for each letter i in wrd:
{
let poz = last position of character c in wrd. initially poz = -1
if ( poz == -1 && c == wrd[i] )
poz = i;
else if ( c == wrd[i] && poz != i - 1 )
// definitely not grouped, as it's separated by at least one letter from the prev sequence
}
}
// grouped if the above else condition never executed
}
basically, checks if every letter in the alphabet either doesn't exist or it appears in only one substring of that letters.

public static Boolean isGrouped( String input )
{
char[] c = input.ToCharArray();
int pointer = 0;
while ( pointer < c.Length - 1 )
{
char current = c[pointer];
char next = c[++ pointer];
if ( next != current &&
( next + 1 ) != current &&
( next - 1 ) == current
) return false;
}
return true;
}
(C# but the principal applies)

Here is a multi-line, verbose, regexp to match failures:
(?: # Non capturing group of ...
(\S)\1* # One or more of any non space character (capured).
)
(?! # Then a position without
\1 # ... the captured character
).+ # ... at least once.
\1 # Followed by the captured character.
Or smaller:
"(?:(\S)\1*)(?!\1).+\1"
I am just presuming that C++ has a regexp implementation that is up to it, it does work in Python and should work in Perl and Ruby too.

Related

my run-length encoding doesn't work with big numbers

I have a assingment were I need to code and decode txt files, for example: hello how are you? has to be coded as hel2o how are you? and aaaaaaaaaajkle as a10jkle.
while ( ! invoer.eof ( ) ) {
if (kar >= '0' && kar <= '9') {
counter = kar-48;
while (counter > 1){
uitvoer.put(vorigeKar);
counter--;
}
}else if (kar == '/'){
kar = invoer.get();
uitvoer.put(kar);
}else{
uitvoer.put(kar);
}
vorigeKar = kar;
kar = invoer.get ( );
}
but the problem I have is if need to decode a12bhr, the answer is aaaaaaaaaaaabhr but I can't seem to get the 12 as number without problems, I also can't use any strings or array's.
c++

I believe that you are making following mistake: imagine you give a32, then you read the character a and save it as vorigeKar (previous character, I am , Flemish so I understand Dutch :-) ).
Then you read 3, you understand that it is a number and you repeat vorigeKar three times, which leads to aaa. Then you read 2 and repeat vorigeKar two times, leading to aaaaa (five times, five equals 3 + 2).
You need to learn how to keep on reading numeric characters, and translate them into complete numbers (like 32, or 12 in your case).

Like #Dominique said in his answers, You're doing it wrong.
Let me tell you my logic, you can try it.
Pesudo Code + Logic:
Store word as a char array or string, so that it'll be easy to print at last
Loop{
Read - a //check if it's number by subtracting from '0'
Read - 1 //check if number = true. Store it in int res[] = res*10 + 1
//Also store the previous index in an index array(ie) index of char 'a' if you encounter a number first time.
Read - 2 //check if number = true. Store it in res = res*10 + 2
Read - b , h and so on till "space" character
If you encounter another number, then store it's previous character's index in index array and then store the number in a res[] array.
Now using index array you can get the index of your repeating character to be printed and print it for it's corresponding times which we have stored in the result array.
This goes for the second, third...etc:- numbers in your word till the end of the word
}

First, even though you say you can't use strings, you still need to know the basic principle behind how to turn a stream of digit characters into an integer.
Assuming the number is positive, here is a simple function that turns a series of digits into a number:
#include <iostream>
#include <cctype>
int runningTotal(char ch, int lastNum)
{
return lastNum * 10 + (ch -'0');
}
int main()
{
// As a test
char s[] = "a123b23cd1/";
int totalNumber = 0;
for (size_t i = 0; s[i] != '/'; ++i)
{
char digit = s[i]; // This is the character "read from the file"
if ( isdigit( digit) )
totalNumber = runningTotal(digit, totalNumber);
else
{
if ( totalNumber > 0 )
std::cout << totalNumber << "\n";
totalNumber = 0;
}
}
std::cout << totalNumber;
}
Output:
123
23
1
So what was done? The character array is the "file". I then loop for each character, building up the number. The runningTotal is a function that builds the integer from each digit character encountered. When a non-digit is found, we output that number and start the total from 0 again.
The code does not save the letter to "multiply" -- I leave that to you as homework. But the code above illustrates how to take digits and create the number from them. For using a file, you would simply replace the for loop with the reading of each character from the file.

Separating every second digit in an integer C++

I am currently finishing up an assignment I have to complete for my OOP class and I am struggling with 1 part in particular. Keep in mind I am still a beginner. The question is as followed:
If the string contains 13 characters, all of characters are digits and the check digit is modulo 10, this function returns true; false otherwise.
This is in regards to a EAN. I basically have to separate every second digit from the rest digits. for example 9780003194876 I need to do calculations with 7,0,0,1,4,7. I have no clue about doing this.
Any help would be greatly appreciated!
bool isValid(const char* str){
if (atoi(str) == 13){
}
return false;
}

You can start with a for loop which increments itself by 2 for each execution:
for (int i = 1, len = strlen(str); i < len; i += 2)
{
int digit = str[i] - '0';
// do something with digit
}
The above is just an example though...

Since the question was tagged as C++ (Not C, so I suggest other answerers to not solve this using C libraries, please. Let us getting OP's C++ knoweledge in the right way since the beggining), and is an OOP class I'm going to solve this with the C++ way: Use the std::string class:
bool is_valid( const std::string& str )
{
if( str.size() == 13 )
{
for( std::size_t i = 0 ; i < 13 ; i += 2 )
{
int digit = str[i] - '0';
//Do what you wan't with the digit
}
}
else
return false;
}

First, if it's EAN, you have to process every digit, not just
every other one. In fact, all you need to do is a weighted sum
of the digits; for EAN-13, the weigths alternate between 1 and
3, starting with three. The simplest solution is probably to
put them in a table (i.e. int weigt[] = { 1, 3, 1, 3... };,
and iterate over the string (in this case, using an index rather
than iterators, since you want to be able to index into
weight as well), converting each digit into a numerical value
(str[i] - '0', if isdigit(static_cast<unsigned char>(str[i])
is true; if it's false, you haven't got a digi.), then
multiplying it by the running total. When you're finished, if
the total, modulo 10, is 0, it's correct. Otherwise, it isn't.
You certainly don't want to use atoi, since you don't want the
numerical value of the string; you want to treat each digit
separately.
Just for the record, professionally, I'd write something like:
bool
isValidEAN13( std::string const& value )
{
return value.size() == 13
&& std::find_if(
value.begin(),
value.end(),
[]( unsigned char ch ){ return !isdigit( ch ); } )
== value.end()
&& calculateEAN13( value ) == value.back() - '0';
}
where calculateEAN13 does the actual calculations (and can be
used for both generation and checking). I suspect that this
goes beyond the goal of the assignment, however, and that all
your teacher is looking for is the calculateEAN13 function,
with the last check (which is why I'm not giving it in full).

Given a word and a text, return the count of the occurrences of anagrams of the word in the text [duplicate]

This question already has answers here:
Given a word and a text, we need to return the occurrences of anagrams
(6 answers)
Closed 9 years ago.
For eg. word is for and the text is forxxorfxdofr, anagrams of for will be ofr, orf, fro, etc. So the answer would be 3 for this particular example.
Here is what I came up with.
#include<iostream>
#include<cstring>
using namespace std;
int countAnagram (char *pattern, char *text)
{
int patternLength = strlen(pattern);
int textLength = strlen(text);
int dp1[256] = {0}, dp2[256] = {0}, i, j;
for (i = 0; i < patternLength; i++)
{
dp1[pattern[i]]++;
dp2[text[i]]++;
}
int found = 0, temp = 0;
for (i = 0; i < 256; i++)
{
if (dp1[i]!=dp2[i])
{
temp = 1;
break;
}
}
if (temp == 0)
found++;
for (i = 0; i < textLength - patternLength; i++)
{
temp = 0;
dp2[text[i]]--;
dp2[text[i+patternLength]]++;
for (j = 0; j < 256; j++)
{
if (dp1[j]!=dp2[j])
{
temp = 1;
break;
}
}
if (temp == 0)
found++;
}
return found;
}
int main()
{
char pattern[] = "for";
char text[] = "ofrghofrof";
cout << countAnagram(pattern, text);
}
Does there exist a faster algorithm for the said problem?

Most of the time will be spent searching, so to make the algorithm more time efficient, the objective is to reduce the quantities of searches or optimize the search.
Method 1: A table of search starting positions.
Create a vector of lists, one vector slot for each letter of the alphabet. This can be space-optimized later.
Each slot will contain a list of indices into the text.
Example text: forxxorfxdofr
Slot List
'f' 0 --> 7 --> 11
'o' 1 --> 5 --> 10
'r' 2 --> 6 --> 12
For each word, look up the letter in the vector to get a list of indexes into the text. For each index in the list, compare the text string position from the list item to the word.
So with the above table and the word "ofr", the first compare occurs at index 1, second compare at index 5 and last compare at index 10.
You could eliminate near-end of text indices where (index + word length > text length).

You can use the commutativity of multiplication, along with uniqueness of primal decomposition. This relies on my previous answer here
Create a mapping from each character into a list of prime numbers (as small as possible). For e.g. a-->2, b-->3, c-->5, etc.. This can be kept in a simple array.
Now, convert the given word into the multiplication of the primes matching each of its characters. This results will be equal to a similar multiplication of any anagram of that word.
Now sweep over the array, and at any given step, maintain the multiplication of the primes matching the last L characters (where L is the length of your word). So every time you advance you do
mul = mul * char2prime(text[i]) / char2prime(text[i-L])
Whenever this multiplication equals that of your word - increment the overall counter, and you're done
Note that this method would work well on short words, but the primes multiplication can overflow a 64b var pretty fast (by ~9-10 letters), so you'll have to use a large number math library to support longer words.

This algorithm is reasonably efficient if the pattern to be anagrammed is so short that the best way to search it is to simply scan it. To allow longer patterns, the scans represented here by the 'for jj' and 'for mm' loops could be replaced by more sophisticated search techniques.
// sLine -- string to be searched
// sWord -- pattern to be anagrammed
// (in this pseudo-language, the index of the first character in a string is 0)
// iAnagrams -- count of anagrams found
iLineLim = length(sLine)-1
iWordLim = length(sWord)-1
// we need a 'deleted' marker char that will never appear in the input strings
chNil = chr(0)
iAnagrams = 0 // well we haven't found any yet have we
// examine every posn in sLine where an anagram could possibly start
for ii from 0 to iLineLim-iWordLim do {
chK = sLine[ii]
// does the char at this position in sLine also appear in sWord
for jj from 0 to iWordLim do {
if sWord[jj]=chK then {
// yes -- we have a candidate starting posn in sLine
// is there an anagram of sWord at this position in sLine
sCopy = sWord // make a temp copy that we will delete one char at a time
sCopy[jj] = chNil // delete the char we already found in sLine
// the rest of the anagram would have to be in the next iWordLim positions
for kk from ii+1 to ii+iWordLim do {
chK = sLine[kk]
cc = false
for mm from 0 to iWordLim do { // look for anagram char
if sCopy[mm]=chK then { // found one
cc = true
sCopy[mm] = chNil // delete it from copy
break // out of 'for mm'
}
}
if not cc then break // out of 'for kk' -- no anagram char here
}
if cc then { iAnagrams = iAnagrams+1 }
break // out of 'for jj'
}
}
}
-Al.

Compare part of the string

Okay so here is what I'm trying to accomplish.
First of all below table is just an example of what I created, in my assignment I'm not suppose to know any of these. Which means I don't know what they will pass and what is the length of each string.
I'm trying to accomplish one task is to get to be able to compare part of the string
//In Array `phrase` // in array `word`
"Backdoor", 0 "mark" 3 (matches "Market")
"DVD", 1 "of" 2 (matches "Get off")
"Get off", 2 "" -1 (no match)
"Market", 3 "VD" 1 (matches "DVD")
So as you can see from the above codes from the left hand side is the set of array which I store them in my class and they have upto 10 words
Here is the class definition.
class data
{
char phrase[10][40];
public:
int match(const char word[ ]);
};
so I'm using member function to access this private data.
int data::match(const char word[ ])
{
int n,
const int wordLength = strlen(word);
for (n=0 ; n <= 10; n++)
{
if (strncmp (phrase[n],word,wordLength) == 0)
{
return n;
}
}
return -1;
}
The above code that I'm trying to make it work is that it should match and and return if it found the match by returning the index n if not found should always return -1.
What happen now is always return 10.

You're almost there but your code is incomplete so I''m shootin in the dark on a few things.
You may have one too many variables representing an index. Unless n and i are different you should only use one. Also try to use more descriptive names, pos seems to represent the length of the text you are searching.
for (n=0 ; n <= searchLength ; n++)
Since the length of word never changes you don't need to call strlen every time. Create a variable to store the length in before the for loop.
const int wordLength = strlen(word);
I'm assuming the text you are searching is stored in a char array. This means you'll need to pass a pointer to the first element stored at n.
if (strncmp (&phrase[n],word,wordLength) == 0)
In the end you have something that looks like the following:
char word[256] = "there";
char phrase[256] = "hello there hippie!";
const int wordLength = strlen(word);
const int searchLength = strlen(phrase);
for (int n = 0; n <= searchLength; n++)
{
// or phrase + n
if (strncmp(&phrase[n], word, wordLength) == 0)
{
return n;
}
}
return -1;
Note: The final example is now complete to the point of returning a match.

I'm puzzled about your problem. There are some cases unclear. For eaxmple abcdefg --- abcde Match "abcde"? how many words match? any other examples, abcdefg --- dcb Match "c"?and abcdefg --- aoodeoofoo Match "a" or "adef"? if you want to find the first matched word, it's OK and very simple. But if you are to find the longest and discontinuous string, it is a big question. I think you should have a research about LCS problem (Longest Common Subsequence)

Caesar cipher in C++

To start off, I'm four weeks into a C++ course and I don't even know loops yet, so please speak baby talk?
Okay, so I'm supposed to read a twelve character string (plus NULL makes thirteen) from a file, and then shift the letters backwards three, and then print my results to screen and file. I'm okay with everything except the shifting letters. I don't want to write miles of code to take each character individually, subtract three, and re-assemble the string, but I'm not sure how to work with the whole string at once. Can someone recommend a really simple method of doing this?

If you are dealing with simple letters (A to Z or a to z), then you can assume that the internals codes are linear.
Letters are coded as numbers, between 0 and 127. A is coded as 65, B as 66, C as 67, Z as 90.
In order to shift letters, you just have to change the internal letter code as if it were a number, so basically just substracting 3 from the character. Beware of edge cases though, because substracting 3 to 'A' will give you '>' (code 62) and not 'X' (code 88). You can deal with them using "if" statements or the modulo operator ("%").
Here is an ASCII characters table to help you

Once you've loaded your string in, you can use the modulous operator to rotate while keeping within the confines of A-Z space.
I'd keep track of whether the letter was a capital to start with:
bool isCaps = ( letter >= 'A' ) && ( letter <= 'Z' );
if( isCaps )
letter -= 'A'-'a';
and then just do the cipher shift like this:
int shift = -3;
letter -= 'a'; // to make it a number from 0-25
letter = ( letter + shift + 26 ) % 26;
// add 26 in case the shift is negative
letter += 'a'; // back to ascii code
finally finish off with
if( isCaps )
letter += 'A'-'a';
so, putting all this together we get:
char *mystring; // ciphertext
int shift = -3; // ciphershift
for( char *letter = mystring; letter; ++letter )
{
bool isCaps = ( *letter >= 'A' ) && ( *letter <= 'Z' );
if( isCaps )
*letter -= 'A'-'a';
letter -= 'a';
letter = ( letter + shift + 26 ) % 26;
letter += 'a';
if( isCaps )
letter += 'A'-'a';
}

You're going to have to learn loops. They will allow you to repeat some code over the characters of a string, which is exactly what you need here. You'll keep an integer variable that will be your index into the string, and inside the loop do your letter-shifting on the character at that index and increment the index variable by one until you reach NULL.
Edit: If you're not expected to know about loops yet in your course, maybe they want you to do this:
string[0] -= 3; // this is short for "string[0] = string[0] - 3;"
string[1] -= 3;
string[2] -= 3;
...
It will only result in 12 lines of code rather than miles. You don't have to "reassemble" the string this way, you can just edit each character in-place. Then I bet after making you do that, they'll show you the fast way of doing it using loops.

Iterate over the characters with a for loop. And do what you want with the char*. Then put the new char back.

for(int i=0; i<12; i++){
string[i] = string[i] - 3;
}
Where string is your character array (string). There is a bit more involved if you want to make it periodic (I.E. have A wrap round to Z, but the above code should help you get started)

I'm a little unclear what you mean by "shift the letters backwards 3"?
Does that mean D ==> A?
If so, here's a simple loop.
(I didn't do reading from the file, or writing to the file... Thats your part)
#include <string.h>
int main(void)
{
char input[13] = "ABCDEFGHIJKL";
int i;
int len = strlen(input);
for(i=0; i<len; ++i)
{
input[i] = input[i]-3;
}
printf("%s", input); // OUTPUT is: ">?#ABCDEFGHI"
}

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Best way to detect grouped words - c++

You could use a Set of some kind (preferable one with O(1) insertion and lookup times). Each time you encounter a character that differs from the previous one, check if the set contains it. If it does, your match fails. If it doesn't, add it to the set and carry on.

Related

my run-length encoding doesn't work with big numbers

Separating every second digit in an integer C++

Given a word and a text, return the count of the occurrences of anagrams of the word in the text [duplicate]

Compare part of the string

Caesar cipher in C++

Categories

Resources