Generate string lexicographically larger than input - c++

Given an input string A, is there a concise way to generate a string B that is lexicographically larger than A, i.e. A < B == true?
My raw solution would be to say:
B = A;
++B.back();
but in general this won't work because:
A might be empty
The last character of A may be close to wraparound, in which case the resulting character will have a smaller value i.e. B < A.
Adding an extra character every time is wasteful and will quickly in unreasonably large strings.
So I was wondering whether there's a standard library function that can help me here, or if there's a strategy that scales nicely when I want to start from an arbitrary string.

You can duplicate A into B then look at the final character. If the final character isn't the final character in your range, then you can simply increment it by one.
Otherwise you can look at last-1, last-2, last-3. If you get to the front of the list of chars, then append to the length.

Here is my dummy solution:
std::string make_greater_string(std::string const &input)
{
std::string ret{std::numeric_limits<
std::string::value_type>::min()};
if (!input.empty())
{
if (std::numeric_limits<std::string::value_type>::max()
== input.back())
{
ret = input + ret;
}
else
{
ret = input;
++ret.back();
}
}
return ret;
}
Ideally I'd hope to avoid the explicit handling of all special cases, and use some facility that can more naturally handle them. Already looking at the answer by #JosephLarson I see that I could increment more that the last character which would improve the range achievable without adding more characters.
And here's the refinement after the suggestions in this post:
std::string make_greater_string(std::string const &input)
{
constexpr char minC = ' ', maxC = '~';
// Working with limits was a pain,
// using ASCII typical limit values instead.
std::string ret{minC};
auto rit = input.rbegin();
while (rit != input.rend())
{
if (maxC == *rit)
{
++rit;
if (rit == input.rend())
{
ret = input + ret;
break;
}
}
else
{
ret = input;
++(*(ret.rbegin() + std::distance(input.rbegin(), rit)));
break;
}
}
return ret;
}
Demo

You can copy the string and append some letters - this will produce a lexicographically larger result.
B = A + "a"

Related

Finding last word in a string

I'm trying to return the last word in a string but am having trouble with the for loops. When I try to test the function I am only getting empty strings. Not really sure what the problem is. Any help is much appreciated.
string getLastWord(string text)
{
string revLastWord = "";
string lastWord = "";
if(text == "")
{
return text;
}
for(size_t i = text.size()-1; i > -1; i--)
{
if((isalpha(text[i])))
{
revLastWord+=text[i];
}
if(revLastWord.size()>=1 && !isalpha(text[i-1]))
{
break;
}
}
for(size_t k = revLastWord.size()-1; k > -1; k--)
{
lastWord+=revLastWord[k];
}
return lastWord;
}
I was coding up another solution until I checked back and read the comments; they are extremely helpful. Moreover, the suggestion from #JustinRandall was incredibly helpful. I find that find_last_of()
and substr() better state the intent of the function--easier to write and easier to read. Thanks! Hope this helps! It helped me.
std::string get_last_word(std::string s) {
auto index = s.find_last_of(' ');
std::string last_word = s.substr(++index);
return last_word;
}
/**
* Here I have edited the above function IAW
* the recommendations.
* #param s is a const reference to a std::string
* #return the substring directly
*/
std::string get_last_word(const std::string& s) {
auto index = s.find_last_of(' ');
return s.substr(++index);
}
The other answers tell you what's wrong, though you should also know why it's wrong.
In general, you should be very careful about using unsigned value types in loop conditions. Comparing an unsigned type like std::size_t and a signed type, like your constant -1, will cause the signed to get converted into an unsigned type, so -1 becomes the largest possible std::size_t value.
If you put some print statements throughout your code, you'll notice that your loops are never actually entered, because the conditional is always false. Use an int when performing arithmetic and especially when signed numbers are compared with.

C++ - Get the "difference" of 2 strings like git

I'm currently working on a project which includes a Win32 console program on my Windows 10 PC and an app for my Windows 10 Mobile Phone. It's about controlling the master and audio session volumes on my PC over the app on my Windows Phone.
The "little" problem I have right now is to get the "difference" between 2 strings.
Let's take these 2 strings for example:
std::string oldVolumes = "MASTER:50:SYSTEM:50:STEAM:100:UPLAY:100";
std::string newVolumes = "MASTER:30:SYSTEM:50:STEAM:100:ROCKETLEAGUE:80:CHROME:100";
Now I want to compare these 2 strings. Lets say I explode each string to a vector with the ":" as delimiter (I have a function named explode to cut the given string by the delimiter and write the string before into a vector).
Good enough. But as you can see, in the old string there's UPLAY with the value 100, but it's missing in the new string. Also, there are 2 new values (RocketLeague and Chrome), which are missing in the old one. But not only the "audio sessions/names" are different, the values are different too.
What I want now is for each session, which is in both strings (like master and system), to compare the values and if the the new value is different to the old one, I want to append this change into another string, like:
std::string volumeChanges = "MASTER:30"; // Cause Master is changed, System not
If there's a session in the old string, but not in the new one, I want to append:
std::string volumeChanges = "MASTER:30:REMOVE:UPLAY";
If there's a session in the new one, which is missing in the old string, I want to append it like that:
std::string volumeChanges = "MASTER:30:REMOVE:UPLAY:ADD:ROCKETLEAGUE:ROCKETLEAGUE:80:ADD:CHROME:CHROME:100";
The volumeChanges string is just to show you, what I need. I'll try to make a better one afterwards.
Do you have any ideas of how to implement such a comparison? I don't need a specific code example or something, just some ideas of how I could do that in theory. It's like GIT at least. If you make changes in a text file, you see in red the deleted text and in green the added one. Something similar to this, just with strings or vectors of strings.
Lets say I explode each string to a vector with the ":" as delimiter (I have a function named explode to cut the given string by the delimiter and write the string before into a vector).
I'm going to advise you further extend that logic to separate them into property objects that discretely maintain a name + value:
struct property {
std::string name;
in32_t value;
bool same_name(property const& o) const {
return name == o.name;
}
bool same_value(property const& o) const {
return value == o.value;
}
bool operator==(property const& o) const {
return same_name(o) && same_value(o);
}
bool operator<(property const& o) const {
if(!same_name(o)) return name < o.name;
else return value < o.value;
}
};
This will dramatically simplify the logic needed to work out which properties were changed/added/removed.
The logic for "tokenizing" this kind of string isn't too difficult:
std::set<property> tokenify(std::string input) {
bool finding_name = true;
property prop;
std::set<property> properties;
while (input.size() > 0) {
auto colon_index = input.find(':');
if (finding_name) {
prop.name = input.substr(0, colon_index);
finding_name = false;
}
else {
prop.value = std::stoi(input.substr(0, colon_index));
finding_name = true;
properties.insert(prop);
}
if(colon_index == std::string::npos)
break;
else
input = input.substr(colon_index + 1);
}
return properties;
}
Then, the function to get the difference:
std::string get_diff_string(std::string const& old_props, std::string const& new_props) {
std::set<property> old_properties = tokenify(old_props);
std::set<property> new_properties = tokenify(new_props);
std::string output;
//We first scan for properties that were either removed or changed
for (property const& old_property : old_properties) {
auto predicate = [&](property const& p) {
return old_property.same_name(p);
};
auto it = std::find_if(new_properties.begin(), new_properties.end(), predicate);
if (it == new_properties.end()) {
//We didn't find the property, so we need to indicate it was removed
output.append("REMOVE:" + old_property.name + ':');
}
else if (!it->same_value(old_property)) {
//Found the property, but the value changed.
output.append(it->name + ':' + std::to_string(it->value) + ':');
}
}
//Finally, we need to see which were added.
for (property const& new_property : new_properties) {
auto predicate = [&](property const& p) {
return new_property.same_name(p);
};
auto it = std::find_if(old_properties.begin(), old_properties.end(), predicate);
if (it == old_properties.end()) {
//We didn't find the property, so we need to indicate it was added
output.append("ADD:" + new_property.name + ':' + new_property.name + ':' + std::to_string(new_property.value) + ':');
}
//The previous loop detects changes, so we don't need to bother here.
}
if (output.size() > 0)
output = output.substr(0, output.size() - 1); //Trim off the last colon
return output;
}
And we can demonstrate that it's working with a simple main function:
int main() {
std::string diff_string = get_diff_string("MASTER:50:SYSTEM:50:STEAM:100:UPLAY:100", "MASTER:30:SYSTEM:50:STEAM:100:ROCKETLEAGUE:80:CHROME:100");
std::cout << "Diff String was \"" << diff_string << '\"' << std::endl;
}
Which yields an output (according to IDEONE.com):
Diff String was "MASTER:30:REMOVE:UPLAY:ADD:CHROME:CHROME:100:ADD:ROCKETLEAGUE:ROCKETLEAGUE:80"
Which, although the contents are in a slightly different order than your example, still contains all the correct information. The contents are in different order because std::set implicitly sorted the attributes by name when tokenizing the properties; if you want to disable that sorting, you'd need to use a different data structure which preserves entry order. I chose it because it eliminates duplicates, which could cause odd behavior otherwise.
In this particular instance, you could do it as follows:
Split the old and new strings by the delimiter, and store the results in a vector.
Loop over the vector with the old data. Look for each word in the vector with new data: e.g. find("MASTER").
If not found add "REMOVE:MASTER" to your results.
If found, compare the numbers and add it to the results if it has been changed.
The added string can be found by looping over the new string and searching for the words in the old string.
I suggest that you enumerate some features (in your case for example: UPLAY present, REMOVE is present, ...)
for every one of those assign a weight if the two strings differs for the given feature.
At the end sum up weights for the features presents in one string and absent in the other and get a number.
This number should represent what you are looking for.
You can adjust weights until you are satisfied with the result.
Maybe my answer will give you some new thoughts. In fact, by tweaking the current code, you can find all the missing words.
std::vector<std::string> splitString(const std::string& str, const char delim)
{
std::vector<std::string> out;
std::stringstream ss(str);
std::string s;
while (std::getline(ss, s, delim)) {
out.push_back(s);
}
return out;
}
std::vector<std::string> missingWords(const std::string& first, const std::string& second)
{
std::vector<std::string> missing;
const auto firstWords = splitString(first, ' ');
const auto secWords = splitString(second, ' ');
size_t i = 0, j = 0;
for(; i < firstWords.size();){
auto findSameWord = std::find(secWords.begin() + j, secWords.end(), firstWords[i]);
if(findSameWord == secWords.end()) {
missing.push_back(firstWords[i]);
j++;
} else {
j = distance(secWords.begin(), findSameWord);
}
i++;
}
return missing;
}

String compression (Interview prepare)

I need to compress a string. Can make an assumption that each character in the string doesn`t appear more than 255 times. I need return the compressed string and its length.
Last 2 years I worked with C# and forgot C++. I will be glad to hear your comments about code , algorithm and c++ programming practices
// StringCompressor.h
class StringCompressor
{
public:
StringCompressor();
~StringCompressor();
unsigned long Compress(string str, string* strCompressedPtr);
string DeCompress(string strCompressed);
private:
string m_StrCompressed;
static const char c_MaxLen;
};
// StringCompressor.cpp
#include "StringCompressor.h"
const char StringCompressor::c_MaxLen = 255;
StringCompressor::StringCompressor()
{
}
StringCompressor::~StringCompressor()
{
}
unsigned long StringCompressor::Compress(string str, string* strCompressedPtr)
{
if (str.empty())
{
return 0;
}
char currentChar = str[0];
char count = 1;
for (string::iterator it = str.begin() + 1; it != str.end(); ++it)
{
if (*it == currentChar)
{
count++;
if (count == c_MaxLen)
{
return -1;
}
}
else
{
m_StrCompressed+=currentChar;
m_StrCompressed+=count;
currentChar = *it;
count = 1;
}
}
m_StrCompressed += currentChar;
m_StrCompressed += count;
*strCompressedPtr = m_StrCompressed;
return m_StrCompressed.length();
}
string StringCompressor::DeCompress(string strCompressed)
{
string res;
if (strCompressed.length() % 2 != 0)
{
return res;
}
for (string::iterator it = strCompressed.begin(); it != strCompressed.end(); it+=2)
{
char dup = *(it + 1);
res += string(dup, *it);
}
return res;
}
There can be many improvement:
Do not return -1 for a unsigned long function.
consider use size_t or ssize_t to represent size.
Learn const
m_StrCompressed has bogus state if Compress is called repeatedly. Since those member cannot be reused, you may as well make the function static.
Compressed stuff generally should not be considered string, but byte buffer. Redesign your interface.
Comments! Nobody knows you are doing RLE here.
Bonus: Fallback mechanism if your compression yield larger result. e.g. a flag to denote uncompressed buffer, or just return failure.
I assume efficiency is not major concern here.
A few things:
I'm all for using classes, and perhaps you could do that here in a way that makes more sense. But given the scope of what you are trying to do, this here would be better off as two functions. One for compression, one for decompression. For instance, why are you storing the string in the class as an object and never using it? How does grouping this as a class actually enhance the functionality or make it more reusable?
You should pass your compressed string return as a reference instead of a pointer.
It looks like you are trying to count the number of times characters are repeated in a row and save that. For most common strings this will make the size of your compressed string larger than uncompressed as it takes two bytes to store each non-repeated character.
There are a lot of characters, there are two kinds of bits. If you do this method trying to group repeated bits, you'd be more successful (and that's actually one simple method of lossless compression).
If you are allowed, just use a library like zlib to do compression of arbitrary data types.

Checking if a word is contained within an array

I want to check for a word contained within a bigger string, but not necessarily in the same order. Example: The program will check if the word "car" exists in "crqijfnsa". In this case, it does, because the second string contains c, a, and r.
You could build a map containing the letters "car" with the values set to 0. Cycle through the array with all the letters and if it is a letter in the word "car" change the value to 1. If all the keys in the map have a value greater than 0, than the word can be constructed. Try implementing this.
An anagram is a type of word play, the result of rearranging the letters of a word or phrase to produce a new word or phrase, using all the original letters exactly once;
So, actually what you are looking for is an algorithm to check if two words are "Anagrams" are not.
Following thread provides psuedocode that might be helpful
Finding anagrams for a given word
A very primitive code would be something like this:
for ( std::string::iterator it=str.begin(); it!=str.end(); ++it)
for ( std::string::iterator it2=str2.begin(); it2!=str2.end(); ++it2) {
if (*it == *it2) {
str2.erase(it);
break;
}
}
if (str2.empty())
found = true;
You could build up a table of count of characters of each letter in the word you are searching for, then decrement those counts as you work through the search string.
bool IsWordInString(const char* word, const char* str)
{
// build up table of characters in word to match
std::array<int, 256> cword = {0};
for(;*word;++word) {
cword[*word]++;
}
// work through str matching characters in word
for(;*str; ++str) {
if (cword[*str] > 0) {
cword[*str]--;
}
}
return std::accumulate(cword.begin(), cword.end(), 0) == 0;
}
It's also possible to return as soon as you find a match, but the code isn't as simple.
bool IsWordInString(const char* word, const char* str)
{
// empty string
if (*word == 0)
return true;
// build up table of characters in word to match
int unmatched = 0;
char cword[256] = {0};
for(;*word;++word) {
cword[*word]++;
unmatched++;
}
// work through str matching characters in word
for(;*str; ++str) {
if (cword[*str] > 0) {
cword[*str]--;
unmatched--;
if (unmatched == 0)
return true;
}
}
return false;
}
Some test cases
"" in "crqijfnsa" => 1
"car" in "crqijfnsa" => 1
"ccar" in "crqijfnsa" => 0
"ccar" in "crqijfnsac" => 1
I think the easiest (and probably fastest, test that youself :) ) implementation would be done with std::includes:
std::string testword {"car"};
std::string testarray {"crqijfnsa"};
std::sort(testword.begin(),testword.end());
std::sort(testarray.begin(),testarray.end());
bool is_in_array = std::includes(testarray.begin(),testarray.end(),
testword.begin(),testword.end());
This also handles all cases of duplicate letters correctly.
The complexity of this approach should be O(n * log n) where n is the length of testarray. (sort is O(n log n) and includes has linear complexity.

skipping a character in an array if previous character is the same

I'm iterating through an array of chars to do some manipulation. I want to "skip" an iteration if there are two adjacent characters that are the same.
e.g. x112abbca
skip----------^
I have some code but it's not elegant and was wondering if anyone can think of a better way? I have a few case's in the switch statement and would be happy if I didn't have to use an if statement inside the switch.
switch(ent->d_name[i])
{
if(i > 0 && ent->d_name[i] == ent->d_name[i-1])
continue;
case ' ' :
...//code omited
case '-' :
...
}
By the way, an instructor once told me "avoid continues unless much code is required to replace them". Does anyone second that? (Actually he said the same about breaks)
Put the if outside the switch.
While I don't have anything against using continue and break, you can certainly bypass them this time without much code at all: simply revert the condition and put the whole switch statement within the if-block.
Answering the rectified question: what's clean depends on many factors. How long is this list of characters to consider: should you iterate over them yourself, or perhaps use a utility function from <algorithm>? In any case, if you are referring to the same character multiple times, perhaps you ought to give it an alias:
std::string interesting_chars("-_;,.abc");
// ...
for (i...) {
char cur = abc->def[i];
if (cur != prev || interesting_chars.find(cur) == std::string::npos)
switch (current) // ...
char chr = '\0';
char *cur = &ent->d_name[0];
while (*cur != '\0') {
if (chr != *cur) {
switch(...) {
}
}
chr = *cur++;
}
If you can clobber the content of the array you are analyzing, you can preprocess it with std::unique():
ent->erase(std::unique(ent->d_name.begin(), ent->d_name.end()), ent.end());
This should replace all sequences of identical characters by a single copy and shorten the string appropriately. If you can't clobber the string itself, you can create a copy with character sequences of just one string:
std::string tmp;
std::unique_copy(ent->d_name.begin(), ent->d_name.end(), std::back_inserter(tmp));
In case you are using C-strings: use std::string instead. If you insist in using C-strings and don't want to play with std::unique() a nicer approach than yours is to use a previous character, initialized to 0 (this can't be part of a C-string, after all):
char previous(0);
for (size_t i(0); ent->d_name[i]; ++i) {
if (ent->d_name[i] != previous) {
switch (previous = ent->d_name[i]) {
...
}
}
}
I hope I understand what you are trying to do, anyway this will find matching pairs and skip over a match.
char c_anotherValue[] = "Hello World!";
int i_len = strlen(c_anotherValue);
for(int i = 0; i < i_len-1;i++)
{
if(c_anotherValue[i] == c_anotherValue[i+1])
{
printf("%c%c",c_anotherValue[i],c_anotherValue[i+1]);
i++;//this will force the loop to skip
}
}