Difficulty writing run length encoder in c++ - c++

Trying to write this run length encoder, and it basically works but it is not passing test cases because of a '/0'.
Code
std::string run_length_encode(const std::string& str)
{
std::string encoded = "";
char prevch;
char newch;
int count = 1;
prevch = str[0];
for (int i = 0; i <= str.length(); i++)
{
newch = str[i];
if (prevch == newch)
{
count++;
}
else
{
encoded += prevch;
if (count > 1)
{
encoded += std::to_string(count);
}
prevch = newch;
count = 1;
}
}
if (prevch == newch)
{
encoded += newch;
if (count > 1)
{
encoded += std::to_string(count);
}
}
return encoded;
Error message:
Expected equality of these values:
run_length_encode("A")
Which is: "A\0"
"A"
Answer should be A but my code returns A\0.

for (int i = 0; i <= str.length(); i++)
should be
for (int i = 0; i < str.length(); i++)
In C++ string indexes start at zero and finish one before the length of the string.

Related

Function to count the frequency of each word in a string using two parallel arrays

I am trying to count the frequency of each word in a given string using two arrays and WITHOUT using Maps or Vectors.
One array to store the words and the other to count the frequency of each word I believe.
I have been teaching myself C++ in my off time and this problem has given me more trouble than I'd like to admit and I've been stuck on it. Maps and Vectors are easier to me, but the problem says specifically not to use them.
This is the code that I used Maps to create pairs but now I need two strings to do the same thing basically.
void wordCounter(string str, string wordsArray[], int countArray[]){
map<string, int> passage;
string word = "";
for (int i = 0; i < str.size(); i++) {
if (str[i] == ' '){
if(passage.find(word) == passage.end()){
passage.insert(make_pair(word, 1));
word = "";
}else{
passage[word]++;
word = "";
}
}else
word += str[i];
}
if(passage.find(word) == passage.end())
passage.insert(make_pair(word, 1));
else
passage[word]++;
for(auto& it : passage) {
cout << it.first << " - " << it.second << endl;
}
}
output would be something like:
thisword - 2
thatword -3
anotherword - 1
etc..
void wordCounter(string str, string wordsArray[], int countArray[]) {
string word = "";
for (int i = 0; i < str.size(); i++) {
if (str[i] == ' ') {
bool found = false;
int spot = 0;
int finder = 0;
for (int j = 0; j < wordsArray.length; j++) {
String wd = wordsArray[j];
if (wd == word) {
spot = finder;
found = true;
break;
}
finder++;
}
if (found) {
countArray[spot] = countArray[spot] + 1;
} else {
// wordsArray[spot + 1] = word
// countArray[spot + 1] = 1
}
word = "";
} else
word += str[i];
}
// to print
// loop through wordsArray and countArray simulatenously
// do something like
for (int simul = 0, simul < wordsArray.length, simul++) {
printf("%s %s\n", wordsArray[simul], countArray[simul]);
}
}
Haven't used C++ in a little bit but essentially if you can't find the word in the arrays, you will want to append the new word to the end of the array and append 1 to the end of the countArray.

Going ones through string to count length takes longer time, than moving string a couple of times?

I wrote the following two functions. In the second function, I used reserve() so that there is no memory reallocation, but unfortunately the second function is slower than the first.
I used release mode and this CPU profiler in Visual Studio to count time. In the second function, reallocation takes place 33 times. So my question is: Really? Going one length string to count length takes longer time, than moving this string 33 times?
string commpres2(string str)
{
string strOut;
int count = 0;
for (int i = 0; i < str.length(); ++i)
{
++count;
if (i < str.length() - 1)
{
if (str[i + 1] != str[i])
{
strOut += str[i];
strOut += to_string(count);
count = 0;
}
}
else
{
strOut += str[i] + to_string(count);
}
}
return strOut.length() < str.length() ? strOut : str;
}
string commpres3(string str)
{
int compressedLength = 0;
int countConsecutive = 0;
for (int i = 0; i < str.length(); ++i)
{
++countConsecutive;
if (i + 1 >= str.length() || str[i] != str[i + 1])
{
compressedLength += 1 +
to_string(countConsecutive).length();
countConsecutive = 0;
}
}
if (compressedLength >= str.length())
return str;
string strOut;
strOut.reserve(compressedLength);
int count = 0;
for (int i = 0; i < str.length(); ++i)
{
++count;
if (i < str.length() - 1)
{
if (str[i + 1] != str[i])
{
strOut += str[i];
strOut += to_string(count);
count = 0;
}
}
else
{
strOut += str[i] + to_string(count);
}
}
return strOut;
}
int main()
{
string str = "aabcccccaaa";
//str.size ~ 11000000;
for (int i = 0; i < 20; ++i)
str += str;
commpres2(str); //107ms //30,32% CPU
commpres3(str); //147ms //42,58% CPU
}
The 2nd function is doing more work than the 1st function, so of course it is going to take longer. Profiling the code should have shown you exactly where the code is spending its time. For instance, the 1st function loops through the str at most 1 time, but the 2nd function may loop through the same str 2 times, which by definition takes longer.
And you haven't eliminated all memory allocations from the 2nd function, either. to_string() allocates memory, and you are calling it many times before and after calling reserve(). Eliminating all of the to_string() allocations is fairly simple, using std::snprintf() into a local buffer and then std::string::append() to add that buffer to your output std::string.
You could forgo all of the pre-calculating and just reserve() the full str length even if you don't end up using all of that memory. You are not going to use up more than the original str length in the worse case scenario (no compression possible at all):
inline int to_buffer(size_t number, char *buf, size_t bufsize)
{
return snprintf(buf, bufsize, "%zu", number);
}
string commpres3(const string &str)
{
string::size_type strLen = str.length();
string strOut;
strOut.reserve(strLen);
size_t count = 0;
char buf[25];
for (string::size_type i = 0; i < strLen; ++i)
{
++count;
if (i < strLen - 1)
{
if (str[i + 1] != str[i])
{
strOut += str[i];
strOut.append(buf, to_buffer(count, buf, sizeof(buf)));
count = 0;
}
}
else
{
strOut += str[i];
strOut.append(buf, to_buffer(count, buf, sizeof(buf)));
}
if (strOut.length() >= strLen)
return str;
}
return strOut;
}
Or, if you must pre-calculate, you can replace the 1st set of to_string() calls with something else that returns the needed length without allocating memory dynamically (see this for ideas). When calculating the size to reserve, you don't need to actually convert an integer 123 to an allocated string "123" to know that it would take up 3 chars.
inline int to_buffer(size_t number, char *buf, size_t bufsize)
{
return snprintf(buf, bufsize, "%zu", number);
}
inline int to_buffer_length(size_t number)
{
return to_buffer(number, nullptr, 0);
}
string commpres3(const string &str)
{
string::size_type strLen = str.length();
string::size_type compressedLength = 0;
size_t countConsecutive = 0;
for (string::size_type i = 0; i < strLen; ++i)
{
++countConsecutive;
if (i < (strLen - 1))
{
if (str[i + 1] != str[i])
{
strOut += 1 + to_buffer_length(countConsecutive);
countConsecutive = 0;
}
}
else
{
strOut += 1 + to_buffer_length(countConsecutive);
}
}
if (compressedLength >= strLen)
return str;
string strOut;
strOut.reserve(compressedLength);
size_t count = 0;
char buf[25];
for (string::size_type i = 0; i < strLen; ++i)
{
++count;
if (i < strLen - 1)
{
if (str[i + 1] != str[i])
{
strOut += str[i];
strOut.append(buf, to_buffer(count, buf, sizeof(buf)));
count = 0;
}
}
else
{
strOut += str[i];
strOut.append(buf, to_buffer(count, buf, sizeof(buf)));
}
}
return strOut;
}
33 memory allocations vs ~11000000 extra if statements.
You are doing if (i < str.length() - 1) check in every iteration but you need to do it only once.
Consider the following:
if (str.empty()) return str;
const auto last = str.length() - 1;
for (size_t i = 0; i < last; ++i)
{
++count;
if (str[i + 1] != str[i])
{
strOut += str[i];
strOut += to_string(count);
count = 0;
}
}
strOut += str[last] + to_string(count);
Some optimization hints:
You can avoid adding count if it equals to one. Otherwise, your algorithm "compresses" "abc" to "a1b1c1".
Add an indicator that the following byte is a count not a regular character to distinguish between "a5" and "aaaaa". For instance, use 0xFF. Hence, "a5" gets encoded to "a5", but "aaaaa" -> {'a', 0xFF, 5}
Store count in binary form, not ASCII. For instance, you can write 3 (0x03) instead of '3' (0x33). You can use one byte to store count up to 255.
constexpr char COMPRESS_COUNT_SEPARATOR = 0xFF;
string compress(const string &str)
{
string strOut;
if (str.empty()) return strOut;
unsigned char count = 0;
const auto last = str.length() - 1;
for (size_t i = 0; i < last; ++i)
{
++count;
if (str[i + 1] != str[i] || count == 255)
{
strOut += str[i];
if (count > 1) {
strOut += COMPRESS_COUNT_SEPARATOR;
strOut += static_cast<char>(count);
}
count = 0;
}
}
strOut += str[last];
if (count) {
strOut += COMPRESS_COUNT_SEPARATOR;
strOut += static_cast<char>(count+1);
}
return strOut;
}
Or you can even use 0x00 as COMPRESS_COUNT_SEPARATOR because C-strings cannot contain null terminators but std::string can.

How to captalize and title a string

If I was given a string, I need to either capitalize or title the string. For example:
There are some problems with your code. Here is the modified version of your code that works fine. Here:
std::string Capitalize(const std::string &str) {
std::string Ret;
for (int i = 0; i < str.length(); i++){
char c = str[i];
if (i == 0){
Ret += toupper(c);
}
else if (i != 0){
Ret += (tolower(c));
}
}
return Ret;}
Condition in for loop needs to be str.length() not Ret.length() and here :
std::string Title(const std::string &str) {
std::string Ret;
int i=0;
for (int i=0;i<str.size();i++) {
if(!(i==0 && str[i]==' '))
Ret += tolower(str[i]);
}
int size = Ret.length();
for (int i = 0; i < size; i++) {
if (i==0 || Ret[i - 1] == ' ')
{
Ret[i] = toupper(Ret[i]);
}
}
return Ret;}
Check if i is 0 to prevent out of range access to string.
Use a stringstream to first split all words, so that you can do this easily with a vector. This is an implementation of the Title function:
std::string Title(const std::string &str) {
std::vector<string>words;
words.clear();
std::string res = str, std::ans = "";
// It's better to pass the string AFTER you convert it all lowercase. Or you can only work with the capitalized characters:
for(int i = 0; i < res.size(); ++i){
if(res[i] >= 'A' && res[i] <= 'Z'){
res[i] = tolower(res[i]);
}
}
istringstream ss(res); // We push the modified string into a stringstream.
do{
res = "";
ss >> res;
words.push_back(res); // We split the string at " " and push each word in the vector.
} while(ss)
for(int i = 0; i < words.size(); ++i){
res = words[i];
res[0] = toUpper(res[0]); // For each word, we capitalize it
ans += res; // We add the word to our return string.
if(i < words.size() - 1){
ans += " "; // If this is not the last word, add a space
}
}
return ans;
}
As for the capitalization, you can do something like this:
std::string Capitalize(const std::string &&str){
std::string res = str;
res[0] = toupper(res[0]);
for(int i = 1; i < res.size(); ++i){
if(res[i] >= 'A' && res[i] <= 'Z'){
res[i] = tolower(res[i]); // converting if only an uppercase character.
}
}
return res; // If you pass a reference, the original will be modified so no return required.
}

C++ string compress

I wrote a program to compress a string using the counts of repeated characters. if the compressed string is longer than the original string, then we still return the original string. Below is my program:
void stringCompress(char* src) {
char* original;
original = src;
char* rst;
rst = src;
int histogram[256];
for (int i = 0; i < 256; i++) {
histogram[i] = 0;
}
int length = 0;
while (*src != NULL) {
length++;
src++;
}
src = original;
int j = 0;
for (int i = 0; i < length; i++) {
histogram[(int) src[i]]++;
if (histogram[(int) src[i]] == 1) {
rst[j] = src[i];
j++;
}
}
rst[j] = '\0';
char* final;
rst = original;
int index = 0;
char buffer[33];
for (int i = 0; i < j; i++) {
final[index] = rst[i];
stringstream number;
number<<histogram[(int)rst[i]];
-------> //cout<<number.str()<<endl;
char* temp = new char[number.str().length()+1];
strcpy(temp, number.str().c_str());
index++;
cout<<temp<<endl;
for(int k =0 ;k<number.str().length();k++)
{
final[index]=temp[k];
index++;
}
}
final[index] = '\0';
src = original;
if (index <= length) {
for (int i = 0; i < index; i++)
cout<<final[i];
} else {
cout << src << endl;
}
}
But strange thing is that if I leave the cout sentence cout<<number.str()<<endl; there (the arrow points to the sentence), then the output is right. For example, aaaabcdaa outputs a6b1c1d1 and aabcd outputs aabcd. However if I comment out cout<<number.str()<<endl;, then nothing is generated. Any help is appreciated.
The variable final is uninitialized in your code. When I initialize it with a memory buffer, then your program prints the desired output whether the line you pointed to is commented out or not.
Perhaps you meant to use buffer (which is unused) as memory for final, such as:
final = buffer;

Finding the largest palindrome in string implementation

I'm trying to solve a problem that asks to find the largest palindrome in a string up to 20,000 characters. I've tried to check every sub string whether it's a palindrome, that worked, but obviously was too slow. After a little googling I found this nice algorithm
http://stevekrenzel.com/articles/longest-palnidrome. I've tried to implement it, however I can't get it to work. Also the given string contains illegal characters, so I have to convert it to only legal characters and output the longest palindrome with all characters.
Here's my attempt:
int len = original.length();
int longest = 0;
string answer;
for (int i = 0; i < len-1; i++){
int lower(0), upper(0);
if (len % 2 == 0){
lower = i;
upper = i+1;
} else {
lower = i;
upper = i;
}
while (lower >= 0 && upper <= len){
string s2 = original.substr(lower,upper-lower+1);
string s = convert(s2);
if (s[0] == s[s.length()-1]){
lower -= 1;
upper += 1;
} else {
if (s.length() > longest){
longest = s.length();
answer = s2;
}
break;
}
}
}
I can't get it to work, I've tried using this exact algorithm on paper and it worked, please help. Here's full code if you need it : http://pastebin.com/sSskr3GY
EDIT:
int longest = 0;
string answer;
string converted = convert(original);
int len = converted.length();
if (len % 2 == 0){
for (int i = 0; i < len - 1; i++){
int lower(i),upper(i+1);
while (lower >= 0 && upper <= len && converted[lower] == converted[upper]){
lower -= 1;
upper += 1;
}
string s = converted.substr(lower+1,upper-lower-1);
if (s.length() > longest){
longest = s.length();
answer = s;
}
}
} else {
for (int i = 0; i < len; i++){
int lower(i), upper(i);
while (lower >= 0 && upper <= len && converted[lower] == converted[upper]){
lower -= 1;
upper += 1;
}
string s = converted.substr(lower+1,upper-lower-1);
if (s.length() > longest){
longest = s.length();
answer = s;
}
}
}
Okay so I fixed the problems, it works perfectly fine but only if the length of converted string is odd. Please help.
I can see two major errors:
Whether you initialise your upper/lower pointers to i,i or i,i+1 depends on the parity of the palindrome's length you want to find, not the original string. So (without any further optimisations) you'll need two separate loops with i going from 0 to len (len-1), one for odd palindrome lengths and another one for even.
The algorithms should be executed on the converted string only. You have to convert the original string first for it to work.
Consider this string: abc^ba (where ^ is an illegal character), the longest palindrome excluding illegal characters is clearly abcba, but when you get to i==2, and move your lower/upper bounds out by one, they will define the bc^ substring, after conversion it becomes bc, and b != c so you concede this palindrome can't be extended.
#include <iostream>
using namespace std;
int main()
{
string s;
cin >> s;
signed int i=1;
signed int k=0;
int ml=0;
int mi=0;
bool f=0;
while(i<s.length())
{
if(s[i]!=s[i+1])
{
for(k=1;;k++)
{
if(!(s[i-k]==s[i+k] && (i-k)>=0 && (i+k)<s.length()))
{
break;
}
else if(ml < k)
{
ml=k;
mi=i;
f=1;
}
}
}
i++;
}
i=0;
while(i<s.length())
{
if(s[i]==s[i+1])
{
for(k=1;;k++)
{
if(!(s[i-k]==s[k+1+i] && (i-k)>=0 && (k+i)<s.length()))
{
break;
}
else if(ml < k)
{
ml=k;
mi=i;
}
}
}
i++;
}
if(ml < 1)
{
cout << "No Planidrom found";
return 0;
}
if(f==0)
{
cout << s.substr(mi-ml,2*ml+2);
}
else
{
cout << s.substr(mi-ml,2*ml+1);
}
return 0;
}
#biziclop : As you said.. i used 2 while loops. one for even and one for old palindrom string. finally i was able to fix it. thanks for your suggestion.
public void LongestPalindrome()
{
string str = "abbagdghhkjkjbbbbabaabbbbbba";
StringBuilder str1=new StringBuilder();
StringBuilder str2= new StringBuilder();
for (int i = 0; i < str.Length; i++)
{
str1.Append((str[i]));
for (int j = i + 1; j < str.Length; j++)
{
str1.Append((str[j]));
if (Checkpalindrome(str1))
{
str2.Append(str1);
str2.Append(" ");
}
}
str1.Clear();
}
var Palstr = str2.ToString().Split(' ');
var Longestpal = Palstr.Where(a => a.Length >= (Palstr.Max(y => y.Length)));
foreach (var s in Longestpal)
{
Console.WriteLine(s);
}
}
public bool Checkpalindrome(StringBuilder str)
{
string str1 = str.ToString();
StringBuilder str2=new StringBuilder();
var revstr = str1.Reverse();
foreach (var c in revstr )
{
str2.Append(c);
}
if (str1.Equals(str2.ToString()))
{
return true;
}
return false;
}