How can I transform a string into an abbreviated form?

How can I transform a string into an abbreviated form? - c++

I want to fit strings into a specific width. Example, "Hello world" -> "...world", "Hello...", "He...rld". Do you know where I can find code for that? It's a neat trick, very useful for representing information, and I'd like to add it in my applications (of course).
Edit: Sorry, I forgot to mention the Font part. Not just for fixed width strings but according to the font face.

It's a pretty simple algorithm to write yourself if you can't find it anywhere - the pseudocode would be something like:
if theString.Length > desiredWidth:
theString = theString.Left(desiredWidth-3) + "...";
or if you want the ellipsis at the start of the string, that second line would be:
theString = "..." + theString.Right(desiredWidth-3);
or if you want it in the middle:
theString = theString.Left((desiredWidth-3)/2) + "..." + theString.Right((desiredWidth-3)/2 + ((desiredWidth-3) mod 2))
Edit:
I'll assume you're using MFC. Since you're wanting it with fonts, you could use the CDC::GetOutputTextExtent function. Try:
CString fullString
CSize size = pDC->GetOutputTextExtent(fullString);
bool isTooWide = size.cx > desiredWidth;
If that's too big, then you can then do a search to try and find the longest string you can fit; and it could be as clever a search as you want - for instance, you could just try "Hello Worl..." and then "Hello Wor..." and then "Hello Wo..."; removing one character until you find it fits. Alternatively, you could do a binary search - try "Hello Worl..." - if that doesn't work, then just use half the characters of the original text: "Hello..." - if that fits, try halfway between it and : "Hello Wo..." until you find the longest that does still fit. Or you could try some estimating heuristic (divide the total length by the desired length, proportionately estimate the required number of characters, and search from there.
The simple solution is something like:
unsigned int numberOfCharsToUse = fullString.GetLength();
bool isTooWide = true;
CString ellipsis = "...";
while (isTooWide)
{
numberOfCharsToUse--;
CString string = fullString.Left(numberOfCharsToUse) + ellipsis;
CSize size = pDC->GetOutputTextExtent(string);
isTooWide = size.cx > desiredWidth;
}

It's really pretty trivial; I don't think you'll find specific code unless you have something more structured in mind.
You basically want:
to get the length of the string you have, and the window width.
figure out how many charaters you can take from the original string, which will basically be window width-3. Call that k.
Depending on whether you want to put the ellipsis in the middle or at the right hand end, either take the first floor(k/2) characters from one end, concatenated with "...", then concatenated with the last floor(k/2) characters (with possibly one more character needed because of the division); or take the first k characters, ollowed by "...".

I think Smashery's answer is a good start. One way to get to the end result would be to write some test code with some test inputs and desired outputs. Once you have a good set of tests setup, you can implement your string manipulation code until you get all of your tests to pass.

Calculate the width of the text (
based on the font)
In case you are using MFC the API GetOutputTextExtent will get you the value.
if the width exceeds the given
specific width, calculate the ellipse width first:
ellipseWidth = calculate the width of (...)
Remove the string part with width ellipseWidth from the end and append ellipse.
something like: Hello...

For those who are interested for a complete routine, this is my answer :
/**
* Returns a string abbreviation
* example: "hello world" -> "...orld" or "hell..." or "he...rd" or "h...rld"
*
* style:
0: clip left
1: clip right
2: clip middle
3: pretty middle
*/
CString*
strabbr(
CDC* pdc,
const char* s,
const int area_width,
int style )
{
if ( !pdc || !s || !*s ) return new CString;
int len = strlen(s);
if ( pdc->GetTextExtent(s, len).cx <= area_width ) return new CString(s);
int dots_width = pdc->GetTextExtent("...", 3).cx;
if ( dots_width >= area_width ) return new CString;
// My algorithm uses 'left' and 'right' parts of the string, by turns.
int n = len;
int m = 1;
int n_width = 0;
int m_width = 0;
int tmpwidth;
// fromleft indicates where the clip is done so I can 'get' chars from the other part
bool fromleft = (style == 3 && n % 2 == 0)? false : (style > 0);
while ( TRUE ) {
if ( n_width + m_width + dots_width > area_width ) break;
if ( n <= m ) break; // keep my sanity check (WTF), it should never happen 'cause of the above line
// Here are extra 'swap turn' conditions
if ( style == 3 && (!(n & 1)) )
fromleft = (!fromleft);
else if ( style < 2 )
fromleft = (!fromleft); // (1)'disables' turn swapping for styles 0, 1
if ( fromleft ) {
pdc->GetCharWidth(*(s+n-1), *(s+n-1), &tmpwidth);
n_width += tmpwidth;
n--;
}
else {
pdc->GetCharWidth(*(s+m-1), *(s+m-1), &tmpwidth);
m_width += tmpwidth;
m++;
}
fromleft = (!fromleft); // (1)
}
if ( fromleft ) m--; else n++;
// Final steps
// 1. CString version
CString* abbr = new CString;
abbr->Format("%*.*s...%*.*s", m-1, m-1, s, len-n, len-n, s + n);
return abbr;
/* 2. char* version, if you dont want to use CString (efficiency), replace CString with char*,
new CString with _strdup("") and use this code for the final steps:
char* abbr = (char*)malloc(m + (len-n) + 3 +1);
strncpy(abbr, s, m-1);
strcpy(abbr + (m-1), "...");
strncpy(abbr+ (m-1) + 3, s + n, len-n);
abbr[(m-1) + (len-n) + 3] = 0;
return abbr;
*/
}

If you just want the "standard" ellipsis format, ("Hello...") you can use the
DrawText (or the equivalient MFC function) function passing DT_END_ELLIPSIS.
Check out also DT_WORD_ELLIPSIS (it truncates any word that does not fit in the rectangle and adds ellipses) or DT_PATH_ELLIPSIS (what Explorer does to display long paths).

Related

Recursive Anagram Calculator C++

I am trying to make a program that, with a given word, can calculate and print every letter combination.
To be more specific, I am asked to use a recursive function and what I should get is something like this:
Given word: HOME
EHOM
EMOH
MEHO
The approach I am taking in to swap the content #x with #x+1 like this
string[0]->string[1]
string[0]->string[2]
This is what I came up with
void anagram(char * s, int len, int y)
{
char temp; //used to store the content to swap betwen the two
if (len < 0) //when the total lenght of the array gets to 0 it means that every single swap has been made
return;
temp = s[len]; //swapping
s[len] = s[y];
s[y] = temp;
puts(s); //prints the string
if (y == 0)
return anagram(s, len-1, y - 1);
return anagram(s, len, y - 1);
}
What I get is just a huge mess and a Break Point from VS (if not a crash).
Can somebody help me please?

Trying to properly substring with String

Given the following function:
If I execute setColor("R:0,G:0,B:255,");
I'm expecting the red, grn, blu values to be:
0 0 255 except I'm getting 0 0 0
It's working fine for R:255,G:0,B:0, or R:0,G:255,B:0, though.
int setColor(String command) {
//Parse the incoming command string
//Example command R:123,G:100,B:50,
//RGB values should be between 0 to 255
int red = getColorValue(command, "R:", "G");
int grn = getColorValue(command, "G:", "B");
int blu = getColorValue(command, "B:", ",");
// Set the color of the entire Neopixel ring.
uint16_t i;
for (i = 0; i < strip.numPixels(); i++) {
strip.setPixelColor(i, strip.Color(red, grn, blu));
}
strip.show();
return 1;
}
int getColorValue(String command, String first, String second) {
int rgbValue;
String val = command.substring(command.indexOf(first)+2, command.indexOf(second));
val.trim();
rgbValue = val.toInt();
return rgbValue;
}

Without knowing your String implementation, I can only make an educated guess.
What happens is that indexOf(second) doesn't give you what you think.
"R:0,G:0,B:255,"
^ ^- indexOf("B:")
|- indexOf(",")
It works for your other cases as none of the things they look for occur more than once in the string.
Looking at the SparkCore Docs we find the documentation for both indexOf and substring.
indexOf()
Locates a character or String within another String. By default, searches from the beginning of the String, but can also start from a given index, allowing for the locating of all instances of the character or String.
string.indexOf(val)
string.indexOf(val, from)
substring()
string.substring(from)
string.substring(from, to)
So now to fix your problem you can use the second variant of indexOf and pass that the index you found from your first search.
int getColorValue(String command, String first, String second) {
int rgbValue;
int beg = command.indexOf(first)+2;
int end = command.indexOf(second, beg);
String val = command.substring(beg, end);
val.trim();
rgbValue = val.toInt();
return rgbValue;
}

In this instance, I would split the string using a comma as the delimiter then parse each substring into a key-value pair. You could use a vector of value for the second part if you always have the sequence "R,G,B" in which case why have the "R:", "G:" or "B:" at all?

I can suppose that command.indexOf(second) will always find you the first comma, therefore for B the val becomes empty string.
Assuming that indexOf is something similar to .Net's, maybe try
int start = command.indexOf(first)+2;
int end = command.indexOf(second, start)
String val = command.substring(start+2, end);
Note the second argument for the second call to indexOf, I think it will make indexOf to look for matches after start. I also think you'd better pass a "," as a second for all calls, and add +1 or -1 to end to compensate for this passing "," instead of "G" and "B".
Or just use another limiter for B part, like R:0,G:0,B:0. (dot instead of comma).

I ended up just modifying my code:
int setColor(String command) {
int commaIndex = command.indexOf(',');
int secondCommaIndex = command.indexOf(',', commaIndex+1);
int lastCommaIndex = command.lastIndexOf(',');
String red = command.substring(0, commaIndex);
String grn = command.substring(commaIndex+1, secondCommaIndex);
String blu = command.substring(lastCommaIndex+1);
// Set the color of the entire Neopixel ring.
uint16_t i;
for (i = 0; i < strip.numPixels(); i++) {
strip.setPixelColor(i, strip.Color(red.toInt(), grn.toInt(), blu.toInt()));
}
strip.show();
return 1;
}
I simply just do: 255,0,0 and it works a treat.

Looping through a string, more than one char gets output

I am using C++ and SDL to make a game for fun.
I show the kill count on the screen by turning it into a surface using TTF_RenderText, which needs a const char*. There are gaps in between where I want the individual digits to be shown so I split up the string into individual chars.
This is the code I wrote to split up the string and render it on the screen:
SpareStream.str("");
SpareStream << Kills;
std::string KillsString = SpareStream.str();
for (int i = 0; i <= 4; i++)
{
if(i < KillsString.size())
{
std::string Cheat = KillsString.substr(i,i+1);
const char *KillsChar = Cheat.c_str();
Message = TTF_RenderText_Solid(EightBitLimit,KillsChar,White);
}
else Message = TTF_RenderText_Solid(EightBitLimit,"0",White);
ApplySurface(540 + (45 * i),(500 - HUD->h) + 24,Message,Screen);
}
However, when the kill count exceeds more than 100, this happens:
the tens and ones are shown where only the tens should be.
Why is this happening?

from my reference, std::string::substr is:
string substr (size_t pos = 0, size_t len = npos) const;
Thats a start/length pair, not start/end, so you need:
std::string Cheat = KillsString.substr(i,1);
btw, while start/length pairs are rarely use for containers in teh standard library, they were so universal for char* management in C that it does frequently turn up in the string classes.

Adapting Boyer-Moore Implementation

I'm trying to adapt the Boyer-Moore c(++) Wikipedia implementation to get all of the matches of a pattern in a string. As it is, the Wikipedia implementation returns the first match. The main code looks like:
char* boyer_moore (uint8_t *string, uint32_t stringlen, uint8_t *pat, uint32_t patlen) {
int i;
int delta1[ALPHABET_LEN];
int *delta2 = malloc(patlen * sizeof(int));
make_delta1(delta1, pat, patlen);
make_delta2(delta2, pat, patlen);
i = patlen-1;
while (i < stringlen) {
int j = patlen-1;
while (j >= 0 && (string[i] == pat[j])) {
--i;
--j;
}
if (j < 0) {
free(delta2);
return (string + i+1);
}
i += max(delta1[string[i]], delta2[j]);
}
free(delta2);
return NULL;
}
I have tried to modify the block after if (j < 0) to add the index to an array/vector and letting the outer loop continue, but it doesn't appear to be working. In testing the modified code I still only get a single match. Perhaps this implementation wasn't designed to return all matches, and it needs more than a few quick changes to do so? I don't understand the algorithm itself very well, so I'm not sure how to make this work. If anyone can point me in the right direction I would be grateful.
Note: The functions make_delta1 and make_delta2 are defined earlier in the source (check Wikipedia page), and the max() function call is actually a macro also defined earlier in the source.

Boyer-Moore's algorithm exploits the fact that when searching for, say, "HELLO WORLD" within a longer string, the letter you find in a given position restricts what can be found around that position if a match is to be found at all, sort of a Naval Battle game: if you find open sea at four cells from the border, you needn't test the four remaining cells in case there's a 5-cell carrier hiding there; there can't be.
If you found for example a 'D' in eleventh position, it might be the last letter of HELLO WORLD; but if you found a 'Q', 'Q' not being anywhere inside HELLO WORLD, this means that the searched-for string can't be anywhere in the first eleven characters, and you can avoid searching there altogether. A 'L' on the other hand might mean that HELLO WORLD is there, starting at position 11-3 (third letter of HELLO WORLD is a L), 11-4, or 11-10.
When searching, you keep track of these possibilities using the two delta arrays.
So when you find a pattern, you ought to do,
if (j < 0)
{
// Found a pattern from position i+1 to i+1+patlen
// Add vector or whatever is needed; check we don't overflow it.
if (index_size+1 >= index_counter)
{
index[index_counter] = 0;
return index_size;
}
index[index_counter++] = i+1;
// Reinitialize j to restart search
j = patlen-1;
// Reinitialize i to start at i+1+patlen
i += patlen +1; // (not completely sure of that +1)
// Do not free delta2
// free(delta2);
// Continue loop without altering i again
continue;
}
i += max(delta1[string[i]], delta2[j]);
}
free(delta2);
index[index_counter] = 0;
return index_counter;
This should return a zero-terminated list of indexes, provided you pass something like a size_t *indexes to the function.
The function will then return 0 (not found), index_size (too many matches) or the number of matches between 1 and index_size-1.
This allows for example to add additional matches without having to repeat the whole search for the already found (index_size-1) substrings; you increase num_indexes by new_num, realloc the indexes array, then pass to the function the new array at offset old_index_size-1, new_num as the new size, and the haystack string starting from the offset of match at index old_index_size-1 plus one (not, as I wrote in a previous revision, plus the length of the needle string; see comment).
This approach will report also overlapping matches, for example searching ana in banana will find b*ana*na and ban*ana*.
UPDATE
I tested the above and it appears to work. I modified the Wikipedia code by adding these two includes to keep gcc from grumbling
#include <stdio.h>
#include <string.h>
then I modified the if (j < 0) to simply output what it had found
if (j < 0) {
printf("Found %s at offset %d: %s\n", pat, i+1, string+i+1);
//free(delta2);
// return (string + i+1);
i += patlen + 1;
j = patlen - 1;
continue;
}
and finally I tested with this
int main(void)
{
char *s = "This is a string in which I am going to look for a string I will string along";
char *p = "string";
boyer_moore(s, strlen(s), p, strlen(p));
return 0;
}
and got, as expected:
Found string at offset 10: string in which I am going to look for a string I will string along
Found string at offset 51: string I will string along
Found string at offset 65: string along
If the string contains two overlapping sequences, BOTH are found:
char *s = "This is an andean andeandean andean trouble";
char *p = "andean";
Found andean at offset 11: andean andeandean andean trouble
Found andean at offset 18: andeandean andean trouble
Found andean at offset 22: andean andean trouble
Found andean at offset 29: andean trouble
To avoid overlapping matches, the quickest way is to not store the overlaps. It could be done in the function but it would mean to reinitialize the first delta vector and update the string pointer; we also would need to store a second i index as i2 to keep saved indexes from going nonmonotonic. It isn't worth it. Better:
if (j < 0) {
// We have found a patlen match at i+1
// Is it an overlap?
if (index && (indexes[index] + patlen < i+1))
{
// Yes, it is. So we don't store it.
// We could store the last of several overlaps
// It's not exactly trivial, though:
// searching 'anana' in 'Bananananana'
// finds FOUR matches, and the fourth is NOT overlapped
// with the first. So in case of overlap, if we want to keep
// the LAST of the bunch, we must save info somewhere else,
// say last_conflicting_overlap, and check twice.
// Then again, the third match (which is the last to overlap
// with the first) would overlap with the fourth.
// So the "return as many non overlapping matches as possible"
// is actually accomplished by doing NOTHING in this branch of the IF.
}
else
{
// Not an overlap, so store it.
indexes[++index] = i+1;
if (index == max_indexes) // Too many matches already found?
break; // Stop searching and return found so far
}
// Adapt i and j to keep searching
i += patlen + 1;
j = patlen - 1;
continue;
}

Efficiently check string for one of several hundred possible suffixes

I need to write a C/C++ function that would quickly check if string ends with one of ~1000 predefined suffixes. Specifically the string is a hostname and I need to check if it belongs to one of several hundred predefined second-level domains.
This function will be called a lot so it needs to be written as efficiently as possible. Bitwise hacks etc anything goes as long as it turns out fast.
Set of suffixes is predetermined at compile-time and doesn't change.
I am thinking of either implementing a variation of Rabin-Karp or write a tool that would generate a function with nested ifs and switches that would be custom tailored to specific set of suffixes. Since the application in question is 64-bit to speed up comparisons I could store suffixes of up to 8 bytes in length as const sorted array and do binary search within it.
Are there any other reasonable options?

If the suffixes don't contain any expansions/rules (like a regex), you could build a Trie of the suffixes in reverse order, and then match the string based on that. For instance
suffixes:
foo
bar
bao
reverse order suffix trie:
o
-a-b (matches bao)
-o-f (matches foo)
r-a-b (matches bar)
These can then be used to match your string:
"mystringfoo" -> reverse -> "oofgnirtsym" -> trie match -> foo suffix

You mention that you're looking at second-level domain names only, so even without knowing the precise set of matching domains, you could extract the relevant portion of the input string.
Then simply use a hashtable. Dimension it in such a way that there are no collisions, so you don't need buckets; lookups will be exactly O(1). For small hash types (e.g. 32 bits), you'd want to check if the strings really match. For a 64-bit hash, the probability of another domain colliding with one of the hashes in your table is already so low (order 10^-17) that you can probably live with it.

I would reverse all of the suffix strings, build a prefix tree of them and then test the reverse of your IP string against that.

I think that building your own automata would be the most efficient way.. it's a sort of your second solution, according to which, starting from a finite set of suffixes, it generates an automaton fitted for that suffixes.
I think you can easily use flex to do it, taking care of reversing the input or handling in a special way the fact that you are looking just for suffixes (just for efficienty matters)..
By the way using a Rabin-Karp approach would be efficient too since your suffixes will be short. You can fit a hashset with all the suffixes needed and then
take a string
take the suffix
calculate the hash of the suffix
check if suffix is in the table

Just create a 26x26 array of set of domains. e.g. thisArray[0][0] will be the domains that end in 'aa', thisArray[0][1] is all the domains that end in 'ab' and so on...
Once you have that, just search your array for thisArray[2nd last char of hostname][last char of hostname] to get the possible domains. If there's more than one at that stage, just brute force the rest.

I think that the solution should be very different depending on the type of input strings. If the strings are some kind of string class that can be iterated from the end (such as stl strings) it is a lot easier than if they are NULL-terminated C-strings.
String Class
Iterate the string backwards (don't make a reverse copy - use some kind of backward iterator). Build a Trie where each node consists of two 64-bit words, one pattern and one bitmask. Then check 8 characters at a time in each level. The mask is used if you want to match less than 8 characters - e.g. deny "*.org" would give a mask with 32 bits set. The mask is also used as termination criteria.
C strings
Construct an NDFA that matches the strings on a single-pass over them. That way you don't have to first iterate to the end but can instead use it in one pass. An NDFA can be converted to a DFA, which will probably make the implementation more efficient. Both construction of the NDFA and conversion to DFA will probably be so complex that you will have to write tools for it.

After some research and deliberation I've decided to go with trie/finite state machine approach.
The string is parsed starting from the last character going backwards using a TRIE as long as the portion of suffix that was parsed so far can correspond to multiple suffixes. At some point we either hit the first character of one of the possible suffixes which means that we have a match, hit a dead end, which means there are no more possible matches or get into situation where there is only one suffix candidate. In this case we just do compare remainder of the suffix.
Since trie lookups are constant time, worst case complexity is o(maximum suffix length). The function turned out to be pretty fast. On 2.8Ghz Core i5 it can check 33,000,000 strings per second for 2K possible suffixes. 2K suffixes totaling 18 kilobytes, expanded to 320kb trie/state machine table. I guess that I could have stored it more efficiently but this solution seems to work good enough for the time being.
Since suffix list was so large, I didn't want to code it all by hand so I ended up writing C# application that generated C code for the suffix checking function:
public static uint GetFourBytes(string s, int index)
{
byte[] bytes = new byte[4] { 0, 0, 0, 0};
int len = Math.Min(s.Length - index, 4);
Encoding.ASCII.GetBytes(s, index, len, bytes, 0);
return BitConverter.ToUInt32(bytes, 0);
}
public static string ReverseString(string s)
{
char[] chars = s.ToCharArray();
Array.Reverse(chars);
return new string(chars);
}
static StringBuilder trieArray = new StringBuilder();
static int trieArraySize = 0;
static void Main(string[] args)
{
// read all non-empty lines from input file
var suffixes = File
.ReadAllLines(#"suffixes.txt")
.Where(l => !string.IsNullOrEmpty(l));
var reversedSuffixes = suffixes
.Select(s => ReverseString(s));
int start = CreateTrieNode(reversedSuffixes, "");
string outFName = #"checkStringSuffix.debug.h";
if (args.Length != 0 && args[0] == "--release")
{
outFName = #"checkStringSuffix.h";
}
using (StreamWriter wrt = new StreamWriter(outFName))
{
wrt.WriteLine(
"#pragma once\n\n" +
"#define TRIE_NONE -1000000\n"+
"#define TRIE_DONE -2000000\n\n"
);
wrt.WriteLine("const int trieArray[] = {{{0}\n}};", trieArray);
wrt.WriteLine(
"inline bool checkSingleSuffix(const char* str, const char* curr, const int* trie) {\n"+
" int len = trie[0];\n"+
" if (curr - str < len) return false;\n"+
" const char* cmp = (const char*)(trie + 1);\n"+
" while (len-- > 0) {\n"+
" if (*--curr != *cmp++) return false;\n"+
" }\n"+
" return true;\n"+
"}\n\n"+
"bool checkStringSuffix(const char* str, int len) {\n" +
" if (len < " + suffixes.Select(s => s.Length).Min().ToString() + ") return false;\n" +
" const char* curr = (str + len - 1);\n"+
" int currTrie = " + start.ToString() + ";\n"+
" while (curr >= str) {\n" +
" assert(*curr >= 0x20 && *curr <= 0x7f);\n" +
" currTrie = trieArray[currTrie + *curr - 0x20];\n" +
" if (currTrie < 0) {\n" +
" if (currTrie == TRIE_NONE) return false;\n" +
" if (currTrie == TRIE_DONE) return true;\n" +
" return checkSingleSuffix(str, curr, trieArray - currTrie - 1);\n" +
" }\n"+
" --curr;\n"+
" }\n" +
" return false;\n"+
"}\n"
);
}
}
private static int CreateTrieNode(IEnumerable<string> suffixes, string prefix)
{
int retVal = trieArraySize;
if (suffixes.Count() == 1)
{
string theSuffix = suffixes.Single();
trieArray.AppendFormat("\n\t/* {1} - {2} */ {0}, ", theSuffix.Length, trieArraySize, prefix);
++trieArraySize;
for (int i = 0; i < theSuffix.Length; i += 4)
{
trieArray.AppendFormat("0x{0:X}, ", GetFourBytes(theSuffix, i));
++trieArraySize;
}
retVal = -(retVal + 1);
}
else
{
var groupByFirstChar =
from s in suffixes
let first = s[0]
let remainder = s.Substring(1)
group remainder by first;
string[] trieIndexes = new string[0x60];
for (int i = 0; i < trieIndexes.Length; ++i)
{
trieIndexes[i] = "TRIE_NONE";
}
foreach (var g in groupByFirstChar)
{
if (g.Any(s => s == string.Empty))
{
trieIndexes[g.Key - 0x20] = "TRIE_DONE";
continue;
}
trieIndexes[g.Key - 0x20] = CreateTrieNode(g, g.Key + prefix).ToString();
}
trieArray.AppendFormat("\n\t/* {1} - {2} */ {0},", string.Join(", ", trieIndexes), trieArraySize, prefix);
retVal = trieArraySize;
trieArraySize += 0x60;
}
return retVal;
}
So it generates code like this:
inline bool checkSingleSuffix(const char* str, const char* curr, const int* trie) {
int len = trie[0];
if (curr - str < len) return false;
const char* cmp = (const char*)(trie + 1);
while (len-- > 0) {
if (*--curr != *cmp++) return false;
}
return true;
}
bool checkStringSuffix(const char* str, int len) {
if (len < 5) return false;
const char* curr = (str + len - 1);
int currTrie = 81921;
while (curr >= str) {
assert(*curr >= 0x20 && *curr <= 0x7f);
currTrie = trieArray[currTrie + *curr - 0x20];
if (currTrie < 0) {
if (currTrie == TRIE_NONE) return false;
if (currTrie == TRIE_DONE) return true;
return checkSingleSuffix(str, curr, trieArray - currTrie - 1);
}
--curr;
}
return false;
}
Since for my particular set of data len in checkSingleSuffix was never more than 9, I tried to replace the comparison loop with switch (len) and hardcoded comparison routines that compared up to 8 bytes of data at a time but it didn't affect overall performance at all either way.
Thanks for everyone who contributed their ideas!

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How can I transform a string into an abbreviated form? - c++

I think Smashery's answer is a good start. One way to get to the end result would be to write some test code with some test inputs and desired outputs. Once you have a good set of tests setup, you can implement your string manipulation code until you get all of your tests to pass.

Related

Recursive Anagram Calculator C++

Trying to properly substring with String

Looping through a string, more than one char gets output

Adapting Boyer-Moore Implementation

Efficiently check string for one of several hundred possible suffixes

Categories

Resources