Related
I am trying to convert strings to integers and sort them based on the integer value. These values should be unique to the string, no other string should be able to produce the same value. And if a string1 is bigger than string2, its integer value should be greater. Ex: since "orange" > "apple", "orange" should have a greater integer value. How can I do this?
I know there are an infinite number of possibilities between just 'a' and 'b' but I am not trying to fit every single possibility into a number. I am just trying to possibly sort, let say 1 million values, not an infinite amount.
I was able to get the values to be unique using the following:
long int order = 0;
for (auto letter : word)
order = order * 26 + letter - 'a' + 1;
return order;
but this obviously does not work since the value for "apple" will be greater than the value for "z".
This is not a homework assignment or a puzzle, this is something I thought of myself. Your help is appreciated, thank you!
You are almost there ... just a minor tweaks are needed:
you are multiplying by 26
however you have letters (a..z) and empty space so you should multiply by 27 instead !!!
Add zeropading
in order to make starting letter the most significant digit you should zeropad/align the strings to common length... if you are using 32bit integers then max size of string is:
floor(log27(2^32)) = 6
floor(32/log2(27)) = 6
Here small example:
int lexhash(char *s)
{
int i,h;
for (h=0,i=0;i<6;i++) // process string
{
if (s[i]==0) break;
h*=27;
h+=s[i]-'a'+1;
}
for (;i<6;i++) h*=27; // zeropad missing letters
return h;
}
returning these:
14348907 a
28697814 b
43046721 c
373071582 z
15470838 abc
358171551 xyz
23175774 apple
224829626 orange
ordered by hash:
14348907 a
15470838 abc
23175774 apple
28697814 b
43046721 c
224829626 orange
358171551 xyz
373071582 z
This will handle all lowercase a..z strings up to 6 characters length which is:
26^6 + 26^5 +26^4 + 26^3 + 26^2 + 26^1 = 321272406 possibilities
For more just use bigger bitwidth for the hash. Do not forget to use unsigned type if you use the highest bit of it too (not the case for 32bit)
You can use position of char:
std::string s("apple");
int result = 0;
for (size_t i = 0; i < s.size(); ++i)
result += (s[i] - 'a') * static_cast<int>(i + 1);
return result;
By the way, you are trying to get something very similar to hash function.
I'm solving the following problem:
The assignment is to create and return a string object that consists of digits in an int that is sent in through the function's parameter; so the expected output of the function call string pattern(int n) would be "1\n22\n..n\n".
In case you're interested, here is the URL (You need to be signed in to view) to the full assignment, a CodeWars Kata
This is one of the tests (with my return included):
Test-case input: pattern(2)
Expected:
1
22
Actual: "OUTPUT"
//string header file and namespace are already included for you
string pattern(int n){
string out = "OUTPUT";
for (int i = 1; i <= n; ++i){
string temp = "";
temp.insert(0, i, i);
out += temp;
}
return out;
}
The code is self-explanatory and I'm sure there are multiple ways of making it run quicker and more efficiently.
My question is two-fold. Why doesn't my loop start (even though my expression should hold true (1 <= 2) for above case)?
And how does my code hold in the grand scheme of things? Am I breaking any best-practices?
The overload of std::string::insert() that you are using takes three arguments:
index
count
character
You are using i as both count and character. However, the function expects the character to be of char type. In your case, your i is interpreted as a character with the code of 1 and 2, which are basically spaces (well, not really, but whatever). So your output really looks like OUTPUT___ where ___ are three spaces.
If you look at the ascii table, you will notice that digits 0123...9 have indexes from 48 to 57, so to get an index of a particular number, you can do i + 48, or i + '0' (where '0' is the index of 0, which is 48). Finally, you can do it all in the constructor:
string temp(i, i + '0');
The loop works - but does nothing visible. You insert the character-code 1 - not the character '1'; use:
temp.insert(0, i, '0'+i);
the insert method is not called right:
temp.insert(0, i, i); --->
temp.insert(0, i, i+'0');
PROBLEM SOLVED: thanks everyone!
I am almost entirely new to C++ so I apologise in advance if the question seems trivial.
I am trying to convert a string of letters to a set of 2 digit numbers where a = 10, b = 11, ..., Y = 34, Z = 35 so that (for example) "abc def" goes to "101112131415". How would I go about doing this? Any help would really be appreciated. Also, I don't mind whether capitalization results in the same number or a different number. Thank you very much in advance. I probably won't need it for a few days but if anyone is feeling particularly nice how would I go about reversing this process? i.e. "101112131415" --> "abcdef" Thanks.
EDIT: This isn't homework, I'm entirely self taught. I have completed this project before in a different language and decided to try C++ to compare the differences and try to learn C++ in the process :)
EDIT: I have roughly what I want, I just need a little bit of help converting this so that it applies to strings, thanks guys.
#include <iostream>
#include <sstream>
#include <string>
int returnVal (char x)
{
return (int) x - 87;
}
int main()
{
char x = 'g';
std::cout << returnVal(x);
}
A portable method is to use a table lookup:
const unsigned int letter_to_value[] =
{10, 11, 12, /*...*/, 35};
// ...
letter = toupper(letter);
const unsigned int index = letter - 'A';
value = letter_to_value[index];
cout << index;
Each character has it's ASCII values. Try converting your characters into ASCII and then manipulate the difference.
Example:
int x = 'a';
cout << x;
will print 97; and
int x = 'a';
cout << x - 87;
will print 10.
Hence, you could write a function like this:
int returnVal(char x)
{
return (int)x - 87;
}
to get the required output.
And your main program could look like:
int main()
{
string s = "abcdef"
for (unsigned int i = 0; i < s.length(); i++)
{
cout << returnVal(s[i]);
}
return 0;
}
This is a simple way to do it, if not messy.
map<char, int> vals; // maps a character to an integer
int g = 1; // if a needs to be 10 then set g = 10
string alphabet = "abcdefghijklmnopqrstuvwxyz";
for(char c : alphabet) { // kooky krazy for loop
vals[c] = g;
g++;
}
What Daniel said, try it out for yourself.
As a starting point though, casting:
int i = (int)string[0] + offset;
will get you your number from character, and: stringstream will be useful too.
How would I go about doing this?
By trying to do something first, and looking for help only if you feel you cannot advance.
That being said, the most obvious solution that comes to mind is based on the fact that characters (i.e. 'a', 'G') are really numbers. Suppose you have the following:
char c = 'a';
You can get the number associated with c by doing:
int n = static_cast<int>(c);
Then, add some offset to 'n':
n += 10;
...and cast it back to a char:
c = static_cast<char>(n);
Note: The above assumes that characters are consecutive, i.e. the number corresponding to 'a' is equal to the one corresponding to 'z' minus the amount of letters between the two. This usually holds, though.
This can work
int Number = 123; // number to be converted to a string
string Result; // string which will contain the result
ostringstream convert; // stream used for the conversion
convert << Number; // insert the textual representation of 'Number' in the characters in the stream
Result = convert.str(); // set 'Result' to the contents of the stream
you should add this headers
#include <sstream>
#include <string>
Many answers will tell you that characters are encoded in ASCII and that you can convert a letter to an index by subtracting 'a'.
This is not proper C++. It is acceptable when your program requirements include a specification that ASCII is in use. However, the C++ standard alone does not require this. There are C++ implementations with other character sets.
In the absence of knowledge that ASCII is in use, you can use translation tables:
#include <limits.h>
// Define a table to translate from characters to desired codes:
static unsigned int Translate[UCHAR_MAX] =
{
['a'] = 10,
['b'] = 11,
…
};
Then you may translate characters to numbers by looking them up in the table:
unsigned char x = something;
int result = Translate[x];
Once you have the translation, you could print it as two digits using printf("%02d", result);.
Translating in the other direction requires reading two characters, converting them to a number (interpreting them as decimal), and performing a similar translation. You might have a different translation table set up for this reverse translation.
Just do this !
(s[i] - 'A' + 1)
Basically we are converting a char to number by subtracting it by A and then adding 1 to match the number and letters
In my program I need to search in a quite big string (~1 mb) for a relatively small substring (< 1 kb).
The problem is the string contains simple wildcards in the sense of "a?c" which means I want to search for strings like "abc" or also "apc",... (I am only interested in the first occurence).
Until now I use the trivial approach (here in pseudocode)
algorithm "search", input: haystack(string), needle(string)
for(i = 0, i < length(haystack), ++i)
if(!CompareMemory(haystack+i,needle,length(needle))
return i;
return -1; (Not found)
Where "CompareMemory" returns 0 iff the first and second argument are identical (also concerning wildcards) only regarding the amount of bytes the third argument gives.
My question is now if there is a fast algorithm for this (you don't have to give it, but if you do I would prefer c++, c or pseudocode). I started here
but I think most of the fast algorithms don't allow wildcards (by the way they exploit the nature of strings).
I hope the format of the question is ok because I am new here, thank you in advance!
A fast way, which is kind of the same thing as using a regexp, (which I would recommend anyway), is to find something that is fixed in needle, "a", but not "?", and search for it, then see if you've got a complete match.
j = firstNonWildcardPos(needle)
for(i = j, i < length(haystack)-length(needle)+j, ++i)
if(haystack[i] == needle[j])
if(!CompareMemory(haystack+i-j,needle,length(needle))
return i;
return -1; (Not found)
A regexp would generate code similar to this (I believe).
Among strings over an alphabet of c characters, let S have length s and let T_1 ... T_k have average length b. S will be searched for each of the k target strings. (The problem statement doesn't mention multiple searches of a given string; I mention it below because in that paradigm my program does well.)
The program uses O(s+c) time and space for setup, and (if S and the T_i are random strings) O(k*u*s/c) + O(k*b + k*b*s/c^u) total time for searching, with u=3 in program as shown. For longer targets, u should be increased, and rare, widely-separated key characters chosen.
In step 1, the program creates an array L of s+TsizMax integers (in program, TsizMax = allowed target length) and uses it for c lists of locations of next occurrences of characters, with list heads in H[] and tails in T[]. This is the O(s+c) time and space step.
In step 2, the program repeatedly reads and processes target strings. Step 2A chooses u = 3 different non-wild key characters (in current target). As shown, the program just uses the first three such characters; with a tiny bit more work, it could instead use the rarest characters in the target, to improve performance. Note, it doesn't cope with targets with fewer than three such characters.
The line "L[T[r]] = L[g+i] = g+i;" within Step 2A sets up a guard cell in L with proper delta offset so that Step 2G will automatically execute at end of search, without needing any extra testing during the search. T[r] indexes the tail cell of the list for character r, so cell L[g+i] becomes a new, self-referencing, end-of-list for character r. (This technique allows the loops to run with a minimum of extraneous condition testing.)
Step 2B sets vars a,b,c to head-of-list locations, and sets deltas dab, dac, and dbc corresponding to distances between the chosen key characters in target.
Step 2C checks if key characters appear in S. This step is necessary because otherwise a while loop in Step 2E will hang. We don't want more checks within those while loops because they are the inner loops of search.
Step 2D does steps 2E to 2i until var c points to after end of S, at which point it is impossible to make any more matches.
Step 2E consists of u = 3 while loops, that "enforce delta distances", that is, crawl indexes a,b,c along over each other as long as they are not pattern-compatible. The while loops are fairly fast, each being in essence (with ++si instrumentation removed) "while (v+d < w) v = L[v]" for various v, d, w. Replicating the three while loops a few times may increase performance a little and will not change net results.
In Step 2G, we know that the u key characters match, so we do a complete compare of target to match point, with wild-character handling. Step 2H reports result of compare. Program as given also reports non-matches in this section; remove that in production.
Step 2I advances all the key-character indexes, because none of the currently-indexed characters can be the key part of another match.
You can run the program to see a few operation-count statistics. For example, the output
Target 5=<de?ga>
012345678901234567890123456789012345678901
abc1efgabc2efgabcde3gabcdefg4bcdefgabc5efg
# 17, de?ga and de3ga match
# 24, de?ga and defg4 differ
# 31, de?ga and defga match
Advances: 'd' 0+3 'e' 3+3 'g' 3+3 = 6+9 = 15
shows that Step 2G was entered 3 times (ie, the key characters matched 3 times); the full compare succeeded twice; step 2E while loops advanced indexes 6 times; step 2I advanced indexes 9 times; there were 15 advances in all, to search the 42-character string for the de?ga target.
/* jiw
$Id: stringsearch.c,v 1.2 2011/08/19 08:53:44 j-waldby Exp j-waldby $
Re: Concept-code for searching a long string for short targets,
where targets may contain wildcard characters.
The user can enter any number of targets as command line parameters.
This code has 2 long strings available for testing; if the first
character of the first parameter is '1' the jay[42] string is used,
else kay[321].
Eg, for tests with *hay = jay use command like
./stringsearch 1e?g a?cd bc?e?g c?efg de?ga ddee? ddee?f
or with *hay = kay,
./stringsearch bc?e? jih? pa?j ?av??j
to exercise program.
Copyright 2011 James Waldby. Offered without warranty
under GPL v3 terms as at http://www.gnu.org/licenses/gpl.html
*/
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <limits.h>
//================================================
int main(int argc, char *argv[]) {
char jay[]="abc1efgabc2efgabcde3gabcdefg4bcdefgabc5efg";
char kay[]="ludehkhtdiokihtmaihitoia1htkjkkchajajavpajkihtijkhijhipaja"
"etpajamhkajajacpajihiatokajavtoia2pkjpajjhiifakacpajjhiatkpajfojii"
"etkajamhpajajakpajihiatoiakavtoia3pakpajjhiifakacpajjhkatvpajfojii"
"ihiifojjjjhijpjkhtfdoiajadijpkoia4jihtfjavpapakjhiifjpajihiifkjach"
"ihikfkjjjjhijpjkhtfdoiajakijptoik4jihtfjakpapajjkiifjpajkhiifajkch";
char *hay = (argc>1 && argv[1][0]=='1')? jay:kay;
enum { chars=1<<CHAR_BIT, TsizMax=40, Lsiz=TsizMax+sizeof kay, L1, L2 };
int L[L2], H[chars], T[chars], g, k, par;
// Step 1. Make arrays L, H, T.
for (k=0; k<chars; ++k) H[k] = T[k] = L1; // Init H and T
for (g=0; hay[g]; ++g) { // Make linked character lists for hay.
k = hay[g]; // In same loop, could count char freqs.
if (T[k]==L1) H[k] = T[k] = g;
T[k] = L[T[k]] = g;
}
// Step 2. Read and process target strings.
for (par=1; par<argc; ++par) {
int alpha[3], at[3], a=g, b=g, c=g, da, dab, dbc, dac, i, j, r;
char * targ = argv[par];
enum { wild = '?' };
int sa=0, sb=0, sc=0, ta=0, tb=0, tc=0;
printf ("Target %d=<%s>\n", par, targ);
// Step 2A. Choose 3 non-wild characters to follow.
// As is, chooses first 3 non-wilds for a,b,c.
// Could instead choose 3 rarest characters.
for (j=0; j<3; ++j) alpha[j] = -j;
for (i=j=0; targ[i] && j<3; ++i)
if (targ[i] != wild) {
r = alpha[j] = targ[i];
if (alpha[0]==alpha[1] || alpha[1]==alpha[2]
|| alpha[0]==alpha[2]) continue;
at[j] = i;
L[T[r]] = L[g+i] = g+i;
++j;
}
if (j != 3) {
printf (" Too few target chars\n");
continue;
}
// Step 2B. Set a,b,c to head-of-list locations, set deltas.
da = at[0];
a = H[alpha[0]]; dab = at[1]-at[0];
b = H[alpha[1]]; dbc = at[2]-at[1];
c = H[alpha[2]]; dac = at[2]-at[0];
// Step 2C. See if key characters appear in haystack
if (a >= g || b >= g || c >= g) {
printf (" No match on some character\n");
continue;
}
for (g=0; hay[g]; ++g) printf ("%d", g%10);
printf ("\n%s\n", hay); // Show haystack, for user aid
// Step 2D. Search for match
while (c < g) {
// Step 2E. Enforce delta distances
while (a+dab < b) {a = L[a]; ++sa; } // Replicate these
while (b+dbc < c) {b = L[b]; ++sb; } // 3 abc lines as many
while (a+dac > c) {c = L[c]; ++sc; } // times as you like.
while (a+dab < b) {a = L[a]; ++sa; } // Replicate these
while (b+dbc < c) {b = L[b]; ++sb; } // 3 abc lines as many
while (a+dac > c) {c = L[c]; ++sc; } // times as you like.
// Step 2F. See if delta distances were met
if (a+dab==b && b+dbc==c && c<g) {
// Step 2G. Yes, so we have 3-letter-match and need to test whole match.
r = a-da;
for (k=0; targ[k]; ++k)
if ((hay[r+k] != targ[k]) && (targ[k] != wild))
break;
printf ("# %3d, %s and ", r, targ);
for (i=0; targ[i]; ++i) putchar(hay[r++]);
// Step 2H. Report match, if found
puts (targ[k]? " differ" : " match");
// Step 2I. Advance all of a,b,c, to go on looking
a = L[a]; ++ta;
b = L[b]; ++tb;
c = L[c]; ++tc;
}
}
printf ("Advances: '%c' %d+%d '%c' %d+%d '%c' %d+%d = %d+%d = %d\n",
alpha[0], sa,ta, alpha[1], sb,tb, alpha[2], sc,tc,
sa+sb+sc, ta+tb+tc, sa+sb+sc+ta+tb+tc);
}
return 0;
}
Note, if you like this answer better than current preferred answer, unmark that one and mark this one. :)
Regular expressions usually use a finite state automation-based search, I think. Try implementing that.
Which language is smart so that it could understand variable a = 0 , 20, ..., 300 ? so you could easily create arrays with it giving step start var last var (or, better no last variable (a la infinite array)) and not only for numbers (but even complex numbers and custom structures like Sedenion's which you would probably define on your own as a class or whatever...)
Point is, find a language or algorithm usable in a language that can cach the law of how array of variables you've given (or params of that variables) change. And compose using that law a structure from which you would be able to get any variable(s).
To everyone - examples you provide are very helpful for all beginners out there. And at the same time are the basic knowledge required to build such 'Smart Array' class. So thank you wary much for your enthusiastic help.
As JeffSahol noticed
all possible rules might include some
that require evaluation of some/all
existing members to generate the nth
member.
So it is a hard Question. And I think language that would do it 'Naturally' would be great to play\work with, hopefully not only for mathematicians.
Haskell:
Prelude> let a=[0,20..300]
Prelude> a
[0,20,40,60,80,100,120,140,160,180,200,220,240,260,280,300]
btw: infinite lists are possible, too:
Prelude> let a=[0,20..]
Prelude> take 20 a
[0,20,40,60,80,100,120,140,160,180,200,220,240,260,280,300,320,340,360,380]
Excel:
Write 0 in A1
Write 20 in A2
Select A1:2
Drag the corner downwards
MatLab:
a = [0:20:300]
F#:
> let a = [|0..20..300|];;
val a : int [] =
[|0; 20; 40; 60; 80; 100; 120; 140; 160; 180; 200; 220; 240; 260; 280; 300|]
With complex numbers:
let c1 = Complex.Create( 0.0, 0.0)
let c2 = Complex.Create(10.0, 10.0)
let a = [|c1..c2|]
val a : Complex [] =
[|0r+0i; 1r+0i; 2r+0i; 3r+0i; 4r+0i; 5r+0i; 6r+0i; 7r+0i; 8r+0i; 9r+0i; 10r+0i|]
As you can see it increments only the real part.
If the step is a complex number too, it will increment the real part AND the imaginary part, till the last var real part has been reached:
let step = Complex.Create(2.0, 1.0)
let a = [|c1..step..c2|]
val a: Complex [] =
[|0r+0i; 2r+1i; 4r+2i; 6r+3i; 8r+4i; 10r+5i|]
Note that if this behavior doesn't match your needs you still can overload (..) and (.. ..) operators. E.g. you want that it increments the imaginary part instead of the real part:
let (..) (c1:Complex) (c2:Complex) =
seq {
for i in 0..int(c2.i-c1.i) do
yield Complex.Create(c1.r, c1.i + float i)
}
let a = [|c1..c2|]
val a : Complex [] =
[|0r+0i; 0r+1i; 0r+2i; 0r+3i; 0r+4i; 0r+5i; 0r+6i; 0r+7i; 0r+8i; 0r+9i; 0r+10i|]
And PHP:
$a = range(1,300,20);
Wait...
Python:
print range(0, 320, 20)
gives
[0, 20, 40, 60, 80, 100, 120, 140, 160, 180, 200, 220, 240, 260, 280, 300]
Props to the comments (I knew there was a more succinct way :P)
Scala:
scala> val a = 0 to 100 by 20
a: scala.collection.immutable.Range = Range(0, 20, 40, 60, 80, 100)
scala> a foreach println
0
20
40
60
80
100
Infinite Lists:
scala> val b = Stream from 1
b: scala.collection.immutable.Stream[Int] = Stream(1, ?)
scala> b take 5 foreach println
1
2
3
4
5
In python you have
a = xrange(start, stop, step)
(or simply range in python 3)
This gives you an iterator from start to stop. It can be infinite since it is built lazily.
>>> a = xrange(0, 300, 20)
>>> for item in a: print item
...
0
20
40
60
80
100
120
140
160
180
200
220
240
260
280
And C++ too [use FC++ library]:
// List is different from STL list
List<int> integers = enumFrom(1); // Lazy list of all numbers starting from 1
// filter and ptr_to_fun definitions provided by FC++
// The idea is to _filter_ prime numbers in this case
// prime is user provided routine that checks if a number is prime
// So the end result is a list of infinite primes :)
List<int> filtered_nums = filter( ptr_to_fun(&prime), integers );
FC++ lazy list implementation: http://www.cc.gatech.edu/~yannis/fc++/New/new_list_implementation.html
More details: http://www.cc.gatech.edu/~yannis/fc++/
Arpan
Groovy,
assert [ 1, *3..5, 7, *9..<12 ] == [1,3,4,5,7,9,10,11]
The SWYM language, which appears to no longer be online, could infer arithmetic and geometric progressions from a few example items and generate an appropriate list.
I believe the syntax in perl6 is start ... *+increment_value, end
You should instead use math.
- (int) infiniteList: (int)x
{
return (x*20);
}
The "smart" arrays use this format since I seriously doubt Haskel could let you do this:
a[1] = 15
after defining a.
C# for example does implement Enumerable.Range(int start, int count), PHP offers the function range(mixed low, mixed high, number step), ... There are programming languages that are "smart" enough.
Beside that, an infinite array is pretty much useless - it's not infinite at all but all-memory-consuming.
You cannot do this enumerating simply with complex numbers as there is no direct successor or predecessor for a given number. Edit: This does not mean that you cannot compare complex numbers or create an array with a specified step!
I may be misunderstanding the question, but the answers that specify way to code the specific example you gave (counting by 20's) don't really meet the requirement that the array "cache" an arbitrary rule for generating array members...it seems that almost any complete solution would require a custom collection class that allows generation of the members with a delegated function/method, especially since all possible rules might include some that require evaluation of some/all existing members to generate the nth member.
Just about any program language can give you this sequence. The question is what syntax you want to use to express it. For example, in C# you can write:
Enumerable.Range(0, 300).Where(x => (x % 20) == 0)
or
for (int i = 0; i < 300; i += 20) yield return i;
or encapsulated in a class:
new ArithmaticSequence(0, 301, 20);
or in a method in a static class:
Enumerable2.ArithmaticSequence(0, 301, 20);
So, what is your criteria?
Assembly:
Assuming edi contains the address of the desired array:
xor eax, eax
loop_location:
mov [edi], eax
add edi, #4
add eax, #20
cmp eax, #300
jl loop_location
MATLAB
it is not a Programming language itself but its a tool but still u can use it like a programming language.
It is built for such Mathematics operations to easily arrays are a breeze there :)
a = 0:1:20;
creates an array from 0 to 20 with an increment of 1.
instead of the number 1 you can also provide any value/operation for the increment
Php always does things much simpler, and sometimes dangerously simple too :)
Well… Java is the only language I've ever seriously used that couldn't do that (although I believe using a Vector instead of an Array allowed that).