Getting the char integer value from a std::string & std::wstring - c++

I am attempting to convert a string into a number by summing the int value of each letter together in C++ WinAPI. So in ASCII; the std::string "AA" would equal 130 (65+65)
The string can either be a std::string or an std::wstring.
Why does the following function always return the value of zero no matter what letter I put in it? Shouldn't it return either the ASCII or Unicode integer value of the letter?
printf("TEST a: %d \n", _tstoi(_T("a")));
printf("TEST A: %d \n", _tstoi(_T("A")));
printf("TEST b: %d \n", _tstoi(_T("b")));
My VC++ application is currently in Unicode, & the previous code prints out zero for each letter. I remember hearing that Unicode is very different to ASCII strings, can you clear up what exactly is different other than Unicode has a library of characters which is something like 30,000 long whilst ASCII is 256(I think?)?

The msdn article says:
"The input string is a sequence of characters that can be interpreted
as a numerical value of the specified type. The function stops reading
the input string at the first character that it cannot recognize as
part of a number."
If you test the code with unicode strings containing actual numbers, you'll see the correct output:
printf("TEST 1: %d \n", _tstoi(_T("1")));
output:
TEST 1: 1
Like #Ylisar said, the *toi functions are used to convert number values from strings to integer variables instead.
The following code will output the number representation instead, but watch out for the pointer representation of the const variables. I've left both versions so you can see the difference:
printf("TEST 1: %d \n", _tstoi(_T("1")));
printf("TEST a: %d \n", _tstoi(_T("a")));
WCHAR* b(_T("b"));
printf("TEST A: %d \n", _T("A"));
printf("TEST b: %d \n", *b);
Output:
TEST 1: 1
TEST a: 0
TEST A: 13457492
TEST b: 98
Check out more at http://msdn.microsoft.com/en-us/library/yd5xkb5c%28v=vs.80%29.aspx
If you want to sum up (accumulate) the values, I would recommend you checking out the STL range functions which does wonders on such things. For example
#include <numeric>
#include <string>
printf("TEST a: %d \n", *_T("a")); // 97
printf("TEST b: %d \n", *_T("b")); // 98
wstring uString(_T("ba"));
int result = accumulate(uString.begin(), uString.end(), 0);
printf("TEST accumulated: %d \n", result);
Results:
TEST a: 97
TEST b: 98
TEST accumulated: 195
This way you don't have to have for-loops going through all the values. The range functions really are nice for stuff like this.
Check out more at: http://www.sgi.com/tech/stl/accumulate.html

the *toi family of functions converts a string representation to integer representation, that is, "10" becomes 10. What you actually want to do is no conversion at all. Change it to:
printf("TEST a: %d \n", _T('a'));
printf("TEST A: %d \n", _T('A'));
printf("TEST b: %d \n", _T('b'));
As for unicode, the underlying representation depends on the encoding ( for example UTF-8, which is very popular, maps the LSB with the ASCII table ).

The first question, why printf does not work as intened has already been answered by Ylisar. The other question about summing the hexadecimal representation of a character is a little more complex. The conversion from strings to number values with the _tstoi() function will only work if the given string represents a number like "123" gets converted to 123. What you want is the sum of the characters representation.
In case of Unicode code points below 0x7F (0...127) this is simply the sum of the 1 Byte UTF-8 representation. However on Windows compiled with UNICODE flag a 2 Byte per character representation is used. Running the following code in the debugger will releal this.
// ASCII 1 Byte per character
const char* letterA = "A";
int sumOfLetterA = letterA[0] + letterA[0]; // gives 130
// 2 Bytes per character (Windows)
const wchar_t* letterB = TEXT("B");
int sumOfLetterB = letterB[0] + letterB[0]; // gives 132

Related

SIMD: Compare two strings char by char and find the total number of matches

I'm developing a bioinformatic tool. I'm interested in applying SIMD to boost its speed. Given two strings of equal length, I would like to rapidly count the total number of indices where the two strings have identical characters.
For example, say we have S1="AATTGGCCAAC" and S2="AATTCTCCAAC". Then, since their lengths are 11 and only differ at position 5 and 6 ("GG" in S1 and "CT" in S2), the output should be 9.
Here is what I have so far:
#include <string>
#include <immintrin.h>
using namespace std;
#include <memory.h>
int main()
{
register __m128i str_A, str_B, char_eq;
str_A = _mm_load_si128((__m128i*)("AATTGGCCAAC"));
str_B = _mm_load_si128((__m128i*)("AATTCTCCAAC"));
char_eq = _mm_cmpeq_epi8(str_A, str_B);
}
String comparison seems to work fine.
uint8_t val[11];
memcpy(val, &char_eq, sizeof(val));
printf("Numerical: %i %i %i %i %i %i %i %i %i %i %i \n",
val[0], val[1], val[2], val[3], val[4], val[5],
val[6], val[7],val[8], val[9], val[10]);
}
, which outputs 255 255 255 255 0 0 255 255 255 255 255
So now I have a register __m128i object called char_eq which contains information on whether each characters match or mismatch. How do turn this __m128i char_eq object into an integer that encodes the number of matching characters?
The only way I can think of is to manually add the boolean values up (i.e. 1+1+1+1+0+0+1+1+1+1+1) but this defeats the purpose of using SIMD since that will require length(str) number of additions.
What is the fastest way to find the total number of matching characters in two strings? I hope to make it O(1). Thank you in advance!

What's with the sort of repetitive output and how does it differentiate between the number of iterations and the Ascii code?

After running the following code and entering A, this is my output:
char 0 is character A with ascii code 65
char 1 is character
with ascii code 10
I have two questions about this output:
Why is the output
char 0 is character A with ascii code 65
and not
char 65 is character A with ascii code 0
How does the program know that the first "%3d" is associated with the number of iterations (I think) and the second "%d" is associated with the Ascii value?
"%c" is a character data type so it prints A, but there is nothing to differentiate between the "%d" except for maybe the 3 in front of the first "%3d" (but doesn't that just mean number of positions including the decimal point?
Where does
char 1 is character
with ascii code 10
come from? Does it have to do with the c,c part in the code?
#include <stdio.h>
main ()
{
int c,n=0;
while ((c=getchar()) !='Q' )
printf ("char %3d is character %c with ascii code %d\n", n++, c,c);
}
The format specifiers are in the same order as the arguments to printf, so "%3d" corresponds to the second argument to printf, %c corresponds to the third argument, and "%d" corresponds to the fourth. In general, the n+1st argument to printf corresponds to the nth format specifier.
In the call printf("cahr %3d is character %c with ascii code %d\n", n++, c, c), "%3d" corresponds to the argument n++, "%c" corresponds to c, and "%d" corresponds to the second c.
The reason for the
char 1 is character
with ascii code 10
Line is that you are entering a newline character when you press the enter key.

C++ for loop, while loop, and do while loop to generate a table of decimal numbers

I need help for a starting point, really. We must use these 3 loops to generate a table of decimal numbers, as well as the binary, octal, and hexadecimal equivalents of the decimal numbers, in the range 1-256. Help would be greatly appreciated.
If you don't know where to start that's... not a good sign. Perhaps you should get together with your teacher so that you don't fall behind.
Anyway, the basic idea will be:
for loop counting from 1 to 256
write counter in decimal form
write counter in binary form
write counter in hex form
write counter in octal form
end loop
You really don't need three loops, though you can break it into three if you have to. You can pass different format specifiers to printf and the like to format your output.
Look at this page to learn about specifiers: http://www.cplusplus.com/reference/clibrary/cstdio/printf/
if you use printf and include %d then you are going to print a decimal. If you use %x you will get the unsigned Hexidecimal of the same number.
for example:
int i;
for(i=1;i<=256;i++){
printf("the number %d in dec: %d",i,i); \\prints i
printf("the number %d in hex: %x",i,i); \\prints i in hex.
printf("the number %d in oct: %o",i,i); \\prints i in oct.
}
OR
int i = 1;
while(i<=256) {
...
i++;
}
OR
int i = 1;
do {
...
i++;
} while (i<=256);
This page talks about the types of loops: http://www.tutorialspoint.com/cplusplus/cpp_loop_types.htm

Extracting string data from text C++

Im currently writing a c++ program that needs to extract string and numeric data from a text file. The format of the data is the following;
3225 C9+ ELECTR C8 C * 1.00E-6 -0.30 0.0
first entry is an integer, next 5 entries are strings and the last 3 are floats. No string is ever greater than 7 characters long.
I am reading the file line by line and then extracting the data using;
sscanf(ln.c_str(),"%d %s %s %s %s %s %e %e %e",
&rref[numre],&names[numre][0],&names[numre][1],&names[numre][2],&names[numre][3],
&names[numre][4],&nums[numre][0],&nums[numre][1],&nums[numre][2]);
this works fine untill I meet a line like;
3098 SIC2H3+ ELECTR SIC2H2 H * 1.50E-7 -0.50 0.0
where one of the entrys is the full 7 characters long. In this case I get;
names[3097][0] = "SIC2H3+ELECTR"
and,
names[3097][1] = "ELECTR"
Anybody got any ideas...they will be much appreciated!!
The most likely problem is in the declaration of names: if you declared it as holding seven characters or less, and forgot to allocate space for terminating zero, you'd get the results that you are describing.
char names[MAX][4][7]
will have enough space for strings of length 6 or less; for strings of length 7, you need
char names[MAX][4][8]

Why does printf not print out just one byte when printing hex?

pixel_data is a vector of char.
When I do printf(" 0x%1x ", pixel_data[0] ) I'm expecting to see 0xf5.
But I get 0xfffffff5 as though I was printing out a 4 byte integer instead of 1 byte.
Why is this? I have given printf a char to print out - it's only 1 byte, so why is printf printing 4?
NB. the printf implementation is wrapped up inside a third party API but just wondering if this is a feature of standard printf?
You're probably getting a benign form of undefined behaviour because the %x modifier expects an unsigned int parameter and a char will usually be promoted to an int when passed to a varargs function.
You should explicitly cast the char to an unsigned int to get predictable results:
printf(" 0x%1x ", (unsigned)pixel_data[0] );
Note that a field width of one is not very useful. It merely specifies the minimum number of digits to display and at least one digit will be needed in any case.
If char on your platform is signed then this conversion will convert negative char values to large unsigned int values (e.g. fffffff5). If you want to treat byte values as unsigned values and just zero extend when converting to unsigned int you should use unsigned char for pixel_data, or cast via unsigned char or use a masking operation after promotion.
e.g.
printf(" 0x%x ", (unsigned)(unsigned char)pixel_data[0] );
or
printf(" 0x%x ", (unsigned)pixel_data[0] & 0xffU );
Better use the standard-format-flags
printf(" %#1x ", pixel_data[0] );
then your compiler puts the hex-prefix for you.
Use %hhx
printf("%#04hhx ", foo);
Then length modifier is the minimum length.
Width-specifier in printf is actually min-width. You can do printf(" 0x%2x ", pixel_data[0] & 0xff) to print lowes byte (notice 2, to actually print two characters if pixel_data[0] is eg 0xffffff02).