C++ const char* array 2 variables into one 'cell'? - c++

I have a text file with 1000 hex values in it like this:
00 2f 3a 2e...
and I'm trying to store them in a const char* array with both values in each 'cell', like '00' '2f' '3a'.
Currently, each 'cell' is only containing one character '0' '0' '2' 'f' '3' 'a'.
The code I'm currently using is:
void Myfunction::Myfile(const char* fileName)
{
ifstream Myfile(fileName);
if (Myfile.is_open())
{
while (!Myfile.eof() && i < 1000)
{
Myfile >> Array[i];
i++;
}
}
Myfile.close();
}
I've looked at dynamic memory allocation, pairs, typecasting and just haven't found a solution that fits. Can anybody help? Thanks in advance.

A char array is an array of chars; by definition, each "cell" can only hold a single char.
You could, if you computed what number each hex value represents, store that value in a char (or, more appropriately, an unsigned char). Or, if you insist on storing 2 chars for each value, declare an array whose "cells" can each hold 2 chars.

Since your array is of 'char', you can only write a single character to each element. The first character is a '0', the second is a '0', the third is a ' ', and so on.
If you want to treat the words of your input as integers and store one of those integers in each element of your array, then you'll need to read them in as strings and convert them to integers. Boost's lexical_cast (http://www.boost.org/doc/libs/1_57_0/doc/html/boost_lexical_cast.html) might help you with that.

A sort of ANSI C solution would be to make a data structure:
typedef struct _hex_val{
char first;
char second;
}t_hex_val;
Then you would allocate a large pointer to this and fill it up. You would need a little bit of parsing code to find the ' ' values between the pairs of chars.
I realize that his is very against the grain of what C++ can do, but sometimes, simple C is the easiest (at least for me).

Related

what does ++array[str[i]]; does?

I found a program to print out the most frequent character in an array of char.
Here is the code.
void main()
{
int array[255] = {0}; // initialize all elements to 0
char str[] = "thequickbrownfoxjumpedoverthelazydog";
int i, max, index;
for(i = 0; str[i] != 0; i++)
{
++array[str[i]];
}
// then find the most used charater ...
}
I don't really understand what ++array[str[i]];does.
We initialized the array as int array[255] but it still accepts the index as str[i] which I believe is char type.
Is it because str[i] automatically turn into ASCII ? And what ++ preceding the command does ?
In this code
++array[str[i]];
i walks the length of str (because of the setup of the loop we are inside...).
For each character inside str, the expression str[i] gets the value of that character. I use "value" instead of "character", because it later is treated as an integer index.
With that value the expression array[str[i]] accesses one of the entries in the array. Each entry in that array corresponds to one possible ASCII "character".
The ++ increments the value in the array. I.e. it counts the number of occurrences of e.g. 'a'.
In total, the code makes a histogram of ASCII character frequency inside str.
Note however the important warning by WhozCraig, in case you intend to use this. You have to match the assumptions the code makes (copied with permission, for completeness):
Just fyi, not casting that index to unsigned char is a recipe for disaster. Further, this is not using a table guaranteed to hold enough slots to cover the domain. i.e. 1 << CHAR_BIT in width. It will "work" (term used loosely) for your input string presented here. It is not an end-all general solution to char counting.
First, the initialization of the array to size 255 is because the ascii values of the characters are in this range. so for example when you call str[i]=a it translate to the value 97 which is a part of the array. you could see the values in the following ascii table, http://www.asciitable.com
Second, the operator ++array[str[i]]; is called pre-increment which is just adds 1 to the value in the array, in the following case you could use the post-increment and you will get the same result, array[str[i]]++;
reference to read about the post/pre increment:
https://www.geeksforgeeks.org/pre-increment-and-post-increment-in-c/

Convert a single character to lowercase in C++ - tolower is returning an integer

I'm trying to convert a string to lowercase, and am treating it as a char* and iterating through each index. The problem is that the tolower function I read about online is not actually converting a char to lowercase: it's taking char as input and returning an integer.
cout << tolower('T') << endl;
prints 116 to the console when it should be printing T.
Is there a better way for me to convert a string to lowercase?
I've looked around online, and most sources say to "use tolower and iterate through the char array", which doesn't seem to be working for me.
So my two questions are:
What am I doing wrong with the tolower function that's making it return 116 instead of 't' when I call tolower('T')
Are there better ways to convert a string to lowercase in C++ other than using tolower on each individual character?
That's because there are two different tolower functions. The one that you're using is this one, which returns an int. That's why it's printing 116. That's the ASCII value of 't'. If you want to print a char, you can just cast it back to a char.
Alternatively, you could use this one, which actually returns the type you would expect it to return:
std::cout << std::tolower('T', std::locale()); // prints t
In response to your second question:
Are there better ways to convert a string to lowercase in C++ other than using tolower on each individual character?
Nope.
116 is indeed the correct value, however this is simply an issue of how std::cout handles integers, use char(tolower(c)) to achieve your desired results
std::cout << char(tolower('T')); // print it like this
It's even weirder than that - it takes an int and returns an int. See http://en.cppreference.com/w/cpp/string/byte/tolower.
You need to ensure the value you pass it is representable as an unsigned char - no negative values allowed, even if char is signed.
So you might end up with something like this:
char c = static_cast<char>(tolower(static_cast<unsigned char>('T')));
Ugly isn't it? But in any case converting one character at a time is very limiting. Try converting 'ß' to upper case, for example.
To lower is int so it returns int. If you check #include <ctype> you will see that definition is int tolower ( int c ); You can use loop to go trough string and to change every single char to lowe case. For example
while (str[i]) // going trough string
{
c=str[i]; // ging c value of current char in string
putchar (tolower(c)); // changing to lower case
i++; //incrementing
}
the documentation of int to_lower(int ch) mandates that ch must either be representable as an unsigned char or must be equal to EOF (which is usually -1, but don't rely on that).
It's not uncommon for character manipulation functions that have been inherited from the c standard library to work in terms of ints. There are two reasons for this:
In the early days of C, all arguments were promoted to int (function prototypes did not exist).
For consistency these functions need to handle the EOF case, which for obvious reasons cannot be a value representable by a char, since that would mean we'd have to lose one of the legitimate encodings for a character.
http://en.cppreference.com/w/cpp/string/byte/tolower
The answer is to cast the result to a char before printing.
e.g.:
std::cout << static_cast<char>(std::to_lower('A'));
Generally speaking to convert an uppercase character to a lowercase, you only need to add 32 to the uppercase character as this number is the ASCII code difference between lowercase and uppercase characters, e.g., 'a'-'A'=97-67=32.
char c = 'B';
c += 32; // c is now 'b'
printf("c=%c\n", c);
Another easy way would be to first map the uppercase character to an offset within the range of English alphabets 0-25 i.e. 'a' is index '0' and 'z' is index '25' inclusive and then remap it to a lowercase character.
char c = 'B';
c = c - 'A' + 'a'; // c is now 'b'
printf("c=%c\n", c);

Why do "strings", i.e. character arrays, have a null-terminating element, whereas integer arrays don't?

From what I understand, character arrays in C/C++ have a null-terminating character for the purpose of denoting an off-the-end element of that array, while integer arrays don't; they have some internal mechanism that is hidden from the user, but they obviously know their own size since the user can do sizeof(myArray)/sizeof(int) (Is that technically a hack?). Wouldn't it make sense for an integer array to have some null-terminating int -- call it i or something?
Why is this? It has never made any sense to me.
Because, in C, strings are not the same as character arrays, they exist at a level above arrays in much the same way as a linked list exists at a level above structures.
This is an example of a string:
"pax is great"
This is an example of a character array:
{ 'p', 'a', 'x' }
This is an example of a character array that just happens to be equivalent to a string:
{ 'p', 'a', 'x', '\0' }
In other words, C string are built on top of character arrays.
If you look at it another way, neither integer arrays nor "real" character arrays (like {'a', 'b', 'c'} for example) have a terminating character.
You can quite easily do the same thing (have a terminator) with an integer array of people's ages, using -1 (or any negative number) as the terminator.
The only difference is that you'll write your own code to handle it rather than using code helpfully provided in the C standard library, things like:
size_t agelen (int *ages) {
size_t len = 0;
while (*ages++ >= 0)
len++;
return len;
}
int *agecpy (int *src, int *dst) {
int *d = dst;
while (*s >= 0)
*d++ = *src++;
*dst = -1;
return dst;
}
Because string does not exists in c.
Because the null terminator is there to mark the end of the input and it doesn't have to be the length of the given array.
This is by convention, treating null as a non-character. Unlike other major system software languages of then e.g. PL/1 which had a leading integer to denote the length of a variable length character string, C was designed to treat strings as simply character arrays and did not want the overhead and in particular any portability issues (such as sizeof int) nor any limitations (what about very long strings). The convention has stuck because it worked out rather well.
To denote end of an int array as you have suggested would require a non-Int marker. That could be rather difficult to arrange. And sizeof an int array as you are figuring out is merely taking advantage of your knowledge of *alloc - there is absolutely nothing in C to prevent you from cobbling together an "array" by clever management of allocated memory. Modern compilers of course contain many convenience checks on wayward code and someone with better knowledge of compilers could clarify/rectify my comments here. C++ Vector contains an explicit knowledge of array capacity, for example.
A lot of places you can see a different Field Separator FS character used to separate out strings. E.g., CSV. But if you were to do that, you will need to write you own std libraries - thousands and thousands of lines of good, tested code.
A C-Style string is a collection of characters terminated by '\0'. It is not an array.
The collection can be indexed like an array.
Because the length of the collection can vary, the length must be determined by counting the number of characters in the collection.
A convenient representation is an array because an array is also a collection.
One difference is that an array is a fixed sized data structure. The collection of characters may not be a fixed size; for example, it can be concatenated.
If you think about the problem of how to represent strings, you have two choices: 1) store a count of letters followed by the letters or 2) store the letters followed by some unique special character used as an end of string marker.
End of string marker is more flexible - longer strings possible, easier to use, etc.
BTW you can have terminator on an int array if you want... Nothing stopping you saying that a -1 for example means the end if the list, as long as you are sure that the -1 is unique.

Is this the correct syntax for an array of pointers to size-3 character arrays?

I'm trying to make my own version of an "Autocorrect" program that checks for words that are similar to a given word. To accomplish this, I need to look at distances between letters on a standard keyboard, so that I have a metric for how "close" a word is to another word.
In my program I've started to write an array
const char[3]* KEY_DISTS[] = { "aa0", "ab5", "ba5", "ac3", "ca3", "ad2", "da2" ,... };
which is supposed to mean "The distance between 'a' and 'a' is 0, the distance between 'a' and 'b' is 5, the distance between 'b' and 'a' is 5, " etcetera.
That information I will next put in a map that maps pairs of characters to integers, but I'm wondering whether it's written correctly so far and whether you have any suggestions for me.
const char[3]* KEY_DISTS[]
should mean "A constant array of pointers to character arrays of size 3", right?
The declaration matching the title would be:
const char (*arr[])[4] = { &"aa0" };
Note that "arr" is an array of four chars (it includes terminating '\0') and that you need to take the address of string literal (which are lvalues and have static storage duration, so this is fine).
Sounds like you could have a 2D array instead:
const char arr[][4] = { "aa0" };

Why doesn't memcpy work when copying a char array into a struct?

#define buffer 128
int main(){
char buf[buffer]="";
ifstream infile("/home/kevin/Music/test.mp3",ios::binary);
infile.seekg(-buffer,ios::end);
if(!infile || !infile.read(buf,buffer)){
cout<<"fail!"<<endl;
}
ID3v1 id3;
cout<<sizeof(id3)<<endl;
memcpy(&id3,buf,128);
cout<<id3.header<<endl;
}
struct ID3v1{
char header[3];
char title[30];
char artist[30];
char album[30];
char year[4];
char comment[28];
bool zerobyte;
bool track;
bool genre;
};
When I do the memcpy, it seems to be pushing too much data into the header field. Do I need to go through each of the structs members and copy the data in? I'm also using c++, but this seems more of a "C" strategy. Is there a better way for c++?
As noted in all the comments (you are missing the '\0' character, or when printing C-Strings the operator<< is expecting the sequence of characters to be '\0' terminated).
Try:
std::cout << std::string(id3.header, id3.header+3) << std::endl;
This will print the three characters in the header field.
The problem is most likely in that the memcpy does what it does.
It copies the 128bytes into your structure.
Then you try to print out the header. It prints the 1st character, 2nd, 3rd.. and continues to print until it finds the '\0' (string termination character).
Basically, when printing things out, copy the header to another char array and append the termination character (or copy to an c++ string).
other problems you may encounter when using memcpy:
your struct elements may be align to word boundaries by the compiler.
most compilers have some pragma or commandline switch to specify the alignment to use.
some cpu's require shorts or longs to be stored on word boundaries, in that case modifying the alignment will not help you, as you will not be able to read from the unaligned addresses.
if you copy integers larger than char ( like short, or long ) you have to make sure to correct the byte order depending on your cpu architecture.