Searching for Junk Characters in a String

Searching for Junk Characters in a String - c++

Friends
I want to integrate the following code into the main application code. The junk characters that come populated with the o/p string dumps the application
The following code snipette doesnt work..
void stringCheck(char*);
int main()
{
char some_str[] = "Common Application FE LBS Serverr is down";
stringCheck(some_str);
}
void stringCheck(char * newString)
{
for(int i=0;i<strlen(newString);i++)
{
if ((int)newString[i] >128)
{
TRACE(" JUNK Characters in Application Error message FROM DCE IS = "<<(char)newString[i]<<"++++++"<<(int)newString[i]);
}
}
}
Can someone please show me the better approaches to find junk characters in a string..
Many Thanks

Your char probably is represented signed. Cast it to unsigned char instead to avoid that it becomes a negative integer when casting to int:
if ((unsigned char)newString[i] >128)
Depending on your needs, isprint might do a better job, checking for a printable character, including space:
if (!isprint((unsigned char)newString[i]))
...
Note that you have to cast to unsigned char: input for isprint requires values between 0 and UCHAR_MAX as character values.

Related

strtol giving same answer for two different hex strings

So I have two hex strings - "3b101c091d53320c000910" and "071d154502010a04000419". When I use strtol() on them I get same value for both strings.
I tried the following code-
string t1="3b101c091d53320c000910";
long int hext1=strtol(t1.c_str(),0,16);
string t2="071d154502010a04000419";
long int hext2=strtol(t2.c_str(),0,16);
cout<<hext1<<endl;
cout<<hext2<<endl;
Both are giving me same value: 9223372036854775807.
I dont know how strtol() works exactly since I am new to C++ but it's giving me same value for two different hex strings. Why?

You should start by reading the manual page. It's returning LONG_MAX since you're input is too large to fit in a long.
Also, strtol() is a very C way of doing things, and you're programming in C++.

You're not using strtol correctly. You should set errno to
0 before calling it, and check that it is still 0 after;
otherwise, it will contain an error code (which can be displayed
using strerror). Also, you should pass it the address of
a char const*, so that you can ensure that it has processed
the entire string (otherwise, "abc" will return 0, without an
error):
errno = 0;
char const* end;
long hext1 = strtol( t1.c_str(), &end, 16 );
if ( errno != 0 || *end != '\0' ) {
// Error occured.
}

How to convert an ASCII char to its ASCII int value?

I would like to convert a char to its ASCII int value.
I could fill an array with all possible values and compare to that, but it doesn't seems right to me. I would like something like
char mychar = "k"
public int ASCItranslate(char c)
return c
ASCItranslate(k) // >> Should return 107 as that is the ASCII value of 'k'.
The point is atoi() won't work here as it is for readable numbers only.
It won't do anything with spaces (ASCII 32).

Just do this:
int(k)
You're just converting the char to an int directly here, no need for a function call.

A char is already a number. It doesn't require any conversion since the ASCII is just a mapping from numbers to character representation.
You could use it directly as a number if you wish, or cast it.

In C++, you could also use static_cast<int>(k) to make the conversion explicit.

Do this:-
char mychar = 'k';
//and then
int k = (int)mychar;

To Convert from an ASCII character to it's ASCII value:
char c='A';
cout<<int(c);
To Convert from an ASCII Value to it's ASCII Character:
int a=67;
cout<<char(a);

#include <iostream>
char mychar = 'k';
int ASCIItranslate(char ch) {
return ch;
}
int main() {
std::cout << ASCIItranslate(mychar);
return 0;
}
That's your original code with the various syntax errors fixed. Assuming you're using a compiler that uses ASCII (which is pretty much every one these days), it works. Why do you think it's wrong?

C++ Character Encoding

This is my C++ Code where i'm trying to encode the received file path to utf-8.
#include <string>
#include <iostream>
using namespace std;
void latin1_to_utf8(unsigned char *in, unsigned char *out);
string encodeToUTF8(string _strToEncode);
int main(int argc,char* argv[])
{
// Code to receive fileName from Sockets
cout << "recvd ::: " << recvdFName << "\n";
string encStr = encodeToUTF8(recvdFName);
cout << "encoded :::" << encStr << "\n";
}
void latin1_to_utf8(unsigned char *in, unsigned char *out)
{
while (*in)
{
if (*in<128)
{
*out++=*in++;
}
else
{
*out++=0xc2+(*in>0xbf);
*out++=(*in++&0x3f)+0x80;
}
}
*out = '\0';
}
string encodeToUTF8(string _strToEncode)
{
int len= _strToEncode.length();
unsigned char* inpChar = new unsigned char[len+1];
unsigned char* outChar = new unsigned char[2*(len+1)];
memset(inpChar,'\0',len+1);
memset(outChar,'\0',2*(len+1));
memcpy(inpChar,_strToEncode.c_str(),len);
latin1_to_utf8(inpChar,outChar);
string _toRet = (const char*)(outChar);
delete[] inpChar;
delete[] outChar;
return _toRet;
}
And the OutPut is
recvd ::: /Users/zeus/ÄÈÊÑ.txt
encoded ::: /Users/zeus/AÌEÌEÌNÌ.txt
The above function latin1_to_utf8 is provided as an solution Convert ISO-8859-1 strings to UTF-8 in C/C++ , Looks like it works.[Answer is accepted]. So i think i must be making some mistake, but i'm not able to identify what it is. Can someone help me out with this , Please.
I have first posted this question in Codereview,but i'm not getting any answers out there. So sorry for the duplication.

Do you use any platform or you build it on the top of std? I am sure that many people use such convertions and therefore there is library. I strongly recommend you to use the libraray, because the library is tested and usually the best know way is used.
A library which I found doing this is boost locale
This is standard. If you use QT I will recommend you to use the QT conversion library for this (it is platform independant)
QT
In case you want to do it yourself (you want to see how it works or for any other reason)
1. Make sure that you allocate memory ! - this is very important in C,C++ . Since you use iostream use new to allocate memory and delete to release it (this is also important C++ won't figure out when to release it for sure. This is developer's job here - C++ is hardcore :D )
2. Check that you allocate the right size of memory. I expect unicode to be larger memory (it encodes more symbols and sometimes uses large numbers).
3. As already mentioned above read from somewhere (terminal or file) but output in new file. After that when you open the file with text editor make sure you set the encoding to be utf-8 ( your text editor has to know how to interpretate the data)
I hope that helps.

You are first outputting the original Latin-1 string to a terminal expecting a certain encoding, probably Latin-1. You then transcode to UTF-8 and output it to the same terminal, which interprets it differently. Classic mojibake. Try the following with the output instead:
for(size_t i=0, len=strlen(outChar); i!=len; ++i)
std::cout << static_cast<unsigned>(static_cast<unsigned char>(outChar[i])) << ' ';
Note that the two casts are to first get the unsigned byte value and then to get the unsigned value to keep the stream from treating it as a char. Note that your char might already be unsigned, but that's compile-dependent.

C++ Convert char array to int representation

What is the best way to convert a char array (containing bytes from a file) into an decimal representation so that it can be converted back later?
E.g "test" -> 18951210 -> "test".
EDITED

It can't be done without a bignum class, since there's more letter combinations possible than integer combinations in an unsigned long long. (unsigned long long will hold about 7-8 characters)
If you have some sort of bignum class:
biguint string_to_biguint(const std::string& s) {
biguint result(0);
for(int i=0; i<s.length(); ++i) {
result *= UCHAR_MAX;
result += (unsigned char)s[i];
}
return result;
}
std::string biguint_to_string(const biguint u) {
std::string result;
do {
result.append(u % UCHAR_MAX)
u /= UCHAR_MAX;
} while (u>0);
return result;
}
Note: the string to uint conversion will lose leading NULLs, and the uint to string conversion will lose trailing NULLs.

I'm not sure what exactly you mean, but characters are stored in memory as their "representation", so you don't need to convert anything. If you still want to, you have to be more specific.
EDIT: You can
Try to read byte by byte shifting the result 8 bits left and oring it
with the next byte.
Try to use mpz_inp_raw

You can use a tree similar to Huffman compression algorithm, and then represent the path in the tree as numbers.
You'll have to keep the dictionary somewhere, but you can just create a constant dictionary that covers the whole ASCII table, since the compression is not the goal here.

There is no conversion needed. You can just use pointers.
Example:
char array[4 * NUMBER];
int *pointer;
Keep in mind that the "length" of pointer is NUMBER.

As mentioned, character strings are already ranges of bytes (and hence easily rendered as decimal numbers) to start with. Number your bytes from 000 to 255 and string them together and you've got a decimal number, for whatever that is worth. It would help if you explained exactly why you would want to be using decimal numbers, specifically, as hex would be easier.
If you care about compression of the underlying arrays forming these numbers for Unicode Strings, you might be interested in:
http://en.wikipedia.org/wiki/Standard_Compression_Scheme_for_Unicode
If you want some benefits of compression but still want fast random-access reads and writes within a "packed" number, you might find my "NSTATE" library to be interesting:
http://hostilefork.com/nstate/
For instance, if you just wanted a representation that only acommodated 26 english letters...you could store "test" in:
NstateArray<26> myString (4);
You could read and write the letters without going through a compression or decompression process, in a smaller range of numbers than a conventional string. Works with any radix.

Assuming you want to store the integers(I'm reading as ascii codes) in a string. This will add the leading zeros you will need to get it back into original string. character is a byte with a max value of 255 so it will need three digits in numeric form. It can be done without STL fairly easily too. But why not use tools you have?
#include <iostream>
#include <sstream>
using namespace std;
char array[] = "test";
int main()
{
stringstream out;
string s=array;
out.fill('0');
out.width(3);
for (int i = 0; i < s.size(); ++i)
{
out << (int)s[i];
}
cout << s << " -> " << out.str();
return 0;
}
output:
test -> 116101115116
Added:
change line to
out << (int)s[i] << ",";
output
test -> 116,101,115,116,

Very strange char array behaviour

.
unsigned int fname_length = 0;
//fname length equals 30
file.read((char*)&fname_length,sizeof(unsigned int));
//fname contains random data as you would expect
char *fname = new char[fname_length];
//fname contains all the data 30 bytes long as you would expect, plus 18 bytes of random data on the end (intellisense display)
file.read((char*)fname,fname_length);
//m_material_file (std:string) contains all 48 characters
m_material_file = fname;
// count = 48
int count = m_material_file.length();
now when trying this way, intellisense still shows the 18 bytes of data after setting the char array to all ' ' and I get exactly the same results. even without the file read
char name[30];
for(int i = 0; i < 30; ++i)
{
name[i] = ' ';
}
file.read((char*)fname,30);
m_material_file = name;
int count = m_material_file.length();
any idea whats going wrong here, its probably something completely obvious but im stumped!
thanks

Sounds like the string in the file isn't null-terminated, and intellisense is assuming that it is. Or perhaps when you wrote the length of the string (30) into the file, you didn't include the null character in that count. Try adding:
fname[fname_length] = '\0';
after the file.read(). Oh yeah, you'll need to allocate an extra character too:
char * fname = new char[fname_length + 1];

I guess that intellisense is trying to interpret char* as C string and is looking for a '\0' byte.

fname is a char* so both the debugger display and m_material_file = fname will be expecting it to be terminated with a '\0'. You're never explicitly doing that, but it just happens that whatever data follows that memory buffer has a zero byte at some point, so instead of crashing (which is a likely scenario at some point), you get a string that's longer than you expect.

Use
m_material_file.assign(fname, fname + fname_length);
which removes the need for the zero terminator. Also, prefer std::vector to raw arrays.

std::string::operator=(char const*) is expecting a sequence of bytes terminated by a '\0'. You can solve this with any of the following:
extend fname by a character and add the '\0' explicitly as others have suggested or
use m_material_file.assign(&fname[0], &fname[fname_length]); instead or
use repeated calls to file.get(ch) and m_material_file.push_back(ch)
Personally, I would use the last option since it eliminates the explicitly allocated buffer altogether. One fewer explicit new is one fewer chance of leaking memory. The following snippet should do the job:
std::string read_name(std::istream& is) {
unsigned int name_length;
std::string file_name;
if (is.read((char*)&name_length, sizeof(name_length))) {
for (unsigned int i=0; i<name_length; ++i) {
char ch;
if (is.get(ch)) {
file_name.push_back(ch);
} else {
break;
}
}
}
return file_name;
}
Note:
You probably don't want to use sizeof(unsigned int) to determine how many bytes to write to a binary file. The number of bytes read/written is dependent on the compiler and platform. If you have a maximum length, then use it to determine the specific byte size to write out. If the length is guaranteed to fewer than 255 bytes, then only write a single byte for the length. Then your code will not depend on the byte size of intrinsic types.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Searching for Junk Characters in a String - c++

Related

strtol giving same answer for two different hex strings

How to convert an ASCII char to its ASCII int value?

C++ Character Encoding

C++ Convert char array to int representation

Very strange char array behaviour

Categories

Resources