Can zlib-compressed string contain whitespace? - compression

Can zlib-compressed string contain whitespace? By whitespace I mean ' ', \n, \t.

Any byte can appear in a zlib-compresed string.
In fact, for a long enough properly compressed string, any byte (from 0 to 255) should have a more-or-less equal probability, or else the string could be further compressed.
You can try this yourself -- for example using Python:
>>> z = open('/dev/urandom').read(1000000).encode('zlib') # compress a long string of junk
>>> [z.count(chr(i)) for i in range(256)] # number of occurrences of each byte
[3936, 3861, 3978, 3951, 3858, 3937, 3945, 3828, 3984, 3871, 3985,
3961, 3879, 3924, 3817, 3984, 3963, 3858, 4029, 3903, 3884, 3817,
... yada ...

Yes; it's just a stream of bytes. Any byte value can appear in there (including zero, which is more likely to cause you problems than whitespace characters!)

Related

Add leading zeros after"0x" to make all numbers in a list to be of same length ( 8-digits)

I have a long list of data pair that look like this:
{0x1023350, 0x3014},
{0x1023954, 0x3007},
{0x1023960, 0x10F},
{0x102396C, 0x2FF},
{0x10219, 0x16},
The numbers here can be anywhere from 2 digits to 8 digits, but my requirement is to pad them with leading zeros so that in the final output all the numbers are 8-digits long.
{0x01023350, 0x00003014},
{0x01023954, 0x00003007},
{0x01023960, 0x0000010F},
{0x0102396C, 0x000002FF},
{0x00010219, 0x00000016},
How can I do it using regular expressions (I am using notepad++ , but I am open to some other tool, if I cant do it in notepad++)
I am not as fluent in regex to try out any solution yet.
Do it in two steps:
Step 1 - Add excess left padding
Search: 0x
Replace: 0x00000000
Step 2 - Match excess exactly and delete it
Search: 0x\d+(?=[\dA-F]{8}[,}])
Replace: 0x
The regex:
\d+ means one or more digits
(?=[\dA-F]{8}[,}]) means followed by 8 hex chars then a comma or }
Some python code to demonstrate:
import re
str = """{0x1023350, 0x3014},
{0x1023954, 0x3007},
{0x1023960, 0x10F},
{0x102396C, 0x2FF},
{0x10219, 0x16},
"""
str = str.replace("0x", "0x00000000")
str = re.sub("0x\d+(?=[\dA-F]{8}[,}])", "0x", str)
print(str)
Output:
{0x01023350, 0x00003014},
{0x01023954, 0x00003007},
{0x01023960, 0x0000010F},
{0x0102396C, 0x000002FF},
{0x00010219, 0x00000016},

How to find the character "\" in a string?

I am trying to manipulate a string by finding the \ character in the string Find\inHere. However, I can't put that as an input in test.find('\', 0). It won't work and gives me the error "missing terminating character." Is there a way to fix test.find('\', 0)?
string test = "Find\inHere";
int x = test.find('\', 0); // error on this line
cout << x; // x should equal 4
\ is a character used to introduce special characters, for example \n newline, \xDB shows the ASCII character with hexadecimal number DB etc.
So, in order to search this special character, you have to escape it by adding another \, use:
test.find("\\",0);
EDIT : Also, in your first string, it is not written in it "Find\inHere" but "Find" and an error because \inHere isn't a special instruction. So, same way to avoid it, write "Find\\inHere".

String masking - inserting dashes

I am writing a function to format a string. I receive a string of numbers, sometimes with dashes, sometimes not. I need to produce an output string of 14 characters, so if the input string contains less than 14, I need to pad it with zeros. then I need to mask the string of numbers by inserting dashes in appropriate places. Here is what I got so far:
strTemp = strTemp.Replace("-", "")
If IsNumeric(strTemp) Then
If strTemp.Length < 14 Then
strTemp = strTemp.PadRight(14 - strTemp.Length)
End If
output = String.Format(strTemp, "{00-000-0-0000-00-00}")
End If
The above works fine, except it just returns a string of numbers without putting in the dashes. I know I am doing something wrong with String.Format but so far I've only worked with pre-defined formats. Can anyone help? How can I use Regex for string formatting in this case?
This function should do the trick:
Public Function MaskFormat(input As String) As String
input = input.Replace("-", String.Empty)
If IsNumeric(input) Then
If input.Length < 14 Then
input = input.PadRight(14 - input.Length)
End If
Return String.Format("{0:00-000-0-0000-00-00}", CLng(input))
Else
Return String.Empty
End If
End Function
You can find more on String formatting here.

Regular Expressions - a string containing an even number of a character among other characters

I'm going through my homework and can't seem to figure out how to do this one.
Say the alphabet is {a,b,c}, we want a expression that finds strings with an even number of cs.
Example strings that are included:
empty set,
ccab
abcc
cabc
ababababcc
and so on.. just an even amount of c's.
You can use this regex to allow only even # of c in input:
^(?=(([^c\n]*c){2})*[^\nc]*$)[abc]*$
RegEx Demo
The below regex would match the strings which has only even number of c's,
^(?:[^c]*c[^c]*c[^c\n]*)+?$
DEMO
OR
^(?:[ab]*c[ab]*c[ab]*)+?$
DEMO
Assuming that the total number of c's count, not consecutive cs - there is a nice theoretical approach, based on the fact that **a string with an even number ofc`s can be expressed as a finite state automaton with two states**.
The first state is the initial state, and it is also an accepting state. The second one is a rejecting state. Each c toggles us between the states. Other letters do nothing.
Now, you can convert this simple machine to regex using one of the methods described here.
Something like
^([^c]*(c[^c]*c)+)*[^c]*$
ought to do it. we can break it out, thus:
^ # - start-of-line, followed by
( # - a group, consisting of
[^c]* # - zero or more characters other than 'c', followed by
( # - a group, consisting of
c # - the literal character 'c', followed by
[^c]* # - zero or more characters other than 'c', followed by
c # - the literal character 'c'
)+ # repeated one or more times
)* # repeated zero or more times, followed by
[^c]* # - a final sequence of zero or more characters other than 'c', followed by
$ # - end-of-line
One might note that something like the following C# method will likely perform better and be easier to understand:
public bool ContainsEvenNumberOfCharacters( this string s , char x )
{
int cnt = 0 ;
foreach( char c in s )
{
cnt += ( c == x ? 1 : 0 ) ;
}
bool isEven = 0 == (cnt&1) ; // it's even if the low-order bit is off.
return isEven ;
}
Simply
/^(([^c]*c[^c]*){2})*$/
In English:
Zero or more strings, each of which contains exactly two instances of a c, preceded or followed by any number of non-c's.
This solution has the advantage that it is easily extendable to the case of a string with a number of c's which is multiple of 3, etc., and makes no assumptions about the alphabet.

Reading characters from a File with fscanf

I have a problem, using fscanf function :(
I need to reed a sequence of characters from file like "a b c d" (characters are separated by space).
but it doesn't works :(
how I have to read them? (
I tried to print it and the result is uncorrect. I think, it's because of spaces. I really don't know why it doesn't work.
Tell me please, what is wrong with array access?
From cplusplus.com:
The function will read and ignore any whitespace characters encountered before the next non-whitespace character (whitespace characters include spaces, newline and tab characters -- see isspace). A single whitespace in the format string validates any quantity of whitespace characters extracted from the stream (including none).
Then if your code is:
while ( fscanf(fin,"%c", &array[i++]) == 1 );
and your file is like this:
h e l l o
Your array will be:
[h][ ][e][ ][l][ ][l][ ][o]
If you change your code into:
while ( fscanf(fin," %c", &array[i++]) == 1 );
with the same file your array will be:
[h][e][l][l][o]
In any case the code works: it depends on what you want.
Anyway, you should think about starting to use fgets() + sscanf(), for example:
char buff[NUM];
while ( fgets(buff, sizeof buff, fin) )
sscanf(buff,"%c", &array[i++]);
With the single fscanf() the lack of buffer management can turns into buffer overflow problems.
Add white space before %c =>
while (fscanf(pFile," %c", &alpArr[i++]) == 1);
It should work.