how to define write-byte for 1 in XSLT-1.0 - xslt

I'm trying to include in my xslt for write byte 1 and write bye 3
as similarly we are doing for CR, LF, AMP ...like how can we write integer 1 in hexadecimal
it was not allowing me to do.
is allowing to do CR but it was not working &#0D; in xslt
similarly i have tried for to implement SOH as  it was not working can any please ..help on this
i have tried a lot for to implement below task, can any suggest me it was helpful for me
Mnemonic Hex value Unicode Description
<SOH> X’01’ <U+0001> Start of Heading message
<ETX> X’03’ <U+0003> End of Text message
U+ 0 1 2 3
0000 NUL SOH STX ETX
0010 DLE DC1 DC2 DC3
0020 sp ! " #
0030 0 1 2 3
i am trying to implement SOH and ETX, you related code check you see above attached screen shot.

I guess this cannot be done; see extract from w3.org below: certain characters are not allowed.
Character Range
[2] Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] /* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. */
Also see Things XSLT can't do - an oldie but probably still valid.

Related

PostgreSQL - tricky regular expression - what am I missing?

I have data as follows - please see the fiddle here for all data and code below:
INSERT INTO t VALUES
('|0|34| first zero'),
('|45|0| second zero'),
('|0|0| both zeroes');
I want to SELECT from the start of the line
1st character in the line is a piple (|)
next characters are a valid (possibly negative - one minus sign) INTEGER
after the valid INT, another pipe
then another valid INT
then a pipe
The rest of the line can be anything at all - including sequences with pipe, INT, pipe, INT - but these are not to be SELECTed!
and I'm using a regex to try and SELECT the valid INTEGERs. A single ZERO is also a valid reading - one ZERO and one ZERO only!
The valid integers must be from between the first 3 pipe (|) characters and not elsewhere in the line - i.e.
^|3|3|adfasfadf |555|6666| -- tuple (3, 3) is valid
but
^|--567|-765| adfasdf -- tuple (--567, -765) is invalid - two minus signs!
and
^|This is stuff.... |34|56| -- tuple (34, 56) is invalid - doesn't start pipe, int, pipe, int!
Now, my regexes (so far) are as follows:
SELECT
SUBSTRING(a, '^\|(0{1}|[-+]?[1-9]{1}\d*)\|') AS n1,
SUBSTRING(a, '^\|[-+]?[1-9]{1}\d*\|(0{1}|[-+]?[1-9]{1}\d*)\|') AS n2,
a
FROM t;
and the results I'm getting for my 3 records of interest are:
n1 n2 a
0 NULL |0|34| first zero -- don't want NULL, want 34
45 0 |45|0| second zero -- OK!
0 NULL |0|0| both zeroes -- don't want NULL, want 0
3 3 |3|3| some stuff here
...
... other data snipped - but working OK!
...
Now, the reason why it works for the middle one is that I have (0{1}|.... other parts of the regex in both the upper and lower one!
So, that means take 1 and only 1 zero OR... the other parts of the regex. Fine, I've got that much!
However, and this is the crux of my problem, when I try to change:
'^\|[-+]?[1-9]{1}\d*\|(0{1}|[-+]?[1-9]{1}\d*)\|'
to
'^\|0{1}|[-+]?[1-9]{1}\d*\|(0{1}|[-+]?[1-9]{1}\d*)\|'
Notice the 0{1}| bit I've added near the beginning of my regex - so, this should allow one and only one ZERO at the beginning of the second string (preceded by a pipe literal (|)) OR the rest... the pipe at the end of my 5 character snippet above in this case being part of the regex.
But the result I get is unchanged for the first 3 records - shown above, but it now messes up many records further down - one example a record like this:
|--567|-765|A test of bad negatives...
which obviously fails (NULL, NULL) in the first SELECT now returns (NULL,-765) for the second. If the first fails, I want the second to fail!
I'm at a loss to understand why adding 0{1}|... should have this effect, and I'm also at a loss to understand why my (0, NULL), (45, 0) and (0, NULL) don't give me (0, 0), (45, 0) and (0, 0) as I would expect?
The 0{1}| snippet appears to work fine in the capturing groups, but not outside - is this the problem? Is there a problem with PostgreSQL's regex implementation?
All I did was add a bit to the regex which said as well as what you've accepted before, please accept one and only one leading ZERO!
I have a feeling there's something about regexes I'm missing - so my question is as follows:
could I please receive an explanation as to what's going on with my regex at the moment?
could I please get a corrected regex that will work for INTEGERs as I've indicated. I know there are alternatives, but I'd like to get to the bottom of the mistake I'm making here and, finally
is there an optimum/best method to achieve what I want using regexes? This one was sort of cobbled together and then added to as further necessary conditions became clearer.
I would want any answer(s) to work with the fiddle I've supplied.
Should you require any further information, please don't hesitate to ask! This is not a simple "please give me a regex for INTs" question - my primary interest is in fixing this one to gain understanding!
Some simplifications could be done to the patterns.
SELECT
SUBSTRING(a, '^\|(0|[+-]?[1-9][0-9]*)\|[+-]?[0-9]+\|') AS n1,
SUBSTRING(a, '^\|[+-]?[0-9]+\|(0|[+-]?[1-9][0-9]*)\|') AS n2,
a
FROM t;
n1 | n2 | a
:--- | :--- | :--------------------------------------------------------------
0 | 34 | |0|34| first zero
45 | 0 | |45|0| second zero
0 | 0 | |0|0| both zeroes
3 | 3 | |3|3| some stuff here
null | null | |SE + 18.5D some other stuff
-567 | -765 | |-567|-765|A test of negatives...
null | null | |--567|-765|A test of bad negatives...
null | null | |000|00|A test of zeroes...
54 | 45 | |54|45| yet more stuff
32 | 23 | |32|23| yet more |78|78| stuff
null | null | |This is more text |11|111|22222||| and stuff |||||||
null | null | |1 1|1 1 1|22222|
null | null | |71253412|ahgsdfhgasfghasf
null | null | |aadfsd|34|Fails if first fails - deliberate - Unix philosophy!
db<>fiddle here

converting CFG to regular expression

Here's a CFG that generates strings of 0s, 1s, or 0s and 1s arranged like this (001, 011) where one of the characters must have a bigger count than the other like in 00011111 or 00000111 for example.
S → 0S1 | 0A | 0 | 1B | 1
A → 0A | 0
B → 1B | 1
I tried converting it to regular expression using this guide but I got stuck here since I have trouble converting 0S1 given that anything similar to it can't be found in that guide.
S → 0S1 | 0+ | 0 | 1+ | 1
A → 0A | 0 = 0+
B → 1B | 1 = 1+
One of my previous attempts is 0+0+1|0+1+1|1+|0+ but it doesn't accept strings I mentioned above like 00011111 and 00000111.
Plug and Play
^(?!01$)(?!0011$)(?!000111$)(?!00001111$)(?=[01]{1,8}$)0*1*$
You cannot perfectly translate this to a regular expression, but you can get close, by ensuring that the input does not have equal number of 0 and 1. This matches up to 8 digits.
How it works
^ first you start from the beginning of a line
(?!01$) ensure that the characters are not 01
(?!0011$) ensure that the characters are not 0011
the same for 000111 and 00001111
then ensure that there are from 1 to 8 zeroes and ones (this is needed, to ensure that the input is not made of more digits like 000000111111, because their symmetry is not verified)
then match these zeroes and ones till the end of the line
for longer inputs you need to add more text, for up to 10 digits it is this: ^(?!01$)(?!0011$)(?!000111$)(?!00001111$)(?!0000011111$)(?=[01]{1,10}$)0*1*$ (you jump by 2 by adding one more symmetry validation)
it is not possible by other means with regular expressions alone, see the explanation.
Explanation
The A and B are easy, as you saw 0+ and 1+. The concatenations in S after the first also are easy: 00+, 0, 11+, 1, that all mixed into one lead to (0+|1+). The problem is with the first concatenation 0S1.
So the problem can be shorten to S = 0S1. This grammar is recursive. But neither left linear nor right linear. To recognize an input for this grammar you will need to "remember" how many 0 you found, to be able to match the same amount of 1, but the finite-state machines that are created from the regular grammars (often and from regular expressions) do not have a computation history. They are only states and transitions, and the machinery "jumps" from one state to the other and does not remember the "path" traveled over the transitions.
For this reason you need more powerful machinery (like the push-down automaton) that can be constructed from a context-free grammar (as yours).

How to create a Python (2.7) regular expression t to find Ascii 06 followed by any two extended ascii characters

I'm parsing a file using Python 2.7 and I'm trying to find all occurrences of the following pattern
ASCII 06 followed by any two characters in the range from ASCII 0 to ASCII 255
Naive try #1 - [chr(6)][chr(0)-chr(255][chr(0)-chr(255]
fails with a message that indicates the range cannot be strings.
I've tried several other combos - no success.
The record that I'm parinsg was read in
sF = open('D:\Scratch\xxxxx.01', 'r')
record = sF.read()
Any help will be gratefully appreciated.
Thanx,
Doug
Given you're using python2.7, your open().read() will return bytes. You can use the following bytes regex to match ASCII 06 followed by 2 [0-255] bytes:
reg = re.compile(b'\x06..')
Note that I used . here as any byte will satisfy 0 - 255 (except the newline character \n (\xa0))
If you wanted to also match the newline character you could do one of the following things:
# Uses the DOTALL flag to force `.` to match `\n`
reg = re.compile(b'\x06..', re.DOTALL)
# Makes a character class which matches all bytes
reg = re.compile(b'\x06[\x00-\xff]{2}')

Storing crypted data with libconfig

I'm using the libconfig to create an configuration file and one of the fields is a content of a encrypted file. The problem occurs because in the file have some escapes characters that causes a partial storing of the content. What is the best way to store this data to avoid accidental escapes caracter ? Convert to unicode?
Any suggestion?
You can use either URL encoding, where each non-ASCII character is encoded as a % character followed by two hex digits, or you case use base64 encoding, where each set of 3 bytes is encoded to 4 ASCII characters (3x8 bits -> 4x6 bits).
For example, if you have the following bytes:
00 01 41 31 80 FE
You can URL encode it as follows:
%00%01A1%80%FE
Or you can base64 encode it like this, with 0-25 = A-Z, 26-51 = a-z, 52-62 = 0-9, 62 = ., 63 = /:
(00000000 00000001 01000001) (00110001 10000000 11111110) -->
(000000 000000 000101 000001) (001100 011000 000011 111110)
AAJBNYD.
The standard for encoding binary data in text used to be uuencode and is now base64. Both use same paradigm: a byte uses 8bits, so 3 bytes use 24 bits or 4 6 bits characters.
uuencode just used the 6 bits with an offset of 32 (ascii code for space), so characters are in range 32-96 => all in printable ascii range, but including space and possibly other characters that could have special meanings
base64 choosed these 64 characters to represent values from 0 to 63 (no =:;,'"\*(){}[] that could have special meaning...):
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/
and the equal sign(=) being a place holder for empty positions and the end of an encoded string to ensure that the encoded string length is a multiple of 4.
Unfortunately, neither the C nor C++ standard library offer functions for uuencode not base 64 conversions, but you can find nice implementations around, with many pointers in this other SO answer: How do I base64 encode (decode) in C?

Parse certain bytes of a variable in bash

i have a variable that contains the following string (where each dot stands for a non-printable character):
.[?1h.=.81..
which is this in hex:
ESC [ ? 1 h ESC = CR 8 1 CR LF
1b 5b 3f 31 68 1b 3d 0d 38 31 0d 0a
What i want is to isolate the '81'. The number 81 can change, so it can be for example 100 and uses 3 bytes in the string then but the number is always between the two "0x0d".
So the goal is to isolate all bytes (which are always numbers in ascii) between the two "0x0d" and save them as an integer in another variable.
Is this possible with only using bash? Would it be possible to work with regex?
You can do it like this:
a=$'\033[?1h\033=\r81\r\n' # or a=$'\x1b[?1h\x1b=\r81\r\n'
[[ $a =~ $'\r'([0-9]+)$'\r' ]] && echo ${BASH_REMATCH[1]}
The $'...' will interpret escape sequences in a string like \r, \n, octal representation \033 or hex representation \x1b
A simple Regex would capture the required decimal characters in hex as follows:
0[dD](\s*3(\d))*\s*0[dD]
Group 2 captures the decimal value, which is the hex value - 30, so only the second character.
Unfortunately only the last group is captured. If you can restrict yourself to a certain number of maximal decimal places you can simply duplicate the term as in
0[dD](\s*3(\d))(\s*3(\d))?(\s*3(\d))?\s*0[dD]
and replace it by
\2\4\6
to get the decimal value.
Edit
If your input is not hex but an ordinary string, it would look as follows
\x0d(\d)*\x0d
or with manual repetition (here 3x):
\x0d(\d)(\d)?(\d)?\x0d
with the same replacement pattern
\1\2\3
Edit2
In sed it should work as follows:
sed -n "s/^.*\x0d(\d)(\d)?(\d)?\x0d.*$/\1\2\3/"
now with start and end padding ^.*matcher.*$, and replacement pattern. s/search/replace/