How to Manipulate Control Characters?

How to Manipulate Control Characters? - clojure

I'd appreciate suggestions on how I can convert ASCII control characters, which are supplied via an HTML text box, to their hexadecimal or binary representations.
Currently, my web app, takes the ASCII control character string and converts the values, for example, if ^C is entered the value 5e43 is returned which represents "^" and "6", not control-c which is represented as 02 in hex.
The idea I had was to run a regex against the input to check for control characters with something like: ^[\w]{1} and then return values from a predefined table that matches the regex.

You can directly read from in with (. *in* read) though how the characters get to you is going to be dependent on a lot of things, most specifically the case that the browser is likely to encode them for http transport before you even get started.
I maintain a secure terminal proxy that has to handle all combinations of control characters so I thought I would pass on a few notes:
they are not one character long. you need up to six characters to represent them. try hitting Esc-Ctrl-alt-left-arrow.
esc implies Alt but alt does not imply esc. if the first character is an esc then the next character is the meta/alt of its character value. so if you see esc-b this is Alt-b
some keys (page up for example) send the esc automatically.
esc-esc is it's own thing (which I can't say I fully understand).
The best way is to write a small program that reads from the keyboard one character at a time and then start mashing the keyboard and see what you can come up with.
Here I will read a character from in twice and hit home the first time and end the second
clojure.core=> (. *in* read)
10
clojure.core=> (. *in* read)
10
So clearly one character is not enough to distinguish these two keys, how about two characters?
This next example won't run in the repl because the repl tries to "handle" control character for you, so you will have to make a new project lein new esc add this code then lein uberjar and java -jar esc-1.0.0-SNAPSHOT-standalone.jar
(ns esc.core
(:gen-class))
(defn -main []
(dorun (repeatedly #(println (. *in* read)))))
Running it and hitting these two keys produces this:
^[OF
27
79
70
10 <-- this is the newline
^[OH
27 <-- esc start marker look for this
79
72
10 <-- this is the newline
Here is esc-end
^[^[OF
27
27
79
70
10
And the ctrl character grand prize winner thus far esc-right-arrow
^[[1;5C
27
91
49
59
53
67
10
taking the prize at six bytes.

Related

Octal number handling in Clojure

I'm trying to create an octal handling function in Clojure, so far I have this:
(defn octal [a]
(if (= 023 a)
(Integer/toOctalString a)
a))
where I have '023' i would like to replace it with some logic which checks if a number begins with '0'. I'm aware of starts-with? for strings, is there a method for use with numbers?
I'm trying to do it this way as if I pass Integer/toOctalString an integer without a 0 it passes back a larger number. E.g. 2345 becomes 4451.
Thank you

It seems read-string can do this for you:
(read-string "023")
; ==> 19
(read-string "19")
; ==> 19
If you want to read octal without prefix you can simply add the zero before passing it:
(defn octal->integer [s]
(read-string (str \0 s)))
(octal->integer "23")
; ==> 19

When outputting numbers:
The built in format function does this nicely:
user> (format "%o" 19)
"23"
of course nothing about a number knows if it was originally provided in a particular format, though if you put it into a collection and store that information along with it, either directly or in metadata you can keep track of that.
As far as reading numbers is concerned:
Clojure numbers are Octal by default if they start with a leading zero
Just the usual warning that clojure.core/read-string IS TOTALLY UNSAFE to use on untrusted input. It will run code from the string at read time if not carefully managed.
user clojure.edn/read-string instead
They both will read an octal number for you just fine:
user> (clojure.edn/read-string "023")
19
user> (read-string "023")
19
And clojure.edn/read-string will refuse to p0wn your server:
user> (read-string "#=(println \"Pwning your server now\")")
Pwning your server now
nil
user> (clojure.edn/read-string "#=(println \"Pwning your server now\")")
RuntimeException No dispatch macro for: = clojure.lang.Util.runtimeException (Util.java:221)
So it's worth being in the habit of using clojure.edn for all data that it not actually part of your program.
PS: there is a dynamic var you can set to turn off the reader-eval feature of read string, and depending on it is an accident waiting to happen.

regex - extract strings at specifc positions

I have a huge fixed-width string that looks something like below:
B100000DA3F19C Android 600 AND 2011-08-29 15:03:21.537
352a0D21ffd800000a3a95911801700e iPad 600 iOS 2011-08-29 19:35:12.753
.
.
.
I need to extract the first part (id) and the fourth part (device type - "AND" or "iOS"). The first column starts at 0 and ends at the 51st position for all lines. The fourth part starts at 168 and ends at 171 for all lines. The length of each line is 244 characters. If this is complicated, the other option is to delete everything in this file except id and device type. This single file has around 800K records measuring 180mb but Notepad++ seems to be handling it okay.
I tried doing a SQL Server import data but even though the Preview looks fine, when the data gets inserted into the table, it is not accurate.
I have the following so far which gives me the first 51 characters -
^(.{51}).*
It would be great if I could one regex that will keep id and device type and delete the rest.

Well if you are certain it is always at that position a very simple way is this:
^(.{51}).{117}(.{3})
The parentheses are the captures (the results you are getting out), while the brackets are the counters.
EDIT: Use the following to explicitly discard the rest of the line:
^(.{51}).{117}(.{3}).*$

keyboard scan codes in c linux and windows

okay so i have a program i am writing , and basically i am going to be taking input for keyboard keys such as left arrow, right arrow, up and down etc and my question is , in what is the best option to scan in these keys so that i can make my program run both in linux and windows
and what am i scanning exactly? am i supposed to scan the ascii values and store them in int? chars? or is it another way to do this ? i have searched the internet and i am finding that the kex values for keyboard scan codes are e0 4b e0 4d e0 48 e0 50
but when i actually scan the values using getchar() and store them into ints i get 4 values for each key pressed namely for example 27 91 67 10 , 27 91 68 10
i understand that each key has press release and other values attached to it , so should i be scanning for the 67 68 etc range?
or is there another way to do this
i am writing the program using c language

In Linux, it seems like you're seeing ANSI escape sequences. They are used by text terminals, and start with the Escape character, which is '\x1b' (decimal 27).
This is probably not what you want, if you want to make something keyboard-controllable in direct, game-like manner you need to use "raw" input. There's plenty of references for that, look at ncurses for instance.

Open a terminal and use the command xev. You can then press any key you want and see its corresponding codes. You can also move and click the mouse to see what happens there.

How to find whether byte read is japanese or english?

I have an array which contains Japanese and ascii characters.
I am trying to find whether characters read is English character or Japanese characters.
in order to solve this i followed as
read first byte , if multicharcterswidth is not equal to one, move pointer to next byte
now display whole two byte together and display that Japanese character has been read.
if multicharcterswidth is equal to one, display the byte. and show message english has been read.
above algo work fine but fails in case of halfwidth form of Japanese eg.ｼ,ｧ etc. as it is only one byte.
How can i find out whether characters are Japanese or English?
**Note:**What i tried
I read from web that first byte will tell whether it is japanese or not which i have covered in step 1 of my algo. But It won't work for half width.
EDIT:
The problem i was solving i include control characters 0X80 at start and end of my characters to identify the string of characters.
i wrote following to identify the end of control character.
cntlchar.....(my characters , can be japnese).....cntlchar
if ((buf[*p+1] & 0X80) && (mbMBCS_charWidth(&buf[*p]) == 1))
// end of control characters reached
else
// *p++
it worked fine when for english but didn't work for japanese half width.
How can i handle this?

Your data must be using Windows Codepage 932. That is a guess, but examining the codepoints shows what you are describing.
The codepage shows that characters in the range 00 to 7F are "English" (a better description is "7-bit ASCII"), the characters in the ranges 81 to 9F and E0 to FF are the first byte of a multibyte code, and everything between A1 and DF are half-width Kana characters.

For individual bytes this is impractical to impossible. For larger sets of data you could do statistical analysis on the bytes and see if it matches known English or Japanese patterns. For example, vowels are very common in English text but different Japanese letters would have similar frequency patterns.
Things get more complicated than testing bits if your data includes accented characters.
If you're dealing with Shift-JIS data and Windows-1252 encoded text, ideally you just remap it to UTF-8. There's no standard way to identify text encoding within a text file, although things like MIME can help if added on externally as metadata.

How to increase or decrease numbers in a visual block on the fly?

I often find myself adding numbers on the fly to a list of numbers.
p.e.
38
12 x
215 x
98 x
03 x
23
What I want to do is to select a visual block of numbers (x in the above example)
and increase or decrease the numbers with another number.
I tried to do it using two macro's (I suppose one macro isn't possible):
#a to increase the number --> 5#a --> to increase every number with 5 (#a = '^Aj')
#x to decrease the number --> 5#x --> to decrease every number with 5 (#x = '^Xj')
but...
I don't know
1) how to use the macro only in my selection (without counting lines)
2) how to change the increase/decrease number on the fly without creating a whole new macro.
3) How to resolve this: when I add 100 to the above numbers, the numbers 12, 98 and 03 are moved 1 character to the right.
Another solution is to create a function but it is i.m.o. to complex to add every time a value in an input box for a few numbers I have to change.

Once you have all your lines selected, you can do:
:'<,'>norm 5<C-v><C-a> <-- inserts ^A
to add 5 to every number.
The alignment problem can't be avoided AFAIK and yes, vimscript is probably the right tool for the job.

Increment the numbers isn't to bad. You have a handful of options, but I personally suggest using Tim Pope's speeddating plugin. It will provide a nice <c-a> visually mode mapping.
However if you want a quick and dirty mapping here you go:
xnoremap <silent> <c-a> :<c-u>exe "'<,'>norm! ".min([col("'<"),col("'>")]).'<bar>'.v:count1."\<c-a>"<cr>
Since you mentioned alignment you may also want to look at godlygeek's Tabular plugin. Drew over at vimcasts did a screencast on using tabular.
If you decide to make your own mapping/function/plugin I would also suggest you look into the following:
:h :s
:h /\%V
:h sub-replace-expression
:h printf(

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to Manipulate Control Characters? - clojure

Related

Octal number handling in Clojure

regex - extract strings at specifc positions

keyboard scan codes in c linux and windows

How to find whether byte read is japanese or english?

How to increase or decrease numbers in a visual block on the fly?

Categories

Resources