replace escaped unicode with elisp - regex

By calling google dictionary api in emacs,
http://www.google.com/dictionary/json?callback=cb&q=word&sl=en&tl=en&restrict=pr%%2Cde&client=te
I can get response like below
"entries": [{
"type": "example",
"terms": [{
"type": "text",
"text": "his grandfather\x27s \x3cem\x3ewords\x3c/em\x3e had been meant kindly",
"language": "en"
}]
}]
As you can see there are escaped unicode in "text". I want to convert them in function like below.
(defun unescape-string (string)
"Return unescape unicode string"
...
)
(unescape-string "his grandfather\x27s \x3cem\x3ewords\x3c/em\x3e")
=> "his grandfathers's <em>words</em>"
(insert #x27)'
(insert #x27)'
(insert #x3c)<
(insert #x3e)>
Here's what I tried
replace-regexp-in-string
custom replace like http://www.emacswiki.org/emacs/ElispCookbook#toc33
But, I think I don't know how to replace '\x123' with corresponding unicode into buffer or string.
Thanks in advance

Seems like the simplest way to do it:
(read (princ "\"his grandfather\\x27s \\x3cem\\x3ewords\\x3c/em\\x3e had been meant kindly\""))
;; "his grandfather's ώm>words</em> had been meant kindly"
Also it is really interesting that Emacs parses \x3ce rather then \x3c. I'm not sure if this is a bug or intended behaviour. I always thought it's not supposed to read more then two characters after x...
If you still wanted to use read + princ combination, you'd need to put a backslash to prevent Emacs from parsing more characters, like so: \x3c\e. Or here's something quick I could come up with:
(defun replace-c-escape-codes (input)
(replace-regexp-in-string
"\\\\x[[:xdigit:]][[:xdigit:]]"
(lambda (match)
(make-string 1 (string-to-number (substring match 2) 16)))
input))
(replace-c-escape-codes "his grandfather\\x27s \\x3cem\\x3ewords\\x3c/em\\x3e")
"his grandfather's <em>words</em>"

Related

how can use # in let

Here is sample code
(def showscp
( let [ cf (seesaw.core/frame :title "cframe")]
(do
(seesaw.core/config! cf :content (seesaw.core/button :id :me :text "btn" ))
(.setSize cf 300 300)
(seesaw.core/show! cf)
cf
)
)
)
For get button, I use this
(defn find-me
([frame]
(let [ btn (seesaw.core/select frame [:#me] ) ] (do btn)
)
)
)
It cause error, like
Syntax error reading source at (REPL:2:1).
EOF while reading, starting at line 2
(I guess :#me is problem in macro.)
why error cause, and how can I avoid error.
Is there more smart way than (keyword "#me")
# is only special at the beginning of a token, to control how that token is parsed. It's perfectly valid as part of a variable name, or a keyword. Your code breaks if I paste it into a repl, but works if I retype it by hand. This strongly suggests to me that you've accidentally included some non-printing character, or other weird variant character, into your function.
You can't use #, because it is a dispatch character.
# is a special character that tells the Clojure reader (the component that takes Clojure source and "reads" it as Clojure data) how to interpret the next character
The pound character (aka octothorpe) is a special reader control character in Clojure, so you can't use it in a literal keyword, variable name, etc.
Your suggestion of (keyword "#me") will work, although it would probably be better to modify your code to just use the string "#me", or to eliminate the need for the pound-char altogether.

Meaning of # in clojure

In clojure you can create anonymous functions using #
eg
#(+ % 1)
is a function that takes in a parameter and adds 1 to it.
But we also have to use # for regex
eg
(clojure.string/split "hi, buddy" #",")
Are these two # related?
There are also sets #{}, fully qualified class name constructors #my.klass_or_type_or_record[:a :b :c], instants #inst "yyyy-mm-ddThh:mm:ss.fff+hh:mm" and some others.
They are related in a sence that in these cases # starts a sequence recognisible by clojure reader, which dispatches every such instance to an appropriate reader.There's a guide that expands on this.
I think this convention exists to reduce the number of different syntaxes to just one and thus simplify the reader.
The two uses have no (direct) relationship.
In Clojure, when you see the # symbol, it is a giant clue that you are "talking" to the Clojure Reader, not to the Clojure Compiler. See the full docs on the Reader here: https://clojure.org/reference/reader.
The Reader is responsible for converting plain text from a source file into a collection of data structures. For example, comparing Clojure to Java we have
; Clojure ; Java
"Hello" => new String( "Hello" )
and
[ "Goodbye" "cruel" "world!" ] ; Clojure vector of 3 strings
; Java ArrayList of 3 strings
var msg = new ArrayList<String>();
msg.add( "Goodbye" );
msg.add( "cruel" );
msg.add( "world!" );
Similarly, there are shortcuts that the Reader recognizes even within Clojure source code (before the compiler converts it to Java bytecode), just to save you some typing. These "Reader Macros" get converted from your "short form" source code into "standard Clojure" even before the Clojure compiler gets started. For example:
#my-atom => (deref my-atom) ; not using `#`
#'map => (var map)
#{ 1 2 3 } => (hash-set 1 2 3)
#_(launch-missiles 12.3 45.6) => `` ; i.e. "nothing"
#(+ 1 %) => (fn [x] (+ 1 x))
and so on. As the # or deref operator shows, not all Reader Macros use the # (hash/pound/octothorpe) symbol. Note that, even in the case of a vector literal:
[ "Goodbye" "cruel" "world!" ]
the Reader creates a result as if you had typed:
(vector "Goodbye" "cruel" "world!" )
Are these two # related?
No, they aren't. The # literal is used in different ways. Some of them you've already mentioned: these are an anonymous function and a regex pattern. Here are some more cases:
Prepending an expression with #_ just wipes it from the compiler as it has never been written. For example: #_(/ 0 0) will be ignored on reader level so none of the exception will appear.
Tagging primitives to coerce them to complex types, for example #inst "2019-03-09" will produce an instance of java.util.Date class. There are also #uuid and other built-in tags. You may register your own ones.
Tagging ordinary maps to coerce them to types maps, e.g. #project.models/User {:name "John" :age 42} will produce a map declared as (defrecord User ...).
Other Lisps have proper programmable readers, and consequently read macros. Clojure doesn't really have a programmable reader - users cannot easily add new read macros - but the Clojure system does internally use read macros. The # read macro is the dispatch macro, the character following the # being a key into a further read macro table.
So yes, the # does mean something; but it's so deep and geeky that you do not really need to know this.

How to find the day of the week from timestamp

I have a timestamp 2015-11-01 21:45:25,296 like I mentioned above. is it possible to extract the the day of the week(Mon, Tue,etc) using any regular expression or grok pattern.
Thanks in advance
this is quite easy if you want to use the ruby filter. I am lazy so I am only doing this.
Here is my filter:
filter {
ruby {
code => "
p = Time.parse(event['message']);
event['day-of-week'] = p.strftime('%A');
"
}
}
The 'message' variable is the field that contains your timestamp
With stdin and stdout and your string, you get:
artur#pandaadb:~/dev/logstash$ ./logstash-2.3.2/bin/logstash -f conf2/
Settings: Default pipeline workers: 8
Pipeline main started
2015-11-01 21:45:25,296
{
"message" => "2015-11-01 21:45:25,296",
"#version" => "1",
"#timestamp" => "2016-08-03T13:07:31.377Z",
"host" => "pandaadb",
"day-of-week" => "Sunday"
}
Hope that is what you need,
Artur
What you want is:
Assuming your string is 2015-11-01 21:45:25,296
mydate='2015-11-01'
date +%a -d ${mydate% *}
Will give you what you want.
Short answer is not, you can't.
A regex, according to Wikipedia:
...is a sequence of characters that define a search pattern, mainly for use in pattern matching with strings, or string matching, i.e. "find and replace"-like operations.
So, a regex allow you to parse a String, it searches for information within the String, but it doesn't make calculations over it.
If you want to make such calculations you need help from a programming language (java, c#, or Ruby[like #pandaadb suggested] etc) or some other tool that makes those calculations (Epoch Converter).

Evaluate math string in Clojure

I need to implement a function called eval-math-string in Clojure, which takes a math string as input and evaluates it:
(eval-math-string "7+8/2") => 11
So I've managed to break apart an expression using re-seq, and now I want to evaluate it using Incanter. However, I have an expression like ("7" "+" "8" "/" "2"), but Incanter needs an expression like ($= 7 + 8 / 2), where $= is the incanter keyword. How can I feed the list of one-character strings into a list including $= so that it executes properly. If the arguments are strings, the function won't work, but I can't convert +, *, / etc. to numbers, so I'm a little stuck.
Does anyone know how I can do this, or if there is a better way to do this?
Incanter's $= macro just calls the infix-to-prefix function, so all you need to do is convert your list of strings to a list of symbols and numbers, then call infix-to-prefix.
I'm going to assume that the input is just a flat list of strings, each of which represents either an integer (e.g. "7", "8", "2") or a symbol (e.g. "+", "/"). If that assumption is correct, you could write a conversion function like this:
(defn convert [s]
(try
(Long/parseLong s)
(catch NumberFormatException _
(symbol s))))
Example:
(infix-to-prefix (map convert ["7" "+" "8" "/" "2"]))
Of course, if you're just writing a macro on top of $=, there's no need to call infix-to-prefix, as you would just be assembling a list with $= as the first item. I'm going to assume that you already have a function called math-split that can transform something like "7+8/2" into something like ["7" "+" "8" "/" "2"], in which case you can do this:
(defmacro eval-math-string [s]
`($= ~#(map convert (math-split s))))
Example:
(macroexpand-1 '(eval-math-string "7+8/2"))
;=> (incanter.core/$= 7 + 8 / 2)

Regex Searching in Emacs

I'm trying to write some Elisp code to format a bunch of legacy files.
The idea is that if a file contains a section like "<meta name=\"keywords\" content=\"\\(.*?\\)\" />", then I want to insert a section that contains existing keywords. If that section is not found, I want to insert my own default keywords into the same section.
I've got the following function:
(defun get-keywords ()
(re-search-forward "<meta name=\"keywords\" content=\"\\(.*?\\)\" />")
(goto-char 0) ;The section I'm inserting will be at the beginning of the file
(or (march-string 1)
"Rubber duckies and cute ponies")) ;;or whatever the default keywords are
When the function fails to find its target, it returns Search failed: "[regex here]" and prevents the rest of evaluation. Is there a way to have it return the default string, and ignore the error?
Use the extra options for re-search-forward and structure it more like
(if (re-search-forward "<meta name=\"keywords\" content=\"\\(.*?\\)\" />" nil t)
(match-string 1)
"Rubber duckies and cute ponies")
Also, consider using the nifty "rx" macro to write your regex; it'll be more readable.
(rx "<meta name=\"keywords\" content=\""
(group (minimal-match (zero-or-more nonl)))
"\" />")