Generic Hashes as arguments? - crystal-lang

I'm trying to implement a class which accepts a generic payload and then converts it to JSON:
require "json"
class Response
alias Payload = Hash(Symbol | String, String | Bool | Int32 | Int64 | Float32 | Float64 | Nil)
#payload : Payload
def initialize(#payload, #method : String = "sendMessage")
end
def to_json
#payload.merge({"method" => #method}).to_json
end
end
Response.new({
"chat_id" => update.message.not_nil!.chat.id,
"text" => "Crystal is awesome but complicated",
})
But I get instance variable '#payload' of Response must be Hash(String | Symbol, Bool | Float32 | Float64 | Int32 | Int64 | String | Nil), not Hash(String, Int64 | String) compiler error. How can I overcome this? Does Crystal support generic Hashes? Why it can't convert even if types overlap?
The Response is a part of a shard, so I don't know which hashes would be passed as arguments, but Payload has to be enough.

Your payload hash is of type Hash(String, Int32 | String):
typeof({
"chat_id" => update.message.not_nil!.chat.id,
"text" => "Crystal is awesome but complicated",
}) # => Hash(String, Int32 | String)
But the constructor expects a Hash(Symbol | String, String | Bool | Int32 | Int64 | Float32 | Float64 | Nil).
These are diffferent types and cannot be magically casted. You'll need to ensure your payload has the correct type.
One way to do this is to explicitly declare the type of the hash literal:
Payload{
"chat_id" => update.message.not_nil!.chat.id,
"text" => "Crystal is awesome but complicated",
}
This is of course not great, but depending on your usecase it might be sufficient.
If you want to have a general interface that allows to receive any type of hash, you'll need to cast it to the Payload type. This means copying the data into a new hash of that type:
def self.new(hash, method : String = "sendMessage")
payload = Payload.new
hash.each do |key, value|
payload[key] = value
end
new(payload, method)
end
For a real-world example, I am using this approach in Crinja to convert a number of different type variations to the matching ones.

Related

How to make an array of a composed type in Crystal?

I was trying
class Output
alias Type = String | Array(Output) | Hash(Symbol, Output)
getter raw
def initialize(#raw : Type)
end
end
hash = Output.new({ :a => Output.new("1") })
array = Output.new([hash.raw])
Type includes Array(Output) so I needed to pass the output and not the raw type:
array = Output.new([hash])

Scala spark how to interact with a List[Option[Map[String, DataFrame]]]

I'm trying to interact with this List[Option[Map[String, DataFrame]]] but I'm having a bit of trouble.
Inside it has something like this:
customer1 -> dataframeX
customer2 -> dataframeY
customer3 -> dataframeZ
Where the customer is an identifier that will become a new column.
I need to do an union of dataframeX, dataframeY and dataframeZ (all df have the same columns). Before I had this:
map(_.get).reduce(_ union _).select(columns:_*)
And it was working fine because I only had a List[Option[DataFrame]] and didn't need the identifier but I'm having trouble with the new list. My idea is to modify my old mapping, I know I can do stuff like "(0).get" and that would bring me "Map(customer1 -> dataframeX)" but I'm not quite sure how to do that iteration in the mapping and get the final dataframe that is the union of all three plus the identifier. My idea:
map(/*get identifier here along with dataframe*/).reduce(_ union _).select(identifier +: columns:_*)
The final result would be something like:
-------------------------------
|identifier | product |State |
-------------------------------
| customer1| prod1 | VA |
| customer1| prod132 | VA |
| customer2| prod32 | CA |
| customer2| prod51 | CA |
| customer2| prod21 | AL |
| customer2| prod52 | AL |
-------------------------------
You could use collect to unnest Option[Map[String, Dataframe]] to Map[String, DataFrame]. To put an identifier into the column you should use withColumn. So your code could look like:
import org.apache.spark.sql.functions.lit
val result: DataFrame = frames.collect {
case Some(m) =>
m.map {
case (identifier, dataframe) => dataframe.withColumn("identifier", lit(identifier))
}.reduce(_ union _)
}.reduce(_ union _)
Something like this perhaps?
list
.flatten
.flatMap {
_.map { case (id, df) =>
df.withColumn("identifier", id) }
}.reduce(_ union _)

What is the difference between JSON::Any and JSON::Type in Crystal?

In Crystal language, what is the difference between JSON::Any and JSON::Type? What are the use cases of this types?
JSON::Any is a struct, which is returned as a result of parsing. It has convenient methods to access underlying data as_s, as_bool, as_f etc. :
obj = JSON.parse %({"access": true})
p obj.class # => JSON::Any
p obj["access"] # => true
p obj["access"].class # => JSON::Any
JSON::Type is an union type of all possible json types. It is used internally by JSON::Any struct to represent the data:
p obj.raw # => {"access" => true}
p obj.raw.class # => Hash(String, JSON::Type)
JSON::Type is a recursively-defined "alias":
alias Type = Nil | Bool | Int64 | Float64 | String | Array(Type) | Hash(String, Type)
Aliases are part of Crystal's type grammar. For details, see https://crystal-lang.org/docs/syntax_and_semantics/alias.html
JSON::Any is a Struct (Struct < Value < Object); an instance of JSON::Any holds the "raw" value of any JSON type:
cr(0.24.1) > x=JSON::Any.new("hi")
=> "hi"
icr(0.24.1) > x
=> "hi"
icr(0.24.1) > x.raw
=> "hi"

Nested hash generation error

Given the following code:
require "big"
alias Type = Nil | String | Bool | Int32 | BigFloat | Array(Type) | Hash(String | Symbol, Type)
alias HOpts = Hash(String | Symbol, Type)
ctx = HOpts.new
ctx["test_int"] = 1
ctx["test_s"] = "hello"
c1 = Hash(String, Type).new
ctx["stuff"] = c1
ctx["stuff"]["foo"] = { "bar" => 1 }
I get:
Error in test.cr:13: instantiating 'Hash(String | Symbol, Type)#[]=(String, Hash(String, Type))'
ctx["stuff"] = c1
^
in /opt/crystal/src/hash.cr:43: instantiating 'insert_in_bucket(Int32, String, Hash(String, Type))'
entry = insert_in_bucket index, key, value
^~~~~~~~~~~~~~~~
in /opt/crystal/src/hash.cr:842: instantiating 'Hash::Entry(String | Symbol, Type)#value=(Hash(String, Type))'
entry.value = value
^~~~~
in /opt/crystal/src/hash.cr:881: expanding macro
property value : V
^
in macro 'property' expanded macro: macro_83313872:567, line 10:
1.
2.
3.
4. #value : V
5.
6. def value : V
7. #value
8. end
9.
> 10. def value=(#value : V)
11. end
12.
13.
14.
15.
instance variable '#value' of Hash::Entry(String | Symbol, Type) must be Type, not Hash(String, Type)
I would expect to be able to create any kind of nested hash but it does not work.
There's a couple of things wrong here.
The type of c1 is Hash(String, Type) which is not one of the types of the Type union. Hash(String, Type) is not compatible with Hash(String | Symbol, Type).
Either include Hash(String, Type) in the Type union, or give c1 the type Hash(String | Symbol, Type) (i.e. HOpts):
c1 = HOpts.new
You will also have another error on this line of code:
ctx["stuff"]["foo"] = { "bar" => 1 }
ctx["stuff"] will return an object of type Type and not a hash as you expect. If you know for certain that ctx["stuff"] is a hash (which we do from this example) then you need to restrict its type. Also { "bar" => 1 } is of type Hash(String, Int32) and not Hash(String, Type), so you need to specify this too:
ctx["stuff"].as(HOpts)["foo"] = HOpts{ "bar" => 1 }

Spark - remove special characters from rows Dataframe with different column types

Assuming I've a Dataframe with many columns, some are type string others type int and others type map.
e.g.
field/columns types: stringType|intType|mapType<string,int>|...
|--------------------------------------------------------------------------
| myString1 |myInt1| myMap1 |...
|--------------------------------------------------------------------------
|"this_is_#string"| 123 |{"str11_in#map":1,"str21_in#map":2, "str31_in#map": 31}|...
|"this_is_#string"| 456 |{"str12_in#map":1,"str22_in#map":2, "str32_in#map": 32}|...
|"this_is_#string"| 789 |{"str13_in#map":1,"str23_in#map":2, "str33_in#map": 33}|...
|--------------------------------------------------------------------------
I want to remove some characters like '_' and '#' from all columns of String and Map type
so the result Dataframe/RDD will be:
|------------------------------------------------------------------------
|myString1 |myInt1| myMap1|... |
|------------------------------------------------------------------------
|"thisisstring"| 123 |{"str11inmap":1,"str21inmap":2, "str31inmap": 31}|...
|"thisisstring"| 456 |{"str12inmap":1,"str22inmap":2, "str32inmap": 32}|...
|"thisisstring"| 789 |{"str13inmap":1,"str23inmap":2, "str33inmap": 33}|...
|-------------------------------------------------------------------------
I am not sure if it's better to convert the Dataframe into an RDD and work with it or perform the work in the Dataframe.
Also, not sure how to handle the regexp with different column types in the best way (I am sing scala).
And I would like to perform this action for all column of these two types (string and map), trying to avoid using the column names like:
def cleanRows(mytabledata: DataFrame): RDD[String] = {
//this will do the work for a specific column (myString1) of type string
val oneColumn_clean = mytabledata.withColumn("myString1", regexp_replace(col("myString1"),"[_#]",""))
...
//return type can be RDD or Dataframe...
}
Is there any simple solution to perform this?
Thanks
One option is to define two udfs to handle string type column and Map type column separately:
import org.apache.spark.sql.functions.udf
val df = Seq(("this_is#string", 3, Map("str1_in#map" -> 3))).toDF("myString", "myInt", "myMap")
df.show
+--------------+-----+--------------------+
| myString|myInt| myMap|
+--------------+-----+--------------------+
|this_is#string| 3|Map(str1_in#map -...|
+--------------+-----+--------------------+
1) Udf to handle string type columns:
def remove_string: String => String = _.replaceAll("[_#]", "")
def remove_string_udf = udf(remove_string)
2) Udf to handle Map type columns:
def remove_map: Map[String, Int] => Map[String, Int] = _.map{ case (k, v) => k.replaceAll("[_#]", "") -> v }
def remove_map_udf = udf(remove_map)
3) Apply udfs to corresponding columns to clean it up:
df.withColumn("myString", remove_string_udf($"myString")).
withColumn("myMap", remove_map_udf($"myMap")).show
+------------+-----+-------------------+
| myString|myInt| myMap|
+------------+-----+-------------------+
|thisisstring| 3|Map(str1inmap -> 3)|
+------------+-----+-------------------+