Safe persistenceId encoder for complex primary key - akka

What is the best way to generate persistenceId from complex key (String, String) and every part of this key is not controlled (may contain any symbol)?
If decoding is required
If decoding is not required (original key stored in actor)

One possibility is to encode the length of the first string of the key as the first characters of the persistenceId. An example in Scala:
import scala.util.Try
type ComplexKey = (String, String)
def persistenceIdFor(key: ComplexKey): String = {
val (first, second) = key
val firstLen = first.length
s"${firstLen};${first},${second}"
}
val PersistenceIdRegex = """^(\d+);(.*)$""".r
// You might not necessarily ever need to get a key for a given persistenceId, but to show the above is invertible
def keyForPersistenceId(persistenceId: String): Option[ComplexKey] =
persistenceId match {
case PersistenceIdRegex(firstLenStr, content) =>
val firstLenTry = Try(firstLenStr.toInt)
firstLenTry
.filter(_ <= content.length)
.toOption
.map(firstLen => content.splitAt(firstLen))
case _ => None
}
Another would be to use escaping, but that can have a lot of subtleties, despite its initially apparent simplicity.
The specific Akka Persistence backend in use may enforce restrictions on the persistenceId (e.g. length of IDs).

Related

How to use regular expression in spark dataframe using scala?

In my case i have a data frame contaning some biological data which are: protein name, ecnumber (which could be more than one) and protein domains (which could be also more than one domain). The data frame is a one column containing all those data which i would like to split it into three columns, but the problem is that if a line (containing more than one ECnumber) is splitted, the second ECnumber goes to the third column and the domains will be then disappeared.
here is my code:
val df = rdd.toDF()
val mydf = df.withColumn("_tmp", split($"value", ";")).select(
$"_tmp".getItem(0).as("Entry"),
$"_tmp".getItem(1).as("ECnumber"),
$"_tmp".getItem(2).as("Domains")
And here is the result
enter image description here
Based on the provided reference data, I see you can use the following regular expression to retrieve your data into independent columns (by extracting using regular expression):
val dataFrameValueRegex = "(\\w++);(([0-9.-]*+;)++)((\\w++;?)++)".r
For example if data frame value has got the following:
val dataFrameValue = "A6MML6;2.1.-.-;2.1.3.16;IPR037431;IPR037432;IPR037433"
Now using regular expression, you can extract independent values from data frame value:
val dataFrameValueRegex(entry, ecNumbers, _, domains, _) = dataFrameValue
Above: All values will be retrieved in corresponding variables:
1.) entry: Entry string
2.) ecNumbers: Complete string of ecnumbers separated by semicolon's. There would be a semicolon present at the end of the string.
3.) domains: Complete string of domains separated by semicolon's.
Note: If for any reason the data frame value was not as expected you would get a MatchError exception been thrown.
In the below code just printing variables information.
println(s"Data value: Entry = [$entry], ECnumbers = [${ecNumbers.init}], Domains = [$domains]")
val ecNumber = ecNumbers.init.split(";")
ecNumber.foreach(e => println(s"ecNumber = [$e]"))
val domain = domains.split(";")
domain.foreach(d => println(s"Domain = [$d]"))

keys stored in list and using them in keyExtractor

I have a nested object and I extracted the keys and values using Object.keys and Object.values and storing them under 'keys' and 'arrayData' below. I can't figure out how to use keyExtractor with the list of keys ('keys') I have created to match the array. Any ideas?
nested object from firebase
let data = this.props.jobHistory
let keys = Object.keys(data)
let arrayData = Object.values(data)
<FlatList
keyExtractor={(item, index) => keys[index]} //I thought this would work but it is printing nothing
data={arrayData}
renderItem={({item}) => {
return (
<Text>{JSON.stringify(item)}{"\n"}</Text>
)}
}
Thanks
keyExtractor does not present or print any actual data, it only provides a "key" that's used by RN to differentiate components. It's an invisible value. Assuming every value in keys is unique, this should not return any errors. If you want to verify that the correct key is bring paired to the correct item when printed, you can do
keyExtractor={(item, index) => console.log(`${keys[index]}:${arrayData{index}}`);keys[index]} //I thought this would work but it is printing nothing

Find maximum w.r.t. substring within each group of formatted strings

I am struggling to find solution for a scenario. I have few files in a directory. lets say
vbBaselIIIData_201802_3_d.data.20180405.txt.gz
vbBaselIIIData_201802_4_d.data.20180405.txt.gz
vbBaselIIIData_201803_4_d.data.20180405.txt.gz
vbBaselIIIData_201803_5_d.data.20180405.txt.gz
Here suppose the single digit number after the second underscore is called runnumber. I have to pick only files with latest runnumber. so in this case I need to pick only two out of the four files and put it in a mutable scala list. The ListBuffer should contain :
vbBaselIIIData_201802_4_d.data.20180405.txt.gz
vbBaselIIIData_201803_5_d.data.20180405.txt.gz
Can anybody suggest me how to implement this. I am using Scala, but only algorithm is also appreciated. What could be the right sets of datastructure we can use? What are the functions we need to implement? Any suggestions.
Here is a hopefully somewhat inspiring proposal that demonstrates a whole bunch of different language features and useful methods on collections:
val list = List(
"vbBaselIIIData_201802_3_d.data.20180405.txt.gz",
"vbBaselIIIData_201802_4_d.data.20180405.txt.gz",
"vbBaselIIIData_201803_4_d.data.20180405.txt.gz",
"vbBaselIIIData_201803_5_d.data.20180405.txt.gz"
)
val P = """[^_]+_(\d+)_(\d+)_.*""".r
val latest = list
.map { str => {val P(id, run) = str; (str, id, run.toInt) }}
.groupBy(_._2) // group by id
.mapValues(_.maxBy(_._3)._1) // find the last run for each id
.values // throw away the id
.toList
.sorted // restore ordering, mostly for cosmetic purposes
latest foreach println
Brief explanation of the not-entirely-trivial parts that you might have missed when reading an introduction to Scala:
"regex pattern".r converts a string into a compiled regex pattern
A block { stmt1 ; stmt2 ; stmt3 ; ... ; stmtN; result } evaluates to the last expression result
Extractor syntax can be used for compiled regex patterns
val P(id, run) = str matches the second and third _-separated values
_.maxBy(_._3)._1 finds the triple with highest run number, then extracts the first component str again
Output:
vbBaselIIIData_201802_4_d.data.20180405.txt.gz
vbBaselIIIData_201803_5_d.data.20180405.txt.gz
It's not clear what performance needs you have, even though you're mentioning an 'algorithm'.
Provided you don't have more specific needs, something like this is easy to do with Scala's Collection API. Even if you were dealing with huge directories, you could probably achieve some good performance characteristics by moving to Streams (at least in memory usage).
So assuming you have a function like def getFilesFromDir(path: String): List[String] where the List[String] is a list of filenames, you need to do the following:
Group files by date (List[String] => Map[String, List[String]]
Extract the Runnumbers, preserving the original filename (List[String] => List[(String, Int)])
Select the max Runnumber (List[(String, Int)] => (String, Int))
Map to just the filename ((String, Int) => String)
Select just the values of the resulting Map (Map[Date, String] => String)
(Note: if you want to go the pure functional route, you'll want a function something like def getFilesFromDir(path: String): IO[List[String]])
With Scala's Collections API you can achieve the above with something like this:
def extractDate(fileName: String): String = ???
def extractRunnumber(fileName: String): String = ???
def getLatestRunnumbersFromDir(path: String): List[String] =
getFilesFromDir(path)
.groupBy(extractDate) // List[String] => Map[String, List[String]]
.mapValues(selectMaxRunnumber) // Map[String, List[String]] => Map[String, String]
.values // Map[String, String] => List[String]
def selectMaxRunnumber(fileNames: List[String]): String =
fileNames.map(f => f -> extractRunnumber(f))
.maxBy(p => p._2)
._1
I've left the extractDate and extractRunnumber implementations blank. These can be done using simple regular expressions — let me know if you're having trouble with that.
If you have the file-names as a list, like:
val list = List("vbBaselIIIData_201802_3_d.data.20180405.txt.gz"
, "vbBaselIIIData_201802_4_d.data.20180405.txt.gz"
, "vbBaselIIIData_201803_4_d.data.20180405.txt.gz"
, "vbBaselIIIData_201803_5_d.data.20180405.txt.gz")
Then you can do:
list.map{f =>
val s = f.split("_").toList
(s(1), f)
}.groupBy(_._1)
.map(_._2.max)
.values
This returns:
MapLike.DefaultValuesIterable(vbBaselIIIData_201803_5_d.data.20180405.txt.gz, vbBaselIIIData_201802_4_d.data.20180405.txt.gz)
as you wanted.

How to handle redundant types in Crystal?

I'm using the crystal language, and it's been great so far. Unfortunately, I feel like my code is becoming a bit too messy with types everywhere.
For example:
# ---------=====----++---
# Grab characters
# ---------=====----++---
def handle_character_list(msg, client)
result = {} of String => Array(Tuple(Int64, String, String, Int16, Int8)) | Int32 | Int16 | Int64 | String
result["characters"] = db.query_all "select character_id, character_name, DATE_FORMAT(created, '%b:%d:%Y:%l:%i:%p') AS created, level, cc from rpg_characters where user_id = ? ", client.user_id,
as: {Int64, String, String, Int16, Int8}
result["max_char_slots"] = client.user_data["max_char_slots"]
puts result
end
While looking at the db.query_all method, it says:
returns an array where the value of each row is read as the given type
With the aforementioned above, why do I need to explicitly set those types again to my result variable, if they are going to be returned?
I feel like I'm doing something wrong, and any advice/insight is appreciated.
The first thing that jumps out at me is the size of the type of your hash. You seem to be using Hash the same way as in Ruby. Don't.
In Ruby, or other dynamic languages, Hashes or objects are used as generic data containers, almost like unnamed classes. In Crystal, hashes have a single type for the key, and a single type for the value, which makes them unsuited for the task. You want to tell Crystal more about the structure of your data, so you don't have to keep repeating that.
The first thing to do is look at the result object. It can simply be transformed into a record Result:
record Result,
characters: Array({Int64, String, String, Int16, Int8}),
max_char_slots: Int32
the method then becomes
def handle_character_list(msg, client)
sql = <<-SQL
SELECT character_id, character_name, DATE_FORMAT(created, '%b:%d:%Y:%l:%i:%p') AS created, level, cc
FROM rpg_characters
WHERE user_id = ?
SQL
characters = db.query_all sql, client.user_id, as: {Int64, String, String, Int16, Int8}
max_char_slots = client.user_data["max_char_slots"]
Result.new(characters, max_char_slots)
end
However, by looking at the method it might be that this Result record is only used in one place - to return data from this method. In that case it's unlikely you want to give it a more formal name. In this case you could use a NamedTuple. They're a bit like an anonymous record.
def handle_character_list(msg, client)
sql = <<-SQL
SELECT character_id, character_name, DATE_FORMAT(created, '%b:%d:%Y:%l:%i:%p') AS created, level, cc
FROM rpg_characters
WHERE user_id = ?
SQL
{
characters: db.query_all(sql, client.user_id, as: {Int64, String, String, Int16, Int8}),
max_char_slots: client.user_data["max_char_slots"]
}
end
Going further, we can see that a "Character" is also a type:
class Character
getter id : Int64
getter name : String
getter created : Time
getter level : Int16
getter cc : Int8
def initialize(#id, #name, #created, #level, #cc)
end
end
We can then use the DB.mapping to define how the Character class looks in the database.
class Character
DB.mapping({
id: Int64,
name: String.
created: Time,
level: Int16,
cc: Int8
})
def initialize(#id, #name, #created, #level, #cc)
end
end
This definition is equivalent to the previous one because DB.mapping automatically generates getters for us.
def handle_character_list(msg, client)
sql = <<-SQL
SELECT character_id, character_name, created, level, cc
FROM rpg_characters
WHERE user_id = ?
SQL
{
characters: db.query_all(sql, client.user_id, as: Character),
max_char_slots: client.user_data["max_char_slots"]
}
end
Going even further, I'd extract that into two methods, each one doing just one thing, and I'd probably make client.user_data more type safe:
def characters_for_user(user_id)
sql = <<-SQL
SELECT character_id, character_name, created, level, cc
FROM rpg_characters
WHERE user_id = ?
SQL
db.query_all(sql, user_id, as: Character)
end
def handle_character_list(msg, client)
{
characters: characters_for_user(client.user_id),
max_character_slots: client.user_data.max_char_slots
}
end
This is just my thought process on how I'd write the code you've shown. I've made a lot of assumptions about your code and database which might be wrong (i.e. "created" is a DATETIME in mysql). I'm attempting to show a thought process, not a finished solution. Hope that helps.

Adding numbers from a list of integer to listbox

I am currently having issues with trying to get a list of integers to show in a listbox.
I have more numbers to show, however i cant even get one number to show.
There is no error, the listbox but shows this System.Collections.Generic.List'1[System.Int32]
Dim URL = New Uri("http://www.hurriyet.com.tr/sans-oyunlari/sans-topu-sonuclari/")
Dim WebClient As New HttpClient
Dim Source = Await WebClient.GetStringAsync(URL)
Dim ListofNumber As List(Of Integer)
ListofNumber = New List(Of Integer)
Dim WebCode1 As String = "<span id=""_ctl0_ContentPlaceHolder1_lblresutone"" class=""hurriyet2010_so_sanstopu_no_text"">([^>]*)</span></div>"
For Each item As Match In (New Regex(WebCode1)).Matches(Source)
ListofNumber.Add(item.Groups(1).Value)
Next
listBox1.Items.Add(ListofNumber)
Currently you're adding a single item to the list, which is the List(Of Integer) object. You need to add each item in the list separately, like this:
For Each i As Integer In ListOfNumber
listBox1.Items.Add(i)
Next
Or, more simply:
listBox1.Items.AddRange(ListOfNumber)
As was already mentioned in the comments, but bears repeating, regex is typically the wrong tool for the job when you're parsing HTML. Using an HTML parser/DOM would be preferable in most cases.
Instead of:
listBox1.Items.Add(ListofNumber)
...it should be:
listBox1.DataSource = ListofNumber
This way you are binding your list of objects (in your case ListofNumber) to the listBox.
In fact you can bind any type of list, and the result shown in the listBox will be the .ToString() of each one of the items (in your case: int.ToString(), which is the string of the number).
An alternative to bind the data source would be: listBox1.Items.Clear(), and then add your items one by one through listBox1.Items.Add(yourItem), or as a group with listBox1.Items.AddRange(ListofNumber).
I believe the issue is with the ID on WebCode1 as ID's are meant to be used once and that is the case in the source downloaded.
Please try this
Dim URL = New Uri("http://www.hurriyet.com.tr/sans-oyunlari/sans-topu-sonuclari/")
Dim WebClient As New HttpClient
Dim Source = Await WebClient.GetStringAsync(URL)
Dim WebCode1 As String = "class=""hurriyet2010_so_sanstopu_no_text"">([^>]*)</span></div>"
ListBox1.DataSource =
(
From T In (New Regex(WebCode1)).Matches(Source) _
.Cast(Of System.Text.RegularExpressions.Match)() _
Select T.Groups(1).Value) _
.ToList