Crystal language: what to use instead of runtime String::to_sym - crystal-lang

I am trying to convert a Ruby program to Crystal.
And I am stuck with missing string.to_sym
I have a BIG XML file, which is too big to fit in memory.
So parsing it all is out of question. Fortunately i do not need all information, only a portion of it. So i am parsing it myself, dropping most of the lines. I used String::to_sym to store the data, like this:
:param_name1 => 1
:param_name2 => 11
:param_name1 => 2
:param_name2 => 22
:param_name1 => 3
:param_name2 => 33
What should I use in Crystal?
Memory is the bottleneck. I do not want to store param_name1 multiple times.

If you have a known list of parameters you can for example use an enum:
enum Parameter
Name1
Name2
Name3
end
a = "Name1"
b = {'N', 'a', 'm', 'e', '1'}.join
pp a.object_id == b.object_id # => false
pp Parameter.parse(a) == Parameter.parse(b) # => true
If the list of parameters is unknown you can use the less efficient StringPool:
require "string_pool"
pool = StringPool.new
a = "param1"
b = {'p', 'a', 'r', 'a', 'm', '1'}.join
pp a.object_id == b.object_id # => false
a = pool.get(a)
b = pool.get(b)
pp a.object_id == b.object_id # => true

Related

Julia equivalent to python list multiplication

In python I can quickly concatenate and create lists with repeated elements using the + and * operators. For example:
my_list = [1] * 3 + ['a'] * 4 # == [1, 1, 1, 'a', 'a', 'a', 'a']
Similarly in Julia, I can quickly concatenate and create strings with repeated elements using the * and ^ operators. For example:
my_string = "1"^3 * "a"^4 # == "111aaaa"
My question is whether or not there is a convenient equivalent for lists (arrays) in Julia. If not, then what is the simplest way to define arrays with repeated elements and concatenation?
For the above scenario, a shorter form is fill:
[fill(1,3); fill('a', 4)]
You could also define a Python style operator if you like:
⊕(a::AbstractVector{T}, n::Integer) where T = repeat(a, n)
⊕(a::T, n::Integer) where T = fill(a, n)
The symbol ⊕ can be entered in Julia by typing \oplus and pressing Tab.
Now you can do just as in Python:
julia> [1,2] ⊕ 2
4-element Vector{Int64}:
1
2
1
2
julia> 3 ⊕ 2
2-element Vector{Int64}:
3
3
You can use repeat, e.g.
[repeat([1], 3); repeat(['a'],4)]
produces Any[1, 1, 1, 'a', 'a', 'a', 'a'].

How to substitute integers with letters in Ruby

I am new to Ruby and programming. I am working on a card game. I have a variable (straightHigh) currently filled with a number n representing a rank of a card. I want certain numbers (11-14) to be replaced with specific letters (11 => J, 12 => Q, 13 => K, 14 => A).
I've tried gsub and gsub! with and without regular expressions. But regular expressions are very foreign to me.
if y == 5
straightHigh = n + 4
#straightHigh.to_s.gsub!(/[11-14]/, 11 => 'J', 12 => 'Q', 13 => 'k', 14 => 'A')
p straightHigh.to_s
end
I've tried:
straightHigh.to_s.gsub!(/[11-14]/, 14 => 'Ace', 13 => K, 12 => Q, 11 => J)
which resulted in syntax errors.
I've tried
straightHigh.to_s.gsub!(/[11-14]/, 'Ace')
this does not throw an error, but does not seem to alter the values either.
Maybe you should use a case statement:
def get_card(number)
case number
when 2..10
return number.to_s
when 11
return 'J'
when 12
return 'Q'
when 13
return 'J'
when 14
return 'Ace'
end
end
I am not sure what you are trying to do, but I believe you are trying to map an integer with a string? If so, you can use a hash:
# straight_high Integer
# returns String
def get_card(straight_high)
card_values = {
11 => 'J',
12 => 'Q',
13 => 'K',
14 => 'Ace',
}
card_values[straight_high]
end

How to define a regex-matched string type in Typescript?

Is it possible to define an interface which has some information on the format of a string? Take the following example:
interface timeMarkers{
markerTime: string[]
};
an example would be:
{
markerTime: ["0:00","1:30", "1:48"]
}
My question: Is there a way to define the type for markerTime such that that the string value must always match this regex, instead of declaring it as simply string[] and going from there?
var reg = /[0-9]?[0-9]:[0-9][0-9]/;
There is no way to define such a type. There is a proposal on GitHub to support this, but it currently does not appear to be a priority. Vote on it and maybe the team might include it in a future release.
Edit
Starting in 4.1 you can define a type that would validate the string without actually defining all the options:
type MarkerTime =`${number| ''}${number}:${number}${number}`
let a: MarkerTime = "0-00" // error
let b: MarkerTime = "0:00" // ok
let c: MarkerTime = "09:00" // ok
Playground Link
Until regex types become available to the language, you can now use template literal types in TS 4.1.
Let me refer to the question example and illustrate, how to model a time restricted string type called Time. Time expects strings in the format hh:mm (e.g. "23:59") here for simplification.
Step 1: define HH and MM types
Paste following code into your browser web console:
Array.from({length:24},(v,i)=> i).reduce((acc,cur)=> `${acc}${cur === 0 ? "" : "|"}'${String(cur).padStart(2, 0)}'`, "type HH = ")
Array.from({length:60},(v,i)=> i).reduce((acc,cur)=> `${acc}${cur === 0 ? "" : "|"}'${String(cur).padStart(2, 0)}'`, "type MM = ")
Generated result, which we can use as types in TS:
type HH = '00'|'01'|'02'|'03'|'04'|'05'|'06'|'07'|...|'22'|'23'
type MM = '00'|'01'|'02'|'03'|'04'|'05'|'06'|'07'|...|'58'|'59'
Step 2: Declare Time
type Time = `${HH}:${MM}`
Simple as that.
Step 3: Some testing
const validTimes: Time[] = ["00:00","01:30", "23:59", "16:30"]
const invalidTimes: Time[] = ["30:00", "23:60", "0:61"] // all emit error
Here is a live code example to get play around with Time.
type D1 = 0|1;
type D3 = D1|2|3;
type D5 = D3|4|5;
type D9 = D5|6|7|8|9;
type Hours = `${D9}` | `${D1}${D9}` | `2${D3}`;
type Minutes = `${D5}${D9}`;
type Time = `${Hours}:${Minutes}`;
Compact solution aggregating ideas from #bela53 and #yoel halb.
This solution has 2039 enum members for the Time type.
Ts Playground
Basing on the answer of #bela53 but much more simpler, we can do a very simple solution which is similar to what #Titian but without the drawbacks:
type HourPrefix = '0'|'1'|'2';
type MinutePrefix = HourPrefix | '3'|'4'|'5';
type Digit = MinutePrefix |'6'|'7'|'8'|'9';
type Time = `${HourPrefix | ''}${Digit}:${MinutePrefix}${Digit}`
const validTimes: Time[] = ["00:00","01:30", "23:59", "16:30"]
const invalidTimes: Time[] = ["30:00", "23:60", "0:61"] // all emit error
WARNING: There is a limit to what TypeScript can handle with #bela53 approach...
Based upon #bela53's answer I attempted the following type definition for an IPv4 address that results in "TS2590: Expression produces a union type that is too complex to represent." This definition caused IntelliJ to consume lots and lots of CPU time when I ignored the TypeScript error and tried to build anyway on the premise that it was still valid (I ended up needing to kill and restart IntelliJ).
type segment = '0'|'1'|'2'|'3'|'4'|'5'|'6'|'7'|'8'|'9'|'10'|'11'|'12'|'13'|'14'|'15'|'16'|'17'|'18'|'19'|'20'|'21'|'22'|'23'|'24'|'25'|'26'|'27'|'28'|'29'|'30'|'31'|'32'|'33'|'34'|'35'|'36'|'37'|'38'|'39'|'40'|'41'|'42'|'43'|'44'|'45'|'46'|'47'|'48'|'49'|'50'|'51'|'52'|'53'|'54'|'55'|'56'|'57'|'58'|'59'|'60'|'61'|'62'|'63'|'64'|'65'|'66'|'67'|'68'|'69'|'70'|'71'|'72'|'73'|'74'|'75'|'76'|'77'|'78'|'79'|'80'|'81'|'82'|'83'|'84'|'85'|'86'|'87'|'88'|'89'|'90'|'91'|'92'|'93'|'94'|'95'|'96'|'97'|'98'|'99'|'100'|'101'|'102'|'103'|'104'|'105'|'106'|'107'|'108'|'109'|'110'|'111'|'112'|'113'|'114'|'115'|'116'|'117'|'118'|'119'|'120'|'121'|'122'|'123'|'124'|'125'|'126'|'127'|'128'|'129'|'130'|'131'|'132'|'133'|'134'|'135'|'136'|'137'|'138'|'139'|'140'|'141'|'142'|'143'|'144'|'145'|'146'|'147'|'148'|'149'|'150'|'151'|'152'|'153'|'154'|'155'|'156'|'157'|'158'|'159'|'160'|'161'|'162'|'163'|'164'|'165'|'166'|'167'|'168'|'169'|'170'|'171'|'172'|'173'|'174'|'175'|'176'|'177'|'178'|'179'|'180'|'181'|'182'|'183'|'184'|'185'|'186'|'187'|'188'|'189'|'190'|'191'|'192'|'193'|'194'|'195'|'196'|'197'|'198'|'199'|'200'|'201'|'202'|'203'|'204'|'205'|'206'|'207'|'208'|'209'|'210'|'211'|'212'|'213'|'214'|'215'|'216'|'217'|'218'|'219'|'220'|'221'|'222'|'223'|'224'|'225'|'226'|'227'|'228'|'229'|'230'|'231'|'232'|'233'|'234'|'235'|'236'|'237'|'238'|'239'|'240'|'241'|'242'|'243'|'244'|'245'|'246'|'247'|'248'|'249'|'250'|'251'|'252'|'253'|'254'|'255';
export type ipAddress = `${segment}.${segment}.${segment}.${segment}`;
I'm not sure if there is any workaround for this.
For MySQL date/time strings
I was trying to make a type that reflected MySQL datetime string values ie "2022-07-31 23:11:54".
Interestingly, you can almost do it currently, but if you add any more specificity it will end up either being any or complain that it can't add more typing. I think there is limit to the # of typings it can create?
type OneToNine = 1|2|3|4|5|6|7|8|9
type ZeroToNine = 0|1|2|3|4|5|6|7|8|9
export type DateTimeType = `${
`${number}`
}-${
`0${OneToNine}` | `1${0|1|2}`
}-${
`0${OneToNine}` | `1${ZeroToNine}` | `2${ZeroToNine}` | `3${0|1}`
} ${
`0${OneToNine}` | `1${0|OneToNine}` | `2${0|1|2|3}`
}:${number}:${number}`
I was just looking for a similar feature right now, too!
And I ended up thinking about this:
Would'nt it be possible to get this running by setting up a little more complex dev-environment? Maybe you could use a file-watcher to trigger tsc and look up TypeError events to update your *d.ts file.
I mean something like:
export type superRegexType = 'type-1' | 'type-2' | '/type-/';
and as a hook something (rudimental suggestion):
const onTypeError = (err: Error, nameOfTypeSuperRegexType: string) => {
const myTypesFile = require('fs').readFileSync(`path/to/\*d.ts`) as string;
const searchFor = `export type ${nameOfTypeSuperRegexType} =`;
const getDefinition = (inMyTypesFile: string, searchFor: string) => {
const typeDefinitionString = inMyTypesFile.split(searchFor)[0].split(';')[0] + ';';
const stringsInThere = typeDefinitionString.split(' | ').map(_str => _str.trim());
const myRegexStr = stringsInThere.pop();
return {
oldTypeDefinitionString: typeDefinitionString,
stringsInThere,
myRegexStr,
myRegex: new RegExp(myRegexStr)
};
};
const myTypeDefinition = getDefinition(myTypesFile, searchFor);
const shouldDynamicallyAddType = myTypeDefinition.myRegex.exec(err.message);
if (!shouldDynamicallyAddType) {
console.log("this is a real TypeError");
console.error(err);
return;
} else {
const indexInErrMessage = shouldDynamicallyAddType.index;
const _endIdx = err.message.indexOf('\'');
const typeToAdd = err.message.slice(indexInErrMessage, _endIdx).trim();
myTypeDefinition.stringsInThere.push(typeToAdd);
const updatedTypeDefinitionString = `${searchFor} ${myTypeDefinition.stringsInThere.join(' | ')} ${myTypeDefinition.myRegexStr};`;
myTypesFile.replace(myTypeDefinition.oldTypeDefinitionString, updatedTypeDefinitionString);
// --> save that new d.ts and return hopefully watch the lint-error disappearing
}
}
Maybe this kind of solution would allow you to dynamically add types based on your RegEx on compiletime.
What do you think?
My 2 cents
type digit01 = '0' | '1';
type digit03 = digit01 | '2' | '3';
type digit05 = digit03 | '4' | '5';
type digit09 = digit05 | '6' | '7' | '8' | '9';
type minutes = `${digit05}${digit09}`;
type hour = `${digit01 | ''}${digit09}` | `2${digit03}`;
type MarkerTime = `${hour}:${minutes}`;
const ok: Record<string, MarkerTime> = {
a: '0:00',
b: '09:00',
c: '23:59',
};
const notOk: Record<string, MarkerTime> = {
a: '0-00',
b: '24:00',
c: '93.242:942.23',
};
Just to complement #bela53's answer, the use const assertions can be used for the type construction.
const hours = [
'00' , '01', '02', '03', '04', '05', '06', '07', '08',
'09' , '10', '11', '12', '13', '14', '15', '16',
'17' , '18', '19', '20', '21', '22', '23', '24'
] as const
type HH = typeof hours[number]
const minutes = [
'00', '01', '02', '03', '04', '05', '06', '07', '08', '09',
'10', '11', '12', '13', '14', '15', '16', '17', '18', '19',
'20', '21', '22', '23', '24', '25', '26', '27', '28', '29',
'30', '31', '32', '33', '34', '35', '36', '37', '38', '39',
'40', '41', '42', '43', '44', '45', '46', '47', '48', '49',
'50', '51', '52', '53', '54', '55', '56', '57', '58', '59'
] as const
type MM = typeof minutes[number]
type Time = `${HH}:${MM}`

How to remove double quotes from keys in RDD and split JSON into two lines?

I need to modify the data to give input to CEP system, my current data looks like below
val rdd = {"var":"system-ready","value":0.0,"objectID":"2018","partnumber":2,"t":"2017-08-25 11:27:39.000"}
I need output like
t = "2017-08-25 11:27:39.000
Check = { var = "system-ready",value = 0.0, objectID = "2018", partnumber = 2 }
I have to write RDD map operations to achieve this if anybody suggests better option welcome. colcount is the number of columns.
rdd.map(x => x.split("\":").mkString("\" ="))
.map((f => (f.dropRight(1).split(",").last.toString, f.drop(1).split(",").toSeq.take(colCount-1).toString)))
.map(f => (f._1, f._2.replace("WrappedArray(", "Check = {")))
.map(f => (f._1.drop(0).replace("\"t\"", "t"), f._2.dropRight(1).replace("(", "{"))) /
.map(f => f.toString().split(",C").mkString("\nC").replace(")", "}").drop(0).replace("(", "")) // replacing , with \n, droping (
.map(f => f.replace("\" =\"", "=\"").replace("\", \"", "\",").replace("\" =", "=").replace(", \"", ",").replace("{\"", "{"))
Scala's JSON parser seems to be a good choice for this problem:
import scala.util.parsing.json
rdd.map( x => {
JSON.parseFull(x).get.asInstanceOf[Map[String,String]]
})
This will result in an RDD[Map[String, String]]. You can then access the t field from the JSON, for example, using:
.map(dict => "t = "+dict("t"))

Can I combine a list of similar dataframes into a single dataframe? [duplicate]

This question already has answers here:
Combine a list of data frames into one data frame by row
(10 answers)
Closed 4 years ago.
I have a dataframe:
foo <- list(df1 = data.frame(x=c('a', 'b', 'c'),y = c(1,2,3)),
df2 = data.frame(x=c('d', 'e', 'f'),y = c(4,5,6)))
Can I convert it to a single dataframe of the form:
data.frame(x = c('a', 'b', 'c', 'd', 'e', 'f'), y= c(1,2,3,4,5,6))
?
do.call("rbind", foo) should do the trick.
with plyr:
foo <- list(df1 = data.frame(x=c('a', 'b', 'c'),y = c(1,2,3)),
df2 = data.frame(x=c('d', 'e', 'f'),y = c(4,5,6)))
library(plyr)
ldply(foo)[,-1]
x y
1 a 1
2 b 2
3 c 3
4 d 4
5 e 5
6 f 6
There are several problems with your code.
The first is that the assignment statement in the list doesn't work. This needs to be fixed by, for example:
foo <- list(
df1 = data.frame(x=c('a', 'b', 'c'), y = c(1,2,3)),
df2 = data.frame(x=c('d', 'e', 'f'), y = c(4,5,6))
)
You can then use rbind() to combine the data frames:
rbind(foo$df1, foo$df2)
x y
1 a 1
2 b 2
3 c 3
4 d 4
5 e 5
6 f 6
But this poses more questions. For example, why do you combine the data frames in a list in the first place. The second is whether you really need to use data frames rather than vectors. Finally, I generally try to avoid rbind() and rather use merge() when combining data frames in this way.
How about merge(foo[[1]], foo[[2]], all = TRUE)