Why do Sets of Characters appear to be ordered?

Why do Sets of Characters appear to be ordered? - clojure

I was always under the impression that Sets aren't ordered, but noticed that Sets of Characters do seem to be ordered:
(seq #{\e \c \b \z \a})
=> (\a \b \c \e \z)
If I introduce other kinds of characters, it seems as though they're being ordered according to the codes of the characters:
(seq #{\e \A \c \space \b \z \a})
=> (\space \A \a \b \c \e \z)
Why are characters being sorted according to their code, but Sets of numbers appear to have arbitrary ordering?

It's because Character/hashCode is directly tied to the character's ordinal number, and sets are based on hashmaps. But if you introduce enough characters to start getting hash collisions, the apparent ordering doesn't entirely hold together:
; the whole alphabet is small enough to avoid collisions
user=> (apply str (set "abcdefghijklmnopqrstuvwxyz"))
"abcdefghijklmnopqrstuvwxyz"
; and observe the hashes are indeed sequential
user=> (map hash (set "abcdefghijklmnopqrstuvwxyz"))
(97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122)
; but go from 26 to 36 elements, and you start to see collisions
user=> (apply str (set "0123456789abcdefghijklmnopqrstuvwxyz"))
"abcdefghijklmno0p1q2r3s4t5u6v7w8x9yz"
user=> (map hash (set "0123456789abcdefghijklmnopqrstuvwxyz"))
(97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 48 112 49 113 50 114 51 115 52 116 53 117 54 118 55 119 56 120 57 121 122)
But of course as you know, this is not a defined behavior, but just how the implementation happens to work at the moment.
Now, you ask why this doesn't happen for numbers: the reason is, Clojure explicitly avoids it! (.hashCode 1) returns 1, because that's how Java defines its hashcodes. But Clojure's hash function uses murmur3, which returns quite different values for numbers than just returning the input: (hash 1) yields 1392991556. I'm not an expert on this, but I believe the primary motivation for using murmur instead of Java's built-in hash function is avoiding hash collisions for security reasons. Timing attacks or something?

Related

Regex to capture and reposition the same pattern

I have a list of numbers that I would like to reformat, but I'm having difficulty with (I think) the substitution -- I'm capturing the groups as I intend to, but they aren't being rendered the way I expect them to be.
Here's some of the text:
Rear seal:
102
111
112
113
137
156
And the expected output is this:
Rear seal:
102 111 112
113 137 156
I'm using this regex to distinguish the first, second, and third lines:
(\d{3}[\n\r])(\d{3}[\n\r])(\d{3}[\n\r]) coupled with \1\t\2\t\3\n for the substitution. But for some reason it comes out as
Rear seal:
102
111
112
113
137
156
I'm using the excellent site regex101.com for testing, but I could use some human input. Specific link is
https://regex101.com/r/R7niEU/1 for this issue.
Thanks in advance.

You are capturing the newline in the capturing group. That way it will also be part of the replacement.
You can only capture the digits and match the newline instead.
Then replace with \1\t\2\t\3\n
(\d{3})[\n\r](\d{3})[\n\r](\d{3})[\n\r]
Regex demo

Regex match multiple numbers stop at string (word) despite more matches exist

Goal;
Match all variations of phone numbers with 8 digits + (optional) country code.
Stop match when "keyword" is found, even if more matches exist after the "keyword".
Need this in a one-liner and have tried a plethora of variations with lookahead/behind and negate [^keyword] but I am unable to understand how to achieve this.
Example of text;
abra 90998855
kadabra 04 94 84 54
cat 132 23 564
oh the nice Hat +41985 32 565
+17 98 56 32 56
Ladida
keyword
I Want It To Stop Matching Here Or Right Before The "keyword"
more nice text with some matches
cat 132 23 564
oh the nice Hat +41985 32 565
+17 98 56 32 56
Example of regex;
(\+\d{1,2})?[\s]?\(?\d{2,3}\)?[\s]?(\d{2})[\s]?(\d{2})?[\s]?(\d{2,3})
-> This matches all numbers also below the keyword
(\+\d{1,2})?[\s]?\(?\d{2,3}\)?[\s]?(\d{2})[\s]?(\d{2})?[\s]?(\d{2,3})[^keyword]
-> This matches all numbers also below the keyword
(\+\d{1,2})?[\s]?\(?\d{2,3}\)?[\s]?(\d{2})[\s]?(\d{2})?[\s]?(\d{2,3})(?!keyword)
-> This matches all numbers also below the keyword
(\+\d{1,2})?[\s]?\(?\d{2,3}\)?[\s]?(\d{2})[\s]?(\d{2})?[\s]?(\d{2,3})(?=keyword)
-> This matches nothing
((\+\d{1,2})?[\s]?\(?\d{2,3}\)?[\s]?(\d{2})[\s]?(\d{2})?[\s]?(\d{2,3})(?:(?!keyword))*)
-> This matches all numbers also below the keyword

Regex only searching part of a string

I'm trying to do some validation on a string.
Y0 40 98 86 A
I would like to be able to replace the 0's which occur in the first 2 characters ie Y0 with O's.
I know how to do the replace part but am struggling to just select that first 0.
It should match all 0's within that first two characters. Ie 00 0Z etc
To clarify, I don't mind what language I just need helping making the Regex Selector

One-step replacement
Thanks to #Rawing for the comment:
"00 40 98 86 A".gsub(/^0|(?<=^.)0/, 'O')
# "OO 40 98 86 A"
The regex means :
start of the string, followed by:
zero, or
a character, followed by a zero.
Another variant by #SebastianProske:
"A0 40 98 86 A".gsub(/(?<!..)0/, 'O')
# "AO 40 98 86 A"
It means : a 0, but only when not preceded by two characters.
Here's a test.
Two steps replacement
It might be easier to do it in two steps. Replace the first character by O if it's a 0, then replace the second character if it's a 0.
Here's a ruby example with a matching group:
"Y0 40 98 86 A".sub(/^0/,'O').sub(/^(.)0/,'\1O')
# "YO 40 98 86 A"
You could also use a lookbehind:
"Y0 40 98 86 A".sub(/^0/,'O').sub(/(?<=^.)0/,'O')
=> "YO 40 98 86 A"

add number as prefix

I have list of number:
19
20
21
22
23
24
25
26
many more numbers...
I want to add one number to all of then as prefix so thay will all becam etree digit numbers:
219
220
221
222
223
224
225
226
It should go lik this in find section: \S{2,} than what should I put in replace section? 2$1 or what I em not expert.

Find all two digits and capture them (with parentheses).
\b(\d\d)\b
Replace captured groups with an additional 2 in front.
2$1

Matching across multiple lines regular expression

I have several lists in a single text file that look like below. It always starts with 0 and it always ends with the word Unique at the start of a newline. I would like to get rid of all of it apart from the line with Unique on it. I looked through stackoverflow and tried the following but it returns the whole text file (there are other strings in the file that I haven't put in this example). Basically the problem is how to account for the newlines in the regex selection
^0(.|\n)*
Input:
0 145
1 139
2 175
3 171
4 259
5 262
6 293
7 401
8 430
9 417
10 614
11 833
12 1423
13 3062
14 10510
15 57587
16 5057575
17 10071
18 375
19 152
20 70
21 55
22 46
23 31
24 25
25 22
26 25
27 14
28 16
29 16
30 8
31 10
32 8
33 21
34 8
35 51
36 65
37 605
38 32
39 2
40 1
41 2
44 1
48 2
51 1
52 1
57 1
63 2
68 1
82 1
94 1
95 1
101 3
102 7
103 1
110 1
111 1
119 1
123 1
129 2
130 3
131 2
132 1
135 1
136 2
137 7
138 4
Unique: 252851
Expected output:
Unique: 252851

You need to use something like
^0[\s\S]*?[\n\r]Unique:
and replace with Unique:.
^ - start of a line
0 - a literal 0
[\s\S]*? - zero or more characters incl. a newline as few as possible
[\n\r] - a linebreak symbol
Unique: - a whole word Unique:
Another possible regex is:
^0[^\r]*(?:\r(?!Unique:)[^\r]*)*
where \r is the line endings in the current file. Replace with an empty string.
Note that you could also use (?m)^0.*?[\r\n]Unique: regex (to replace with Unique:) with the (?m) option:
m: multi-line (dot(.) match newline)

Your method of matching newlines should work, although it's not optimal (alternation is rather slow); the next problem is to make sure the match stops before Unique:
(?s)^0.*(?=Unique:)
should work if there is only one Unique: in your file.
Explanation:
(?s) # Start "dot matches all (including newlines) mode
^0 # Match "0" at the start of the file
.* # Match as many characters as possible
(?=Unique:) # but then backtrack until you're right before "Unique:"

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Why do Sets of Characters appear to be ordered? - clojure

Related

Regex to capture and reposition the same pattern

Regex match multiple numbers stop at string (word) despite more matches exist

Regex only searching part of a string

add number as prefix

Matching across multiple lines regular expression

Categories

Resources