Duplicate key exception reading map from edn file

Duplicate key exception reading map from edn file - clojure

I am persisting a Clojure map to a file using
(spit (str tmpdir "/" "results.edn") (.toString c))
where c is a very large hash-map (the file is 201MB). The keys of the map are strings and the values are numbers.
When I try to read the map back in from the file I'm getting a duplicate key exception. Here is how I'm trying to read it back in
(def phrases (edn/read
(PushbackReader.
(io/reader
"/tmp/mednotes8910030368496690883/results.edn"))))
How can the map end up with duplicate keys when it is written out to the edn file? Any ideas?
Here's the meat of the exception:
Caused by java.lang.IllegalArgumentException
Duplicate key: ? 5
PersistentHashMap.java: 67 clojure.lang.PersistentHashMap/createWithCheck
RT.java: 1538 clojure.lang.RT/map
EdnReader.java: 631 clojure.lang.EdnReader$MapReader/invoke
EdnReader.java: 142 clojure.lang.EdnReader/read
EdnReader.java: 108 clojure.lang.EdnReader/read
edn.clj: 35 clojure.edn/read
edn.clj: 33 clojure.edn/read
AFn.java: 154 clojure.lang.AFn/applyToHelper
AFn.java: 144 clojure.lang.AFn/applyTo
As requested, here is a sample of what is in the map:
{"cervical region of" 32,
"partial brachial" 64,
"is an effective medication for" 32,
", as stopping them" 32,
", supportive treatment" 160,
"should be eating a normal" 128,
"o call back if supplement" 32,
"diagnosiscod diabc }" 32,
"days in case of allergies" 32,
"ointment 8 drops" 32,
"leg from pinched" 32,
...
#fl00r suggested in the comments to grep the results.edn file for the string "? 5". Using egrep to show the 20 characters before and after this string, I get 4 entries with that key:
[/tmp/mednotes8910030368496690883]> egrep -o ".{20}\"\? 5\" .{20}" results.edn
oothbrush for" 160, "? 5" 32, ". ) during his
t with sutures" 32, "? 5" 352, "4.81 pounds" 1
"being up all" 32, "? 5" 32, "limited financi
everytime she" 32, "? 5" 32, "had a partial m
interesting that 3 of the 4 have exactly the same counts. Still not sure how this can happen when written from a seemingly good map that doesn't have these duplications.

Related

How can find some pin point with regex

I am trying to analyze a string with regex (e.g. 20, 38,, 20, 24 n2,, 20, 28, 38,, 851, 859 n3,) in XML files.
Example text:
<p>Gilmer v Interstate/Johnson Lane Corp. (1991) 500 US 20, 38, 111 S Ct 1647:</p>
<p>Gilmer v Interstate/Johnson Lane Corp. (1991) 500 US 20, 24 n2, 111 S Ct 1647</p>
<p>Gilmer v Interstate/Johnson Lane Corp.</italic> (1991) 500 US 20, 28, 38, 111 S Ct 1647</p>
<p>International Bhd. of Elec. Workers v Hechler (1987) 481 US 851, 859 n3, 107 S Ct 2161:</p>
I want to modify the (\([^()]*)|([0-9]+,)\s*[0-9]+,?\s*[0-9]+, regex because I am replacing the text with $1$2.
(https://regex101.com/r/jWt2w1/2)

Use
(\([^()]*)|([0-9]+,)\s*[0-9]+(?:\s+[a-z]+)?,?\s*[0-9]+(?:\s+[a-z]+)?,
See proof
The (?:\s+[a-z]+)? optionally matches one or more whitespace characters and one or more letters.

Remove regex pattern from string and store in csv

I am trying to clean up a CSV by using regex. I have accomplished the first part which extracts the regex pattern from the address table and writes it to the street_numb field. The part I need help with is removing that same pattern from the street field so I only end up with the following (i.e., Steinway St, 31 St, 82nd Rd, and 19th St) stored in the street field. Hence these values would be removed (-78, -45, -35, -54) from the street field.
b street_numb street address zipcode
1 246 FIFTH AVE 246 FIFTH AVE 11215
2 30 -78 -78 STEINWAY ST 30 -78 STEINWAY ST 11016
3 25 -45 -45 31ST ST 25 -45 31ST ST 11102
4 123 -35 -35 82ND RD 123 -35 82ND RD 11415
5 22 -54 -54 19TH ST 22 -54 19TH ST 11105
Sample Data (above)
import csv
import re
path = '/Users/darchcruise/Desktop/bldg_zip_codes.csv'
with open(path, 'rU') as infile, open(path+'out.csv', 'w') as outfile:
fieldnames = ['b', 'street_numb', 'street', 'address', 'zipcode']
readablefile = csv.DictReader(infile)
writablefile = csv.DictWriter(outfile, fieldnames=fieldnames)
for row in readablefile:
add = re.match(r'\d+\s*-\s*\d+', row['address'])
if add:
row['street_numb'] = add.group()
# row['street'] = remove re.string (add.group()) from street field
writablefile.writerow(row)
else:
writablefile.writerow(row)
What code in line 12 (# remove re.string from row['street']) could be used to resolve my issue (removing -78, -45, -35, -54 from the street field)?

You can use capturing group with findall like this
[x for x in re.findall("(\d+\s*(-\s*\d+\s+)?)((\w|\s)+)", row['address'])][0][0]-->gives street number
[x for x in re.findall("(\d+\s*(-\s*\d+\s+)?)((\w|\s)+)", row['address'])][0][2]-->gives address

Repeating Capture Groups Regex

I have a large chunk of class data that I need to run a regular expression on and get data back from. The problem is that I need a repeating capturing group in order to acomplish that.
Womn St 157A QUEERHISTORY MAKING
CCode Typ Sec Unt Instructor Time Place Max Enr Req Rstr Status
32680 LEC A 4 SHAH, P. TuTh 11:00-12:20p IAB 131 35 37 60 FULL
Womn St 171 SEX/RACE & CONQUEST
CCode Typ Sec Unt Instructor Time Place Max Enr Req Rstr Status
32710 LEC A 4 O'TOOLE, R. TuTh 2:00- 3:20p DBH 1300 52 13/45 24 OPEN
~ Same as 25610 (GlblClt 103B, Lec A); 26350 (History 169, Lec A); and
~ 60320 (Anthro 139, Lec B).
32711 DIS 1 0 MONSON, A. W 9:00- 9:50 HH 105 25 5/23 8 OPEN
O'TOOLE, R.
~ Same as 25612 (GlblClt 103B, Dis 1); 26351 (History 169, Dis 1); and
~ 60321 (Anthro 139, Dis 1).
The result I need would return two matches
Match
Group1:Womn St 157A
Group2:QUEERHISTORY MAKING
Group3:32680
Group4:LEC
Group5:A
Group6:SHAH, P.
Group7:TuTh 11:00-12:20p
Group8:IAB 13
Match
Group1:Womn St 171
Group2:SEX/RACE & CONQUEST
Group3:32710
Group4:LEC
Group5:A
Group6:O'TOOLE, R.
Group7:TuTh 2:00- 3:20p
Group8:DBH 1300
Group9:25610
Group10:26350
Group11:60320
Group12:32711
Group13:DIS
Group14:1
Group15:MONSON, A.
Group16: W 9:00- 9:50
Group17:HH 105
Group18:25612
Group19:26351
Group20:60321

How to extract character component in values and replace the values with -99

My data looks like:
VAR_A: 134, 15M3, 2004, 301ME, 201E, 41, 53, 22
I'd like to change this vector like below:
VAR_A: 134, -99, 2004, -99, -99, 41, 53, 22
If a value contain characters (e.g., M, E), I want to change those values with -99.
How could I do it in R? I've heard that regular expression would be a possible way, but I'm not good at it.

It seems to me you want to replace the values that are not digits, if that is the case ...
x <- c('134', '15M3', '2004', '301ME', '201E', '41', '53', '22')
sub('.*\\D.*', '-99', x)
# [1] "134" "-99" "2004" "-99" "-99" "41" "53" "22"
Or essentially you could do:
x[grepl('\\D', x)] <- -99
as.numeric(x)
# [1] 134 -99 2004 -99 -99 41 53 22

JasperReports: Converting String into array and populating List with it

What I have is this String [125, 154, 749, 215, 785, 1556, 3214, 7985]
(string can have anything from 1 to 15 ID's in it and the reason it is a string and not a List is that, its being sent through a URL)
I need to populate a List called campusAndFaculty with it
I am using iReport 5.0.0
I've tried entering this in the campusAndFaculty default value Expression field
Array.asList(($P{campusAndFacultyString}.substring( 1, ($P{campusAndFacultyString}.length() -2 ))).split("\\s*,\\s*"))
But it does not populate the campusAndFaculty List
Any idea how I can populate the List campusAndFaculty using that String ("campusAndFacultyString")?
======================
UPDATE
I have these variables in iReport (5.0.0)
String campusAndFacultyFromBack = "[111, 125, 126, 4587, 1235, 1259]"
String noBrackets = $P{campusAndFacultyFromBack}.substring(1 ($P{campusAndFacultyFromBack}.length() -1 ))
List campusAndFacultyVar = java.util.Arrays.asList(($V{noBrackets}).split("\\s*,\\s*"))
When I print campusAndFacultyVar It returns "[111, 125, 126, 4587, 1235, 1259]"
but when I use it in a Filter I get the "Cannot evaluate the following expression: org_organisation.org_be_id in null"

This works for me:
String something = "[125, 154, 749, 215, 785, 1556, 3214, 7985]";
Arrays.asList((something.substring(1, (something.length() -1 ))).split("\\s*,\\s*"));
Which means you can do this in iReport:
java.util.Arrays.asList(($P{campusAndFacultyString}.substring(1, (something.length() -1 ))).split("\\s*,\\s*"));
Differences with your snippet:
It's Arrays, not Array
You should take 1, not 2 from the length
Fully qualified reference to Arrays class (which may or may not matter depending on how your iReport is configured)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Duplicate key exception reading map from edn file - clojure

Related

How can find some pin point with regex

Remove regex pattern from string and store in csv

Repeating Capture Groups Regex

How to extract character component in values and replace the values with -99

JasperReports: Converting String into array and populating List with it

Categories

Resources