Opening a compressed Dicom (MONOCHROME2) file in Python - compression
I tried to open a Dicom file which is a compressed version (JPEG 2000 Image Compression).
I used different free software tools as well as Python libraries (pydicom, VTK, OpenCV) to open it, but they were unsuccessful
In pydicom, decompress() adapts the transfer syntax of the data set, but not the Photometric Interpretation
It seems GDCM can also handle compressed versions, but there is no "one-click" solution to install GDCM on Windows [Solved!]
I would appreciate any suggestions on this.
Dicom Tags:
(0008, 0016) SOP Class UID UI:
(0008, 0018) SOP Instance UID UI: 1.2.276.0.75.2.2.42.114374073191699.20160725100733359.3844234830
(0008, 0020) Study Date DA: ''
(0008, 0021) Series Date DA: ''
(0008, 0022) Acquisition Date DA: ''
(0008, 002a) Acquisition DateTime DT: ''
(0008, 0030) Study Time TM: '100304'
(0008, 0031) Series Time TM: '100733'
(0008, 0032) Acquisition Time TM: '100733'
(0008, 0050) Accession Number SH: ''
(0008, 0060) Modality CS: ''
(0008, 0070) Manufacturer LO: ''
(0008, 0080) Institution Name LO: ''
(0008, 1010) Station Name SH: ''
(0008, 1030) Study Description LO: ''
(0008, 103e) Series Description LO: 'Model Acquistion'
(0008, 1070) Operators' Name PN: ''
(0008, 1072) Operator Identification Sequence 1 item(s) ----
(0008, 0080) Institution Name LO: ''
---------
(0008, 1090) Manufacturer's Model Name LO: '4000'
(0010, 0010) Patient's Name PN: ''
(0010, 0020) Patient ID LO: ''
(0010, 0021) Issuer of Patient ID LO: ''
(0010, 0030) Patient's Birth Date DA: ''
(0010, 0040) Patient's Sex CS: ''
(0010, 4000) Patient Comments LT: ''
(0018, 0088) Spacing Between Slices DS: "0.0"
(0018, 1000) Device Serial Number LO: '11307'
(0018, 1020) Software Versions LO: '6.5.0.772'
(0018, 1030) Protocol Name LO: ''
(0020, 000d) Study Instance UID UI: 1.2.276.0.75.2.2.42.114374073191699.20160725100304296.3844877810
(0020, 000e) Series Instance UID UI: 1.2.276.0.75.2.2.42.114374073191699.20160725100733328.3844960960
(0020, 0010) Study ID SH: '2016072510030428'
(0020, 0011) Series Number IS: "-2147483648"
(0020, 0012) Acquisition Number IS: "0"
(0020, 0013) Instance Number IS: "1"
(0020, 0052) Frame of Reference UID UI: 1.2.276.0.75.2.2.42.114374073191699.20160725100733328.3844960960
(0020, 0060) Laterality CS: 'OS'
(0020, 0200) Synchronization Frame of Reference UI: 1.2.276.0.75.2.2.42.114374073191699.20140417175345187.1980128920
(0020, 4000) Image Comments LT: ''
(0028, 0002) Samples per Pixel US: 1
(0028, 0004) Photometric Interpretation CS: 'MONOCHROME2'
(0028, 0008) Number of Frames IS: "2"
(0028, 0010) Rows US: 1024
(0028, 0011) Columns US: 1024
(0028, 0030) Pixel Spacing DS: '0.005865103,0.001955034'
(0028, 0100) Bits Allocated US: 8
(0028, 0101) Bits Stored US: 8
(0028, 0102) High Bit US: 7
(0028, 0103) Pixel Representation US: 1
(0028, 1050) Window Center DS: "29.0"
(0028, 1051) Window Width DS: "210.0"
(0028, 2110) Lossy Image Compression CS: '01'
(0028, 2112) Lossy Image Compression Ratio DS: "10.0"
(0032, 1060) Requested Procedure Description LO: ''
(0040, 0008) Scheduled Protocol Code Sequence 1 item(s) ----
(0008, 0100) Code Value SH: 'SD-E1'
(0008, 0102) Coding Scheme Designator SH: '99CZM'
(0008, 0103) Coding Scheme Version SH: '1.0'
(0008, 0104) Code Meaning LO: 'ALL SCANS'
(0008, 010d) Context Group Extension Creator UID UI: CZM
(0061, 0111) Private tag data UL: 1
(0061, 0113) Private tag data LO: '1'
(0061, 0115) Private tag data LO: ''
(0061, 0117) Private tag data LO: 'SD-E1.xml'
(0061, 0119) Private tag data LO: 'False'
(0061, 011b) Private tag data LO: ''
(0061, 011d) Private tag data LO: 'True'
---------
(0040, 0244) Performed Procedure Step Start Date DA: ''
(0040, 0245) Performed Procedure Step Start Time TM: '100733'
(0040, 0260) Performed Protocol Code Sequence 1 item(s) ----
(0008, 0100) Code Value SH: 'SD-S2'
(0008, 0102) Coding Scheme Designator SH: ''
(0008, 0103) Coding Scheme Version SH: '1.0'
(0008, 0104) Code Meaning LO: 'Macular Cube 512x128'
(0008, 010d) Context Group Extension Creator UID UI: CZM
(0061, 0111) Private tag data UL: 2
(0061, 0113) Private tag data LO: '1'
(0061, 0115) Private tag data LO: ''
(0061, 0117) Private tag data LO: 'SD-S2.xml'
(0061, 0119) Private tag data LO: 'False'
(0061, 011b) Private tag data LO: 'No Hires in center.No parameters can be modified.'
(0061, 011d) Private tag data LO: 'False'
---------
(0040, 1001) Requested Procedure ID SH: ''
(0057, 0001) Private Creator UI: 1.2.276.0.75.2.2.42.7
(0057, 1003) Private tag data UL: 1
(0057, 1015) Private tag data LO: 'CZMI'
(0057, 1021) Private tag data LO: ''
(0057, 1023) Private tag data LO: '116525681374'
(0059, 1000) Private tag data LO: 'DATAFILES/E039/3RT8Q85TM1Y7VT7X31WNLNBV99GA43BOJ27IOLS4X2ZU.EX.DCM'
(0059, 1005) Private tag data SL: 0
(0059, 3500) Private tag data SL: 1
(0063, 1000) Private tag data FL: 0.0
(0063, 1005) Private tag data FL: 6.0
(0063, 1010) Private tag data FL: 2.0
(0063, 1015) Private tag data FL: -64.0
(0063, 1020) Private tag data UL: 141
(0063, 1025) Private tag data FL: 0.0
(0063, 1026) Private tag data FL: 0.0
(0063, 1030) Private tag data FL: 1.0
(0063, 1032) Private tag data FL: 1.0
(0063, 1035) Private tag data SL: 113
(0063, 1047) Private tag data FL: 297.0
(0063, 1048) Private tag data FL: 872.0
(0063, 1049) Private tag data FL: 1222.0
(0071, 1070) Private tag data FL: -292.0
(0071, 1095) Private tag data FL: 0.0
(0071, 1100) Private tag data FL: 0.0
(0071, 1105) Private tag data SL: 0
(0073, 1085) Private tag data FL: 1.0
(0073, 1090) Private tag data SL: 0
(0073, 1095) Private tag data SL: 0
(0073, 1100) Private tag data SL: 0
(0073, 1105) Private tag data FL: 0.0
(0073, 1110) Private tag data FL: 0.0
(0073, 1125) Private tag data SL: Array of 128 elements
(0073, 1135) Private tag data FL: 0.6000000238418579
(0073, 1200) Private tag data SL: Array of 128 elements
(0075, 1015) Private tag data 0 item(s) ----
(0075, 1020) Private tag data SL: 0
(0075, 1021) Private tag data SL: 0
(0075, 1035) Private tag data FL: 3.5299999713897705
(0075, 1065) Private tag data FL: 0.0
(0075, 1070) Private tag data FL: 0.0
(0075, 1075) Private tag data FL: 0.0
(0075, 1080) Private tag data FL: 0.0
(0075, 1085) Private tag data SL: 0
(0075, 1210) Private tag data UL: 0
(0075, 1215) Private tag data FL: -inf
(0075, 1220) Private tag data FL: -inf
(7fe0, 0010) Pixel Data OB: Array of 211106 elements ```
I've determined how to unscramble these obfuscated CZM DICOM datasets. Essentially CZM transposes three regions of the JPEG 2000 data and then overwrites those by XORing every 7th byte with 0x5A, which is pretty nasty.
The following function should reverse this, producing a normal JPEG 2000 data stream that can then be written to file or opened with Pillow:
import math
def unscramble_czm(frame: bytes) -> bytearray:
"""Return an unscrambled image frame.
Parameters
----------
frame : bytes
The scrambled CZM JPEG 2000 data frame as found in the DICOM dataset.
Returns
-------
bytearray
The unscrambled JPEG 2000 data.
"""
# Fix the 0x5A XORing
frame = bytearray(frame)
for ii in range(0, len(frame), 7):
frame[ii] = frame[ii] ^ 0x5A
# Offset to the start of the JP2 header - empirically determined
jp2_offset = math.floor(len(frame) / 5 * 3)
# Double check that our empirically determined jp2_offset is correct
offset = frame.find(b"\x00\x00\x00\x0C")
if offset == -1:
raise ValueError("No JP2 header found in the scrambled pixel data")
try:
assert jp2_offset == offset
except AssertionError:
raise ValueError(
f"JP2 header found at offset {offset} rather than the expected "
f"{jp2_offset}"
)
d = bytearray()
d.extend(frame[jp2_offset:jp2_offset + 253])
d.extend(frame[993:1016])
d.extend(frame[276:763])
d.extend(frame[23:276])
d.extend(frame[1016:jp2_offset])
d.extend(frame[:23])
d.extend(frame[763:993])
d.extend(frame[jp2_offset + 253:])
assert len(d) == len(frame)
return d
Related
How to create a boolean calculated field in Amazon QuickSight?
Let's assume I have access to this data in QuickSight : Id Amount Date 1 10 15-01-2019 2 0 16-01-2020 3 100 21-12-2019 4 34 15-01-2020 5 5 20-02-2020 1 50 13-09-2020 4 0 01-01-2020 I would like to create a boolean calculated field, named "Amount_in_2020", whose value is True when the Id have a total strictly positive Amount in 2020, else False. With python I would have done the following : # Sample data df = pd.DataFrame({'Id' : [1,2,3,4,5,1,4], 'Amount' : [10,0,100,34,5,50,0], 'Date' : ['15-01-2019','16-01-2020','21-12-2019','15-01-2020','20-02-2020','13-09-2020','01-01-2020']}) df['Date']=df['Date'].astype('datetime64') # Group by to get total amount and filter dates df_gb = pd.DataFrame(df[(df["Date"]>="01-01-2020") & (df["Date"]<="31-12-2020")].groupby(by=["Id"]).sum()["Amount"]) # Creation of the wanted column df["Amount_in_2020"]=np.where(df["Id"].isin(list(df_gb[df_gb["Amount"]>0].index)),True,False) But I can't find a way to create such a calculated field in Quicksight. Could you please help me ? Expected output : Id Amount Date Amount_in_2020 1 10 2019-01-15 True 2 0 2020-01-16 False 3 100 2019-12-21 False 4 34 2020-01-15 True 5 5 2020-02-20 True 1 50 2020-09-13 True 4 0 2020-01-01 True
Finally found : ifelse(sumOver(max(ifelse(extract("YYYY",{Date})=2020,{Amount},0)), [{Id}])>0,1,0)
Querying Historical Data to get Month End Data
We have a history table that keeps all instances of a record, and flags which is the current record and when it is changed - here is a cut down version for it CREATE TABLE *schema*.hist_temp ( record_id VARCHAR ,record_created_date DATE ,current_flag BOOLEAN ,value int ) INSERT INTO hist_temp VALUES ('Record A','2018-06-01',1,1000); INSERT INTO hist_temp VALUES ('Record A','2018-04-12',0,900); INSERT INTO hist_temp VALUES ('Record A','2018-03-13',0,800); INSERT INTO hist_temp VALUES ('Record A','2018-01-13',0,700); So what we have is Record A, which has been updated 3 times, the latest record is flagged with a 1 but we want to see all 4 instances of the history. Then we have a dates table which holds, among other things, month end dates: SELECT calendar_date ,trunc(month_start) as month_start FROM common.calendar WHERE calendar_year = '2018' and calendar_date < trunc(sysdate) ORDER BY 1 desc Sample data: calendar_date month_start 2018-06-03 2018-06-01 2018-06-02 2018-06-01 2018-06-01 2018-06-01 2018-05-31 2018-05-01 2018-05-30 2018-05-01 2018-05-29 2018-05-01 2018-05-28 2018-05-01 2018-05-27 2018-05-01 2018-05-26 2018-05-01 2018-05-25 2018-05-01 etc Required results: I would like to be able to display the following - show the month start / end position for Record A for 2018 record_id, month_start, value Record A, '2018-06-01', 1000 Record A, '2018-05-01', 900 Record A, '2018-04-01', 800 Record A, '2018-03-01', 700 Record A, '2018-02-01', 700 I am trying to write this query, I have something but know this is wrong as the value is summed up wrongly, please can someone help out ascertain how to get the correct values?
Try: SELECT record_id, date_trunc('month', record_created_date)::date AS month_start, value FROM hist_temp Output: Record A 2018-06-01 1000 Record A 2018-04-01 900 Record A 2018-01-01 700 Record A 2018-03-01 800
Applying cutoff to data set with IDs
I am using SAS and managed to run proc logistic, which gives me a table like so. Classification Table Prob Correct Incorrect Percentages Level Event Non- Event Non- Correct Sensi- Speci- FALSE FALSE Event Event tivity ficity POS NEG J 0 33 0 328 0 9.1 100 0 90.9 . 99 0.02 33 62 266 0 26.3 100 18.9 89 0 117.9 0.04 31 162 166 2 53.5 93.9 49.4 84.3 1.2 142.3 0.06 26 209 119 7 65.1 78.8 63.7 82.1 3.2 141.5 How do I include IDs for the rows of data in lib.POST_201505_PRED below that have at least 0.6 probability? proc logistic data=lib.POST_201503 outmodel=lib.POST_201503_MODEL descending; model BUYER = age tenure usage payment loyalty_card /outroc=lib.POST_201503_ROC; Score data=lib.POST_201505 out=lib.POST_201505_PRED outroc=lib.POST_201505_ROC; run; I've been reading the documentation and searching online but haven't found anything on it. I must be searching for the wrong keywords, as I presume this is a frequently used process.
You just need an id-statement to tell SAS your ID-variable identifies your observations; proc logistic data=lib.POST_201503 outmodel=lib.POST_201503_MODEL descending; id ID; model BUYER = age tenure usage payment loyalty_card /outroc=lib.POST_201503_ROC; Score data=lib.POST_201505 out=lib.POST_201505_PRED outroc=lib.POST_201505_ROC; run; Now your output contains all you need. For instance to print the IDs that get had probability of at least 0.6 assigned of being a BUYER to them; proc print data=lib.POST_201505_PRED (where=(P_1 GE 0.6)); var ID P_1; run; You find these id yourKey; statements throughout the statistical procedures in SAS, for instance ; proc univariate data=psydata.stroop; id Subject; var ReadTime; run; ** will report the most extreme values of ReadTime as ;
Turns out I just had to include the ids in lib.POST_201505
Pandas multilevel concat/group/chunking
I'm trying to groupby a large data set using chunking. What works: chunks = pd.read_stata('data.dta', chunksize = 50000, columns = ['year', 'race', 'app']) pieces = [chunk.groupby(['race'])['app'].agg(['sum']) for chunk in chunks] agg = pd.concat(pieces.groupby(level = 0).sum() What doesn't work (error: Categorical objects has no attribute flags) chunks = pd.read_stata('data.dta', chunksize = 50000, columns = ['year', 'race', 'app']) pieces = [chunk.groupby(['year', 'race'])['app'].agg(['sum']) for chunk in chunks] agg = pd.concat(pieces.groupby(['year', 'race']).sum() Thoughts on what i'm missing when adding in year? pieces: 2013 Asian 9325 Black 2655 AmInd 118 Hisp 6371 White 16825 Other 2446 Unknown 3502 Foreign 7280 Name: app, dtype: float64, year race 2013 Asian 8884 Black 2969 AmInd 72 Hisp 3760 White 18926 Other 1843 Unknown 3262 Foreign 8183 Name: app, dtype: float64, year race 2013 Asian 6429 Black 2176 AmInd 89 Hisp 3804 White 13903 Other 1752 Unknown 2760 Foreign 6825 2014 Asian 1522 Black 738 AmInd 23 Hisp 1133 White 4243 Other 437 Unknown 316 Foreign 1997 Name: app, dtype: float64, year race
Regex and file processing
This question relates to R but really isn't language specific per se. I have a bunch of csv files with this general format "sitename_03082015.csv". The files have 5 columns and various rows Host MaximumIn MaximumOut AverageIn AverageOut device1 30.63 Kbps 0 bps 24.60 Kbps 0 bps device2 1.13 Mbps 24.89 Kbps 21.76 Kbps 461 bps device5 698.44 Kbps 37.71 Kbps 17.49 Kbps 3.37 Kbps I ultimately want to read in all the files and merge which I can do but during the merge I want to read the site name and date and add it to each related line so the output looks like this Host MaximumIn MaximumOut AverageIn AverageOut Site Name Date device1 30.63 Kbps 0 bps 24.60 Kbps 0 bps SiteA 3/7/15 device12 1.13 Mbps 24.89 Kbps 21.76 Kbps 461 bps SiteA 3/8/15 device1 698.44 Kbps 37.71 Kbps 17.49 Kbps 3.37 Kbps SiteB 3/7/15 device2 39.08 Kbps 1.14 Mbps 10.88 Kbps 27.06 Kbps SiteB 3/8/15 device3 123.43 Kbps 176.86 Kbps 8.62 Kbps 3.78 Kbps SiteB 3/9/15 With my R code I can do the following: #Get list of file names filenames<- list.files(pattern = ".csv$") #This extracts everything up to the underscore to get site name siteName <- str_extract(string = filenames, "[^_]*") # Extract date from file names use date <- str_extract(string = filenames, "\\d{8}" ) With the below R code I can merge all the files but that will be without the added columns of site name and date that I want. myDF<-do.call("rbind", lapply(filenames, read.table, header=TRUE, sep=",")) I just can't get my head around how to do the extracts for site and date adding and populating the columns to create my ideal dataframe which is the second table above. The solution that best worked for me was posted below :)
The way that immediately comes to my mind is to do cbind while reading information with additional infor and do rbind afterwards. Something similar to this: myDF<-do.call("rbind", lapply(filenames, function(x) cbind(read.table(x, header=TRUE, sep=","), "Site Name" = str_extract(string = x, "[^_]*"), "Date" = as.Date(str_extract(string = x, "\\d{8}"), "%m%d%Y"))))
I have done something similar which can be applied here. You can add more fileNames separated by comma. Also Site can be extracted similarly. Let me know if you need more help . ##Assuming your csv files are saved in location C:/" library(stringr) ##List all filenames fileNames <- c("hist_03082015.csv","hist_03092015.csv") ##Create a empty dataframe to save all output to final_df <- NULL for (i in fileNames) { ##Read CSV df <- read.csv(paste("C:/",i,sep=""),header = TRUE, sep = ",",colClasses='character') ##Extract date from filename into a column df$Date <- gsub("\\D","",i) ##Convert string to date df$Date <-as.Date(paste(str_sub(df$Date, 1, 2), str_sub(df$Date, 3,-5), str_sub(df$Date, 5,-1),sep="-"),"%d-%m-%Y") ##save all data into 1 dataframe final_df <- rbind(final_df,df) print(summary(final_df)) }