I have a problem with saving a binary representation of a file to a file...
Let me show you my pain:
Everything starts with a file, file.pdf
Then the file is send via POST to a website with some additional data:
curl --data "sector=4&name=John&surname=Smith&email=john#smith.com&isocode=PL&theFile=$(cat file.pdf | base64)" http://localhost/awesomeUpload
then the data is received and decoded:
var decoded = BinaryDecode(data.theFile, "Base64");
then I attempt to save it by:
var theFilePath = ExpandPath("/localserver/temp/theFile.pdf");
fileWrite(theFilePath , data.theFile);
or:
var file_output_steam = CreateObject("java","java.io.FileOutputStream").init(theFilePath);
file_output_steam.write(data.theFile);
file_output_steam.close();
My files does not match ;(
the original one looks like
%PDF-1.5
%µµµµ
1 0 obj
<</Type/Catalog/Pages 2 0 R/Lang(pl-PL) /StructTreeRoot 13 0 R/MarkInfo<</Marked true>>>>
endobj
2 0 obj
<</Type/Pages/Count 1/Kids[ 3 0 R] >>
endobj
3 0 obj
<</Type/Page/Parent 2 0 R/Resources<</Font<</F1 5 0 R/F2 10 0 R>>/ProcSet[/PDF/Text/ImageB/ImageC/ImageI] >>/MediaBox[ 0 0 595.32 841.92] /Contents 4 0 R/Group<</Type/Group/S/Transparency/CS/DeviceRGB>>/Tabs/S/StructParents 0>>
where as the copy that went through ColdFusion looks like:
%PDF-1.5
%µµµµ
1 0 obj
<</Type/Catalog/Pages 2 0 R/Lang(pl-PL) /StructTreeRoot 13 0 R/MarkInfo<</Marked true>> B™[™ŘšBŚŘšBŹŐ\KÔYŮ\ËĐŰÝ[ťKŇÚYÖČČ—H€Đ¦VćFö& ĐŁ2ö& ĐŁĂÂőG—RővRő&VçB""ő&W6÷W&6W3ĂÂôföçCĂÂôcR"ôc"#ŕ˝AÉ˝ŤM•Ńl˝A˝Q•áĐ˝%µ…ť•˝%µ…ť•˝%µ…ť•%t€>/MediaBox[ 0 0 595.32 841.92] /Contents 4 0 R/Group<</Type/Group/S/Transparency/CS/DeviceRGB>>/Tabs/S/StructParents 0>B™[™ŘšBŤŘšBŹŃš[\‹Ń›]QXŰŮKÓ[™ÝMŚOŹBśÝ™X[CBž'cłB°Ś!8Ě1Ď]CsôŘQ&‰2 PäV˝ëËöĽ¨QŰge•ź
ďÂŃ,đť#"aKR•˘<1™[ä¸
(ÄňĄyoâ9S\Śĺ <ę8I±D¬‰#…Ć”ťLé‘ا÷ÍnU|WŸ‰t`ýuşąĽ\hlu&âĂ7ß
ů"Ĺ\Ŕ>pÇč÷÷.°ß’Ř——•‹ĚB™[™Ý™X[CB™[™ŘšBŤHŘšBŹŐ\Kћ۝ÔÝXť\KŐ\LĐ\ŮQ›ŰťĐPŃQJĐŘ[XśšKŃ[ŰŮ[™ËŇY[ť]KRŃ\ŘŮ[™[ť›ŰťČ
‹ŐŐ[šXŰŮHŚŹŹB™[™ŘšBŤŘšB–Č
Č—HB™[™ŘšBŤČŘšBŹĐ\ŮQ›ŰťĐPŃQJĐŘ[XśšKÔÝXť\KĐŇQ›Űť\L‹Ő\Kћ۝ĐŇQŃŇQX\ŇY[ť]KŃČLĐŇQŢ\Ý[R[™›Č‹Ń›Űť\ŘÜš\ÜH‹ŐČŚŕЦVćFö& ĐŁ‚ö& ĐŁĂÂô÷&FW&–ćr„–FVçF—G’’ő&Vv—7G'’„Fö&R’ő7WĆVÖVçBăŕЦVćFö& ĐŁ’ö& ĐŁĂÂőG—RôföçDFW67&—F÷"ôföçDć
please help
Related
[(['Piano'], 'Beethoven - opus22 4.mid'), (['Piano'], 'Borodin - ps7.mid'), (['Piano'], 'Chopin - op18.mid'), ([None, 'Guitar', 'StringInstrument', 'Acoustic Bass'], 'Cyndi Lauper - True Colors.mid'), (['Piano', 'Fretless Bass', 'StringInstrument', None], 'Frank Mills - Musicbox Dancer.mid'), (['Piano', 'Acoustic Bass', None, 'Baritone Saxophone'], 'George Benson - On Broadway.mid'), (['Piano'], 'Grieg - Voeglein.mid'), (['Piano'], 'Mozart - 333 3.mid'), ([None, 'Pan Flute', 'Piano', 'Piccolo', 'Violin'], 'The Corrs - Dreams.mid'), (['Piano', None, 'Fretless Bass'], 'ABBA - Money Money Money.mid')]
The above-given list is a list of songs with the given instruments used within those songs. I want to make a boolean panda dataframe given these songs with the nonetype instrument removed. The below-given image as an example:
Given dataframe
I tried to make a dataframe given every single instrument and merge these, however, this did not result in the given dataframe.
Try:
import pandas as pd
lst = [
(["Piano"], "Beethoven - opus22 4.mid"),
(["Piano"], "Borodin - ps7.mid"),
(["Piano"], "Chopin - op18.mid"),
(
[None, "Guitar", "StringInstrument", "Acoustic Bass"],
"Cyndi Lauper - True Colors.mid",
),
(
["Piano", "Fretless Bass", "StringInstrument", None],
"Frank Mills - Musicbox Dancer.mid",
),
(
["Piano", "Acoustic Bass", None, "Baritone Saxophone"],
"George Benson - On Broadway.mid",
),
(["Piano"], "Grieg - Voeglein.mid"),
(["Piano"], "Mozart - 333 3.mid"),
(
[None, "Pan Flute", "Piano", "Piccolo", "Violin"],
"The Corrs - Dreams.mid",
),
(["Piano", None, "Fretless Bass"], "ABBA - Money Money Money.mid"),
]
all_data = []
for instruments, title in lst:
d = {"title": title}
for i in instruments:
if not i is None:
d[i] = 1
all_data.append(d)
df = pd.DataFrame(all_data).fillna(0).set_index("title").astype(int)
df.index.name = None
print(df)
Prints:
Piano Guitar StringInstrument Acoustic Bass Fretless Bass Baritone Saxophone Pan Flute Piccolo Violin
Beethoven - opus22 4.mid 1 0 0 0 0 0 0 0 0
Borodin - ps7.mid 1 0 0 0 0 0 0 0 0
Chopin - op18.mid 1 0 0 0 0 0 0 0 0
Cyndi Lauper - True Colors.mid 0 1 1 1 0 0 0 0 0
Frank Mills - Musicbox Dancer.mid 1 0 1 0 1 0 0 0 0
George Benson - On Broadway.mid 1 0 0 1 0 1 0 0 0
Grieg - Voeglein.mid 1 0 0 0 0 0 0 0 0
Mozart - 333 3.mid 1 0 0 0 0 0 0 0 0
The Corrs - Dreams.mid 1 0 0 0 0 0 1 1 1
ABBA - Money Money Money.mid 1 0 0 0 1 0 0 0 0
So far this is my code:
from django.template import (Context, Template) # v1.11
from weasyprint import HTML # v0.42
import codecs
template = Template(codecs.open("/path/to/my/template.html", mode="r", encoding="utf-8").read())
context = Context({})
html = HTML(string=template.render(context))
pdf_file = html.write_pdf()
#with open("/path/to/my/file.pdf", "wb") as f:
# f.write(self.pdf_file)
Errorstack:
[17/Jan/2019 08:14:13] INFO [handle_correspondence:54] 'utf8' codec can't
decode byte 0xe2 in position 10: invalid continuation byte. You passed in
'%PDF-1.3\n%\xe2\xe3\xcf\xd3\n1 0 obj\n<</Author <> /Creator (cairo 1.14.6
(http://cairographics.org))\n /Keywords <> /Producer (WeasyPrint 0.42.3
\\(http://weasyprint.org/\\))>>\nendobj\n2 0 obj\n<</Pages 3 0 R /Type
/Catalog>>\nendobj\n3 0 obj\n<</Count 1 /Kids [4 0 R] /Type
/Pages>>\nendobj\n4 0 obj\n<</BleedBox [0 0 595 841] /Contents 5 0 R
/Group\n <</CS /DeviceRGB /I true /S /Transparency /Type /Group>>
MediaBox\n [0 0 595 841] /Parent 3 0 R /Resources 6 0 R /TrimBox [0 0 595
841]\n /Type /Page>>\nendobj\n5 0 obj\n<</Filter /FlateDecode /Length 15
0 R>>\nstream\nx\x9c+\xe4*T\xd0\x0fH,)I-\xcaSH.V\xd0/0U(N\xceS\xd0O4PH/\xe62P0P0\xb54U\xb001T(JUH\xe3\n\x04B\x00\x8bi\r\x89\nendstream\nendobj\n6 0
obj\n<</ExtGState <</a0 <</CA 1 /ca 1>>>> /Pattern <</p5 7 0
R>>>>\nendobj\n7 0 obj\n<</BBox [0 1123 794 2246] /Length 8 0 R /Matrix
[0.75 0 0 0.75 0 -843.5]\n /PaintType 1 /PatternType 1 /Resources
<</XObject <</x7 9 0 R>>>>\n /TilingType 1 /XStep 1588 /YStep
2246>>\nstream\n /x7 Do\n \n\nendstream\nendobj\n8 0 obj\n10\nendobj\n9 0
obj\n<</BBox [0 1123 794 2246] /Filter /FlateDecode /Length 10 0 R
/Resources\n 11 0 R /Subtype /Form /Type /XObject>>\nstream\nx\x9c+\xe4\nT(\xe42P0221S0\xb74\xd63\xb3\xb4T\xd05442\xd235R(JU\x08W\xc8\xe3*\xe42T0\x00B\x10\t\x942VH\xce\xe5\xd2O4PH/V\xd0\xaf04Tp\xc9\xe7\n\x04B\x00`\xf0\x10\x11\nendstream\nendobj\n10 0 obj\n77\nendobj\n11 0 obj\n<</ExtGState
<</a0 <</CA 1 /ca 1>>>> /XObject <</x11 12 0 R>>>>\nendobj\n12 0
obj\n<</BBox [0 1123 0 1123] /Filter /FlateDecode /Length 13 0 R
/Resources\n 14 0 R /Subtype /Form /Type /XObject>>\nstream\nx\x9c+\xe4\n
xe4\x02\x00\x02\x92\x00\xd7\nendstream\nendobj\n13 0 obj\n12\nendobj\n14 0
obj\n<<>>\nendobj\n15 0 obj\n58\nendobj\nxref\n0 16\n0000000000 65535
f\r\n0000000015 00000 n\r\n0000000168 00000 n\r\n0000000215 00000
n\r\n0000000270 00000 n\r\n0000000489 00000 n\r\n0000000620 00000
n\r\n0000000697 00000 n\r\n0000000923 00000 n\r\n0000000941 00000
n\r\n0000001165 00000 n\r\n0000001184 00000 n\r\n0000001264 00000
n\r\n0000001422 00000 n\r\n0000001441 00000 n\r\n0000001462 00000
n\r\ntrailer\n\n<</Info 1 0 R /Root 2 0 R /Size 16>>\nstartxref\n1481
n%%EOF\n' (<type 'str'>)
Actually it works via web request (returning the PDF as response) and via shell (manually writting the code). The code is tested and never gaves me problems. The files are saved with correct encoding, and setting the encoding kwarg in HTML doesn't help; also, the mode value of the template is correct, because I've seen other questions whose problem could be that.
However, I was adding a management command to use it periodically (for bigger PDFs I cannot do it via web request because the server's timeout could activate before finishing), and when I try to call it, I only get a UnicodeDecodeError saying 'utf8' codec can't decode byte 0xe2 in position 10: invalid continuation byte.
The PDF (at least from what I see) renders initially with this characters:
%PDF-1.3\n%\xe2\xe3\xcf\xd3\n1 0
which translates into this:
%PDF-1.3
%âãÏÓ
1 0 obj
So the problem is all about the character â. But it's a trap!
Instead, the problem is this line of code:
pdf_file = html.write_pdf()
Changing it to:
html.write_pdf()
Just works as expected!
So my question is: what type of reason could exists for Python to throw an UnicodeDecodeError when trying to assign a variable to a string? I've digged into weasyprint's code in my virtualenv, but I didn't see conversions out there.
So I don't know why, but now suddenly it works. I literally didn't modify anything: I just run the command again and it works.
I'm not marking the question as answered, as maybe in the future someone could have the same problem as me can try to post a correct one.
So disturbing.
EDIT
So it looks like I'm a very intelligent person who tries to set up the value of self.pdf_file, which is a models.FileField, to the content of the created PDF instead of the file itself.
I want to count areas of interest in my dataframe column 'which_AOI' (ranging from 0 -9). I would like to have a new column with the results added to a dataframe depending on a variable 'marker' (ranging from 0 - x) which tells me when one 'picture' is done and the next begins (one marker can go on for a variable length of rows). This is my code so far but it seems to be stuck and runs on without giving output. I tried reconstructing it from the beginning once but as soon as i get to 'if df.marker == num' it doesn't stop. What am I missing?
(example dataframe below)
## AOI count of spec. type function (in progress):
import numpy as np
import pandas as pd
path_i = "/Users/Desktop/Pilot/results/gazedata_filename.csv"
df = pd.read_csv(path_i, sep =",")
#create a new dataframe for AOIs:
d = {'marker': []}
df_aoi = pd.DataFrame(data=d)
### Creating an Aoi list
item = df.which_AOI
aoi = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] #list for search
aoi_array = [0, 0 , 0, 0, 0, 0, 0, 0, 0, 0] #list for filling
num = 0
for i in range (0, len (df.marker)): #loop through the dataframe
if df.marker == num: ## if marker = num its one picture
for index, item in enumerate(aoi): #look for item (being a number in which_AOI) in aoi list
if (item == aoi[index]):
aoi_array[index] += 1
print (aoi)
print (aoi_array)
se = pd.Series(aoi_array) # make list into a series to attach to dataframe
df_aoi['new_col'] = se.values #add list to dataframe
aoi_array.clear() #clears list before next picture
else:
num +=1
index pos_time pos_x pos_y pup_time pup_diameter marker which_AOI fixation Picname shock
1 16300 168.608779907227 -136.360855102539 16300 2.935715675354 0 7 18 5 save
2 16318 144.97673034668 -157.495513916016 16318 3.08838820457459 0 8 33 5 save
3 16351 152.92560577392598 -156.64172363281298 16351 3.0895299911499 0 7 17 5 save
4 16368 152.132453918457 -157.989685058594 16368 3.111008644104 0 7 18 5 save
5 16386 151.59835815429702 -157.55587768554702 16386 3.09514689445496 0 7 18 5 save
6 16404 150.88092803955098 -152.69479370117202 16404 3.10009074211121 1 7 37 5 save
7 16441 152.76554107666 -142.06188964843798 16441 3.0821495056152304 1 7 33 5 save
Not 100% clear based on your question but it sounds like you want to count the number of rows for each which_AOI value in each marker.
You can accomplish this using groupby
df_aoi = df.groupby(['marker','which_AOI']).size().unstack('which_AOI',fill_value=0)
In:
pos_time pos_x pos_y pup_time pup_diameter marker \
0 16300 168.608780 -136.360855 16300 2.935716 0
1 16318 144.976730 -157.495514 16318 3.088388 0
2 16351 152.925606 -156.641724 16351 3.089530 0
3 16368 152.132454 -157.989685 16368 3.111009 0
4 16386 151.598358 -157.555878 16386 3.095147 0
5 16404 150.880928 -152.694794 16404 3.100091 1
6 16441 152.765541 -142.061890 16441 3.082150 1
which_AOI fixation Picname shock
0 7 18 5 save
1 8 33 5 save
2 7 17 5 save
3 7 18 5 save
4 7 18 5 save
5 7 37 5 save
6 7 33 5 save
Out:
which_AOI 7 8
marker
0 4 1
1 2 0
I have multiple csv files with the same format (14 rows 4 columns).
I tried to load all of them into a single dataFrame, and use file's name to rename the values of the first column (1-14)
1 500 0 0
2 350 0 1
3 500 1 0
.............
13 600 0 0
14 800 0 0
I tried the following code but I am not getting what I am expecting:
filenames = os.listdir('Threshold/')
Y = pd.DataFrame () #empty df
# file name are in the following foramt "subx_ICA_thre.csv"
# need to get x (subject number to be used later for renaming columns values)
Sub_list=[]
for filename in filenames:
s= int(''.join(filter(str.isdigit, filename)))
Sub_list.append(int(s))
S_Sub_list= sorted(Sub_list)
for x in S_Sub_list: # get the file according to the subject number
temp = pd.read_csv('sub' +str(x)+'_ICA_thre.csv' )
df = pd.concat([Y, temp]) # concat the obtained frame with the empty frame
df.columns = ['id', 'data', 'isEB', 'isEM']
# replace the column values using subject id
for sub in range(1,15):
df['id'].replace(sub, 'sub' +str(x)+'_ICA_'+str(sub) ,inplace=True)
print (df)
output:
id data isEB isEM
0 sub1_ICA_2 200 0 0
1 sub1_ICA_3 275 0 0
2 sub1_ICA_4 500 1 0
................................
11 sub1_ICA_13 275 0 0
12 sub1_ICA_14 300 0 0
id data isEB isEM
0 sub2_ICA_2 275 0 0
1 sub2_ICA_3 500 0 0
2 sub2_ICA_4 400 0 0
.................................
11 sub2_ICA_13 300 0 0
12 sub2_ICA_14 450 0 0
First, it seems that the code makes different dataFrame not a single one.Second, the first row is removed (sub1_ICA_1 is missing, may be replaced with column names).
I couldn't find the problem in the loop that I am using
I think you need create list of DataFrames first, then concat with parameter keys for new values by range in MultiIndex, then modify column id and last remove MultiIndex by reset_index:
Also was added parameter names to read_csv for custom columns names.
Y = []
for x in S_Sub_list:
n = ['id', 'data', 'isEB', 'isEM']
temp = pd.read_csv('sub' + str(x) +'_ICA_thre.csv', names = n)
Y.append(temp)
#list comprehension alternative
#n = ['id', 'data', 'isEB', 'isEM']
#Y = [pd.read_csv('sub' + str(x) +'_ICA_thre.csv', names = n) for x in S_Sub_list]
df = pd.concat(Y, keys=range(1,len(S_Sub_list) + 1))
df['id'] = 'sub' + df.index.get_level_values(0).astype(str) +'_ICA_'+ df['id'].astype(str)
df = df.reset_index(drop=True)
I have a tab seperated CSV file
I use the following code fragment
data = tf.decode_csv(csv_row, record_defaults=listoflists,field_delim="\t")
but it arises the following error
tensorflow.python.framework.errors.InvalidArgumentError: Expect 5 fields but have 1 in record 0
but when i make the file into comma separated and space separated , it works correctly
1. Comma Sepeated
data = tf.decode_csv(csv_row, record_defaults=listoflists)
2.Space Separated
data = tf.decode_csv(csv_row, record_defaults=listoflists,field_delim=" ")
The full Code
from __future__ import print_function
import tensorflow as tf
def file_len(fname):
with open(fname) as f:
for i, l in enumerate(f):
pass
return i + 1
filename = "test.csv"
# setup text reader
file_length = file_len(filename)
filename_queue = tf.train.string_input_producer([filename])
reader = tf.TextLineReader(skip_header_lines=1)
_, csv_row = reader.read(filename_queue)
# setup CSV decoding
#setup text reader
listoflists = []
for i in range(0,5):
listoflists.append((list([0])))
data = tf.decode_csv(csv_row, record_defaults=listoflists,field_delim="\t")
# turn features back into a tensor
print("loading, " + str(file_length) + " line(s)\n")
with tf.Session() as sess:
tf.initialize_all_variables().run()
# start populating filename queue
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord)
for i in range(file_length):
# retrieve a single instance
example = sess.run(data)
print(example)
coord.request_stop()
coord.join(threads)
print("\ndone loading")
Sample Data
Tab Separated :
1 0 0 0 0
1 0 0 0 0
1 0 0 0 0
1 0 0 0 0
1 0 0 0 0
Comma Separated :
1,0,1,1,1
1,0,1,1,1
1,0,1,1,1
1,0,1,1,1
1,0,1,1,1
1,0,1,1,1
Space Separated :
1 0 0 0 0
1 0 0 0 0
1 0 0 0 0
1 0 0 0 0
1 0 0 0 0