django - docfile adding %0D to url? - django

I have a weird issue where the docfile.url of a file on our server is adding a %0D (carriage return) to the end of the url. This is only happening to files that I mannually linked. What I mean is, there were about 1,000 files in a directory and I created a CSV file which had the id and filename of each file, and added them to the mysql database with some code. All files uploaded normally through my django app's interface link normally - clicking their link opens the file properly.
Here's a sample of the CSV file:
792,asbuilts/C0010.pdf
793,asbuilts/C0011.pdf
794,asbuilts/C0012.pdf
795,asbuilts/C0013.pdf
796,asbuilts/C0014.pdf
797,asbuilts/C0015.pdf
798,asbuilts/C0016.pdf
799,asbuilts/C0017.pdf
I have all these asbuilt files in the directory static_media/asbuilts/. In mysql I ran this command:
load data local infile '/srv/www/cpm/CPM_CSV_Files/comm_asbuilts.csv' into table systems_asbuilt fields terminated by ',' lines terminated by '\n' (id, docFile);
A sample output of select * from systems_asbuilt is like this:
|846 | asbuilts/C0057.pdf
|847 | asbuilts/C0059.pdf
|848 | asbuilts/C0060.pdf
|849 | asbuilts/C0061.pdf
|850 | asbuilts/C0062.pdf
|851 | asbuilts/C0063.pdf
|852 | asbuilts/C0064.pdf
Everything looks good right?
But when I look at the link created it looks like this:
`www.ourdomain.com/static_media/asbuilts/R0546.pdf%0D'
If I manually delete the %0D from the link, the file opens as expected. Any idea why there's the extra %0D on there? Where is it coming from?
Thanks

My guess is, that this:
lines terminated by '\n'
Should be
lines terminated by '\r\n'
Your result "looks" right because of the client you are using to browse it, but when the record is retrieved, it still has the \r appended to it.
So you can strip it before you load it in the database, or .strip() before you generate the link.

Related

Copying txt file to Redshift

I am trying to copy the text file from S3 to Redshift using the below command but getting the same error.
Error:
Missing newline: Unexpected character 0xffffffe2 found at location 177
copy table from 's3://abc_def/txt_006'
credentials '1234567890'
DELIMITER '|'
NULL AS 'NULL'
NULL AS '' ;
The text file has No header and field delimiter is |.
I tried passing the parameters using: ACCEPTINVCHARS.
Redshift shows same error
1216 error code: invalid input line.
Can anyone provide how to resolve this issue?
Thanks in advance.
Is your file in UTF8 format? if not convert it and try reloading.
I am Assuming path to the text file is correct. Also you generated the text file with some tool and uploaded to redshift manually
I faced the same issue and the issue is with whitespaces .I recommend you to generate the text file by nulling and trimming the whitespaces .
your query should be select RTRIM(LTRIM(NULLIF({columnname}, ''))),.., from {table}. generate the output of this query into text file.
If you are using SQl Server, query out the table using BCP.exe by passing the above query with all the columns and functions
Then use the below copy command after uploading the txt file in S3
copy {table}
from 's3://{path}.txt'
access_key_id '{value}'
secret_access_key '{value}' { you can alternatively use credentials as mentioned above }
delimiter '|' COMPUPDATE ON
removequotes
acceptinvchars
emptyasnull
trimblanks
BLANKSASNULL
FILLRECORD
;
commit;
This solved my problem. Please let us know if you are facing anything else.

Django: No such file or directory

I have a process that scans a tape library and looks for media that has expired, so they can be removed and reused before sending the tapes to an offsite vault. (We have some 7 day policies that never make it offsite.) This process takes around 20 minutes to run, so I didn't want it to run on-demand when loading/refreshing the page. Rather, I set up a django-cron job (I know I could have done this in Linux cron, but wanted the project to be as self-contained as possible) to run the scan, and creates a file in /tmp. I've verified that this works -- the file exists in /tmp from this morning's execution. The problem I'm having is that now I want to display a list of those expired (scratch) media on my web page, but the script is saying that it can't find the file. When the file was created, I use the absolute filename "/tmp/scratch.2015-11-13.out" (for example), but here's the error I get in the browser:
IOError at /
[Errno 2] No such file or directory: '/tmp/corpscratch.2015-11-13.out'
My assumption is that this is a "web root" issue, but I just can't figure it out. I tried copying the file to the /static/ and /media/ directories configured in django, and even in the django root directory, and the project root directory, but nothing seems to work. When it says it cant' find /tmp/file, where is it really looking?
def sample():
""" Just testing """
today = datetime.date.today() #format 2015-11-31
inputfile = "/tmp/corpscratch.%s.out" % str(today)
with open(inputfile) as fh: # This is the line reporting the error
lines = [line.strip('\n') for line in fh]
print(lines)
The print statement was used for testing in the shell (which works, I might add), but the browser gives an error.
And the file does exist:
$ ls /tmp/corpscratch.2015-11-13.out
/tmp/corpscratch.2015-11-13.out
Thanks.
Edit: was mistaken, doesn't work in python shell either. Was thinking of a previous issue.
Use this instead:
today = datetime.datetime.today().date()
inputfile = "/tmp/corpscratch.%s.out" % str(today)
Or:
today = datetime.datetime.today().strftime('%Y-%m-%d')
inputfile = "/tmp/corpscratch.%s.out" % today # No need to use str()
See the difference:
>>> str(datetime.datetime.today().date())
'2015-11-13'
>>> str(datetime.datetime.today())
'2015-11-13 15:56:19.578569'
I ended up finding this elsewhere:
today = datetime.date.today() #format 2015-11-31
inputfilename = "tmp/corpscratch.%s.out" % str(today)
inputfile = os.path.join(settings.PROJECT_ROOT, inputfilename)
With settings.py containing the following:
PROJECT_ROOT = os.path.abspath(os.path.dirname(__file__))
Completely resolved my issues.

iTunes exported XML playlist refers to files in the wrong music structure

This is a simplified version of a question I asked at the end of last year but could not get to the bottom of it. I hope that somebody can help me with this explanation.
I exported my iTunes playlist as an XML file (songs.xml) onto an external drive. Each song exported appears to have its metadata stored under a node in the XML file. A fragment containing 2 Adele songs is below.
After exporting the playlist, I copied the music files to the /Music folder on the external drive. The issue is that ALL files are now directly in this folder and not within the subfolders. The songs.xml file references each song as being in a subfolder of /Music e.g. /Music/Adele/21 - but that is no longer the case - all files are in /Music. Thus when I attempt to import the songs back in they cannot be found.
Can somebody tell me how I can parse songs.xml and replace the /Music/Artist/Album references with just /Music ? Then I could successfully re-import my tunes with their metadata as described in the file! An added difficulty is that some songs are referenced just under the Music/Artist, and not Music/Artist/Album. e.g. the Artist could be 'Various' or a compilation.
I can get access to a Mac or Linux terminal to run SED or a RegEx or any other command that you can advise. If you can help I'd be very grateful.
Thanks a lot in advance.
Ben
<dict>
<key>Track ID</key><integer>22041</integer>
<key>Name</key><string>Rolling in the Deep</string>
<key>Artist</key><string>Adele</string>
<key>Album Artist</key><string>Adele</string>
<key>Album</key><string>21</string>
<key>Persistent ID</key><string>B123AA625019E726</string>
<key>Track Type</key><string>File</string>
<key>Purchased</key><true/>
<key>Location</key><string>file://localhost/Volumes/External%20Hard%20Drive/Music/Adele/21/RollingInTheDeep.m4a</string>
<key>File Folder Count</key><integer>1</integer>
<key>Library Folder Count</key><integer>1</integer>
</dict>
<dict>
<key>Track ID</key><integer>22042</integer>
<key>Name</key><string>Someone Like You</string>
<key>Artist</key><string>Adele</string>
<key>Album Artist</key><string>Adele</string>
<key>Album</key><string>Someone Like You</string>
<key>Persistent ID</key><string>A274ED723536E610</string>
<key>Track Type</key><string>File</string>
<key>Purchased</key><true/>
<key>Location</key><string>file://localhost/Volumes/External%20Hard%20Drive/Music/Adele/SomeoneLikeYou.mp3</string>
<key>File Folder Count</key><integer>1</integer>
<key>Library Folder Count</key><integer>1</integer>
</dict>
This awk may be what you need.
awk '/Drive\/Music/ {sub(/\/string/,":string");sub(/Music.*\//,"Music/");sub(/:string/,"/string")}1' file
It will change this type of lines:
<key>Location</key><string>file://localhost/Volumes/External%20Hard%20Drive/Music/Adele/21/RollingInTheDeep.m4a</string>
<key>Location</key><string>file://localhost/Volumes/External%20Hard%20Drive/Music/Adele/SomeoneLikeYou.mp3</string>
to
<key>Location</key><string>file://localhost/Volumes/External%20Hard%20Drive/Music/RollingInTheDeep.m4a</string>
<key>Location</key><string>file://localhost/Volumes/External%20Hard%20Drive/Music/SomeoneLikeYou.mp3</string>
How does this work:
awk '
/Drive\/Music/ { # Serch for all lines with Drive/Music lines
sub(/\/string/,":string") # Replace last / to prevent problem with greedy regex in next step
sub(/Music.*\//,"Music/") # Replace from Music to last / with only Music/ (using .* greedy)
sub(/:string/,"/string") # Replace last / back to its original
}
1 # Print all lines, changed and not changed
' file # input file
sed '/Location/ s|\(Drive/Music\)[^<]*\(/[^/<]*<\)|\1\2|' YourFile
use of | instead of traditional / for easier reading and treatment of / in path
you could also use variable instead of Drive/Music to adapt easily to another place (and use double quote around the sed action in this case

Weka 3-7 CSVLoader do not work with ";" (semicolon) as field separator

I think that i found a bug in weka 3.7,
When I try to load a csv file using weka.core.converters.CSVLoader with separator ";", I get the following error:
Exception in thread "main" java.io.IOException: number expected, read Token[1;2], line 1
at weka.core.converters.ArffLoader$ArffReader.errorMessage(ArffLoader.java:294)
at weka.core.converters.ArffLoader$ArffReader.getInstanceFull(ArffLoader.java:656)
at weka.core.converters.ArffLoader$ArffReader.getInstance(ArffLoader.java:477)
at weka.core.converters.ArffLoader$ArffReader.readInstance(ArffLoader.java:445)
at weka.core.converters.ArffLoader$ArffReader.readInstance(ArffLoader.java:430)
at weka.core.converters.ArffLoader$ArffReader.(ArffLoader.java:202)
at weka.core.converters.CSVLoader.getDataSet(CSVLoader.java:803)
at de.tuhh.thesis.repower.pcanalysis.BinningWindSpeed.from_CSV_to_ARFF(BinningWindSpeed.java:99)
at de.tuhh.thesis.repower.pcanalysis.Main.main(Main.java:49)
My csv file is:
a;b
1;2
my code is:
CSVLoader loader = new CSVLoader();
File inputFile = new File(csvFileName);
loader.setSource(inputFile);
loader.setFieldSeparator(";");
data = loader.getDataSet();
if I try the same code but changing ";" for "," and using the following file, the program succeeds
a,b
1,2
I really need to work with ";"
Thanks and regards
There is (at least by now) an option to set the field separator:
CSVLoader loader = new CSVLoader();
loader.setFieldSeparator(";");
Just in case someone else stumbles upon this question..

Getting "new-line character seen in unquoted field" when parsing csv document using django-storages

I am trying to parse csv files that have been uploaded to Amazon S3 using django-storages. I keep getting a "Error: new-line character seen in unquoted field - do you need to open the file in universal-newline mode?". The normal work around for this is to open the file with "rU", but that does not seem to work with django storages. If I drop the file directly on the server and open from there it works, I just want to avoid storing the files directly on the server if possible. Here is the code I am using:
import csv
from django.core.files.storage import default_storage as s3_storage
n = 'csvdumps/130331548894.csv'
csvf = s3_storage.open(n, "rU")
csvReader = csv.reader(csvf)
for item in csvReader:
print item
I can see that this is a django-storage reported bug here http://jgrid.org/david/django-storages/issue/80/trying-to-parse-csv-file-from-django but perhaps you can try this:-
csvf = s3_storage.open(n.splitlines(), "rU")
Would also be great if you could share a link to access some of your S3 (sample) csv files though so I can open them to check the line endings.