django csv import encoding - django

I am using the Django csv import (https://pypi.python.org/pypi/django-csvimport) to populate some models. The problem is the csv files I have are encoded in ANSI (Windows-1252) format and they have words with special characters e.g. JOSÉ, when I import to my models the word become JOSи.
Could you help me with this?
P.S.:
1 - I have fulfilled the encoding field of the csv import with many options (ansi, utf-8...) but it seems to have no effect.
2 - I have tried to convert my csv files to many differents formats (using vb.net) like utf-8, utf-32, unicode... but all of them cause some error in Django csv import.

After some tries I found the solution:
While trying to convert my text file using vb.net I was opening it with OpenText(), which open the file with UTF8 encoding. So I opened it with something like "Using SR As StreamReader = New StreamReader(Fl.FullName, System.Text.Encoding.GetEncoding("Windows-1252"), True)", and wrote it with UTF8. This solved the problem.

Related

Problem reading file with pandas uploaded to a django view

I'm uploading some excel and csv files to a django view using axios, then i pass those files to a function that uses pandas read_csv and read_excel functions to process them, the first problem i had was with some excel files that had some non utf-8 characters that pandas was unable to read, the only solution i found was to set "engine = 'python'" when reading the file (changing the encoding to utf-8-sig or utf-16 didn't work).
This works when i'm testing the script from my terminal, but when i use the same script on a view i get the following error: ValueError("The 'python' engine cannot iterate through this file buffer.")
This is the code i'm using:
try:
data = pandas.read_csv(request.FILES['file'], engine="python")
except:
print("Oops!",sys.exc_info(),"occured.")
trying the same function through the terminal works fine
pandas.read_csv("file.csv", engine="python")

How to fix 'Imported file has a wrong encoding: 'charmap' codec can't decode byte 0x9d in position 21221: character maps to' error?

I'm trying to import a csv file to my django project. Until now, the previous times I did this i never had a problem. However, all of a sudden I keep getting this error that says "Imported file has a wrong encoding: 'charmap' codec can't decode byte 0x9d in position 21221: character maps to" when i try to import the csv file in.
I don't understand why I am getting the error
[![enter image description here][1]][1]
this is what i keep getting.
I am trying to import my excel file like this:
[![enter image description here][2]][2]
and this is what my csv file looks like:
[![enter image description here][3]][3]
\
The csv file contains invalid data for the encoding that you are attempting to interpret it with. Depending on how it was generated, you might be able to tell Python what is the correct decoding to apply when you open it
f = open(csv_file_name, encoding= ...)
or you might specify an appropriate encoding when you generate the csv file, or you might be processing dodgy data and have to resort to encoding="latin-1" -- which may result in putting bad data in to your database, if you don't validate what comes out of the csv file through a Django form before saving it.
I'd recommend always processing rows of csv data through Django forms or modelforms. It makes it very easy to catch errors (form is not valid, form.errors, etc. ) and print out useful error messages about what is wrong with which field (column) of the row.
Lots more here and here
Hex character 9d is not a printable character (https://www.codetable.net/hex/9d). In unicode it appears to be some kind of command. You will need to sanitise this character to be able to handle it in a csv file.
Edit: As #snakecharmerb points out in the comments, there are encodings where this is a valid character. However, I suspect from your question that you are not using one of these.
You can also look into decode to allow you to specify a charset for reading the data. If you have a charset that you think this is a valid char for, then perhaps your routine is picking up a different default charset from the system.
I actually do something like this to make sure I get Swedish chars properly set. This is directly from my code when extracting fields
output.decode('iso-8859-1').strip()

Telegram: Import numbers with format .txt to telegram

I have a list numbers (about 2000 contacts) with format .txt
Now want add numbers to my telegram, best idea for add import list to telegram
idea is possible in desktop or Android?
I don't think you can do it with dekstop. Either convert the txt file to csv and download "import csv" from Play Store which will simply add all the contacts from your file and then you can see them in your telegram list (If they have Telegram) OR convert them to vcf and open it from your File manager, it will automatically add them into your contacts list.
Also if you search in Play Store you may find an application to directly import numbers from txt file.

Stata Import File - Troubleshooting

I am having difficulty importing a CSV File into Stata. I have tried using the import delimited feature. Stata does not recognize the semi-colon separated data as separate data points. I also have access to a plain text file but I haven't had success with that either. Any suggestions?
I see that you've figured out a solution already, but did you specify "delimiters(";")" when you were using "import delimited"? Otherwise, Stata assumes the delimiters are commas.

Builder c++ Rave Reports encoding problem with cyrillic

When i try save rave project in pdf\html file, have incorrect encoding.
When choose format and press SAVE, it ussually save in iso-8859-1 code.
But i need cp1251 (cyrillic).
For example "Ïëîùàäü" instead of "Площадь".
I would guess that the best solution to your problem would be to use Unicode, rather than a codepage such as CP1251. Is it possible to use Unicode with Rave Reports?
I have the same problem when I want o save reprt to pdf format. I have to create TRvRenderPDF and set it as RenderObject but pdf file was viewed not correctrly.
The TRvRenderPDF component not unicode-compatible(that is very bad) that is why all text in report coverted into Ansi with active codepage(for cyrillic it is CP1251). Now we have pdf file with text in CP1251 encoding.
As default TRvRenderPDF generate pdf with font TYPE1 Helvetica(by the standart of the format pdf it is build-in). But text is interpreted with encoding ISO 8859-1(or CP1252) but it encoding was CP1251 tha is why we have "Ïëîùàäü" or something analogous.
What we can to do:
Get font TYPE1(CP1252) where service symbols(numbers like in cyrillic letters in CP1251) replaced with cyrillic glyphs. For example a link and we need to install it.
Now we need tor replace old font name(Helvetica) from generated pdf document with new font name(AGHelvetica). You can dow it with text editor or i your's programm(read file -> find -> replace -> save file).
That all situation.
P.S. Sorry for my english.
P.P.S. If set property of pdf render EmbedBaseFonts = true, pdf document were saved with TrueType fonts, but problem stay. Neew to see to unicode render? but not this one.