Why the csv file in S3 is empty after loading from Lambda - amazon-web-services

import os
import csv
import boto3
client = boto3.client('s3')
fields = ['dt','dh','key','value']
row = [dt,dh,key,value]
print(row)
# name of csv file
filename = "/tmp/sns_file.csv"
# writing to csv file
with open(filename, 'a',newline='') as csvfile:
# creating a csv writer object
csvwriter = csv.writer(csvfile)
# writing the fields
csvwriter.writerow(fields)
# writing the data row
csvwriter.writerow(row)
final_file_name="final_report_"+dt+".csv"
client.upload_file('/tmp/sns_file.csv',BUCKET_NAME,final_file_name)
if os.path.exists('/tmp/sns_file.csv'):
os.remove('/tmp/sns_file.csv')
else:
print("The file does not exist")

Python's with block is a context manager, which means, in simple terms, it will "clean up" after all operations within it are done.
In context of files "clean up" means closing file. Any changes you write to the file will not be saved on disk until you close the file. So you need to move upload operation outside and after the with block.

Related

Read uploaded fasta file in django using Bio library

in index.html I used
<input type="file" name="upload_file">
in views.py
from Bio import SeqIO
def index(request):
if request.method == "POST":
try:
text_file = request.FILES['upload_file']
list_1, list_2 = sequence_extract_fasta(text_file)
context = {'files': text_file}
return render(request, 'new.html', context)
except:
text_file = ''
context = {'files': text_file}
return render(request, 'index.html')
def sequence_extract_fasta(fasta_files):
# Defining empty list for the Fasta id and fasta sequence variables
fasta_id = []
fasta_seq = []
# opening a given fasta file using the file path
with open(fasta_files, 'r') as fasta_file:
print("pass")
# extracting multiple data in single fasta file using biopython
for record in SeqIO.parse(fasta_file, 'fasta'): # (file handle, file format)
print(record.seq)
# appending extracted fasta data to empty lists variables
fasta_seq.append(record.seq)
fasta_id.append(record.id)
# returning fasta_id and fasta sequence to both call_compare_fasta and call_reference_fasta
return fasta_id, fasta_seq
The method sequence_extract_fasta(fasta_files) work with python. But not on the Django framework. If I can find the temporary location of the uploaded file then using the path, I may be able to call the method. Is there any efficient way to solve this? your help is highly appreciated. Thank you for your time.
I found a one way of doing this.
def sequence_extract_fasta(fasta_file):
# Defining empty list for the Fasta id and fasta sequence variables
fasta_id = []
fasta_seq = []
# fasta_file = fasta_file.chunks()
print(fasta_file)
# opening given fasta file using the file path
# crating a backup file with original uploaded file data
with open('data/temp/name.bak', 'wb+') as destination:
for chunk in fasta_file.chunks():
destination.write(chunk)
# opening created backup file and reading
with open('data/temp/name.bak', 'r') as fasta_file:
# extracting multiple data in single fasta file using biopython
for record in SeqIO.parse(fasta_file, 'fasta'): # (file handle, file format)
fasta_seq.append(record.seq)
fasta_id.append(record.id)
# returning fasta_id and fasta sequence to both call_compare_fasta and call_reference_fasta
return fasta_id, fasta_seq

Django send excel file to Celery Task. Error InMemoryUploadedFile

I have background process - read excel file and save data from this file. I need to do read file in the background process. But i have error InMemoryUploadedFile.
My code
def create(self, validated_data):
company = ''
file_type = ''
email = ''
file = validated_data['file']
import_data.delay(file=file,
company=company,
file_type=file_type,
email=email)
my method looks like
#app.task
def import_data(
file,
company,
file_type,
email):
// some code
But i have error InMemoryUploadedFile.
How i can to send a file to cellery without errors?
When you delay a task, Celery will try to serialize the parameters which in your case a file is included.
Files and especially files in memory can't be serialized.
So to fix the problem you have to save the file and pass the file path to your delayed function and then read the file there and do your calculations.
Celery does not know how to serialize complex objects such as file objects. However, this can be solved pretty easily. What I do is to encode/decode the file to its Base64 string representation. This allows me to send the file directly through Celery.
The following example shows how (I intendedly placed each conversion separatedly, though this could be arranged in a more pythonic way):
import base64
import tempfile
# (Django, HTTP server)
file = request.FILES['files'].file
file_bytes = file.read()
file_bytes_base64 = base64.b64encode(file_bytes)
file_bytes_base64_str = file_bytes_base64.decode('utf-8') # this is a str
# (...send string through Celery...)
# (Celery worker task)
file_bytes_base64 = file_bytes_base64_str.encode('utf-8')
file_bytes = base64.b64decode(file_bytes_base64)
# Write the file to a temporary location, deletion is guaranteed
with tempfile.TemporaryDirectory() as tmp_dir:
tmp_file = os.path.join(tmp_dir, 'something.zip')
with open(tmp_file, 'wb') as f:
f.write(file_bytes)
# Process the file
This can be inefficient for large files but it becomes pretty handy for small/medium sized temporary files.

how to load csv file data into pandas using request.FILES(django 1.11) without saving file on server

i just want to upload .csv file via form, directly in to pandas dataframe in django without saving physically file on to server.
def post(self, request, format=None):
try:
from io import StringIO, BytesIO
import io
print("data===",request.FILES['file'].read().decode("utf-8"))
# print("file upload FILES data=====",pd.read_csv(request.FILES['file'].read(), sep=','))
#print(request.FILES)
print("file upload data df=====11")
mm = pd.read_csv( BytesIO(request.FILES['file'].read().decode("utf-8")))
print("dataframe data=====",mm)
# import io, csv
# urlData = request.FILES['file']
# data = [row for row in (csv.reader(urlData))]
# print("file upload data df=====222",data)
# mm = pd.read_csv()
#excel_file = request.FILES['file']
# movies = pd.read_excel(request.FILES['file'])
except Exception as e:
print(e)
log.debug("Error in CheckThreadStatus api key required "+str(e))
return Response(responsejson('api key required', status=404))
the ans is straight forward: that is
pd.read_csv(request.FILES['file'])
works perfectly fine, the mistake i was doing is that.. my csv file was not in correct format.
Check With
pd.read_csv('data.csv') # doctest: +SKIP
If using post method you can try
getFile = request.FILE['file_name']
pd.read_csv(getFile) # doctest: +SKIP
You can use StringIO for reading and decoding your csv :
import csv
from io import StringIO
csv_file = request.FILES["csv_file"]
content = StringIO(csv_file.read().decode('utf-8'))
reader = csv.reader(content)
After reading you can populate your database like this :
csv_rows = [row for row in reader]
field_names = csv_rows[0] # Get the header row
del csv_rows[0] # Deleting header after storing it's values in field_names
for index, row in enumerate(csv_rows):
data_dict = dict(zip(field_names, row))
Model.objects.update_or_create(id=row[0],
defaults=data_dict
)
Make sure to validate data before inserting, if the data is critical.
HINT: use django forms to validate for you.
from django import forms

Django: upload csv file and stays in memory

I'm trying to build a web app using Django where the user will upload some csv file, possibly a big one. Then the code will clean the file for bad data and then the user can use it to make queries with clean data.
Now I believe whenever a user will make a query, the whole code will run again which means it will start cleaning again and so on.
Question:
Is there any way that once the csv data in clean, it stays in memory and user can make queries to that clean data?
import pandas as pd
def converter(num):
try:
return float(num)
except ValueError:
try:
num = num.replace("-", '0.0').replace(',', '')
return float(num)
except ValueError:
return np.nan
def get_clean_data(request):
# Read the data from csv file:
df = pd.read_csv("data.csv")
# Clean the data and send JSON response
df['month'] = df['month'].str.split("-", expand=True)[1]
df[df.columns[8:]] = df[df.columns[8:]].astype(str).applymap(converter)
selected_year = df[df["Departure Datetime: Year (YYYY)"] == 2015]
data_for_user = (selected_year.groupby(
by="route").sum().sort_values(by="revenue").to_json()
return JsonResponse(data_for_user, safe=False)
One way to achieve this could be to cache the dataframe in memory after it has been cleaned. Subsequent requests could then use the cleaned version from the cache.
from django.core.cache import cache
def get_clean_data(request):
# Check the cache for cleaned data
df = cache.get('cleaned_data')
if df is None:
# Read the data from csv file:
df = pd.read_csv("data.csv")
# Clean the data
df['month'] = df['month'].str.split("-", expand=True)[1]
df[df.columns[8:]] = df[df.columns[8:]].astype(str).applymap(converter)
# Put it in the cache
cache.set('cleaned_data', df, timeout=600)
selected_year = df[df["Departure Datetime: Year (YYYY)"] == 2015]
data_for_user = (selected_year.groupby(
by="route").sum().sort_values(by="revenue").to_json()
return JsonResponse(data_for_user, safe=False)
You'd need to be a little bit careful, because if the csv file is very large it may consume a large amount of memory when cached.
Django supports a number of different cache backends, from simple local memory caching, to more complex memcached caching.

django RequestFactory file upload

I try to create a request, using RequestFactory and post with file, but I don't get request.FILES.
from django.test.client import RequestFactory
from django.core.files import temp as tempfile
tdir = tempfile.gettempdir()
file = tempfile.NamedTemporaryFile(suffix=".file", dir=tdir)
file.write(b'a' * (2 ** 24))
file.seek(0)
post_data = {'file': file}
request = self.factory.post('/', post_data)
print request.FILES # get an empty request.FILES : <MultiValueDict: {}>
How can I get request.FILES with my file ?
If you open the file first and then assign request.FILES to the open file object you can access your file.
request = self.factory.post('/')
with open(file, 'r') as f:
request.FILES['file'] = f
request.FILES['file'].read()
Now you can access request.FILES like you normally would. Remember that when you leave the open block request.FILES will be a closed file object.
I made a few tweaks to #Einstein 's answer to get it to work for a test that saves the uploaded file in S3:
request = request_factory.post('/')
with open('my_absolute_file_path', 'rb') as f:
request.FILES['my_file_upload_form_field'] = f
request.FILES['my_file_upload_form_field'].read()
f.seek(0)
...
Without opening the file as 'rb' I was getting some unusual encoding errors with the file data
Without f.seek(0) the file that I uploaded to S3 was zero bytes
You need to provide proper content type, proper file object before updating your FILES.
from django.core.files.uploadedfile import File
# Let django know we are uploading files by stating content type
content_type = "multipart/form-data; boundary=------------------------1493314174182091246926147632"
request = self.factory.post('/', content_type=content_type)
# Create file object that contain both `size` and `name` attributes
my_file = File(open("/path/to/file", "rb"))
# Update FILES dictionary to include our new file
request.FILES.update({"field_name": my_file})
the boundary=------------------------1493314174182091246926147632 is part of the multipart form type. I copied it from a POST request done by my webbrowser.
All the previous answers didn't work for me. This seems to be an alternative solution:
from django.core.files.uploadedfile import SimpleUploadedFile
with open(file, "rb") as f:
file_upload = SimpleUploadedFile("file", f.read(), content_type="text/html")
data = {
"file" : file_upload
}
request = request_factory.post("/api/whatever", data=data, format='multipart')
Be sure that 'file' is really the name of your file input field in your form.
I got that error when it was not (use name, not id_name)