Python request download a file and save to a specific directory - django

Hello sorry if this question has been asked before.
But I have tried a lot of methods that provided.
Basically, I want to download the file from a website, which is I will show my coding below. The code works perfectly, but the problem is the file was auto download in our download folder path directory.
My concern is to download the file and save it to a specific folder.
I'm aware we can change our browser setting since this was a server that will remote by different users. So, it will automatically download to their temporarily /users/adam_01/download/ folder.
I want it to save in server disk which is, C://ExcelFile/
Below are my script and some of the data have been changing because it is confidential.
import pandas as pd
import html5lib
import time from bs4
import BeautifulSoup
import requests
import csv
from datetime
import datetime
import urllib.request
import os
with requests.Session() as c:
proxies = {"http": "http://:911"}
url = 'https://......./login.jsp'
USERNAME = 'mwirzonw'
PASSWORD = 'Fiqr123'
c.get(url,verify= False)
csrftoken = ''
login_data = dict(proxies,atl_token = csrftoken, os_username=USERNAME, os_password=PASSWORD, next='/')
c.post(url, data=login_data, headers={"referer" : "https://.....com"})
page = c.get('https://........s...../SearchRequest-96010.csv')
location = 'C:/Users/..../Downloads/'
with open('asdsad906010.csv', 'wb') as output:
output.write(page.content )
print("Done!")
Thank you, be pleased to ask if any confusing information was given.
Regards,
Fiqri

It seems that from your script you are writing the file to asdsad906010.csv. You should be able to change the output directory as follows.
# Set the output directory to your desired location
output_directory = 'C:/ExcelFile/'
# Create a file path by joining the directory name with the desired file name
file_path = os.path.join(output_directory, 'asdsad906010.csv')
# Write the file
with open(file_path, 'wb') as output:
output.write(page.content)

Related

Creating a dmarc parser using parsedmarc in python3 for use in AWS s3

I am very new to programming. I am working on a pipeline to analyze DMARC report files that are sent to my email account, that I am manually placing in an s3 bucket. The goal of this task is to download, extract, and analyze files using parsedmarc: https://github.com/domainaware/parsedmarc The part I'm having difficulty with is setting a conditional statement to extract .gz files if the target file is not a .zip file. I'm assuming the gzip library will be sufficient for this purpose. Here is the code I have so far. I'm using python3 and the boto3 library for AWS. Any help is appreciated!
import parsedmarc
import pprint
import json
import boto3
import zipfile
import gzip
pp = pprint.PrettyPrinter(indent=2)
def main():
#Set default session profile and region for sandbox account. Access keys are pulled from /.aws/config and /.aws/credentials.
#The 'profile_name' value comes from the header for the account in question in /.aws/config and /.aws/credentials
boto3.setup_default_session(region_name="aws-region-goes-here")
boto3.setup_default_session(profile_name="aws-account-profile-name-goes-here")
#Define the s3 resource, the bucket name, and the file to download. It's hardcoded for now...
s3_resource = boto3.resource(s3)
s3_resource.Bucket('dmarc-parsing').download_file('source-dmarc-report-filename.zip' '/home/user/dmarc/parseme.zip')
#Use the zipfile python library to extract the file into its raw state.
with zipfile.ZipFile('/home/user/dmarc/parseme.zip', 'r') as zip_ref:
zip_ref.extractall('/home/user/dmarc')
#Ingest all locations for xml file source
dmarc_report_directory = '/home/user/dmarc/'
dmarc_report_file = 'parseme.xml'
"""I need an if statement here for extracting .gz files if the file type is not .zip. The contents of every archive are .xml files"""
#Set report output variables using functions in parsedmarc. Variable set to equal the output
pd_report_output=parsedmarc.parse_aggregate_report_file(_input=f"{dmarc_report_directory}{dmarc_report_file}")
#use jsonify to make the output in json format
pd_report_jsonified = json.loads(json.dumps(pd_report_output))
dkim_status = pd_report_jsonified['records'][0]['policy_evaluated']['dkim']
spf_status = pd_report_jsonified['records'][0]['policy_evaluated']['spf']
if dkim_status == 'fail' or spf_status == 'fail':
print(f"{dmarc_report_file} reports failure. oh crap. report:")
else:
print(f"{dmarc_report_file} passes. great. report:")
pp.pprint(pd_report_jsonified['records'][0]['auth_results'])
if __name__ == "__main__":
main()
Here is the code using the parsedmarc.parse_aggregate_report_xml method I found. Hope this helps others in parsing these reports:
import parsedmarc
import pprint
import json
import boto3
import zipfile
import gzip
pp = pprint.PrettyPrinter(indent=2)
def main():
#Set default session profile and region for account. Access keys are pulled from ~/.aws/config and ~/.aws/credentials.
#The 'profile_name' value comes from the header for the account in question in ~/.aws/config and ~/.aws/credentials
boto3.setup_default_session(profile_name="aws_profile_name_goes_here", region_name="region_goes_here")
source_file = 'filename_in_s3_bucket.zip'
destination_directory = '/tmp/'
destination_file = 'compressed_report_file'
#Define the s3 resource, the bucket name, and the file to download. It's hardcoded for now...
s3_resource = boto3.resource('s3')
s3_resource.Bucket('bucket-name-for-dmarc-report-files').download_file(source_file, f"{destination_directory}{destination_file}")
#Extract xml
outputxml = parsedmarc.extract_xml(f"{destination_directory}{destination_file}")
#run parse dmarc analysis & convert output to json
pd_report_output = parsedmarc.parse_aggregate_report_xml(outputxml)
pd_report_jsonified = json.loads(json.dumps(pd_report_output))
#loop through results and find relevant status info and pass fail status
dmarc_report_status = ''
for record in pd_report_jsonified['records']:
if False in record['alignment'].values():
dmarc_report_status = 'Failed'
#************ add logic for interpreting results
#if fail, publish to sns
if dmarc_report_status == 'Failed':
message = "Your dmarc report failed a least one check. Review the log for details"
sns_resource = boto3.resource('sns')
sns_topic = sns_resource.Topic('arn:aws:sns:us-west-2:112896196555:TestDMARC')
sns_publish_response = sns_topic.publish(Message=message)
if __name__ == "__main__":
main()

How to download a snappy.parquet file from s3 using Boto in Python

I'm new to this, and trying to download a snappy.parquet file from Amazon s3 I can later convert to CSV file.
I tried working with the following example I've found online, and I get an empty folder. can anyone please help me?
import boto
import sys, os
from boto.s3.key import Key
from boto.exception import S3ResponseError
DOWNLOAD_LOCATION_PATH =""
BUCKET_NAME = ""
AWS_ACCESS_KEY_ID= ""
AWS_ACCESS_SECRET_KEY = ""
conn = boto.connect_s3(AWS_ACCESS_KEY_ID, AWS_ACCESS_SECRET_KEY)
bucket = conn.get_bucket(BUCKET_NAME)
#goto through the list of files
bucket_list = bucket.list()
for l in bucket_list:
key_string = str(l.key)
s3_path = DOWNLOAD_LOCATION_PATH + key_string
try:
print ("Current File is ", s3_path)
l.get_contents_to_filename(s3_path)
except (OSError, S3ResponseError) as e:
pass
# check if the file has been downloaded locally
if not os.path.exists(s3_path):
try:
os.makedirs(s3_path)
except OSError as exc:
# let guard againts race conditions
import errno
if exc.errno != errno.EEXIST:
raise
The script you are using appears to recursively download the contents of the specified S3 bucket (BUCKET_NAME) to the specified local directory (DOWNLOAD_LOCATION_PATH). FWIW, I notice this script looks like it comes from here.
The "Current File is ..." output line should show you the progress of these files being written. One problem you might be having is due to this line:
s3_path = DOWNLOAD_LOCATION_PATH + key_string
If you had specified DOWNLOAD_LOCATION_PATH at the top as a directory without a trailing '/' character, e.g. like this:
DOWNLOAD_LOCATION_PATH = '/tmp/my_dir'
then the files being downloaded would be written not underneath the /tmp/my_dir directory, but directly in /tmp/ with a my_dir prefix on each filename! You can fix this by changing this line to:
s3_path = os.path.join(DOWNLOAD_LOCATION_PATH, key_string)
Other than that, the script appears to work alright. You may want to add this line at the very top:
from __future__ import print_function
if you are still using Python 2.x, otherwise the print output will look a bit odd (print will think you are printing a 2-Tuple).
Your question also makes it sound like you really only want/need to download a single file from the bucket -- if so, this isn't really a great script to be using, since it's downloading everything.

Scrapy how to save a State between spider runs (via scrapinghub)?

I have a spider that will run on schedule. Spider input is based on Date. From date of last scrape to todays date. So the question is how to save the date of last scrape within the Scrapy project? There is an option to get data from scrapy settings using pkjutil module, but i did not find any reference in the docs on how to write data in that file. Any idea? Maybe an alternative?
P.S. My other option is to use some free remote MySql DB just for this. But looks like more work if simple solution is available.
import pkgutil
class CodeSpider(scrapy.Spider):
name = "code"
allowed_domains = ["google.com.au"]
def start_requests(self):
f = pkgutil.get_data("au_go", "res/state.json")
ids = json.loads(f)
id = ids[0]['state']
yield {'state':id}
ids[0]['state'] = 'New State'
with open('./au_go/res/state.json', 'w') as f:
json.dump(ids, f)
The above solution works fine when ran locally. But I am getting no such file or directory when running the code at Scrapinghub.
File "/tmp/unpacked-eggs/__main__.egg/au_go/spiders/test_state.py", line 33, in parse
with open(savePath, 'w') as f:
IOError: [Errno 2] No such file or directory: './au_go/res/state.json'
The problem is fixed with use of Scrapinghub Colections
And scrapinghub API. Works nice now.
Here is an example code in case somebody will find it usefull.
from scrapinghub import ScrapinghubClient
client = ScrapinghubClient(Your API KEY)
project = client.get_project(Your Project ID)
collections = project.collections
last_accessed = collections.get_store('last_accessed')
last_accessed.set({'_key': 'Date', 'value': '12-54-1235'})
print last_accessed.get('Date')['value']

Unable to access .xlsx file from folder created during execution of script [Python]

I want to access(print) a xlsx file from the folder which is created during execution of the script.
But, I am not sure how to print it. Can anyone help ??
Thanks :)
In main.py script, I asked user to input folder name :
output_folder = raw_input('Enter the output folder name: ')
Then in another script,
import main
import pandas as pd
import os
output_rm_folder = os.path.join(dir, 'ODEs', main.output_folder)
output_xls = pd.read_excel('ODEs', main.output_folder, 'name.xlsx') (I know this is not correct)
Now I want to access the xls file from this new generated folder.

Upload an mp3 files to soundcloud using Python (file name is random)

I'd like to upload an mp3 file from hotfolder without knowing the name of the file. (such as *.mp3)
here's what I tried (to upload specific file / known file name)
import soundcloud
# create client object with app and user credentials
client = soundcloud.Client(client_id='***',
client_secret='***',
username='***',
password='***')
# print authenticated user's username
print client.get('/me').username
mp3_file=('test.mp3')
# upload audio file
track = client.post('/tracks', track={
'title': 'Test Sound',
'asset_data': open(mp3_file, 'rb')
})
# print track link
print track.permalink_url
how can I make the script upload any mp3 file in that folder ? (script and files are located in the same folder)
From the language as written here, it's not precisely clear what you mean by "upload any mp3 file in that folder." Does uploading the first file in the folder satisfy your need, or does it need to be a different file each time the script executes? If the latter, my suggestion is to get a list of files and then randomly select one of them.
To get a list of all files in python,
from os import listdir
from os.path import isfile, join
onlyfiles = [ f for f in listdir(mypath) if isfile(join(mypath,f)) ]
and then to randomly select one of them:
import random
print(random.choice(onlyfiles))
Hope this helps