How to cronly fill ajax form in django app? - django

I have following problem:
I need my app to
1. Go to https://airqualityegg.wickeddevice.com/download and fill out the form
(I found Django-cron package. Is it good idea to use it?)
2. Wait, while after js effects "Download file" button will appear
3. Download .zip file by clicking on that button
4. Extract from this .zip archive .csv file and work with it.
every 1 hour
What I gonna do?

I've used selenium webdriver:
def get_zipfile_file_link(self):
display = Display(visible=0, size=(1024, 768))
display.start()
driver = webdriver.Firefox()
driver.get("https://airqualityegg.wickeddevice.com/download")
driver.find_element_by_id("serial_numbers").send_keys(self.egg_id)
driver.find_element_by_id("start_date").send_keys(timezone.now() - timedelta(hours=5))
driver.find_element_by_id("zipfilename").send_keys("q")
driver.find_element_by_id("download_submit").click()
WebDriverWait(driver, 150).until(lambda d: d.find_element_by_xpath('/html/body/table/tbody/tr[9]/td[2]/a'))
url = driver.find_element_by_xpath('/html/body/table/tbody/tr[9]/td[2]/a').get_attribute('href')
driver.close()
display.stop()
return url

Related

Django PIPE youtube-dl to view for download

TL;DR: I want to pipe the output of youtube-dl to the user's browser on a button click, without having to save the video on my server's disk.
So I'm trying to have a "download" button on a page (django backend) where the user is able to download the video they're watching.
I am using the latest version of youtube-dl.
In my download view I have this piece of code:
with youtube_dl.YoutubeDL(ydl_opts) as ydl:
file = ydl.download([f"https://clips.twitch.tv/{pk}"])
And it works, to some extend. It does download the file to my machine, but I am not sure how to allow users to download the file.
I thought of a few ways to achieve this, but the only one that really works for me would be a way to pipe the download to user(client) without needing to store any video on my disk. I found this issue on the same matter, but I am not sure how to make it work. I successfully piped the download to stdout using ydl_opts = {'outtmpl': '-'}, but I'm not sure how to pipe that to my view's response. One of the responses from a maintainer mentions a subprocess.Popen, I looked it up but couldn't make out how it should be implemented in my case.
I did a workaround.
I download the file with a specific name, I return the view with HttpResponse, with force-download content-type, and then delete the file using python.
It's not what I originally had in mind, but it's the second best solution that I could come up with. I will select this answer as accepted solution until a Python wizard gives a solution to the original question.
The code that I have right now:
def download_clip(request, pk):
ydl_opts = {
'outtmpl': f"{pk}.mp4"
}
with youtube_dl.YoutubeDL(ydl_opts) as ydl:
ydl.download([f"https://clips.twitch.tv/{pk}"])
path = f"{pk}.mp4"
file_path = os.path.join(path)
if os.path.exists(file_path):
with open(file_path, 'rb') as fh:
response = HttpResponse(fh.read(), content_type="application/force-download")
response['Content-Disposition'] = 'inline; filename=' + os.path.basename(file_path)
os.remove(file_path)
return response
raise Http404

Flask - Generated PDF can be viewed but cannot be downloaded

I recently started learning flask and created a simple webapp which randomly generates kids' math work sheets in PDF based on user input.
The PDF opens automatically in a browser and can be viewed. But when I try downloading it both on a PC and in Chrome iOS, I get error messages (Chrome PC: Failed - Network error / Chrome iOS:the file could not be downloaded at this time).
You can try it out here: kidsmathsheets.com
I suspect it has something to do with the way I'm generating and returning the PDF file. FYI I'm using ReportLab to generate the PDF. My code below (hosted on pythonanywhere):
from reportlab.lib.pagesizes import A4, letter
from reportlab.pdfgen import canvas
from reportlab.platypus import Table
from flask import Flask, render_template, request, Response
import io
from werkzeug import FileWrapper
# Other code to take in input and generate data
filename=io.BytesIO()
if letter_size:
c = canvas.Canvas(filename, pagesize=letter)
else:
c = canvas.Canvas(filename, pagesize=A4)
pdf_all(c, p_set, answer=answers, letter=letter_size)
c.save()
filename.seek(0)
wrapped_file = FileWrapper(filename)
return Response(wrapped_file, mimetype="application/pdf", direct_passthrough=True)
else:
return render_template('index.html')
Any idea what's causing the issue? Help is much appreciated!
Please check whether you are using an ajax POST request for invoking the endpoint to generate your data and display the PDF respectively. If this is the case - quite probably this causes the behaviour our observe. You might want to try invoking the endpoint with a GET request to /my-endpoint/some-hashed-non-reusable-id-of-my-document where some-hashed-non-reusable-id-of-my-documentwill tell the endpoint which document to serve without allowing users to play around with guesstimates about what other documents you might have. You might try it first like:
#app.route('/display-document/<document_id>'):
def display_document(document_id):
document = get_my_document_from_wherever_it_is(document_id)
binary = get_binary_data_from_document(document)
.........
Prepare response here
.......
return send_file(binary, mimetype="application/pdf")
Kind note: a right click and 'print to pdf' will work but this is not the solution we want

get list of files in a sharepoint directory using python

I have a url for sharepoint directory(intranet) and need an api to return list of files in that directory given the url. how can I do that using python?
Posting in case anyone else comes across this issue of getting files from a SharePoint folder from just the folder path.
This link really helped me do this: https://github.com/vgrem/Office365-REST-Python-Client/issues/98. I found so much info about doing this for HTTP but not in Python so hopefully someone else needs more Python reference.
I am assuming you are all setup with client_id and client_secret with the Sharepoint API. If not you can use this for reference: https://learn.microsoft.com/en-us/sharepoint/dev/solution-guidance/security-apponly-azureacs
I basically wanted to grab the names/relative urls of the files within a folder and then get the most recent file in the folder and put into a dataframe.
I'm sure this isn't the "Pythonic" way to do this but it works which is good enough for me.
!pip install Office365-REST-Python-Client
from office365.runtime.auth.client_credential import ClientCredential
from office365.runtime.client_request_exception import ClientRequestException
from office365.sharepoint.client_context import ClientContext
from office365.sharepoint.files.file import File
import io
import datetime
import pandas as pd
sp_site = 'https://<org>.sharepoint.com/sites/<my_site>/'
relative_url = "/sites/<my_site/Shared Documents/<folder>/<sub_folder>"
client_credentials = ClientCredential(credentials['client_id'], credentials['client_secret'])
ctx = ClientContext(sp_site).with_credentials(client_credentials)
libraryRoot = ctx.web.get_folder_by_server_relative_path(relative_url)
ctx.load(libraryRoot)
ctx.execute_query()
#if you want to get the folders within <sub_folder>
folders = libraryRoot.folders
ctx.load(folders)
ctx.execute_query()
for myfolder in folders:
print("Folder name: {0}".format(myfolder.properties["ServerRelativeUrl"]))
#if you want to get the files in the folder
files = libraryRoot.files
ctx.load(files)
ctx.execute_query()
#create a dataframe of the important file properties for me for each file in the folder
df_files = pd.DataFrame(columns = ['Name', 'ServerRelativeUrl', 'TimeLastModified', 'ModTime'])
for myfile in files:
#use mod_time to get in better date format
mod_time = datetime.datetime.strptime(myfile.properties['TimeLastModified'], '%Y-%m-%dT%H:%M:%SZ')
#create a dict of all of the info to add into dataframe and then append to dataframe
dict = {'Name': myfile.properties['Name'], 'ServerRelativeUrl': myfile.properties['ServerRelativeUrl'], 'TimeLastModified': myfile.properties['TimeLastModified'], 'ModTime': mod_time}
df_files = df_files.append(dict, ignore_index= True )
#print statements if needed
# print("File name: {0}".format(myfile.properties["Name"]))
# print("File link: {0}".format(myfile.properties["ServerRelativeUrl"]))
# print("File last modified: {0}".format(myfile.properties["TimeLastModified"]))
#get index of the most recently modified file and the ServerRelativeUrl associated with that index
newest_index = df_files['ModTime'].idxmax()
newest_file_url = df_files.iloc[newest_index]['ServerRelativeUrl']
# Get Excel File by newest_file_url identified above
response= File.open_binary(ctx, newest_file_url)
# save data to BytesIO stream
bytes_file_obj = io.BytesIO()
bytes_file_obj.write(response.content)
bytes_file_obj.seek(0) # set file object to start
# load Excel file from BytesIO stream
df = pd.read_excel(bytes_file_obj, sheet_name='Sheet1', header= 0)
Here is another helpful link of the file properties you can view: https://learn.microsoft.com/en-us/previous-versions/office/developer/sharepoint-rest-reference/dn450841(v=office.15). Scroll down to file properties section.
Hopefully this is helpful to someone. Again, I am not a pro and most of the time I need things to be a bit more explicit and written out. Maybe others feel that way too.
You need to do 2 things here.
Get a list of files (which can be directories or simple files) in
the directory of your interest.
Loop over each item in this list of files and check if
the item is a file or a directory. For each directory do the same as
step 1 and 2.
You can find more documentation at https://learn.microsoft.com/en-us/sharepoint/dev/sp-add-ins/working-with-folders-and-files-with-rest#working-with-files-attached-to-list-items-by-using-rest
def getFilesList(directoryName):
...
return filesList
# This will tell you if the item is a file or a directory.
def isDirectory(item):
...
return true/false
Hope this helps.
I have a url for sharepoint directory
Assuming you asking about a library, you can use SharePoint's REST API and make a web service call to:
https://yourServer/sites/yourSite/_api/web/lists/getbytitle('Documents')/items?$select=Title
This will return a list of documents at: https://yourServer/sites/yourSite/Documents
See: https://msdn.microsoft.com/en-us/library/office/dn531433.aspx
You will of course need the appropriate permissions / credentials to access that library.
You can not use "server name/sites/Folder name/Subfolder name/_api/web/lists/getbytitle('Documents')/items?$select=Title" as URL in SharePoint REST API.
The URL structure should be like below considering WebSiteURL is the URL of site/subsite containing document library from which you are trying to get files and Documents is the Display name of document library:
WebSiteURL/_api/web/lists/getbytitle('Documents')/items?$select=Title
And if you want to list metadata field values you should add Field names separated by comma in $select.
Quick tip: If you are not sure about the REST API URL formation. Try pasting the URL in Chrome browser (you must be logged in to SharePoint site with appropriate permissions) and see if you get proper result as XML if you are successful then update the REST URL and run the code. This way you will save time of running your python code.

PhantomJS with selenium doesnt scroll to bottom

With Firefox driver, the below code works(scrolls to bottom of page) but not with PhantomJS webdriver. The below page has infinity scroll so I need to scroll down to gather more information. Kindly help me identify why this doesnt work with phantomjs.
driver = webdriver.PhantomJS()
driver.maximize_window()
driver.get("http://www.betpawa.co.ke/upcoming")
element = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.CSS_SELECTOR, "div.events-wrapper")))
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(2)
soup = BeautifulSoup(driver.page_source.encode('utf-8'),"html.parser")
print len(soup.findAll("div", {"class":"prematch"}))

browser.click() & browser.send_keys() conflict - Selenium 3.0 Python 2.7

I am currently trying to implement a subtitle downloader with the help of the http://www.yifysubtitles.com website.
The first part of my code is to click on the accept cookies button and then send keys to search the movie of interest.
url = "http://www.yifysubtitles.com"
profile = SetProfile() # A function returning my favorite profile for Firefox
browser = webdriver.Firefox(profile)
WindowSize(400, 400)
browser.get(url)
accept_cookies = WebDriverWait(browser, 100).until(
EC.element_to_be_clickable((By.CLASS_NAME, "cc_btn.cc_btn_accept_all")))
accept_cookies_btn = browser.find_element_by_class_name("cc_btn.cc_btn_accept_all")
accept_cookies_btn.click()
search_bar = browser.find_element_by_id("qSearch")
search_bar.send_keys("Harry Potter and the Chamber of Secrets")
search_bar.send_keys(Keys.RETURN)
print "Succesfully clicked!"
But it only works once - if not randomly. If I turn on my computer and run the code, it does click, make the search and print the last statement. The second time, it doesn't click but still make the search and print the final statement.
After each try, I close the session with the browser.quit() method.
Any idea on what might be the issue here?
Specify wait for button and search bar it should solve your problem.
Thanks,D