get list of files in a sharepoint directory using python

get list of files in a sharepoint directory using python - python-2.7

I have a url for sharepoint directory(intranet) and need an api to return list of files in that directory given the url. how can I do that using python?

Posting in case anyone else comes across this issue of getting files from a SharePoint folder from just the folder path.
This link really helped me do this: https://github.com/vgrem/Office365-REST-Python-Client/issues/98. I found so much info about doing this for HTTP but not in Python so hopefully someone else needs more Python reference.
I am assuming you are all setup with client_id and client_secret with the Sharepoint API. If not you can use this for reference: https://learn.microsoft.com/en-us/sharepoint/dev/solution-guidance/security-apponly-azureacs
I basically wanted to grab the names/relative urls of the files within a folder and then get the most recent file in the folder and put into a dataframe.
I'm sure this isn't the "Pythonic" way to do this but it works which is good enough for me.
!pip install Office365-REST-Python-Client
from office365.runtime.auth.client_credential import ClientCredential
from office365.runtime.client_request_exception import ClientRequestException
from office365.sharepoint.client_context import ClientContext
from office365.sharepoint.files.file import File
import io
import datetime
import pandas as pd
sp_site = 'https://<org>.sharepoint.com/sites/<my_site>/'
relative_url = "/sites/<my_site/Shared Documents/<folder>/<sub_folder>"
client_credentials = ClientCredential(credentials['client_id'], credentials['client_secret'])
ctx = ClientContext(sp_site).with_credentials(client_credentials)
libraryRoot = ctx.web.get_folder_by_server_relative_path(relative_url)
ctx.load(libraryRoot)
ctx.execute_query()
#if you want to get the folders within <sub_folder>
folders = libraryRoot.folders
ctx.load(folders)
ctx.execute_query()
for myfolder in folders:
print("Folder name: {0}".format(myfolder.properties["ServerRelativeUrl"]))
#if you want to get the files in the folder
files = libraryRoot.files
ctx.load(files)
ctx.execute_query()
#create a dataframe of the important file properties for me for each file in the folder
df_files = pd.DataFrame(columns = ['Name', 'ServerRelativeUrl', 'TimeLastModified', 'ModTime'])
for myfile in files:
#use mod_time to get in better date format
mod_time = datetime.datetime.strptime(myfile.properties['TimeLastModified'], '%Y-%m-%dT%H:%M:%SZ')
#create a dict of all of the info to add into dataframe and then append to dataframe
dict = {'Name': myfile.properties['Name'], 'ServerRelativeUrl': myfile.properties['ServerRelativeUrl'], 'TimeLastModified': myfile.properties['TimeLastModified'], 'ModTime': mod_time}
df_files = df_files.append(dict, ignore_index= True )
#print statements if needed
# print("File name: {0}".format(myfile.properties["Name"]))
# print("File link: {0}".format(myfile.properties["ServerRelativeUrl"]))
# print("File last modified: {0}".format(myfile.properties["TimeLastModified"]))
#get index of the most recently modified file and the ServerRelativeUrl associated with that index
newest_index = df_files['ModTime'].idxmax()
newest_file_url = df_files.iloc[newest_index]['ServerRelativeUrl']
# Get Excel File by newest_file_url identified above
response= File.open_binary(ctx, newest_file_url)
# save data to BytesIO stream
bytes_file_obj = io.BytesIO()
bytes_file_obj.write(response.content)
bytes_file_obj.seek(0) # set file object to start
# load Excel file from BytesIO stream
df = pd.read_excel(bytes_file_obj, sheet_name='Sheet1', header= 0)
Here is another helpful link of the file properties you can view: https://learn.microsoft.com/en-us/previous-versions/office/developer/sharepoint-rest-reference/dn450841(v=office.15). Scroll down to file properties section.
Hopefully this is helpful to someone. Again, I am not a pro and most of the time I need things to be a bit more explicit and written out. Maybe others feel that way too.

You need to do 2 things here.
Get a list of files (which can be directories or simple files) in
the directory of your interest.
Loop over each item in this list of files and check if
the item is a file or a directory. For each directory do the same as
step 1 and 2.
You can find more documentation at https://learn.microsoft.com/en-us/sharepoint/dev/sp-add-ins/working-with-folders-and-files-with-rest#working-with-files-attached-to-list-items-by-using-rest
def getFilesList(directoryName):
...
return filesList
# This will tell you if the item is a file or a directory.
def isDirectory(item):
...
return true/false
Hope this helps.

I have a url for sharepoint directory
Assuming you asking about a library, you can use SharePoint's REST API and make a web service call to:
https://yourServer/sites/yourSite/_api/web/lists/getbytitle('Documents')/items?$select=Title
This will return a list of documents at: https://yourServer/sites/yourSite/Documents
See: https://msdn.microsoft.com/en-us/library/office/dn531433.aspx
You will of course need the appropriate permissions / credentials to access that library.

You can not use "server name/sites/Folder name/Subfolder name/_api/web/lists/getbytitle('Documents')/items?$select=Title" as URL in SharePoint REST API.
The URL structure should be like below considering WebSiteURL is the URL of site/subsite containing document library from which you are trying to get files and Documents is the Display name of document library:
WebSiteURL/_api/web/lists/getbytitle('Documents')/items?$select=Title
And if you want to list metadata field values you should add Field names separated by comma in $select.
Quick tip: If you are not sure about the REST API URL formation. Try pasting the URL in Chrome browser (you must be logged in to SharePoint site with appropriate permissions) and see if you get proper result as XML if you are successful then update the REST URL and run the code. This way you will save time of running your python code.

Related

Openpyxl Django

Trying to automate modifying an Ecxel file using openpyxl lib and below code works:
def modify_excel_view(request):
wb = openpyxl.load_workbook('C:\\my_dir\\filename.xlsx')
sh1 = wb['Sheet1']
row = sh1.max_row
column = sh1.max_column
for i in range(2, row + 1):
# ...some_code...
wb.save('C:\\my_dir\\new_filename.xlsx')
return render(request, 'my_app/convert.html', {'wb': wb})
But how to implement a similar method without static (hardcoding) path to file and a filename? Like choosing xlsx file from modal window?
Saving modified copy on a desktop by default? (Maybe like 'ReportLab library': response.write(pdf)=>return response. And it saving on the desktop by default.)
Thank you in advance!

You can send the file path and name as POST parameters and extract them in the views.py function using request.POST.get('file_path').
Just add a line to save the copy to the desktop before returning.

How to recreate/reload app when external config/database changes in Flask

I’m making a server to convert Rmarkdown to Dash apps. The idea is parse all params in the rmd file and make corresponding Dash inputs. Then add a submit button which compile the rmd to html somewhere and iframe back. I use an external database to store the info for rmd paths so user can dynamically add files. The problem is when a rmd file changes, the server has to reparse the file and recreate the app and serve at the same url. I don’t have an elegant solution. Right now I’m doing something like this.
server = Flask(__name__)
#server.route(“rmd/path:path”):
def convert_rmd_to_dash(path):
file = get_file_path_from_db(path)
mtime = get_last_modified_time(file)
cached_app, cached_mtime = get_cache(path)
if cached_mtime == mtime:
return cached_app
inputs = parse_file(file)
app = construct_dash_app(inputs)
return app.index()
def construct_dash_app(inputs):
app = dash.Dash(
name,
server=server,
routes_pathname_prefix=’/some_url_user_will_never_use/’ + file_name + time.time())
app.layout = …
…
return app
It works but I get many routing rules under /some_url_user_will_never_use. Directly overwriting rmd/path might be possible but feels hacky based on Stackoveflow’s answer. Is there a better solution? Thanks.

Referencing image files in markdown for flask app

I am trying to create a simple blog app using flask that uses flask_flatpages to fill a jinja2 template using the contents of a markdown file for each post.
app = Flask(__name__)
app.config.from_pyfile('settings.py')
pages = FlatPages(app)
#app.route('/<path>/')
def blog_post(path):
post = pages.get_or_404(path)
return render_template('post.html', post=post)
The issue I'm having is that I'm unable to link an image in the markdown file, for example the following example_post.md file returns a 404 error in the rendered HTML for the image.png file (when accessing e.g. http://localhost:5000/example_post/)
# Heading
Here is an example image.
![png](image.png)
I think this is because accessing the image attempts to find example_post/image.png, due to the route I created, but the image is actually in the same directory as the post.md file (there is no example_post/ directory). The file structure is as follows:
--app.py
--posts/
----example_post.md
----image.png
--templates/
----post.html
Any suggestions for how to correctly reference the image.png file in this case, or how to better structure the app to make this work?

We can use this below as an example to fix your problem.
from flask import send_from_directory
#app.route('<path:filename>')
def serve_static(filename):
root_dir = os.path.dirname(os.getcwd())
return send_from_directory(os.path.join(root_dir, 'md'), filename)
Example :
https://www.programcreek.com/python/example/65747/flask.send_from_directory

web Crawling and Extracting data using scrapy

I am new to python as well as scrapy.
I am trying to crawl a seed url https://www.health.com/patients/status/.This seed url contains many urls. But I want to fetch only urls that contain Faci/Details/#somenumber from the seed url .The url will be like below:
https://www.health.com/patients/status/ ->https://www.health.com/Faci/Details/2
-> https://www.health.com/Faci/Details/3
-> https://www.health.com/Faci/Details/4
https://www.health.com/Faci/Details/2 -> https://www.health.com/provi/details/64
-> https://www.health.com/provi/details/65
https://www.health.com/Faci/Details/3 -> https://www.health.com/provi/details/70
-> https://www.health.com/provi/details/71
Inside each https://www.health.com/Faci/Details/2 page there is https://www.health.com/provi/details/64
https://www.health.com/provi/details/65 ... .Finally I want to fetch some datas from
https://www.health.com/provi/details/#somenumber url.How can I achieve the same?
As of now I have tried the below code from scrapy tutorial and able to crawl only url that contains https://www.health.com/Faci/Details/#somenumber .Its not going to https://www.health.com/provi/details/#somenumber .I tried to set depth limit in settings.py file.But it doesn't worked.
import scrapy
from scrapy.spiders import CrawlSpider, Rule
from scrapy.linkextractors import LinkExtractor
from news.items import NewsItem
class MySpider(CrawlSpider):
name = 'provdetails.com'
allowed_domains = ['health.com']
start_urls = ['https://www.health.com/patients/status/']
rules = (
Rule(LinkExtractor(allow=('/Faci/Details/\d+', )), follow=True),
Rule(LinkExtractor(allow=('/provi/details/\d+', )),callback='parse_item'),
)
def parse_item(self, response):
self.logger.info('Hi, this is an item page! %s', response.url)
item = NewsItem()
item['id'] = response.xpath("//title/text()").extract()
item['name'] = response.xpath("//title/text()").extract()
item['description'] = response.css('p.introduction::text').extract()
filename='details.txt'
with open(filename, 'wb') as f:
f.write(item)
self.log('Saved file %s' % filename)
return item
Please help me to proceed further?

To be honest, the regex-based and mighty Rule/LinkExtractor gave me often a hard time. For simple project it is maybe an approach to extract all links on page and then look on the href attribute. If the href matches your needs, yield a new Response object with it. For instance:
from scrapy.http import Request
from scrapy.selector import Selector
...
# follow links
for href in sel.xpath('//div[#class="contentLeft"]//div[#class="pageNavigation nobr"]//a').extract():
linktext = Selector(text=href).xpath('//a/text()').extract_first()
if linktext and linktext[0] == "Weiter":
link = Selector(text=href).xpath('//a/#href').extract()[0]
url = response.urljoin(link)
print url
yield Request(url, callback=self.parse)
Some remarks to your code:
response.xpath(...).extract()
This will return a list, maybe you want to have a look on extract_first() which provide the first item (or None).
with open(filename, 'wb') as f:
This will overwrite the file several times. You will only gain the last item saved. Also you open the file in binary mode ('b'). From the filename I guess you want to read it as text? Use 'a' to append? See open() docs
An alternative is to use the -o flag to use scrapys facilities to store the items to JSON or CSV.
return item
It is a good style to yield items instead of return them. At least if you need to create several items from one page you need to yield them.
Another good approach is: Use one parse() function for one type/kind of page.
For instance every page in start_urls fill end up in parse(). From that you extract could extract the links and yield Requests for each /Faci/Details/N page with a callback parse_faci_details(). In parse_faci_details() you extract again the links of interest, create Requests and pass them via callback= to e.g. parse_provi_details().
In this function you create the items you need.

Arcpy Report all MXDs using specified Layer

I have a code I'm working on using python 2.7 and the arcpy plugin.
The user will plug in a layer file and a root directory to search.
The code will os.walk the directory and find every .MXD file. It will then search each .MXD file for the specified .SHP file. If an .MXD reports using the .SHP requested, it will log that .MXD file.
where lyr = E1.get() and is the .shp file being searched for.
lyr = E1.get()
for root, dirs, files in os.walk(path):
for name in files:
basename, extension = os.path.splitext(name)
if extension == '.mxd':
fullPath = os.path.join(root,name)
mxd = arcpy.mapping.MapDocument(fullPath)
DataList = arcpy.mapping.ListLayers(mxd)
for item in DataList:
if item == lyr:
LOG_ME = mxd
l.info(LOG_ME)
else:
pass
else:
pass
The logfile is created when the program runs, but never populates any data. I receive no errors, even in a directory I know contains .MXDs that use the specified .SHP
I've also tried
for item in DataList:
if lyr in item:
log
and
if lyr in DataList:
log
any ideas what the issue might be?

I believe the problem may be in your comparison
if item == lyr:
Note that arcpy.mapping.ListLayers returns a list of layer objects, not a list of layer names.
Try changing your comparison to
if item.name == lyr:
Good luck!
Tom

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

get list of files in a sharepoint directory using python - python-2.7

I have a url for sharepoint directory(intranet) and need an api to return list of files in that directory given the url. how can I do that using python?

Related

Openpyxl Django

How to recreate/reload app when external config/database changes in Flask

Referencing image files in markdown for flask app

web Crawling and Extracting data using scrapy

Arcpy Report all MXDs using specified Layer

Categories

Resources