This question already has an answer here:
Refering to a directory in a Flask app doesn't work unless the path is absolute
(1 answer)
Closed 4 years ago.
I have a standalone app that takes in an excel file and outputs a word doc. This works fine as standalone.
I have now tried to integrate it into a Flask application, but flask can't find the subfolder "templates" of my application. Here is my file structure:
my_flask_site
├── flask_app.py
├── __init__.py
├── templates
| ├── index.html
| └── report.html
├── uploads
| └── myfile.xlsx
|
└── apps
└── convert_app
├── __init__.py
├── main.py
├── report
| ├── __init__.py
| ├── data_ingest.py
| └── report_output.py
└── templates
└── output_template.docx
now I can't get the report_output.py file to find the output_template.docx file now it is in the flask application.
def run_report(file):
data = data_ingest.Incident(file)
priority_count = dict(data.df_length())
size = sum(priority_count.values())
print(priority_count)
print(size)
report = report_output.Report()
report.header()
report.priority_header(0)
i = 0
if '1' in priority_count:
for _ in range(priority_count['1']):
field = data.fields(i)
report.priority_body(field)
i += 1
report.break_page()
report.priority_header(1)
else:
report.none()
report.priority_header(1)
if '2' in priority_count:
for _ in range(priority_count['2']):
field = data.fields(i)
report.priority_body(field)
i += 1
report.break_page()
report.priority_header(2)
else:
report.none()
report.break_page()
report.priority_header(2)
if '3' in priority_count:
for _ in range(priority_count['3']):
field = data.fields(i)
report.priority_body(field)
i += 1
report.break_page()
if '4' in priority_count:
for _ in range(priority_count['4']):
field = data.fields(i)
i += 1
output = OUTPUT_FILE+f"/Platform Control OTT Daily Report {data.field[0]}.docx"
report.save(output)
print(f"Report saved to:\n\n\t {output}")
def main(file):
run_report(file)
if __name__ == "__main__":
main()
and here is the report_output.py (without the word format part):
from docx import Document
class Report(object):
def __init__(self):
self.doc = Document('./templates/pcc_template.docx')
self.p_title = ['Major Incident',
'Stability Incidents (HPI)',
'Other Incidents']
self.date = datetime.now().strftime('%d %B %Y')
def save(self, output):
self.doc.save(output)
There is more in the format_report.py file, but it is related to the function of the app. Where I am stuck is how I get the app to again see it's own template folder and the template file inside it.
I have solved my problem, after finding this post here Refering to a directory in a Flask app doesn't work unless the path is absolute.
What I take from this is that the file path has to be absolute from the Flask applications root folder, in this case "my_flask_site" is the root folder and adding the full file path solved the problem.
Related
What is the proper way of serving static files (images, PDFs, Docs etc) from a flask server?
I have used the send_from_directory method before and it works fine. Here is my implementation:
#app.route('/public/assignments/<path:filename>')
def file(filename):
return send_from_directory("./public/assignments/", filename, as_attachment=True)
However if I have multiple different folders, it can get a bit hectic and repetitive because you are essentially writing the same code but for different file locations - meaning if I wanted to display files for a user instead of an assignment, I'd have to change it to /public/users/<path:filename> instead of /public/assignments/<path:filename>.
The way I thought of solving this is essentially making a /file/<path:filepath> route, where the filepath is the entire path to the destination folder + the file name and extension, instead of just the file name and extension. Then I did some formatting and separated the parent directory from the file itself and used that data when calling the send_from_directory function:
#app.route('/file/<path:filepath>', methods=["GET"])
def general_static_files(filepath):
filepath = filepath.split("/")
_dir = ""
for i, p in enumerate(filepath):
if i < len(filepath) - 1:
_dir = _dir + (p + "/")
return send_from_directory(("./" + _dir), filepath[len(filepath) - 1], as_attachment=True)
if we simulate the following request to this route:
curl http://127.0.0.1:5000/file/public/files/jobs/images/job_43_image_1.jpg
the _dir variable will hold the ./public/files/jobs/images/ value, and then filepath[len(filepath) - 1] holds the job_43_image_1.jpg value.
If i hit this route, I get a 404 - Not Found response, but all the code in the route body is being executed.
I suspect that the send_from_directory function is the reason why I'm getting a 404 - Not Found. However, I do have the image job_43_image_1.jpg stored inside the /public/files/jobs/images/ directory.
I'm afraid I don't see a lot I can do here except hope that someone has encountered the same issue/problem and found a way to fix it.
Here is the folder tree:
├── [2050] app.py
├── [2050] public
│ ├── [2050] etc
│ └── [2050] files
│ ├── [2050] jobs
│ │ ├── [2050] files
│ │ └── [2050] images
│ │ ├── [2050] job_41_image_decline_1.jpg
│ │ ├── [2050] job_41_image_decline_2554.jpg
│ │ ├── [2050] ...
│ ├── [2050] shop
│ └── [2050] videos
└── [2050] server_crash.log
Edit 1: I have set up the static_url_path. I have no reason to believe that that could be the cause of my problem.
Edit 2: Added tree
Pass these arguments when you initialise the app:
app = Flask(__name__, static_folder='public',
static_url_path='frontend_public' )
This would make the file public/blah.txt available at http://example.com/frontend_public/blah.txt.
static_folder sets the folder on the filesystem
static_url_path sets the path used within the URL
If neither of the variables are set, it defaults to 'static' for both.
Hopefully this is what you're asking.
Let's say I want to build a very simple package in R that wraps c++ code.
My test project would be called bananas.
Let's say I have a folder called "bananas", where inside I have two other folders one called c++ and one called R.
Inside the c++ folder I have folder called include that contains the bananas.hpp header file (with the class definition):
#ifndef BANANAS_H
#define BANANAS_H
class Bananas
{
public:
void add_banana();
int get_bananas();
protected:
int number_of_bananas;
};
#endif
Outside include there is the bananas.cpp file that implements the methods of bananas.hpp:
#include "include/bananas.hpp"
using namespace std;
void Bananas::add_banana(){
// Return False if edge already existed
number_of_bananas ++;
}
int Bananas::get_bananas(){
return number_of_bananas;
}
now in my R folder I have a the wrapper.cpp file that uses Rcpp library to export c++ classes inside R as a module:
#include <Rcpp.h>
#include "include/bananas.hpp"
using namespace Rcpp;
RCPP_EXPOSED_CLASS(Bananas)
RCPP_MODULE(Bananas_cpp){
class_<Bananas>("BananasCPP")
.default_constructor()
.method("add_banana", &Bananas::add_banana)
.method("get_bananas", &Bananas::get_bananas)
;
}
Note that right now #include "include/bananas.hpp" does not mean anything, but in the future this file will be added inside source.
Finally I have my main concern which a R wrapper of this class under:
require(R6)
library(Rcpp)
# For exposing the error
print(ls(all.names = TRUE))
# Load Module
bcpp <- Module("Bananas_cpp", PACKAGE="RBananasC", mustStart = TRUE)$BananasCPP
Bananas <- R6Class("Bananas", list(
bn = new(bcpp),
print_bananas = function() {print(self$bn$get_bananas())},
banana_tree = function(d) {
for(row in 1:d){
self$bn$add_banana()
}
}
))
Now when running the setup.R file in my main directory which looks as follows:
require(Rcpp)
# Make a base package
unlink("RBananasC", recursive=TRUE)
Rcpp.package.skeleton(name = "RBananasC", list = character(),
path = ".", force = FALSE,
cpp_files = c("bananas/R/wrapper.cpp", "bananas/c++/bananas.cpp", "bananas/c++/include/bananas.hpp"))
dir.create("RBananasC/src/include")
file.rename("RBananasC/src/bananas.hpp", "RBananasC/src/include/bananas.hpp")
install.packages("RBananasC", repos=NULL, type="source")
# See that Bananas_cpp works
library(RBananasC) #Without this line it doesn't work
print(Module("Bananas_cpp", PACKAGE="RBananasC", mustStart = TRUE)$BananasCPP)
bcpp <- Module("Bananas_cpp", PACKAGE="RBananasC", mustStart = TRUE)$BananasCPP
ban <- new(bcpp)
ban$add_banana()
print(ban$get_bananas())
# Make the desired package
unlink("RBananas", recursive=TRUE)
Rcpp.package.skeleton(name = "RBananas", list = character(),
path = ".", force = FALSE,
code_files = c("bananas/R/bananas.R"), cpp_files = c("bananas/R/wrapper.cpp", "bananas/c++/bananas.cpp", "bananas/c++/include/bananas.hpp"))
dir.create("RBananas/src/include")
file.rename("RBananas/src/bananas.hpp", "RBananas/src/include/bananas.hpp")
install.packages("RBananas", repos=NULL, type="source")
I have a very strange behaviour inside my bananas.R file, that has to do with the fact that Bananas_cpp module of my package is not visible and so I cannot access the class BananasCPP upon installation.
On the other hand if I ignore the file bananas.R I can import the BananasCPP from the module Bananas_cpp of the package RBananasC, I created using Rcpp.package.skeleton.
To sum up the total file structure looks like:
.
├── bananas
│ ├── c++
│ │ ├── bananas.cpp
│ │ └── include
│ │ └── bananas.hpp
│ └── R
│ ├── bananas.R
│ └── wrapper.cpp
└── setup.R
And to demonstrate what is my problem you just run the setup.R.
I followed a standard tutorial, but I couldn't find any information of how I can load my BananasCPP class from the Bananas_cpp module inside my Bananas.R wrap function Bananas, while searching for days in the internet. This file does not appear in the namespace of the environment active inside the package, so I think this what should be tackled: "which commands I should add and where to expose my Bananas_cpp module inside the current namespace of the package".
Of course this is a reproducible that I made from a real problem I had.
Thanks a lot for your support!
I have several thousand files in an S3 bucket in this form:
├── bucket
│ ├── somedata
│ │ ├── year=2016
│ │ ├── year=2017
│ │ │ ├── month=11
│ │ | │ ├── sometype-2017-11-01.parquet
│ | | | ├── sometype-2017-11-02.parquet
│ | | | ├── ...
│ │ │ ├── month=12
│ │ | │ ├── sometype-2017-12-01.parquet
│ | | | ├── sometype-2017-12-02.parquet
│ | | | ├── ...
│ │ ├── year=2018
│ │ │ ├── month=01
│ │ | │ ├── sometype-2018-01-01.parquet
│ | | | ├── sometype-2018-01-02.parquet
│ | | | ├── ...
│ ├── moredata
│ │ ├── year=2017
│ │ │ ├── month=11
│ │ | │ ├── moretype-2017-11-01.parquet
│ | | | ├── moretype-2017-11-02.parquet
│ | | | ├── ...
│ │ ├── year=...
etc
Expected behavior:
The AWS Glue Crawler creates one table for each of somedata, moredata, etc. It creates partitions for each table based on the childrens' path names.
Actual Behavior:
The AWS Glue Crawler performs the behavior above, but ALSO creates a separate table for every partition of the data, resulting in several hundred extraneous tables (and more extraneous tables which every data add + new crawl).
I see no place to be able to set something or otherwise prevent this from happening... Does anyone have advice on the best way to prevent these unnecessary tables from being created?
Adding to the excludes
**_SUCCESS
**crc
worked for me (see aws page glue/add-crawler). Double stars match the files at all folder (ie partition) depths. I had an _SUCCESS living a few levels up.
Make sure you set up logging for glue, which quickly points out permission errors etc.
Use the Create a Single Schema for Each Amazon S3 Include Path option to avoid the AWS Glue Crawler adding all these extra tables.
I had this problem and ended up with ~7k tables 😅 so wrote the following script to remove them. It requires jq.
#!/bin/sh
aws glue get-tables --region <YOUR AWS REGION> --database-name <YOUR AWS GLUE DATABASE> | jq '.TableList[] | .Name' | grep <A PATTERN THAT MATCHES YOUR TABLENAMEs> > /tmp/table-names.json
cd /tmp
mkdir table-names
cd table-names
split -l 50 ../table-names.json
for f in `ls`; cat $f | tr '\r\n' ' ' | xargs aws glue batch-delete-table --region <YOUR AWS REGION> --database-name <YOUR AWS GLUE DATABASE> --tables-to-delete;
check if you have empty folders inside. When spark writes to S3, sometimes, the _temporary folder is not deleted, which will make Glue crawler create table for each partition.
I was having the same problem.
I added *crc* as exclude pattern to the AWS Glue crawler and it worked.
Or if you crawl entire directories add */*crc*.
So, my case was a little bit different and I was having the same behaviour.
I got a data structure like this:
├── bucket
│ ├── somedata
│ │ ├── event_date=2016-01-01
│ │ ├── event_date=2016-01-02
So when I started AWS Glue Crawler instead of update the tables, this pipeline was creating a one table per date. After digging into the problem I found that someone added a column as a bug at the json file instead of id was ID. Because my data is parquet the pipeline was working well to store the data and retrieve inside the EMR. But Glue was crashing pretty bad because Glue convert everything to lowercase and probably that was the reason why it was crashing. Removing the uppercase column glue start to work like a charm.
You need to have separate crawlers for each table / file type. So create one crawler that looks at s3://bucket/somedata/ and a 2nd crawler that looks at s3://bucket/moredata/.
having below directory structure.
home/
└── jobs/
├── jobname1/
│ ├── builds/
│ └── config.xml
└── jobname2/
├── builds/
└── config.xml
having many job names under jobs folder.
I wish to copy all config.xml files to another backup directory without missing its jobname folder structure.
could you please help me understand how to use shutil libraray in more efficient way.
My target folder Backup structure something like below
Backup/
└── jobs/
├── jobname1/
│ └── config.xml
└── jobname2/
└── config.xml
Please help.
Please find below my actual requirement.
Algorithm to write python2.7 scrpt
1. git clone Backup folder , where Backup is empty folder
2. Find list of files having file name "config.xml in the HOME folder
3. Copy all config.xml files to Backup folder with out changing the folder structure
3.1. If config.xml is new file then git add all config.xml to git repository , and commit followed by git push
3.2. If config.xml is existing and unchanged then git push is not required
3.3. If config.xml is existing and modified then commit followed by git push
I found the logic. May be this might be useful to someone. Regret the code as it is novice level.
def walkfs(self,findfile):
## dictionary to store full path of each source findfile
matches_fullpath_dict = {}
## dictionary to store relative path of each findfile
matches_trunc_dict = {}
## dictionary to store full path of each target file location
matches_target_fullpath_dict = {}
i =0
source = self.jenkins_Home
destination = self.backupRepositoryPath
for root, dirnames, filenames in os.walk(source):
for filename in fnmatch.filter(filenames, findfile):
i=i+1
matches_fullpath_dict[i]=os.path.join(root, filename)
matches_trunc_dict[i]=os.path.join(root.replace(source,""), filename)
matches_target_fullpath_dict[i]=os.path.join( os.path.sep, destination + root.replace(source,""))
keys = matches_target_fullpath_dict.keys()
for key in keys:
if not os.path.exists(matches_target_fullpath_dict.get(key)):
try:
os.makedirs(matches_target_fullpath_dict.get(key))
except OSError:
pass
keys = matches_target_fullpath_dict.keys()
for key in keys:
shutil.copy2(matches_fullpath_dict.get(key,None), matches_target_fullpath_dict.get(key,None))
I have following directory structure:
A
|
|--B--Hello.py
|
|--C--Message.py
Now if the path of root directory A is not fixed, how can i import "Hello.py" from B to "Message.py" in C.
At first I suggest to add empty __init__.py file into every directory with python sources. It will prevent many issues with imports because this is how the packages work in Python:
In your case it this should look like this:
A
├── B
│ ├── Hello.py
│ └── __init__.py
├── C
│ ├── Message.py
│ └── __init__.py
└── __init__.py
Let's say the Hello.py contains the function foo:
def foo():
return 'bar'
and the Message.py tries to use it:
from ..B.Hello import foo
print(foo())
The first way to make it work is to let the Python interpreter to do his job and to handle package constructing:
~ $ python -m A.C.Message
Another option is to add your Hello.py file into list of known sources with the following code:
# Message.py file
import sys, os
sys.path.insert(0, os.path.abspath('..'))
from B.Hello import foo
print(foo())
In this case you can execute it with
~/A/C $ python Message.py